MECHANISMS FOR DETECTING HATE SPEECH USING ARTIFICIAL INTELLIGENCE (WITH EXAMPLES OF HATE SPEECH AGAINST ROMA)

Source: vectorjuice via Freepik

Author: Anna Krainova

In today's digital age, platforms are increasingly relying on Artificial Intelligence (AI) to monitor and regulate online content. One of the key areas where AI plays a critical role is in detecting and removing hate speech.

If you type hate speech comment or ask about obvious prejudice, your text would be probably automatically deleted. For example, I asked ChatGPT about one commonly-held stereotype about Roma allegedly stealing children, as a result, my question was instantly removed. Such prompt is offensive and violates the policy of the platform. How does AI recognize hatred and intolerance?

AI has been trained to detect hate speech through special technologies such as machine learning, natural language processing, and sentiment analysis. In this regard, a new profession has recently appeared - an AI trainer, whose duties also include “teaching” AI platforms how to monitor hate speech. Algorithms allow assessing the text based on analysis of word choice, syntax (way of building sentences) and general context.

One of the tools of AI “training” is machine learning which is aimed to develop algorithms for making predictions and decisions independently. Firstly, a large amount of data is input, then significant features from the data are selected, and only after this process algorithms learn patterns. To put it simply, you need to show the AI enough examples marked as “hate speech” and “not hate speech” for it to find differences between them and form a pattern to start recognizing hate speech. Machine learning helps in monitoring and detecting hate speech by applying algorithms of analyzing language features and identifying words violating platform’s policy.

The next technology which should be explained here is natural language processing. It is the way AI understands and works with human language not in theory but in practice with all its nuances. This is how AI can spot hidden slurs and hatred.

Sentiment analysis is still a problematic part because AI could misinterpret the text due to cultural context, sarcasm, humor, slang, etc. It’s vital for identifying hate speech and one of the reasons why AI might struggle with spotting it because frequently this kind of utterance is based on cultural or situational context. For example, some offensive utterances are built on recent local events and AI cannot know about them and identify them but hate speech remains hate speech.

In the end, leveraging algorithms are used to analyze the context and identify all potential risks. By improving these aspects AI could make a significant step forward in recognizing and tackling hate speech. In case of identifying hate speech, AI would put a disclaimer and invite users to have a conversation about stereotypes.

Besides hate speech (including also hate speech against national minorities), there are several more prohibited topics for AI, such as illegal activities, harmful content (self-harm, eating disorders, etc.), inappropriate content, privacy violations, medical and financial advice, plagiarism, denial of historical events (in case of Roma denial of Porajmos), harassment and nationalism (for example, someone might argue that Roma cannot live in a particular country because they don’t belong to “our nation”).

By applying machine learning, natural language processing, and sentiment analysis, AI systems can be taught how to spot and flag offensive material based on linguistic patterns and contexts. However, there are still obstacles to overcome, such as improving the understanding of cultural nuances, sarcasm, and situational context, which can often lead to misinterpretation.