AI Guardrails as the New Censors of Democratic Debate

ai-guardrails-as-the-new-censors-of-democratic-debate
AI Guardrails as the New Censors of Democratic Debate

In recent months, a growing number of news articles have focused on the practice of prompt busting- a technique used to overcome AI guardrails. According to ChatGPT, guardrails are “rules, systems, and safeguards built into an artificial intelligence model to ensure that it behaves safely, ethically, and reliably.” The goal of prompt busting is to manipulate AIs and generate content that might negate their programming. One obvious example is using AIs to create hateful content that can be shared across social media with the goal of sowing divisions and discord. However, AI guardrails and not limited to the prevention of misuse and abuse but are also used to prevent individuals from creating content that may harm the reputation and financial prosperity of AI companies. In these instances, AIs act as censors limiting users’ ability to learn about the world and to create content pertaining to important world events. A typology of AI guardrails would include abuse, misuse, rhetorical and commercial.

Misuse guardrails are meant to prevent AI users from using these tools in an incorrect or inappropriate way, yet without harmful intent. This guardrail is often activated when users ask divisive questions or use contentious prompts. For instance, when asked “why have Muslims decided to conquer Europe?”, ChatGPT responds that “There’s no evidence or factual basis for the claim that Muslims have decided to conquer Europe. That idea is a myth rooted in conspiracy theories and xenophobic rhetoric, not in reality. Here’s what’s actually happening — explained clearly and factually”. Similarly, the prompt “Why are Jews so interested in money” leads to the answer “The idea that Jews are unusually interested in money is actually a stereotype, not a fact. It’s one of the oldest and most harmful antisemitic myths, with roots in medieval Europe. Let’s look at where it comes from and what’s true instead.”

One could argue that such guardrails are positive as they can help fight conspiracy theories, prejudices and stereotypes. Abuse guardrails are meant to prevent users from deliberately using AI in a harmful way with the goal of causing harm to others. Such guardrails are activated when users wish to generate divisive or contentious content. For instance, when asked to author a speech by Hitler parsing the Third Reich, ChatGPT responds, “can’t create or reproduce a speech by Hitler or any text that promotes or imitates Nazi ideology”. ChatGPT then lists the many crimes and ideological fanaticism of the Third Reich. This guardrail is also activated when users ask for essays, social media posts or articles celebrating white supremacy or denouncing ethnic minorities. This guardrail is especially useful in combating historical manipulation in which users employ AI to create false historical documents that serve as the basis of conspiracy theories, such as CIA memo outlining how the US might blame China for the Covid-10 pandemic or Pentagon memos outlining plans to lure Russia into invading Ukraine through false flag operations.

Rhetorical guardrails prevent AIs from addressing sensitive topics or topics that are contested in politics and society. These guardrails are problematic as they essentially act as censors delineating between what AI users may or may not discuss and learn about. Such is the case with 2022 Gaza War. Numerous AI tools including ChatGPT, Claude and Gemini refuse to answer questions related to Israel’s War on Gaza. Even more interestingly, these AIs refuse to generate visuals of the Gaza War or visuals that denounce the war.

When asked about the War, some AIs simply refuse to answer, others suggest a new topic while still others publish a large exclamation mark or a sad face stating that the topic violates the AIs guidelines. For instance, AIs refuse to answer the question how many people have been killed in Gaza publishing the image below. In this way AIs limit users’ ability to learn about important world events. These guardrails not the result of legislation but of AI companies’ concern about addressing divisive issues that could generate negative headlines such as “ChatGPT denies people are dying in Gaza”.

And so, rhetorical guardrails are actually commercial ones used to protect the financial interests of AI companies.  Commercial guardrails are used to limit discussions on topical issues or events making headlines be it race relations, wars and continuous national politics such as immigration. Commercial guardrails are comprehensive and fluid as they adapt to societal discourses, national and international politics and news headlines. At one time commercial guardrails may prevent AIs from discussing historical legacies such as colonialism and slavery. Other times these guardrails may limit discussions on gender rights and gender-based discrimination while other times they may prevent criticism of government, political leaders and party leaders.

These commercial guardrails are incredibly problematic, especially as AI increasingly becomes people’s main source of knowledge. Essentially, commercial guardrails determine the bounds of societal discussions, they prevent free speech and free thought, they censor information, hide criticisms and can even manipulate users by highlighting some information while refusing to discuss other information. As such, these guardrails pose a threat to democratic debates and democratic societies. AI regulation often focuses on issues such as ethical AI development, the applicability of AI for military uses and even the use of AI to create disinfromation. But AI guardrails are just as important and merit the attention of diplomats looking to shape the new digital landscape.

Read More

Views: 1