MIT Researchers Develop Toxic AI To Combat Harmful Content
Researchers at the Massachusetts Institute of Technology (MIT) are training AI systems to mock and express hatred using AI as a tool. The goal is to create a sound plan for detecting and curbing toxic content in media, known as CRT for short-term purposes. Chatbots need to be taught to rely on preset parameters to exclude inappropriate answers.
AI technology with language models is rapidly becoming superior to humans in various functions, including creating software and answering non-trivial questions. However, AI can be exploited for both good and bad intentions, such as spreading misinformation or harmful content.
MIT’s AI algorithm addresses these issues by synthesizing prompts by mirroring given prompts and then responding. The Department of Probabilistic Artificial Intelligence Lab at MIT, under the supervision of Pulkit Agrawal, advocates a red teaming style approach, testing a system by posing as an adversary. This strategic approach ensures AI systems are designed to combat adverse inputs and ensure their safety.
(With inputs from Shikha Singh)
You need to login in order to Like