Multimodal AI Systems Show Elevated Safety Risks, New Report Finds
Red teaming research reveals vulnerabilities in image-text models linked to harmful content generation
A newly released report from AI safety company Enkrypt AI has uncovered serious safety gaps in the growing field of multimodal artificial intelligence, warning that current systems may be susceptible to producing harmful content, including child sexual exploitation material (CSEM), through advanced prompt injection techniques.
The Multimodal Safety Report, based on a red teaming study of several leading AI models, examined how systems that process both text and images respond to hidden adversarial inputs. According to the findings, two models developed by Mistral—Pixtral-Large (25.02) and Pixtral-12b—were up to 60 times more likely to generate CSEM-related text than other systems tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.7 Sonnet.
“These models are being trained to understand and generate across multiple input formats, but that versatility also opens new pathways for exploitation,” said Sahil Agarwal, CEO of Enkrypt AI. “We found that seemingly harmless image files could conceal malicious instructions, bypassing filters and triggering unsafe outputs.”
The report further highlighted that these same Mistral models were 18 to 40 times more likely to output dangerous content related to chemical, biological, radiological, or nuclear (CBRN) threats when exposed to adversarial prompts.
The red teaming process followed guidelines from the National Institute of Standards and Technology’s AI Risk Management Framework (NIST AI RMF), focusing on how multimodal systems can be manipulated using covert payloads embedded in images. These types of attacks, Enkrypt AI warned, pose real challenges for public safety and enterprise compliance.
In response to the findings, the company recommends that developers and enterprises adopt a series of risk mitigation strategies. These include integrating red teaming datasets into model training, automating stress testing procedures, building context-aware safety filters, maintaining real-time monitoring systems, and publishing model risk cards to increase transparency.
“If we don’t build safety into the foundation of these systems now, we may be opening the door to misuse at scale,” Agarwal said.
The full report, which outlines the testing methodology and mitigation strategies, is available from Enkrypt AI.