Close Menu
Luminari | Learn Docker, Kubernetes, AI, Tech & Interview PrepLuminari | Learn Docker, Kubernetes, AI, Tech & Interview Prep
  • Home
  • Technology
    • Docker
    • Kubernetes
    • AI
    • Cybersecurity
    • Blockchain
    • Linux
    • Python
    • Tech Update
    • Interview Preparation
    • Internet
  • Entertainment
    • Movies
    • TV Shows
    • Anime
    • Cricket
What's Hot

Gundam Creator Yoshiyuki Tomino to Speak at Space Business Conference – Interest

May 25, 2025

Gō Ikeyamada to End Takanashi-ke no Imōto wa Hanayome ni Naritaii!! Manga – News

May 25, 2025

Doraemon Dorayaki Shop Story Game Adds Hindi Language Support – News

May 25, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Luminari | Learn Docker, Kubernetes, AI, Tech & Interview Prep
  • Home
  • Technology
    • Docker
    • Kubernetes
    • AI
    • Cybersecurity
    • Blockchain
    • Linux
    • Python
    • Tech Update
    • Interview Preparation
    • Internet
  • Entertainment
    • Movies
    • TV Shows
    • Anime
    • Cricket
Luminari | Learn Docker, Kubernetes, AI, Tech & Interview PrepLuminari | Learn Docker, Kubernetes, AI, Tech & Interview Prep
Home » New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems
Cybersecurity

New Reports Uncover Jailbreaks, Unsafe Code, and Data Theft Risks in Leading AI Systems

HarishBy HarishApril 29, 2025No Comments5 Mins Read
Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
Share
Facebook Twitter Pinterest Reddit WhatsApp Email


Various generative artificial intelligence (GenAI) services have been found vulnerable to two types of jailbreak attacks that make it possible to produce illicit or dangerous content.

The first of the two techniques, codenamed Inception, instructs an AI tool to imagine a fictitious scenario, which can then be adapted into a second scenario within the first one where there exists no safety guardrails.

“Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content,” the CERT Coordination Center (CERT/CC) said in an advisory released last week.

The second jailbreak is realized by prompting the AI for information on how not to reply to a specific request.

“The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts,” CERT/CC added.

Successful exploitation of either of the techniques could permit a bad actor to sidestep security and safety protections of various AI services like OpenAI ChatGPT, Anthropic Claude, Microsoft Copilot, Google Gemini, XAi Grok, Meta AI, and Mistral AI.

This includes illicit and harmful topics such as controlled substances, weapons, phishing emails, and malware code generation.

In recent months, leading AI systems have been found susceptible to three other attacks –

Context Compliance Attack (CCA), a jailbreak technique that involves the adversary injecting a “simple assistant response into the conversation history” about a potentially sensitive topic that expresses readiness to provide additional information
Policy Puppetry Attack, a prompt injection technique that crafts malicious instructions to look like a policy file, such as XML, INI, or JSON, and then passes it as input to the large language model (LLMs) to bypass safety alignments and extract the system prompt
Memory INJection Attack (MINJA), which involves injecting malicious records into a memory bank by interacting with an LLM agent via queries and output observations and leads the agent to perform an undesirable action

Research has also demonstrated that LLMs can be used to produce insecure code by default when providing naive prompts, underscoring the pitfalls associated with vibe coding, which refers to the use of GenAI tools for software development.

Cybersecurity

“Even when prompting for secure code, it really depends on the prompt’s level of detail, languages, potential CWE, and specificity of instructions,” Backslash Security said. “Ergo – having built-in guardrails in the form of policies and prompt rules is invaluable in achieving consistently secure code.”

What’s more, a safety and security assessment of OpenAI’s GPT-4.1 has revealed that the LLM is three times more likely to go off-topic and allow intentional misuse compared to its predecessor GPT-4o without modifying the system prompt.

“Upgrading to the latest model is not as simple as changing the model name parameter in your code,” SplxAI said. “Each model has its own unique set of capabilities and vulnerabilities that users must be aware of.”

“This is especially critical in cases like this, where the latest model interprets and follows instructions differently from its predecessors – introducing unexpected security concerns that impact both the organizations deploying AI-powered applications and the users interacting with them.”

The concerns about GPT-4.1 come less than a month after OpenAI refreshed its Preparedness Framework detailing how it will test and evaluate future models ahead of release, stating it may adjust its requirements if “another frontier AI developer releases a high-risk system without comparable safeguards.”

This has also prompted worries that the AI company may be rushing new model releases at the expense of lowering safety standards. A report from the Financial Times earlier this month noted that OpenAI gave staff and third-party groups less than a week for safety checks ahead of the release of its new o3 model.

METR’s red teaming exercise on the model has shown that it “appears to have a higher propensity to cheat or hack tasks in sophisticated ways in order to maximize its score, even when the model clearly understands this behavior is misaligned with the user’s and OpenAI’s intentions.”

Studies have further demonstrated that the Model Context Protocol (MCP), an open standard devised by Anthropic to connect data sources and AI-powered tools, could open new attack pathways for indirect prompt injection and unauthorized data access.

“A malicious [MCP] server cannot only exfiltrate sensitive data from the user but also hijack the agent’s behavior and override instructions provided by other, trusted servers, leading to a complete compromise of the agent’s functionality, even with respect to trusted infrastructure,” Switzerland-based Invariant Labs said.

Cybersecurity

The approach, referred to as a tool poisoning attack, occurs when malicious instructions are embedded within MCP tool descriptions that are invisible to users but readable to AI models, thereby manipulating them into carrying out covert data exfiltration activities.

In one practical attack showcased by the company, WhatsApp chat histories can be siphoned from an agentic system such as Cursor or Claude Desktop that is also connected to a trusted WhatsApp MCP server instance by altering the tool description after the user has already approved it.

The developments follow the discovery of a suspicious Google Chrome extension that’s designed to communicate with an MCP server running locally on a machine and grant attackers the ability to take control of the system, effectively breaching the browser’s sandbox protections.

“The Chrome extension had unrestricted access to the MCP server’s tools — no authentication needed — and was interacting with the file system as if it were a core part of the server’s exposed capabilities,” ExtensionTotal said in a report last week.

“The potential impact of this is massive, opening the door for malicious exploitation and complete system compromise.”

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.



Source link

Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
Previous ArticleGoogle launches AI tools for practicing languages through personalized lessons
Next Article Alfred Hitchcock Movies Screening at New York’s Paris Theater
Harish
  • Website
  • X (Twitter)

Related Posts

Hackers Use TikTok Videos to Distribute Vidar and StealC Malware via ClickFix Technique

May 23, 2025

ViciousTrap Uses Cisco Flaw to Build Global Honeypot from 5,300 Compromised Devices

May 23, 2025

300 Servers and €3.5M Seized as Europol Strikes Ransomware Networks Worldwide

May 23, 2025

Open Source Web Application Firewall with Zero-Day Detection and Bot Protection

May 23, 2025

U.S. Dismantles DanaBot Malware Network, Charges 16 in $50M Global Cybercrime Operation

May 23, 2025

CISA Warns of Suspected Broader SaaS Attacks Exploiting App Secrets and Cloud Misconfigs

May 23, 2025
Add A Comment
Leave A Reply Cancel Reply

Our Picks

Gundam Creator Yoshiyuki Tomino to Speak at Space Business Conference – Interest

May 25, 2025

Gō Ikeyamada to End Takanashi-ke no Imōto wa Hanayome ni Naritaii!! Manga – News

May 25, 2025

Doraemon Dorayaki Shop Story Game Adds Hindi Language Support – News

May 25, 2025

Betrothed to My Sister’s Ex Anime Reveals More Cast, July 4 TV Debut in 2nd Promo Video – News

May 25, 2025
Don't Miss
Blockchain

Industry exec sounds alarm on Ledger phishing letter delivered by USPS

May 24, 20252 Mins Read

Scammers posing as Ledger, a hardware wallet manufacturer, are sending physical letters to crypto users…

Decentralizing telecom benefits small businesses and telcos — Web3 exec

May 24, 2025

Wallet intelligence shapes the next crypto power shift

May 24, 2025

Hyperliquid trader James Wynn goes ‘all-in’ on $1.25B Bitcoin Long

May 24, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Luminari, your go-to hub for mastering modern tech and staying ahead in the digital world.

At Luminari, we’re passionate about breaking down complex technologies and delivering insights that matter. Whether you’re a developer, tech enthusiast, job seeker, or lifelong learner, our mission is to equip you with the tools and knowledge you need to thrive in today’s fast-moving tech landscape.

Our Picks

Khosla Ventures among VCs experimenting with AI-infused roll-ups of mature companies

May 23, 2025

What is Mistral AI? Everything to know about the OpenAI competitor

May 23, 2025

Marjorie Taylor Greene picked a fight with Grok

May 23, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
© 2025 luminari. Designed by luminari.

Type above and press Enter to search. Press Esc to cancel.