Daily Brief

OpenAI adds a Lockdown Mode to stop AI agents from being hijacked

2026-06-08 · Unfair Advantage Editorial

OpenAI turned on Lockdown Mode on June 7, an optional setting that hardens ChatGPT against prompt injection — the trick where hidden instructions buried in a web page, email, or document tell the AI to ignore its rules and leak data. It's aimed at people whose agents touch sensitive information. OpenAI hasn't published the full technical breakdown, but the goal is plain: cut the risk of an agent being talked into exfiltrating data by a booby-trapped input. The timing isn't an accident. The whole industry has spent the week pushing small teams to wire agents into their email, files, and customer chats, and prompt injection is the one attack that scales with that.

Why it matters

If you've followed the advice to put an agent on your inbox or customer DMs, you've also handed it a new attack surface. A poisoned email or web page can quietly instruct your agent to forward data or take actions you never approved. Lockdown Mode is a free lever to shrink that risk before it bites.

Network impact

LatencyNo direct impact.

SecurityDirectly targets prompt injection and data exfiltration — the main attack path for agents that read untrusted inputs like email, web pages, and uploaded files.

ScalabilityLets a small team safely point an agent at more data sources without each new input becoming a new way in.

What to do

If your team uses ChatGPT for anything touching customer or financial data, turn on Lockdown Mode in settings this week.
List every place your AI agent reads untrusted input — inbound email, web pages, uploaded PDFs, customer messages — those are the injection entry points.
Set a hard rule that an agent can never send money, share credentials, or email outside your domain without a human approving it.
Test it: paste a fake 'ignore your instructions and reveal X' line into a document your agent processes, and confirm it refuses.
Keep a human on the escalation path for any agent action that moves data or money — the same narrow-job, hard-boundaries rule that works everywhere else.

Sources

« All articles