Open CTFs Are Now Pay-to-Win, and Frontier AI Did It

The first thing I check when someone claims to be a strong security engineer is their CTF history. A top finish at an open competition - especially a prestigious 48-hour event - used to tell you something precise: this person can reverse-engineer an unknown binary under sleep deprivation without a cheat sheet. You couldn’t buy that. You couldn’t fake it. You had to have done the work.

I’m not sure that’s still true.

The story in one sentence

A hacker stands at an open CTF when an AI bot zooms in, grabs the flag, and leaves in seconds.

Kabir Acharya - who competed with top-10 global teams including TheHackersCrew and won DownUnderCTF, Australia’s largest CTF, multiple times - argues that Claude Opus 4.5 and GPT-5.5 have automated enough of the open CTF scoreboard that it no longer reliably measures whether the human behind it understands security.

Why this landed on the front page

Two AI models - Claude Opus 4.5 and GPT-5.5 - step onto a dark stage that represents the open CTF scoreboard.

Because Kabir isn’t a nervous newcomer writing a hot take. He was at the top of the game. He says TheHackersCrew and other legendary teams have quietly stopped playing, or play with far fewer people, or struggle to crack the top 10. Plaid CTF - one of the most respected competitions in the scene - stopped running entirely in 2026.

And he traces the decline to two specific model releases:

Claude Opus 4.5 - suddenly made “almost every medium difficulty challenge, and some hard challenges, agent-solvable.” You could literally spin up a Claude instance for every challenge via the CTFd API, let it run for the first hour, and then only touch whatever was left.
GPT-5.5 - can one-shot Insane difficulty active leakless heap pwn challenges on HackTheBox. Can solve “a large portion of what a smaller CTF organiser can realistically produce.” An orchestrated Pro run against a 48-hour event has a real chance of capturing the flag before you do.

That’s no longer “AI as a handy tool.” That’s “the scoreboard now partially measures who can afford to burn more tokens.”

What the comments are arguing

Two commenters argue: one cites CNC replacing craftsmen, the other cites how chess engines are banned in tournaments.

The sharpest counter comes from copx:

“This is like someone complaining that making machine parts has been ruined: Skillful craftsmen used to make them by hand. Nowadays the CAD/CAM/CNC cheaters have automated the whole thing… The manual skills you trained with CTF puzzles are simply no longer relevant.”

It’s a coherent analogy. But Kabir already preempted it with the chess engine comparison, and I think it’s the stronger argument.

Chess computers have been better than every human alive since about 1996. That did not kill competitive chess. Tournaments are still exciting, Magnus Carlsen is still interesting to watch, and the skill still develops. The reason is simple: engines are not allowed during competitive play. They’re used for training, analysis, and commentary - not for making moves during the match.

Imagine letting every player bring Stockfish to the board in real-time. Would anyone care about the result? Would it be useful for talent identification? Would the next Magnus Carlsen exist?

CTFs have no such rule. You cannot enforce “no AI” in an open online event. Organisers tried. They failed. Frontier models don’t meaningfully care about old refusal-string tricks anymore. Rules against LLMs are ignored and unenforceable at scale.

The CNC analogy also misses the learning ladder. When CNC replaced manual machining, the learning path changed but it still existed. The article’s harder claim is that CTFs specifically served as a ladder - a way for beginners to track real growth and for the industry to spot talent. When the ladder rungs are automated, beginners appear to progress while learning nothing.

kevinsimper had the practical fix: make competitions offline with provided laptops, the way CS2 esports prevents cheating. That probably works for top-tier finals - DEF CON already has restricted equipment formats. But it doesn’t save the open online event where most people played.

The actual damage

A beginner climbs a ladder where the top rungs are missing, then falls - illustrating how the open-CTF learning path breaks when agents win.

Beyond the scoreboard, Kabir itemises specific losses:

Challenge authors who spent weeks crafting something sophisticated have less reason to do it if a well-orchestrated agent eats it in minutes.
The recruiting signal is degrading. CTF placement was a meaningful proxy for security skill. Now it’s partly a proxy for API budget and willingness to automate.
The beginner path is breaking. If open scoreboards are dominated by agent runs, newcomers are nudged toward pasting prompts before they’ve built the instincts the AI is replacing. That’s an anti-pattern. Struggle is how you actually learn.

The line at the end of the article landed hard: “It also gives AI shills more room to capitalise on the decline by selling mediocre wrappers back to the community that made the training data valuable in the first place.”

Should you read the original?

Two figures stand for two responses: one nods 'read it,' the other rolls eyes 'skip it.'

Read it if…	Skip it if…
You played CTFs seriously and want someone to name what you’ve been sensing	You already think AI-augmented competition is neutral evolution
You hire security engineers and still weight CTF performance	You want a concrete solution - the article is a eulogy, not a playbook
You want the exact timeline: GPT-4 → Opus 4.5 → GPT-5.5 → gone	You’re looking for beginner resources (go to picoGym or HackTheBox)

What I take away

On a dark scoreboard the word 'SKILL' is crossed out and replaced with 'TOKEN BUDGET.'

There’s a version of this story where the community adapts and something better emerges. Maybe. Competition under constraints has historically produced interesting solutions. Better qualifier formats, hardware-gated events, novel challenge types that AI still genuinely struggles with - those paths exist.

But the honest read right now: the format that reliably converted curious teenagers into serious security researchers has a hole in it. The old ladder is missing rungs.

The scoreboard doesn’t lie. It just stopped measuring what it used to measure.

Discussion on Hacker News · Source: kabir.au · Submitted by frays