OpenAI Launches EVMbench to Test AI’s Ability to Secure Ethereum Smart Contracts

By Rachel Lourdesamy February 19, 2026 In Ethereum, OpenAI

Source:AdobeStock

OpenAI and Paradigm introduced EVMbench to assess AI systems’ ability to handle Ethereum smart contract vulnerabilities.
The benchmark uses 120 real-world audit issues and evaluates detection, repair and exploit capabilities in controlled environments.
Early results show significant performance differences between GPT-5.3-Codex and GPT-5, highlighting rapid model advancement.

OpenAI has unveiled EVMbench, a smart contract security benchmark developed alongside crypto investment firm Paradigm to test artificial intelligence agents on Ethereum vulnerabilities. The framework is intended to determine whether AI systems can detect, exploit and fix serious flaws in Ethereum smart contracts.

Because smart contracts are generally immutable once deployed, errors can have enduring financial consequences. OpenAI said such contracts routinely protect more than US$100 billion (AU$141 billion) in open-source crypto assets, increasing the importance of rigorous security evaluation as AI coding capabilities advance.

Measuring AI Performance

The dataset underpinning EVMbench consists of 120 curated vulnerabilities drawn from 40 professional audits, with most sourced from open audit competitions including Code4rena. Additional scenarios stem from security auditing work for Tempo, a purpose-built Layer-1 blockchain designed to support high-throughput, low-cost stablecoin payments.

AI agents are assessed across three categories: detecting known vulnerabilities, patching contracts without compromising intended functionality, and executing exploit attempts within a controlled blockchain environment. Exploit tasks are graded using deterministic transaction replay and on-chain checks.

In benchmark results, GPT-5.3-Codex achieved 72.2% in exploit mode, while GPT-5 recorded 31.9%, despite being released just over six months earlier. OpenAI said the objective is to create a clear standard for evaluating AI systems in blockchain security as decentralised finance continues to grow.

Author

Rachel Lourdesamy

Rachel is a freelance writer based in Sydney with experience within financial services, marketing, and corporate communications in the APAC region. An avid reader and a graduate of the University of Sydney, she covers topics including business, finance and human interest.

OpenAI Launches EVMbench to Test AI’s Ability to Secure Ethereum Smart Contracts

Measuring AI Performance

Rachel Lourdesamy

You may also like

Vitalik’s Ethereum Roadmap: A Near-Total Rebuild Is Coming, and It’ll Take Years

Robinhood Goes On-Chain: Brokerage Launches Its Own Ethereum Layer-2 Network

Vitalik Buterin Unveils 40% Ethereum Foundation Budget Cut in Push for Leaner Future