OpenAI releases new macOS app for agent coding

AI is already having a major impact on how software is written, and much of the heavy lifting of programming is now performed by swarms of agents and subagents. But as developers experiment with new interfaces and form factors for human-AI collaboration, even the most advanced AI labs are finding it difficult to keep up.

The current trend is agent software development (systems that allow AI agents to independently work on coding tasks), as exemplified by the Claude Code and Cowork apps. Meanwhile, OpenAI has been gradually building the Codex tool, which was released as a command line tool last April and expanded to a web interface a month later.

Now, OpenAI is taking big steps to catch up. The company released a new macOS app for Codex on Monday that integrates many of the agent practices that have become popular over the past year. The new app is designed to work in parallel with multiple agents and integrate agent skills and other cutting-edge workflows. This release also comes less than two months after the release of OpenAI’s most powerful coding model, GPT-5.2-Codex, which the company hopes will be enough to attract Claude Code users.

“If you really want to do sophisticated work on complex things, 5.2 is the most powerful model we’ve ever had,” CEO Sam Altman told reporters at a press conference. “But it’s getting harder to use, so we think it’s going to be pretty important to build that level of model functionality into a more flexible interface.”

Altman’s confidence in GPT-5.2 is understandable, but the coding benchmarks tell a more complicated story. GPT-5.2, at least at the time of writing, holds the top spot in Terminal Bench, a test that measures how well an AI handles command-line programming tasks. However, the Gemini 3 and Claude Opus agents scored almost identically. The score is lower, but within the benchmark’s margin of error. Results from SWE Bench, another coding benchmark that tests AI’s ability to fix bugs in real-world software, are similar, showing no clear advantage for GPT-5.2. However, agent use cases are difficult to benchmark effectively, and state-of-the-art models can have very different user experiences.

The Codex app also comes with a variety of new features, and OpenAI says it can match, and in some cases outperform, the various Claude apps. The Codex app allows you to set up automations to run in the background on an automatic schedule, and the results are queued and available for review when the user returns. Users can also choose different personalities for their agents, from down-to-earth to empathetic, depending on their working style.

But the biggest selling point for the company is the speed of development made possible by AI. “You can use this from a clean sheet of paper to create very sophisticated software in a matter of hours,” Altman said. “The ability to input new ideas as quickly as possible is the limit of what you can build.”

tech crunch event

boston, massachusetts
|
June 23, 2026

Source link

What's Hot

Analysis: Beijing claims victory for new world order amid uproar over parade of U.S. allies during President Trump’s visit to China

Cristiano Ronaldo: Saudi Pro League club Al Nasr forward absent amid rumors he is dissatisfied with the club’s transfer activities | Cristiano Ronaldo Soccer News

Strong US data. Traders consider Warsh Fed selection

Firefox will soon allow you to block all generated AI features

What Snowflake and OpenAI’s deal says about the enterprise AI race

Coalition calls on federal government to ban Grok over non-consensual sexual content

‘False narrative’: Family members challenge Trump’s suspension of visas to 75 countries | Donald Trump News

Democrats win special election in Texas. How important is it to Trump? |Election news

President Trump lowers US tariffs on India from 50% to 18% | Trade war news