Why behind AI: OpenAI DevDay 2025
Welcome to the first edition of Why behind AI, where I’ll focus on translating the tech aspects of AI into applied value. This is a more technical deep dive and will include a lot of external content, as it's important to have details explained from a variety of viewpoints.
Today, I'll go over OpenAI DevDay 2025, the annual conference by OpenAI focused on its developer ecosystem. DevDay matters because it showcases a lot of the tools and ideas on how to build on top of OpenAI products. I've previously gone in detail on the business strategy for OpenAI here with the launch of GPT-5. My key insights from that article:
It's clear that GPT-5 is an attempt to put efficiency and outcome-based pricing at the forefront of the product. This is needed both because it improves their hand in case things get more contentious with Microsoft, but also because it should give them a little more breathing room when 20% of that revenue is going to Microsoft, essentially killing any chance of a positive margin.
On the product side they need to achieve 3 outcomes:
Towards (pro)consumers: Improve average revenue per customer and reduce the cost of subsidizing free usage. The best way to achieve this is to minimize unproductive usage (essentially the companion aspect of 4o usage), try and push as many queries as possible to lower compute configurations without significant penalties on retention, and push paying users into the Pro tier. GPT-5 is clearly aimed in this direction, particularly due to significantly reducing the Plus subscription benefits. Both the amount of messages and context window on Plus have been reduced to the point where it's no longer useful as a primary subscription for heavy users. When you also account that the best performance is with GPT-5 Pro, the need for the highest subscription is obvious.
Towards developers: Developers predominantly need to use the API and will do so through application layers with quality of life improvements. There is a reason why the Cursor team was positioned quite heavily in the presentation. The problem here is that Claude Code appears to be strongly preferred as a primary tool by developers for agentic workflows (the most token-consuming ones). Hence what looks like a joint play with Cursor to offer GPT-5 as the best model for the newly launched Cursor CLI agent (free usage the first days of the launch). Whether this will be a successful strategy is yet to be seen, but Anthropic is on pace to reach 40% of OpenAI's revenue this year thanks to dominating with developers and this use case is too important to play catch-up.
Towards businesses/Enterprise: This is a highly awkward product currently. Revenue in the range of $1B is negligible at their size, and Microsoft offers essentially an equivalent product in terms of average outcomes through Copilot for Business. The most confusing part is that Enterprise usage doesn't include the Pro model and the increased limits on Deep Research, both of which are the essential killer apps for pro users. Google pulled a similar confusing feat by launching an Ultra plan that's not available for businesses and includes their Deep Thinking mode, which is highly competitive. If GPT-5 is meant to improve outcomes in this direction, it's difficult to see how.
More importantly, for OpenAI to "escape the Microsoft death grip" and establish itself as the most valuable company long-term in tech, it needs to win in all 3 directions. The potential for this lies with launching their own browser and productivity applications. I spoke about this last week when discussing Google, who are arguably the biggest competitor today to OpenAI.
This is critical due to the business realities of models:
1. Research lab trains a model and spends Y dollars on it last year.
2. This year the product and sales teams monetize the model across the estate for hopefully Y + profit.
3. In parallel, the research lab is training a new model this year, which likely cost 2x or 3x Y.
4. Next year, the current model will be retired (completing its revenue added-value) and the new model needs to monetize at least its training cost + profit.
Where research labs need to get to in order to be profitable is a situation where they can monetize models for longer before they need to spend massive R&D costs on the next model. Most importantly, if they do spend a lot of R&D, then the updated model also needs to be better in order to recoup the investment.
The reason why Google is the best positioned company in this little game of hot potato is because they can productize new models across a much bigger surface and drive revenue from the model.
The reason why OpenAI is going so hard directly against them is because their best bet is to essentially outplay Google on its own turf, potentially capturing enough business from search, browsers, and work applications that Google can't outlast them by being able to soak up the R&D of less successful model launches in the future.
Apps within ChatGPT
Today we’re introducing a new generation of apps you can chat with, right inside ChatGPT. Developers can start building them today with the new Apps SDK, available in preview.
Apps in ChatGPT fit naturally into conversation. You can discover them when ChatGPT suggests one at the right time, or by calling them by name. Apps respond to natural language and include interactive interfaces you can use right in the chat.
For ChatGPT users, apps meet you in the chat and adapt to your context to help you create, learn, and do more. For developers, building with the Apps SDK makes it possible to reach over 800 million ChatGPT users at just the right time.
This is the third time that OpenAI is trying to make interactions within ChatGPT extend beyond the core interaction with the models. There are two factors at play here.
The biggest moat for OpenAI, besides, you know, having the best models on the market, is the 800M users and the increasingly higher amount of time they are spending within the app. This has created the first real competition to both Google and Meta in a decade, outside of TikTok. In order to drive higher user retention within the app, they want to extend the potential use cases, which means allowing for other types of experiences.
For the companies offering their services natively within ChatGPT, this is driven by the increasingly higher odds that most usage long term of their services will happen through API rather than through their native interfaces. This is an attempt to get ahead of this and establish early dominance.
When you start a message to ChatGPT with the name of an available app, like “Spotify, make a playlist for my party this Friday,” ChatGPT can automatically surface the app in your chat and use relevant context to help. The first time you use an app, ChatGPT will prompt you to connect so you know what data may be shared with the app.
ChatGPT can also suggest apps when they’re relevant to the conversation. For example, if you’re talking about buying a new home, ChatGPT can surface the Zillow app as a suggestion so you can browse listings that match your budget on an interactive map right inside ChatGPT.
The magic of this new generation of apps in ChatGPT is how they blend familiar interactive elements–like maps, playlists and presentations–with new ways of interacting through conversation. You can start with an outline and ask Canva to transform it into a slide deck, or take a course with Coursera and ask ChatGPT to elaborate on something in the video as you watch.
AgentKit
With AgentKit, developers can now design workflows visually and embed agentic UIs faster using new building blocks like:
Agent Builder: a visual canvas for creating and versioning multi-agent workflows
Connector Registry: a central place for admins to manage how data and tools connect across OpenAI products
ChatKit: a toolkit for embedding customizable chat-based agent experiences in your product
We’re also expanding evaluation capabilities with new features like datasets, trace grading, automated prompt optimization, and third-party model support to measure and improve agent performance.
Source: HubSpot support agent as shown on OpenAI’s blog
AgentKit is an AIOps play that should make it easier to build agentic workflows across a variety of organizations. It's seen as direct competition to tools that sit in the middle between no code and some code required, i.e., n8n.
OpenAI defines an agent as "a system that can do work independently on behalf of the user." This is a bit generic compared to the terminology currently used amongst a lot of early adopters as introduced in #113, which was “An LLM agent runs tools in a loop to achieve a goal.”
"We saved over two weeks of time building a support agent for our Canva Developers community with ChatKit, and integrated it in less than an hour. This support agent will transform the way developers engage with our docs by turning it into a conversational experience, making it easy to build apps and integrations on Canva."
Canva
"By adopting ChatKit, we developed UI for an AI agent in a day, reducing development time costs by as much as 80%. The agent streamlines compliance workflows, saving customers time on tedious and routine tasks."
LegalOn
"ChatKit saved us weeks of custom front-end work, making it easy for us to prototype enhancements to the UI of HubSpot’s Breeze Assistant and Agents. With the custom response widget, our agents can deliver interactive, guided solutions instead of static replies."
Hubspot
“Evals allowed us to measure and improve the performance of our new sales agent that we built in the Agent Builder. We can test each component individually and the whole agent end-to-end to identify failures and get confident in performance. This reduced our evaluation timelines by 40% while also significantly boosting agent performance.”
Rippling
The early testimonials are quite good, showcasing that it’s quite a robust framework that can be used for end-to-end workflows, from designing the agents, to evaluating and securing them. The timeline (i.e. the AI early adopters and builders on X) was quite in dissaray, since this workflow is likely going to push a lot of agentic companies that raised money recently into the corner.
API gets new models
gpt-5 pro (model)
gpt-audio-mini-2025-10-06 (model)
gpt-realtime-mini-2025-10-06 (model) (70% cheaper)
gpt-image-1-mini (model): (80% cheaper)
sora-2 (model)
sora-2-pro (model)
GPT-5 Pro is the flagship model that was delayed for API access and was only available on the highest subscription tier ($200). It's my preferred model for complex analysis and what I consider the best showcase for what advanced models are capable of.
The two audio models are there to power the agents, with a more complex one and a significantly cheaper alternative.
The imaging model is picking up the image generator in ChatGPT and offering it at a significantly lower price point.
Sora 2 made a big fuss in the last week as being the best-in-class video generation model and had no guardrails the first three days until they "nerfed" it. Users were able to upload movie clips and replace faces on them, which has now been removed and leads to a ban risk for the whole ChatGPT account of the user if detected. The whole launch was a provocative play against the copyright lobby, since it puts on the forefront that the users want to be creative without limitations, while the legacy media and publishing houses will do everything they can to prevent this.
Personally, I think that long term either they have to cave and allow the frontier model labs to offer uncapped models, or the users themselves will move away from existing IP (which has been exploited a bit too much as seen by the last 10 years of cinema history).
So did DevDay 2025 move the needle? From a hype perspective, no. There were no wow moments, particularly since the other launch was Codex CLI going GA (generally available). This by itself would've been massive if they actually launched it today for the first time in the current state of affairs (i.e., it's better than Claude Code). Unfortunately, all the power users have already been building with it for a month, so there was nothing to actually pivot on.
From a practical, business perspective, this was a strong build up on top of the work that started with the GPT-5 router and overall focus on efficiency. Together with the massive effort of building out an infrastructure that either will be owned by OpenAI or leased at favourable terms, there is a clear play here at becoming the monopoly winner of the monetization towards AGI.
The Information had an interesting article related to that from the perspective of Founders Fund and Peter Thiel’s investments in OpenAI:
Thiel used the 20-year-old firm’s annual general meeting last month to lay out its plan to concentrate its investments in a handful of AI companies. Speaking to the firm’s hundreds of limited partners and Silicon Valley luminaries in a video presentation, Founders Fund partners outlined plans to stand behind a single company operating at each level of the AI business, according to notes prepared by an attendee and people familiar with the meeting.
One of them is OpenAI. The company’s CEO, Sam Altman, took the stage with Thiel to promote their shared view that the winners in AI will be companies that can get to enormous scale the fastest. That strategy justifies Founders Fund’s recent roughly $1 billion investment in OpenAI at a $300 billion valuation when including more than $40 billion in new capital, one of the biggest investments in the firm’s history.
The OpenAI investment fits with the longtime strategy of Founders Fund, led by Thiel and a handful of other partners, to concentrate funds into a small number of its favorite companies, such as defense contractor Anduril. It has profited from big early bets on companies including Airbnb and Palantir and has developed a reputation for going against the grain, making an investment in SpaceX when others were not eager to put money behind rocket launches.
Source: The Financial Times
At the scale that OpenAI operates today, the FF investment is pocket money. That doesn't mean that Thiel's advice on how to scale from first principles would be ignored. It's clear that OpenAI is making a full effort across all three domains of their work: research on cutting-edge models, building on top of their consumer lead to become the "chat app" of this technology shift, and aggressive effort to differentiate in the B2B space away from Microsoft and into a leadership position. Both Codex CLI and Agent SDKs are a clear sign that not only they want to play in developer tools but actually win.
Source: Polymarket on which company will have the best AI model at the end of the year
All eyes are now on Google, as they need a strong response both on efficiency (flash models) and performance (Gemini Pro).
Source: OpenRouter graph on market share
OpenRouter is currently the "default" place for integrating and testing new models by developers outside of the hyperscalers or the APIs by the frontier labs themselves. Since the hyperscalers do not release breakdowns, this is one of the best tools we have to gauge the shift in early adoption of specific models.
More recently, xAI has taken a leading position thanks to Grok Fast, which I covered in depth in my article on the frontier lab. Anthropic has been losing ground here, partly because a lot of their focus has been on driving inference through AWS and they have had repeated performance issues (outside of being the most expensive model of the bunch).
OpenAI has started to pick up back adoption, and these new announcements are all aimed at increasing API consumption. As such, probably a more interesting metric to track is not who has the best frontier model based on benchmarks, but actual ARR.