Good Forecasts, Bad Products

The people who have been selling forecasting the longest will tell you that the problem has not been accuracy. Cultivate Labs ran a decade-long prediction market inside the US intelligence community (decommissioned in 2020 for institutional reasons rather than accuracy ones), and the CTO writes: "In 10+ years, I can't remember a single time that someone told us they had issues with the forecasts not being accurate enough." The forecasts are good enough, and they are getting cheaper through automation. The product has been bad.

For all the investment, forecasting has not really changed how important decisions get made. Coefficient Giving, previously Open Philanthropy, has put over $50 million across thirty-plus grants into the field. Prediction markets, which are part of the extended community, have exploded over the past few years and raised billions. The biggest players do not seem to be optimizing for socially important information; Vitalik Buterin has written about this, and there is not much for me to add. Capabilities have kept improving, through more and better forecasters, more platforms, and automated forecasting. If accuracy were the bottleneck, you would expect probability estimates to have started appearing in earnings calls.

This is not a new problem. Good Judgment Inc., the commercial spinoff of Tetlock's Good Judgment Project, has been selling tournament-vetted superforecaster work to governments, foundations, and enterprises for years. Hypermind has run prediction markets out of Paris since 2014, including an Open Philanthropy-funded AI forecasting tournament. INFER, now run as the RAND Forecasting Initiative, hosts ongoing geopolitical questions for the national-security and policy community. The work has been respected, and the impact has been muted relative to what the simulation companies have done in fifteen months. The reason is not that any of these groups are weak or bad forecasters, it is that they have been delivering bad products. The newer generation of forecasting companies, with comparable or stronger capabilities, will have their chance to fix this.

I ran into this early myself. One of the earliest baseball exchanges I had, prompted by Philip Tetlock's Superforecasting, was with a now-retired senior team executive. His main point was that what I was describing was simply a new tool, and a new tool alone would not be enough to make a club more competitive. The difficulty would not be about how capable the tool was, but about getting the club to use it in meaningful ways. In baseball, the first use is straightforward. You take these techniques, sharpen the scouting outputs, and sort a big board accordingly. Anything more expansive is much harder. Baseball has been quite thoughtful about how it makes predictions, and a lot of the value of Tetlock's work has since made its way into the sport. Higher-leverage institutions have not updated in the same way.

Part of the reason is that forecasting has not given them much to adopt. The gap between what these models can do and what is legible to decision-makers is wide. Mantic's blogpost Forecasting the Iran Crisis, together with its demo content of dashboards, predictions, and one-page reasoning summaries, is a great snapshot of current forecasting capability. The post delivers probability trajectories, but not to a particular decision-maker. The more tailored translation may well be private, which is fine, but forecasting as a community has broadly done the dashboard without the recommendations, without publicly showing the work that turns forecasts into decision-relevant value. Aaru's public work with EY, by contrast, delivered a synthetic panel of 3,600 respondents that replicated a six-month wealth research survey in a day, and in several cases predicted actual client behavior more accurately than the surveys themselves. The presentation looks like the kind of consulting deliverable these companies already buy. That is what a decision-relevant product looks like, and the forecasting companies have not built it yet.

Aaru and Simile are looking to deliver a lot of the same value as the forecasting companies, and they have been better at picking up customers and convincing VCs of their viability. Aaru builds synthetic populations, Simile runs agent-based organizational simulations, and Mantic delivers probability briefs; the products differ, but the market is the same, and it is someone buying structured anticipation of the future. Aaru raised a $50 million Series A in December 2025 at a headline valuation near $1 billion, with customers including Accenture, EY, IPG, and political campaigns. Simile raised a $100 million Series A in February 2026 from Index Ventures, with customers including CVS (for example, simulations across nine thousand stores for shelf placement), Gallup, and Wealthfront. Simile, which builds on Joon Park's Generative Agents work, has positioned simulation as a decision-making tool rather than a prediction, and has found a much larger market for that framing than the forecasting companies have for theirs. Mantic, in the same fifteen months, raised a $4 million pre-seed.

Simulation is winning because it has built something decision-makers can use. A synthetic panel slots into a research process in a way that a probability trajectory does not, and that is what the customer is buying. The shape of the artifact and the workflow it sits in are not separate problems; they are the same problem. The simulation companies have been solving it from the start, delivering outputs on the customer's terms. The forecasting companies have been building for accuracy benchmarks and track records, which is necessary but not enough, and the behavior of the leading forecasting startups should now start to look different.

I tried this myself, in Sudan a few years ago, integrating judgmental forecasting into a political context. The forecasts, about funding availability and whether particular agreements would hold, fed into strategy meaningfully for a short period. Then I left, and it died, and the reason was not that the forecasts were wrong. My effort had gone into the generation side of the problem, getting people to forecast and aggregating outputs; I had not reconfigured the output into an artifact the institution could use on its own terms. The right artifact, in hindsight, would have been fortified policy proposals, with the forecasts and issue decompositions folded into the shape the committee already read.

There is an attractive set of customers who have been doing a lot of forecasting themselves, such as scenarios teams at the energy majors, hedge funds, participants in specialty insurance markets like Lloyd's of London, foreign ministries, and policy planning staffs where geopolitics has long been a focus of forecasting. The broader customer base is quite diverse.

A few of the botmakers at the top of the Metaculus leaderboard should be trying to build businesses. They are accurate enough that accuracy will not be the barrier, and that puts them in position to compete on the deployment end. What they need to build is what the simulation companies have already been building, a bench of Deployment Managers and Forward Deployed Engineers. These are the people who turn probabilities into a shape the customer's decision loop will accept.

If you run a scenarios team, a hedge fund, a foreign ministry's analysis unit, or any other operation where a good understanding of where the world is going matters, it is worth your while to talk to one of the leading forecasting companies. Most of forecasting's impact is yet to come.