<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Anthropic Research</title>
<link>https://www.anthropic.com/research</link>
<description>Latest research publications from Anthropic</description>
<language>en</language>
<lastBuildDate>Tue, 02 Jun 2026 04:04:20 +0000</lastBuildDate>
<generator>ForgeRSS</generator>
<atom:link href="https://www.anthropic.com/research" rel="self" type="application/rss+xml"/>
<image>
  <url>https://www.anthropic.com/favicon.ico</url>
  <title>Anthropic Research</title>
  <link>https://www.anthropic.com/research</link>
</image>
<item>
  <title>Coding agents in the social sciences</title>
  <link>https://www.anthropic.com/research/coding-agents-social-sciences</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/coding-agents-social-sciences</guid>
  <pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">We present results from a survey of 1,260 social scientists about AI and coding agent use, fielded in February and March 2026.

The vast majority of respondents (81%) have tried using AI chatbots in research, particularly for writing code and editing prose. But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work.

There are sharp disparities in use of coding agents. Twice as many researchers with typically male names use co...</p><div style="font-size:16px;line-height:1.8;color:#333">We present results from a survey of 1,260 social scientists about AI and coding agent use, fielded in February and March 2026.

The vast majority of respondents (81%) have tried using AI chatbots in research, particularly for writing code and editing prose. But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work.

There are sharp disparities in use of coding agents. Twice as many researchers with typically male names use coding agents as those with female names. Researchers at top universities are 40% more likely than others to use coding agents.

Users of coding agents post more working papers and grant proposals than others in the same discipline and career stage, but this could reflect pre-existing differences among early adopters.

Researchers are more optimistic about AI helping write publishable papers than about the effects of AI on the social sciences as a whole.

How are AI coding agents changing how we study the economy and society?

The human sciences are shifting: for the first time, core research tasks can be handed off to machines. AI chatbots increasingly contribute to scientific research, including in the mostprestigious publicationsand in thesocial sciences. This has spurred optimism that AI could boost research productivity—while also stoking fears about overloaded peer review and a deluge of academic AI slop.

But while turn-taking AI chatbots have primarily been used forwriting assistance, coding agents could restructure social science research more radically. Agentic coding platforms like Claude Code and Codex can take a research idea and a dataset, write and run an analysis, interpret the output, and iterate autonomously. What had been irreducibly human steps in empirical research can, for the first time, beautomated. At the extreme, researchers have built multi-agent pipelines to automatecomputer science researchand autonomously execute socialscience research ideas.

These tools could accelerate science and make it more daring: fast research execution should mean cheap and plentiful discovery. They could also amplify disparities in research resources and exacerbate congestion in the scholarly record. More deeply, as AI handles a broadening swath of research tasks, its distinctive analytical choices could stamp our collective understanding of our economy, our society, and ourselves.

In this post, we offer a first look, drawing on a survey of 1,260 quantitative social scientists fielded in early 2026. The survey is the baseline wave of a larger ongoing study of how coding agents affect research productivity, including a randomized experiment providing researchers with access to Claude Code. We will publish results from this experiment in the future. For now, we report what the baseline survey reveals about who is using these tools and for what; how output differs between users and non-users; and what researchers expect about the implications of growing adoption.

A new survey on AI coding agent use among quantitative social scientists

We fielded the survey in late February and March 2026, targeting active quantitative social scientists. This was not a representative sample—respondents were recruited for a study that offered access to Claude Max accounts, so selection into the sample could tilt toward researchers curious about AI tools. However, the respondents were fairly similar to an earlier sample that received a more generic invitation (see Table A2 in theAppendix).

Respondents were evenly split between economics, political science and sociology, each around a fifth of the sample, with management sciences and psychology close behind (see Table A1). We also received a smaller number of responses from public health, education and communications researchers. Roughly 40% were full or associate professors, 25% were assistant professors, and about 30% were doctoral students.

Coding agents haven't reached most social scientists

We measured overall AI use in two ways. First, we asked “Have you previously used genAI models to aid your research process?” 81% of respondents said yes.

But what about those who have actually adopted increasingly capable coding agents into their workflow? Here, we asked “Do you regularly (more than once a week) use an AI coding assistant integrated into your command line (such as Codex, Cursor, or Claude Code)?” In a follow-up question, we verified that they used one of those tools (or Google Antigravity).1

Only 20% of respondents use coding agents. Our survey came around two months after a flurry of discussion about Claude Code and Opus 4.6 that kicked off in late December of 2025. Yet even among interested respondents who self-selected into our survey, only ⅕ had adopted agents into their workflow. Claude Code is the most common coding agent tool reported, with 86% of users reporting Claude Code use (31% report using Codex, the next most common tool).

Adoption is highly uneven

Figure 1 shows there is large variation in the overall adoption rate, from 39% of economists and 25% of political scientists to single digits for public health (6%), education (4%) and communication (6%). This gradient roughly tracks differences across fields in overall AI use, but differences in coding agent adoption are steeper on average.

Just over a quarter of doctoral students and postdocs use coding agents at least weekly; among tenured professors that rate falls by more than half. The researchers adopting coding agents are the juniors—more technologically fluent, more likely to be working directly with code and data, and facing stronger career pressures to produce research.

Adoption differences extend beyond discipline and career stage. We classify researcher names according to gender and find that those with typically male names have adopted coding agents at more than twice the rate of respondents with typically female names. High-status and private universities also see notably higher use. All of these differences are significant at the p<0.05 level. These differences are starker than the differences in overall AI use, and suggest higher inequality, at least in this early period of coding agent adoption.

The gender gap in coding agent use does not just reflect a gap in rates of trying AI. Among respondents who have tried using AI for research, there is even a slightly larger gender gap in regular coding agent use than in the overall sample. These differences also persist when comparing across genders in the same disciplines and career stages.

Researchers mainly use AI to code and edit, not write

Among researchers using AI, whether through coding agents or chatbots, what are they actually using it for? Debate about AI in academic research has focused heavily on writing: hallucinated literature reviews, “it’s not X, it’s Y” strewn across formulaic introductions, and the possibility of fully automated paper writing.

But Figure 4 shows that the most common use, for both coding agent users and others, is for coding up analysis of quantitative data: 97% of coding agent users and 77% of other AI users report using it to generate code. Next most common is editing prose, followed by asking for methods advice and background on prior research. Aggregating across coding agent users and others, only a third of all AI users have used it to draft prose at all. These patterns generally hold across disciplines, with only economists and management researchers commonly using AI to draft prose.

Coding agent users are posting more working papers and sending out more grant proposals, but not submitting more to journals

Are coding agents making researchers more productive? That’s the question motivating the broader study this survey kicks off. The experiment we are running on this question is still ongoing. But the baseline survey lets us compare coding agent users to others across a whole bunch of checkpoints in the research process. This comparison is purely descriptive: we compare researchers who select into coding agent use to those who do not, and expect that the two groups differ in a number of ways that we cannot adjust for. Differences should not be interpreted as causal, but as a first cut comparison between researchers using coding agents and those who are not.

Figure 5 shows self-reported output over the six months before the survey at different stages of the research process, from projects started to papers submitted. The adjusted estimates compare coding agent users to others, controlling for career stage, discipline, and the week they completed the survey. Coding agent users are starting more projects, posting more working papers, submitting more grants, and possibly sending out more conference submissions.

So are coding agent users writing more papers? First, consider the differences in early pipeline output between coding agent users and others. Coding agent users are starting projects at a pace of around a quarter of a paper more and posting around a half of a working paper more than non agent users. In percentage terms, coding agent users look around 10% (empirical projects started) to 75% (working papers posted) more productive than others in their discipline and career stage.

However, this productivity difference only appears for these early pipeline measures. We find no evidence that coding agent users are submitting more new papers to journals or resubmitting papers more quickly. This could reflect the timeline of getting a paper to submission, as coding agent use is a recent phenomenon. But it could also reflect that coding agents are more useful at getting projects up and running than they are at the last mile of perfecting a paper for journal submission.

Researchers expect AI tools to raise productivity, but are less confident that they will improve social science overall

We also asked researchers what they expected of AI tools. Does AI make social scientists more productive, in terms of writing publishable papers? And do they think AI will make the social sciences better or worse?

Researchers are optimistic about AI raising paper-writing productivity. On a 1 to 10 scale, 88% of respondents were above a 5, and half were at 8 or above. Figure 6 shows that these ratings vary strongly with AI use. The left side of the plot shows researchers that use AI for more types of tasks are more optimistic. The right side shows coding agent users are more optimistic than others.

The survey is drawing from people who are interested in trying these tools out, so it should not be surprising to see some optimism about productivity. But even among these optimists, there is a real gap between views about AI helping narrowly with publishable papers and broadly affecting the social sciences. 70% of respondents are more optimistic about paper productivity than about broader field impact. There are few researchers more optimistic about field impacts than about paper productivity, and many who are more pessimistic.

Social scientists who are using coding agents are posting more working papers and applying for more grants. Relative to others in their discipline and career stage, they are also starting more projects. But as of March 2026, they are not yet driving a surge in journal submissions. The increased number of project starts may be early evidence of productivity increasing. It could also indicate that early adopters were already more productive researchers.

Overall, early adoption of coding agents has tilted toward early career researchers, men, and those at higher status universities. Coding agent use is currently more unevenly distributed across these categories than LLM use more broadly. We also find suggestive evidence that researchers fear that the immediate benefits of rising paper productivity may come along with field-level costs. Perhaps more papers means congestion and competition for attention; perhaps respondents fear that some researchers will use AI tools in ways that exacerbate existing problems in social science, like selective reporting and risk-averse, incremental research.

There are several caveats to the findings in this report. The data presented here is based on an email survey of quantitative social scientists, recruited explicitly to participate in a study on workflows and AI use. We expect that the respondents are both heavier users and more optimistic about LLMs than non-responders. The early-stage productivity differences we see should be interpreted descriptively. The early adopters of coding agents may be more productive and otherwise different from non-adopters in many ways that we cannot measure directly in the survey. Finally, we only look here at the number of projects researchers report, and report nothing about their quality. In future updates on this study, we will show results comparing coding agent users to a clean comparison group, and assess whether the content, and not just quantity, of coding agent augmented work looks different.

Notwithstanding these limitations, we show that coding agents are diffusing into the social sciences. The way we study the economy and politics, for example, is increasingly through analysis decisions made in part by AI coding agents. We plan to bring more evidence in future reports on the potential and risks of this kind of automation.

Thomas Lyttelton, Maxim Massenkoff, Nathan Wilmers

Tim Belonax, Keir Bradwell, Jake Eaton, Rebecca Hiscott, Peter McCrory, Kerry Persen, Sarah Pollack, Santi Ruiz, Heather Whitney, Jack Clark.

Alvero, A. J., Stoltz, D. S., Stuhler, O., & Taylor, M. A. (2026). Generative AI in Sociological Research: State of the Discipline.Sociological Science,13, 45-62.

Korinek, A. (2025).AI agents for economic research(No. w34202). National Bureau of Economic Research. NBER Working Paper Series.

Liang, W., Zhang, Y., Wu, Z., Lepp, H., Ji, W., Zhao, X., ... & Zou, J. (2025). Quantifying large language model usage in scientific papers.Nature Human Behaviour, 1-11.

Lu, C., Lu, C., Lange, R. T., Yamada, Y., Hu, S., Foerster, J., ... & Clune, J. (2026). Towards end-to-end automation of AI research.Nature,651(8107), 914-919.

Wilmers, N., & Engzell, P. (2026). The Paper Factory. SocArXiv Preprints.

We launched the survey only a month after the release of Claude Cowork and before the release of OpenAI’s Codex app, so we focused this question on Command Line Interface tools for interaction with coding agents. This means we do not count researchers who use AI agents exclusively via more general purpose desktop apps.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

New research on how we've reduced agentic misalignment.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/coding-agents-social-sciences" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">We present results from a survey of 1,260 social scientists about AI and coding agent use, fielded in February and March 2026.

The vast majority of respondents (81%) have tried using AI chatbots in research, particularly for writing code and editing prose. But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work.

There are sharp disparities in use of coding agents. Twice as many researchers with typically male names use co...</p><div style="font-size:16px;line-height:1.8;color:#333">We present results from a survey of 1,260 social scientists about AI and coding agent use, fielded in February and March 2026.

The vast majority of respondents (81%) have tried using AI chatbots in research, particularly for writing code and editing prose. But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work.

There are sharp disparities in use of coding agents. Twice as many researchers with typically male names use coding agents as those with female names. Researchers at top universities are 40% more likely than others to use coding agents.

Users of coding agents post more working papers and grant proposals than others in the same discipline and career stage, but this could reflect pre-existing differences among early adopters.

Researchers are more optimistic about AI helping write publishable papers than about the effects of AI on the social sciences as a whole.

How are AI coding agents changing how we study the economy and society?

The human sciences are shifting: for the first time, core research tasks can be handed off to machines. AI chatbots increasingly contribute to scientific research, including in the mostprestigious publicationsand in thesocial sciences. This has spurred optimism that AI could boost research productivity—while also stoking fears about overloaded peer review and a deluge of academic AI slop.

But while turn-taking AI chatbots have primarily been used forwriting assistance, coding agents could restructure social science research more radically. Agentic coding platforms like Claude Code and Codex can take a research idea and a dataset, write and run an analysis, interpret the output, and iterate autonomously. What had been irreducibly human steps in empirical research can, for the first time, beautomated. At the extreme, researchers have built multi-agent pipelines to automatecomputer science researchand autonomously execute socialscience research ideas.

These tools could accelerate science and make it more daring: fast research execution should mean cheap and plentiful discovery. They could also amplify disparities in research resources and exacerbate congestion in the scholarly record. More deeply, as AI handles a broadening swath of research tasks, its distinctive analytical choices could stamp our collective understanding of our economy, our society, and ourselves.

In this post, we offer a first look, drawing on a survey of 1,260 quantitative social scientists fielded in early 2026. The survey is the baseline wave of a larger ongoing study of how coding agents affect research productivity, including a randomized experiment providing researchers with access to Claude Code. We will publish results from this experiment in the future. For now, we report what the baseline survey reveals about who is using these tools and for what; how output differs between users and non-users; and what researchers expect about the implications of growing adoption.

A new survey on AI coding agent use among quantitative social scientists

We fielded the survey in late February and March 2026, targeting active quantitative social scientists. This was not a representative sample—respondents were recruited for a study that offered access to Claude Max accounts, so selection into the sample could tilt toward researchers curious about AI tools. However, the respondents were fairly similar to an earlier sample that received a more generic invitation (see Table A2 in theAppendix).

Respondents were evenly split between economics, political science and sociology, each around a fifth of the sample, with management sciences and psychology close behind (see Table A1). We also received a smaller number of responses from public health, education and communications researchers. Roughly 40% were full or associate professors, 25% were assistant professors, and about 30% were doctoral students.

Coding agents haven't reached most social scientists

We measured overall AI use in two ways. First, we asked “Have you previously used genAI models to aid your research process?” 81% of respondents said yes.

But what about those who have actually adopted increasingly capable coding agents into their workflow? Here, we asked “Do you regularly (more than once a week) use an AI coding assistant integrated into your command line (such as Codex, Cursor, or Claude Code)?” In a follow-up question, we verified that they used one of those tools (or Google Antigravity).1

Only 20% of respondents use coding agents. Our survey came around two months after a flurry of discussion about Claude Code and Opus 4.6 that kicked off in late December of 2025. Yet even among interested respondents who self-selected into our survey, only ⅕ had adopted agents into their workflow. Claude Code is the most common coding agent tool reported, with 86% of users reporting Claude Code use (31% report using Codex, the next most common tool).

Adoption is highly uneven

Figure 1 shows there is large variation in the overall adoption rate, from 39% of economists and 25% of political scientists to single digits for public health (6%), education (4%) and communication (6%). This gradient roughly tracks differences across fields in overall AI use, but differences in coding agent adoption are steeper on average.

Just over a quarter of doctoral students and postdocs use coding agents at least weekly; among tenured professors that rate falls by more than half. The researchers adopting coding agents are the juniors—more technologically fluent, more likely to be working directly with code and data, and facing stronger career pressures to produce research.

Adoption differences extend beyond discipline and career stage. We classify researcher names according to gender and find that those with typically male names have adopted coding agents at more than twice the rate of respondents with typically female names. High-status and private universities also see notably higher use. All of these differences are significant at the p<0.05 level. These differences are starker than the differences in overall AI use, and suggest higher inequality, at least in this early period of coding agent adoption.

The gender gap in coding agent use does not just reflect a gap in rates of trying AI. Among respondents who have tried using AI for research, there is even a slightly larger gender gap in regular coding agent use than in the overall sample. These differences also persist when comparing across genders in the same disciplines and career stages.

Researchers mainly use AI to code and edit, not write

Among researchers using AI, whether through coding agents or chatbots, what are they actually using it for? Debate about AI in academic research has focused heavily on writing: hallucinated literature reviews, “it’s not X, it’s Y” strewn across formulaic introductions, and the possibility of fully automated paper writing.

But Figure 4 shows that the most common use, for both coding agent users and others, is for coding up analysis of quantitative data: 97% of coding agent users and 77% of other AI users report using it to generate code. Next most common is editing prose, followed by asking for methods advice and background on prior research. Aggregating across coding agent users and others, only a third of all AI users have used it to draft prose at all. These patterns generally hold across disciplines, with only economists and management researchers commonly using AI to draft prose.

Coding agent users are posting more working papers and sending out more grant proposals, but not submitting more to journals

Are coding agents making researchers more productive? That’s the question motivating the broader study this survey kicks off. The experiment we are running on this question is still ongoing. But the baseline survey lets us compare coding agent users to others across a whole bunch of checkpoints in the research process. This comparison is purely descriptive: we compare researchers who select into coding agent use to those who do not, and expect that the two groups differ in a number of ways that we cannot adjust for. Differences should not be interpreted as causal, but as a first cut comparison between researchers using coding agents and those who are not.

Figure 5 shows self-reported output over the six months before the survey at different stages of the research process, from projects started to papers submitted. The adjusted estimates compare coding agent users to others, controlling for career stage, discipline, and the week they completed the survey. Coding agent users are starting more projects, posting more working papers, submitting more grants, and possibly sending out more conference submissions.

So are coding agent users writing more papers? First, consider the differences in early pipeline output between coding agent users and others. Coding agent users are starting projects at a pace of around a quarter of a paper more and posting around a half of a working paper more than non agent users. In percentage terms, coding agent users look around 10% (empirical projects started) to 75% (working papers posted) more productive than others in their discipline and career stage.

However, this productivity difference only appears for these early pipeline measures. We find no evidence that coding agent users are submitting more new papers to journals or resubmitting papers more quickly. This could reflect the timeline of getting a paper to submission, as coding agent use is a recent phenomenon. But it could also reflect that coding agents are more useful at getting projects up and running than they are at the last mile of perfecting a paper for journal submission.

Researchers expect AI tools to raise productivity, but are less confident that they will improve social science overall

We also asked researchers what they expected of AI tools. Does AI make social scientists more productive, in terms of writing publishable papers? And do they think AI will make the social sciences better or worse?

Researchers are optimistic about AI raising paper-writing productivity. On a 1 to 10 scale, 88% of respondents were above a 5, and half were at 8 or above. Figure 6 shows that these ratings vary strongly with AI use. The left side of the plot shows researchers that use AI for more types of tasks are more optimistic. The right side shows coding agent users are more optimistic than others.

The survey is drawing from people who are interested in trying these tools out, so it should not be surprising to see some optimism about productivity. But even among these optimists, there is a real gap between views about AI helping narrowly with publishable papers and broadly affecting the social sciences. 70% of respondents are more optimistic about paper productivity than about broader field impact. There are few researchers more optimistic about field impacts than about paper productivity, and many who are more pessimistic.

Social scientists who are using coding agents are posting more working papers and applying for more grants. Relative to others in their discipline and career stage, they are also starting more projects. But as of March 2026, they are not yet driving a surge in journal submissions. The increased number of project starts may be early evidence of productivity increasing. It could also indicate that early adopters were already more productive researchers.

Overall, early adoption of coding agents has tilted toward early career researchers, men, and those at higher status universities. Coding agent use is currently more unevenly distributed across these categories than LLM use more broadly. We also find suggestive evidence that researchers fear that the immediate benefits of rising paper productivity may come along with field-level costs. Perhaps more papers means congestion and competition for attention; perhaps respondents fear that some researchers will use AI tools in ways that exacerbate existing problems in social science, like selective reporting and risk-averse, incremental research.

There are several caveats to the findings in this report. The data presented here is based on an email survey of quantitative social scientists, recruited explicitly to participate in a study on workflows and AI use. We expect that the respondents are both heavier users and more optimistic about LLMs than non-responders. The early-stage productivity differences we see should be interpreted descriptively. The early adopters of coding agents may be more productive and otherwise different from non-adopters in many ways that we cannot measure directly in the survey. Finally, we only look here at the number of projects researchers report, and report nothing about their quality. In future updates on this study, we will show results comparing coding agent users to a clean comparison group, and assess whether the content, and not just quantity, of coding agent augmented work looks different.

Notwithstanding these limitations, we show that coding agents are diffusing into the social sciences. The way we study the economy and politics, for example, is increasingly through analysis decisions made in part by AI coding agents. We plan to bring more evidence in future reports on the potential and risks of this kind of automation.

Thomas Lyttelton, Maxim Massenkoff, Nathan Wilmers

Tim Belonax, Keir Bradwell, Jake Eaton, Rebecca Hiscott, Peter McCrory, Kerry Persen, Sarah Pollack, Santi Ruiz, Heather Whitney, Jack Clark.

Alvero, A. J., Stoltz, D. S., Stuhler, O., & Taylor, M. A. (2026). Generative AI in Sociological Research: State of the Discipline.Sociological Science,13, 45-62.

Korinek, A. (2025).AI agents for economic research(No. w34202). National Bureau of Economic Research. NBER Working Paper Series.

Liang, W., Zhang, Y., Wu, Z., Lepp, H., Ji, W., Zhao, X., ... & Zou, J. (2025). Quantifying large language model usage in scientific papers.Nature Human Behaviour, 1-11.

Lu, C., Lu, C., Lange, R. T., Yamada, Y., Hu, S., Foerster, J., ... & Clune, J. (2026). Towards end-to-end automation of AI research.Nature,651(8107), 914-919.

Wilmers, N., & Engzell, P. (2026). The Paper Factory. SocArXiv Preprints.

We launched the survey only a month after the release of Claude Cowork and before the release of OpenAI’s Codex app, so we focused this question on Command Line Interface tools for interaction with coding agents. This means we do not count researchers who use AI agents exclusively via more general purpose desktop apps.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

New research on how we've reduced agentic misalignment.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/coding-agents-social-sciences" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Project Glasswing: An initial update</title>
  <link>https://www.anthropic.com/research/glasswing-initial-update</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/glasswing-initial-update</guid>
  <pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Last month, we launchedProject Glasswing, our collaborative effort to secure the world’s most critical software before increasingly capable AI models can be turned against it.

Since then, we and our approximately 50 partners have used Claude Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities across the most systemically important software in the world. Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’...</p><div style="font-size:16px;line-height:1.8;color:#333">Last month, we launchedProject Glasswing, our collaborative effort to secure the world’s most critical software before increasingly capable AI models can be turned against it.

Since then, we and our approximately 50 partners have used Claude Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities across the most systemically important software in the world. Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch the large numbers of vulnerabilities found by AI.

In this post, we discuss what we’ve learned about this critical challenge for cybersecurity in the first weeks of Project Glasswing. We focus on the early public evidence of Mythos Preview’s performance, on the initial results of our effort to scan thousands of open-source software projects, and on what this progress means for cyberdefenders today. We also cover what to expect next from Project Glasswing, and how we’re thinking about releasing Mythos-class models in the future.

Our approach to discussing Mythos Preview’s findings

The software industry’s longstanding convention is to disclose new vulnerabilities 90 days after they’re discovered (or, if a patch is created before the 90 days is up, around 45 days after the patch becomes available). This allows time for end users to update their software before a vulnerability can be exploited by attackers. Our ownCoordinated Vulnerability Disclosure policytakes this approach.

However, this means that disclosed vulnerabilities are a lagging indicator of the accelerating frontier of AI models’ cyber capabilities: we’re not yet at the point where we can fully detail our partners’ findings with Mythos Preview without putting end users at risk. Instead, we provide illustrative examples of the model’s performance, along with aggregate statistics on our progress to date. Once patches for the vulnerabilities that Mythos Preview has discovered are widely deployed, we’ll provide much more detail about what we’ve learned.

Evidence from our partners and external testers

Project Glasswing’s initial partners build and maintain software that is fundamental to the functioning of the internet and other essential infrastructure. Fixing flaws in their code reduces risk for the many other organizations that rely on it, and therefore reduces risk for billions of end users.

After one month, most partners have each found hundreds of critical- or high-severity vulnerabilities in their software. Collectively, they’ve found more than ten thousand. Several have told us that their rate of bug-finding has increased by more than a factor of ten. For instance,Cloudflarehas found 2,000 bugs (400 of which are high- or critical-severity) across their critical-path systems, with a false positive rate that Cloudflare’s team considers better than human testers.

This tallies with external testers’ experience of Mythos Preview’s performance, and with recent additional evaluations of the model:

The UK’s AI Security Institutereportsthat Mythos Preview is the first model to solve both of their cyber ranges (simulations of multistep cyberattacks) end to end;

Mozillafound and fixed271 vulnerabilitiesin Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6;

XBOW, an independent security platform,reportsthat Mythos Preview is a “significant step up over all existing models” on its web exploit benchmark, and provides “absolutely unprecedented precision” on a token-for-token basis;

ExploitBenchandExploitGym, two recently released academic benchmarks for measuring models’ exploit development capabilities, show Mythos Preview as the strongest performer. We discuss what these benchmarks tell us about the model in more detail on ourFrontier Red Team blog.

More generally, we’re now seeing that patched software is being rolled out much more quickly. The latest Palo Alto Networks release included overfive timesas many patches as usual. Microsofthas reportedthat the number of new patches they’ll release will “continue trending larger for some time.” And Oracle is finding and fixing vulnerabilities across its products and cloudmultiple times fasterthan before.

Mythos Preview has also proved useful for other kinds of security work. For example, at one of our Glasswing partner banks, Mythos Preview helped to detect and prevent a fraudulent $1.5 million wire transfer after a threat actor compromised a customer’s email account and made spoof phone calls.

For the last few months, Anthropic has used Mythos Preview to scan more than 1,000 open-source projects, which collectively underpin much of the internet—and much of our own infrastructure.

So far, Mythos Preview has found what it estimates are 6,202 high- or critical-severity vulnerabilities in these projects (out of 23,019 in total, including those it estimates as medium- or low-severity).

1,752 of those high- or critical-rated vulnerabilities have now been carefully assessed by one of six independent security research firms, or in a small number of cases by ourselves. Of these, 90.6% (1,587) have proved to be valid true positives, and 62.4% (1,094) were confirmed as either high- or critical-severity. That means that even if Mythos Preview finds no further vulnerabilities, at our current post-triage true-positive rates, it’s on track to have surfaced nearly 3,900 high- or critical-severity vulnerabilities in open-source code—in addition to those it has found for Project Glasswing’s partners. To be clear, we intend to continue scanning open-source code for some time, so we expect this number to rise.

One example of an open-source vulnerability that Mythos Preview detected was inwolfSSL, an open-source cryptography library that’s known for its security and is used by billions of devices worldwide. Mythos Previewconstructed an exploitthat would let an attacker forge certificates that would (for instance) allow them to host a fake website for a bank or email provider. The website would look perfectly legitimate to an end user, despite being controlled by the attacker. We’ll release our full technical analysis of this now-patched vulnerability (assignedCVE-2026-5194) in the coming weeks.

As we noted above, the bottleneck infixingbugs like these is the human capacity to triage, report, and design and deploy patches for them. Finding them in the first place has become vastly more straightforward with Mythos Preview. We’ve created adashboard of the open-source vulnerabilitieswe’ve scanned, below, which shows the different steps in our disclosure process and will track our progress over time. This shows vulnerabilities of all severity levels, rather than only the subset initially assessed as high- or critical-severity by Mythos Preview. Note the steep drop-off at each phase, reflecting the amount of human effort required to verify and fix each of the vulnerabilities.

Our process for triaging vulnerabilities is intensive. First, we or one of the external security firms we work with reproduce the issue that Mythos has found and re-assess its severity. Once we’ve confirmed that a vulnerability is real, we check for whether there are already fixes in place, and write a detailed report to the software’s maintainers. We take considerable care here: on top of the regular challenges of maintaining open-source software, maintainers have been facing a deluge of low-quality, AI-generated bug reports. Indeed, several maintainers have told us they’re currently severely capacity constrained, and some have even asked us to slow down our rate of our disclosures because they need more time to design patches. (On average, a high- or critical-severity bug found by Mythos Preview takes two weeks to patch.)

On maintainers’ request, we sometimes disclose bugs directly, without further assessment. We’ve now reported 1,129 such unvetted bugs, of which Mythos Preview estimated that 175 were high- or critical-severity.

We estimate that we’ve disclosed 530 high- or critical-severity bugs to maintainers so far. This is based on Claude’s assessment of severity in the case of direct disclosures, and maintainers’ or our security partners’ assessment where available. There are a further 827 confirmed vulnerabilities (estimated as high- or critical-severity in the same manner) that we’re aiming to disclose as quickly as possible.

75 of the 530 high- or critical-severity bugs we’ve reported have now been patched, and 65 of those have been given public advisories. The number of patches is still relatively low for three reasons. First, we’re still early in the 90-day window that’s set out in our Coordinated Vulnerability Disclosure policy: we expect many more patches to land soon. Second, we are likely to be undercounting patches because some vulnerabilities are patched without a public advisory: in those cases, we’re reliant on scanning for the patches ourselves using Claude. Third, the low volume of patches reflects a genuine problem: even at our relatively slow pace of disclosures, Mythos Preview is adding to an already-overloaded security ecosystem.

The relative ease of finding vulnerabilities compared with the difficulty of fixing them amounts to a major challenge for cybersecurity. Confronting this challenge successfully will make our software far safer than before. Below we discuss some ways that cyber defenders can adapt.

Adapting to a new phase of cybersecurity

Models with similar cybersecurity skills to Mythos Preview will soon be more broadly available. There is a clear need for a larger effort across the software industry to manage the volume of findings that these models will generate.

Currently, there’s often a long lag between the discovery of a vulnerability, the creation of a patch for it, and the time when the patch is widely deployed by end users. This leaves open a significant window for attackers to exploit critical software. Mythos-class models significantly shrink the time and cost required to find and exploit vulnerabilities, magnifying the risk associated with these time lags. Ultimately, Mythos-class models will enable developers to build far more secure software by catching bugs before they are deployed. But this interim period—while vulnerabilities are being rapidly discovered and slowly patched—presents new risks.

Software developers and users should act now to reduce their exposure to these risks. The advice below is not new, and many researchers (including at Anthropic) are currently working on better and more durable solutions. In the meantime, it’s important to get the basics right:

Software developersshould shorten their patch cycles and make security fixes available as quickly as possible. The thoughtful use of publicly available AI models can help here; we’re building tools and sharing our research to support this (more details below). Developers should also help their users stay up-to-date with their software by making it as easy as possible to install updates; to the extent feasible, they should be more persistent with users who are still running software with known vulnerabilities.

Network defendersshould shorten their patch testing and deployment timelines. The critical controls laid out by organizations like theNational Institute of Standards and Technologyand the UK’sNational Cyber Security Centreare now all the more important, since they improve security without depending on any single patch landing in time. These include steps like hardening networks’ default configurations, enforcing multi-factor authentication, and keeping comprehensive logs for detection and response.

Tools for cyberdefense with publicly available AI models

Many generally-available models can already find large numbers of software vulnerabilities, even if they can’t find the most sophisticated vulnerabilities or exploit them as effectively as Claude Mythos Preview. Project Glasswing has already spurred many other organizations to take action on their own codebases with these generally-available models; we’re working to make this much easier to do.

To begin, we’ve releasedClaude Securityin public beta for Claude Enterprise customers. It’s a tool that helps teams scan their codebases for vulnerabilities, and which can generate proposed fixes for them. In the three weeks since launch, Claude Opus 4.7 has been used to patch over 2,100 vulnerabilities. (This is faster than the open-source patching described above in large part because enterprises are fixing their own code, whereas open-source fixes usually require volunteer maintainers who work through coordinated disclosure.)

We’ve also begun ourCyber Verification Program, which allows security professionals using our models for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) to do so without certain safeguards designed to prevent cyber misuse.

Now, we’re making the tools that we and our partners have used with Mythos Preview available to qualifying customers’ security teams on request. Our aim is to make it much easier to get the best performance out of highly capable public models without extensive setup. This release includes:

Theskills(custom instructions for repeated work) that we and our partners have built and shared;

A harness that helps Claude map the codebase, spin up scanning subagents, triage its findings, and write reports;

A threat model builder, which maps a codebase to identify potential targets for attack and prioritizes the model’s work accordingly.

Cisco, one of our Project Glasswing partners, has also recently open-sourced itsFoundry Security Specto help other defenders build an evaluation system similar to the one they use themselves.

Supporting the ecosystem

We’ve formed apartnershipwith the Open Source Security Foundation’s Alpha-Omega project, which will support the foundation’s efforts to assist maintainers in processing and triaging bug reports. We’re also continuing to publish research into how frontier model capabilities can best support cyberdefenders.

We’ve also supported the development ofExploitBenchandExploitGym, the two new benchmarks that allow researchers to track frontier AI models’ exploit development capabilities over time, as we discusshere. We’re supporting the development of other high-quality quantitative benchmarks through ourExternal Researcher Access Program. Finally,Claude for Open Sourcesupports maintainers and contributors, and we’re committing to scan any open-source package that we adopt ourselves in the future.

What's next for Project Glasswing

The speed of AI progress means that models as capable as Mythos Preview will soon be developed by many different AI companies. At present, no company—including Anthropic—has developed safeguards strong enough to prevent such models from being misused and potentially causing severe harm. That is why we have yet to release Mythos-class models to the public. But it’s also why we began Project Glasswing: if a similarly capable model is releasedwithoutsuch safeguards, it will soon become dramatically cheaper and easier for almost anyone in the world to exploit flawed software.

Glasswing helps the most systemically important cyber defenders gain an asymmetric advantage. However, there is an urgent need for as many organizations as possible to shore up their cyber defenses. We hope that our generally available models, and the new tools, resources, and research we’re providing to accompany them, will support those organizations to improve their cybersecurity posture.

Next, we will work with critical partners—including US and allied governments—to expand Project Glasswing to additional partners. And in the near future, once we’ve developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release.

On the far side of these risks, there’s an encouraging world available to us: one in which important code is hardened far better than it is today, and in which hacking is far less prevalent. There are many obstacles, but we’re nonetheless confident that Project Glasswing can help get us there.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

New research on how we've reduced agentic misalignment.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/glasswing-initial-update" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Last month, we launchedProject Glasswing, our collaborative effort to secure the world’s most critical software before increasingly capable AI models can be turned against it.

Since then, we and our approximately 50 partners have used Claude Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities across the most systemically important software in the world. Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’...</p><div style="font-size:16px;line-height:1.8;color:#333">Last month, we launchedProject Glasswing, our collaborative effort to secure the world’s most critical software before increasingly capable AI models can be turned against it.

Since then, we and our approximately 50 partners have used Claude Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities across the most systemically important software in the world. Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch the large numbers of vulnerabilities found by AI.

In this post, we discuss what we’ve learned about this critical challenge for cybersecurity in the first weeks of Project Glasswing. We focus on the early public evidence of Mythos Preview’s performance, on the initial results of our effort to scan thousands of open-source software projects, and on what this progress means for cyberdefenders today. We also cover what to expect next from Project Glasswing, and how we’re thinking about releasing Mythos-class models in the future.

Our approach to discussing Mythos Preview’s findings

The software industry’s longstanding convention is to disclose new vulnerabilities 90 days after they’re discovered (or, if a patch is created before the 90 days is up, around 45 days after the patch becomes available). This allows time for end users to update their software before a vulnerability can be exploited by attackers. Our ownCoordinated Vulnerability Disclosure policytakes this approach.

However, this means that disclosed vulnerabilities are a lagging indicator of the accelerating frontier of AI models’ cyber capabilities: we’re not yet at the point where we can fully detail our partners’ findings with Mythos Preview without putting end users at risk. Instead, we provide illustrative examples of the model’s performance, along with aggregate statistics on our progress to date. Once patches for the vulnerabilities that Mythos Preview has discovered are widely deployed, we’ll provide much more detail about what we’ve learned.

Evidence from our partners and external testers

Project Glasswing’s initial partners build and maintain software that is fundamental to the functioning of the internet and other essential infrastructure. Fixing flaws in their code reduces risk for the many other organizations that rely on it, and therefore reduces risk for billions of end users.

After one month, most partners have each found hundreds of critical- or high-severity vulnerabilities in their software. Collectively, they’ve found more than ten thousand. Several have told us that their rate of bug-finding has increased by more than a factor of ten. For instance,Cloudflarehas found 2,000 bugs (400 of which are high- or critical-severity) across their critical-path systems, with a false positive rate that Cloudflare’s team considers better than human testers.

This tallies with external testers’ experience of Mythos Preview’s performance, and with recent additional evaluations of the model:

The UK’s AI Security Institutereportsthat Mythos Preview is the first model to solve both of their cyber ranges (simulations of multistep cyberattacks) end to end;

Mozillafound and fixed271 vulnerabilitiesin Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6;

XBOW, an independent security platform,reportsthat Mythos Preview is a “significant step up over all existing models” on its web exploit benchmark, and provides “absolutely unprecedented precision” on a token-for-token basis;

ExploitBenchandExploitGym, two recently released academic benchmarks for measuring models’ exploit development capabilities, show Mythos Preview as the strongest performer. We discuss what these benchmarks tell us about the model in more detail on ourFrontier Red Team blog.

More generally, we’re now seeing that patched software is being rolled out much more quickly. The latest Palo Alto Networks release included overfive timesas many patches as usual. Microsofthas reportedthat the number of new patches they’ll release will “continue trending larger for some time.” And Oracle is finding and fixing vulnerabilities across its products and cloudmultiple times fasterthan before.

Mythos Preview has also proved useful for other kinds of security work. For example, at one of our Glasswing partner banks, Mythos Preview helped to detect and prevent a fraudulent $1.5 million wire transfer after a threat actor compromised a customer’s email account and made spoof phone calls.

For the last few months, Anthropic has used Mythos Preview to scan more than 1,000 open-source projects, which collectively underpin much of the internet—and much of our own infrastructure.

So far, Mythos Preview has found what it estimates are 6,202 high- or critical-severity vulnerabilities in these projects (out of 23,019 in total, including those it estimates as medium- or low-severity).

1,752 of those high- or critical-rated vulnerabilities have now been carefully assessed by one of six independent security research firms, or in a small number of cases by ourselves. Of these, 90.6% (1,587) have proved to be valid true positives, and 62.4% (1,094) were confirmed as either high- or critical-severity. That means that even if Mythos Preview finds no further vulnerabilities, at our current post-triage true-positive rates, it’s on track to have surfaced nearly 3,900 high- or critical-severity vulnerabilities in open-source code—in addition to those it has found for Project Glasswing’s partners. To be clear, we intend to continue scanning open-source code for some time, so we expect this number to rise.

One example of an open-source vulnerability that Mythos Preview detected was inwolfSSL, an open-source cryptography library that’s known for its security and is used by billions of devices worldwide. Mythos Previewconstructed an exploitthat would let an attacker forge certificates that would (for instance) allow them to host a fake website for a bank or email provider. The website would look perfectly legitimate to an end user, despite being controlled by the attacker. We’ll release our full technical analysis of this now-patched vulnerability (assignedCVE-2026-5194) in the coming weeks.

As we noted above, the bottleneck infixingbugs like these is the human capacity to triage, report, and design and deploy patches for them. Finding them in the first place has become vastly more straightforward with Mythos Preview. We’ve created adashboard of the open-source vulnerabilitieswe’ve scanned, below, which shows the different steps in our disclosure process and will track our progress over time. This shows vulnerabilities of all severity levels, rather than only the subset initially assessed as high- or critical-severity by Mythos Preview. Note the steep drop-off at each phase, reflecting the amount of human effort required to verify and fix each of the vulnerabilities.

Our process for triaging vulnerabilities is intensive. First, we or one of the external security firms we work with reproduce the issue that Mythos has found and re-assess its severity. Once we’ve confirmed that a vulnerability is real, we check for whether there are already fixes in place, and write a detailed report to the software’s maintainers. We take considerable care here: on top of the regular challenges of maintaining open-source software, maintainers have been facing a deluge of low-quality, AI-generated bug reports. Indeed, several maintainers have told us they’re currently severely capacity constrained, and some have even asked us to slow down our rate of our disclosures because they need more time to design patches. (On average, a high- or critical-severity bug found by Mythos Preview takes two weeks to patch.)

On maintainers’ request, we sometimes disclose bugs directly, without further assessment. We’ve now reported 1,129 such unvetted bugs, of which Mythos Preview estimated that 175 were high- or critical-severity.

We estimate that we’ve disclosed 530 high- or critical-severity bugs to maintainers so far. This is based on Claude’s assessment of severity in the case of direct disclosures, and maintainers’ or our security partners’ assessment where available. There are a further 827 confirmed vulnerabilities (estimated as high- or critical-severity in the same manner) that we’re aiming to disclose as quickly as possible.

75 of the 530 high- or critical-severity bugs we’ve reported have now been patched, and 65 of those have been given public advisories. The number of patches is still relatively low for three reasons. First, we’re still early in the 90-day window that’s set out in our Coordinated Vulnerability Disclosure policy: we expect many more patches to land soon. Second, we are likely to be undercounting patches because some vulnerabilities are patched without a public advisory: in those cases, we’re reliant on scanning for the patches ourselves using Claude. Third, the low volume of patches reflects a genuine problem: even at our relatively slow pace of disclosures, Mythos Preview is adding to an already-overloaded security ecosystem.

The relative ease of finding vulnerabilities compared with the difficulty of fixing them amounts to a major challenge for cybersecurity. Confronting this challenge successfully will make our software far safer than before. Below we discuss some ways that cyber defenders can adapt.

Adapting to a new phase of cybersecurity

Models with similar cybersecurity skills to Mythos Preview will soon be more broadly available. There is a clear need for a larger effort across the software industry to manage the volume of findings that these models will generate.

Currently, there’s often a long lag between the discovery of a vulnerability, the creation of a patch for it, and the time when the patch is widely deployed by end users. This leaves open a significant window for attackers to exploit critical software. Mythos-class models significantly shrink the time and cost required to find and exploit vulnerabilities, magnifying the risk associated with these time lags. Ultimately, Mythos-class models will enable developers to build far more secure software by catching bugs before they are deployed. But this interim period—while vulnerabilities are being rapidly discovered and slowly patched—presents new risks.

Software developers and users should act now to reduce their exposure to these risks. The advice below is not new, and many researchers (including at Anthropic) are currently working on better and more durable solutions. In the meantime, it’s important to get the basics right:

Software developersshould shorten their patch cycles and make security fixes available as quickly as possible. The thoughtful use of publicly available AI models can help here; we’re building tools and sharing our research to support this (more details below). Developers should also help their users stay up-to-date with their software by making it as easy as possible to install updates; to the extent feasible, they should be more persistent with users who are still running software with known vulnerabilities.

Network defendersshould shorten their patch testing and deployment timelines. The critical controls laid out by organizations like theNational Institute of Standards and Technologyand the UK’sNational Cyber Security Centreare now all the more important, since they improve security without depending on any single patch landing in time. These include steps like hardening networks’ default configurations, enforcing multi-factor authentication, and keeping comprehensive logs for detection and response.

Tools for cyberdefense with publicly available AI models

Many generally-available models can already find large numbers of software vulnerabilities, even if they can’t find the most sophisticated vulnerabilities or exploit them as effectively as Claude Mythos Preview. Project Glasswing has already spurred many other organizations to take action on their own codebases with these generally-available models; we’re working to make this much easier to do.

To begin, we’ve releasedClaude Securityin public beta for Claude Enterprise customers. It’s a tool that helps teams scan their codebases for vulnerabilities, and which can generate proposed fixes for them. In the three weeks since launch, Claude Opus 4.7 has been used to patch over 2,100 vulnerabilities. (This is faster than the open-source patching described above in large part because enterprises are fixing their own code, whereas open-source fixes usually require volunteer maintainers who work through coordinated disclosure.)

We’ve also begun ourCyber Verification Program, which allows security professionals using our models for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) to do so without certain safeguards designed to prevent cyber misuse.

Now, we’re making the tools that we and our partners have used with Mythos Preview available to qualifying customers’ security teams on request. Our aim is to make it much easier to get the best performance out of highly capable public models without extensive setup. This release includes:

Theskills(custom instructions for repeated work) that we and our partners have built and shared;

A harness that helps Claude map the codebase, spin up scanning subagents, triage its findings, and write reports;

A threat model builder, which maps a codebase to identify potential targets for attack and prioritizes the model’s work accordingly.

Cisco, one of our Project Glasswing partners, has also recently open-sourced itsFoundry Security Specto help other defenders build an evaluation system similar to the one they use themselves.

Supporting the ecosystem

We’ve formed apartnershipwith the Open Source Security Foundation’s Alpha-Omega project, which will support the foundation’s efforts to assist maintainers in processing and triaging bug reports. We’re also continuing to publish research into how frontier model capabilities can best support cyberdefenders.

We’ve also supported the development ofExploitBenchandExploitGym, the two new benchmarks that allow researchers to track frontier AI models’ exploit development capabilities over time, as we discusshere. We’re supporting the development of other high-quality quantitative benchmarks through ourExternal Researcher Access Program. Finally,Claude for Open Sourcesupports maintainers and contributors, and we’re committing to scan any open-source package that we adopt ourselves in the future.

What's next for Project Glasswing

The speed of AI progress means that models as capable as Mythos Preview will soon be developed by many different AI companies. At present, no company—including Anthropic—has developed safeguards strong enough to prevent such models from being misused and potentially causing severe harm. That is why we have yet to release Mythos-class models to the public. But it’s also why we began Project Glasswing: if a similarly capable model is releasedwithoutsuch safeguards, it will soon become dramatically cheaper and easier for almost anyone in the world to exploit flawed software.

Glasswing helps the most systemically important cyber defenders gain an asymmetric advantage. However, there is an urgent need for as many organizations as possible to shore up their cyber defenses. We hope that our generally available models, and the new tools, resources, and research we’re providing to accompany them, will support those organizations to improve their cybersecurity posture.

Next, we will work with critical partners—including US and allied governments—to expand Project Glasswing to additional partners. And in the near future, once we’ve developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release.

On the far side of these risks, there’s an encouraging world available to us: one in which important code is hardened far better than it is today, and in which hacking is far less prevalent. There are many obstacles, but we’re nonetheless confident that Project Glasswing can help get us there.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

New research on how we've reduced agentic misalignment.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/glasswing-initial-update" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>2028: Two scenarios for global AI leadership</title>
  <link>https://www.anthropic.com/research/2028-ai-leadership</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/2028-ai-leadership</guid>
  <pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">We’re releasing a new paper that explains our views on the competition on AI between the US and China.

It’s essential that the US and its allies stay ahead of authoritarian governments like the Chinese Communist Party, or CCP. AI will soon becomepowerful enoughto be used to repress citizens at unprecedented scale, and even to alter the balance of poweramong nations. And since AI is advancing more quickly by the day, we have only a limited period of time to set the conditions of the competition—...</p><div style="font-size:16px;line-height:1.8;color:#333">We’re releasing a new paper that explains our views on the competition on AI between the US and China.

It’s essential that the US and its allies stay ahead of authoritarian governments like the Chinese Communist Party, or CCP. AI will soon becomepowerful enoughto be used to repress citizens at unprecedented scale, and even to alter the balance of poweramong nations. And since AI is advancing more quickly by the day, we have only a limited period of time to set the conditions of the competition—and determine whether and how those threats materialize. It’s with this in mind that we outline what’s required to ensure America stays ahead.

The most important ingredient for developing AI is access to the computer chips on which the models are trained (or “compute”). Since the most capable chips are developed by American companies, the US government currently limits China’s supply by enforcing tight export controls on them. Recent history suggests these controls have been incredibly successful. In fact, AI labs in China have only built models close in intelligence to America’s because of their talent, their knack for exploiting loopholesaroundthese export controls, and theirlarge-scale distillation attacksthat illicitly extract the innovations of American companies.

In this post, we present two scenarios for what the world might look like in 2028, when we expect transformative AI systems to have arrived.

In the first scenario, America has successfully defended its compute advantage. Policymakers have acted to tighten export controls further, disrupt China’s distillation attacks, and further accelerate democracies’ adoption of AI. In this world, democracies set the rules and norms around AI. It’s also in this scenario that we’re most likely to successfully engage with China on safety, which we’re supportive of to the extent this is possible.

In the second scenario, America has chosen not to act. Policymakers have not tightened loopholes on the CCP’s access to compute, and AI firms in China have quickly taken advantage—catching up to the frontier and even overtaking America. In this world, AI norms and rules are shaped by authoritarian regimes, and the best models enable automated repression at scale. It will be no solace that this authoritarian triumph has happened on the back of American compute.

America and its allies approach AI competition from a position of great strength. The tools for AI dominance have been built by an exceptionally innovative ecosystem of companies in democratic nations. Our past success means that our present task is largely to avoid squandering our advantage: to decide not to make it easier for the CCP to catch up.

Two scenarios for the US and China in 2028

Democracies, not authoritarian regimes, must lead in AI development and deployment. These countries and political systems can shape the rules and norms that govern these systems.

Democracies currently hold a substantial lead in compute, the most important ingredient for developing frontier AI models. That lead exists thanks to American and allied innovation, and to bipartisan US export controls that defend those innovations. But on model intelligence, AI labs in the People’s Republic of China (PRC), under the jurisdiction and control of the Chinese Communist Party (CCP), are not far behind. We focus on the CCP as it is the regime that is most able to use frontier AI to cement authoritarianism; we do not seek to undermine the interests or ingenuity of the Chinese people. Already, the CCP is using AI to censor speech, repress dissidents, hack governments and corporations across the world, and strengthen the People’s Liberation Army (PLA).

AI labs in China have world-class talent. It is compute constraints that limit their ability to keep up. Labs in China have remained close by exploiting loopholes in US export control policies, and by carrying out large-scale distillation attacks that harvest the innovations of US models in order to mimic their capabilities.

With the supply of compute expanding rapidly, and with AI being used increasingly to augment the training of new AI models, we’re entering a period of great acceleration in AI capabilities. The “country of geniuses in a data center”—the level of intelligence we associate with transformative AI—may be close at hand. This acceleration makes policy action more urgent. To date, by allowing export control evasions and distillation attacks, we have let the CCP’s AI efforts trail closely up the frontier curve. But if the US and its allies act now to address both issues, it may be possible to lock in a 12-24 month lead in frontier capabilities. A lead that large by 2028 would be enormously advantageous. Such a lead would also augment efforts to engage with AI experts in China on AI safety and governance, which we support. But the window of opportunity to lock in that lead will not necessarily remain open for long.

Here, we present two potential scenarios for the state of US-China AI competition in 2028. The first scenario is one in which democracies have established a commanding lead in model intelligence, adoption, and global distribution. This scenario can be achieved if policymakers act now to tighten controls on advanced compute to PRC labs, disrupt their efforts to distill America’s best AI models, and accelerate democracies’ adoption of AI.

The second scenario is one in which the CCP is competitive at the near-frontier. This scenario happens if policymakers don’t build on our existing lead, or if they loosen restrictions on access to compute for PRC firms.

Many in Congress and the Trump administration have championed export controls, curbing distillation attacks, and exporting American AI. In advancing these policies, we are hopeful that democracies can secure a commanding lead by 2028, and avoid a destabilizing neck-and-neck race with the CCP two years from now.

The imperatives of staying ahead

We expect frontier AI to have transformational economic and societal impacts in the coming years, as described inMachines of Loving GraceandThe Adolescence of Technology. Our mission is to ensure that humanity navigates the transition to transformative AI safely and beneficially. We believe that a successful transition can lead to astonishing breakthroughs in medicine, invention, and economic growth.

The threat of authoritarian AI

Whether that transition goes well depends in part on where the most capable systems are built first. The political systems in which the most advanced AI is created will shape the rules and norms for how the technology is developed and deployed. In turn, those rules and norms will help determine whether the technology is safe, whose security it protects, and whose interests it ultimately serves. We believe that responsibility should rest with democratically elected governments, not authoritarian regimes.

If the frontier is set by regimes that treat AI as an instrument of repression, military advantage over democracies, and domestic control, thetransitionis less likely to go well, for those regimes’ own citizens or anyone else.

Historically, the reach of authoritarian rule has been limited by its dependence on human enforcers to carry out surveillance and repression. Powerful AI systems may remove that dependency, enabling automated repression on a far greater scale. For that reason, the prospect of the CCP leading in AI is among the greatest threats to a successful transition.

The CCP holds enormous power and influence at the helm of China’s economy, military, and the largest authoritarian state structure on Earth. It is also theonly country besides the USwith well-resourced, highly talented AI labs chasing the frontier. Furthermore, the CCP ishighly motivatedto establish China as the leading AI power. Beijing haspouredtens of billions of dollars into China’s AI and semiconductor sectors.

Already, theCCP uses AI systemsto censor speech, enforce draconian policies on ethnic minorities, andhackmajor corporations and government agencies. The CCP’s vision of AI-enabled techno-authoritarianism has been extensivelydocumentedin Xinjiang, where state security agencies havesystematically deployedfacial recognition technology, biometric data collection, and communications surveillance, enabling repression at a scale that humans alone could not achieve. Frontier AI systems will make those capabilities cheaper to maintain, far more pervasive, and more sophisticated. The CCP’sexportof these technologies has enabled autocrats in other countries to more effectively stifle dissent,entrenchingauthoritarianism. A CCP-led AI frontier could dramatically strengthen repression around the world.

AI is a dual-use technology

Frontier AI will shape the future military balance. CCP leadership already operates on that premise, and is building its military for an AI-enabled battlefield. PLA strategists view the“intelligentization”of their military forces as the means with which tocatch up and eventually surpassthe US military. The PLA is alreadyprocuringcommercially developed Chinese AI systems for military use, including DeepSeek modelsdeployedto coordinate swarms of unmanned vehicles and enable cyber offense capabilities. These capabilities will not diffuse slowly. When a new model reaches a new capability in autonomous targeting, vulnerability discovery, or swarm coordination, for example, the regime that controls it canput it onto the field in weeks, not years.

The risk compounds because frontier AI will be anaccelerant for other critical technologies. Advanced AI models will be able to compress research and development (R&D) cycles in semiconductors, biotech, and advanced materials. A lead in frontier AI will enable a widening lead across the full national security technology stack.

If a PRC AI lab had developed a model at the level ofClaude Mythos Previewbefore an American one, the CCP would have had first access to a system that can autonomouslydiscover and chainsoftware vulnerabilities, which it could have used to furtherpenetratecritical American infrastructure. Future models will be exponentially more capable, and therefore have commensurately greater implications for the national security interests of the US and other democracies.

Neck-and-neck competition risks disincentivizing responsible AI

A neck-and-neck race between American and Chinese AI labs could make industry and government-led safety and governance efforts more difficult, and less likely. If PRC labs are either close behind or at par with models in the US, private AI firms in the US and China are likely to feel more pressure to release new models and products faster, without taking prudent pre-deployment safety measures. Governments could become reluctant to enact policies to encourage responsible AI development and deployment, for fear of falling behind.

While increasing numbers of researchers in China’s AI labs and policy community are concerned with AI safety risks, this trend has not translated into safety practices on par with labs in the US.As of last year, only 3 out of 13 top Chinese AI labs published any safety evaluation results, and none disclosed evaluations for Chemical, Biological, Radiological, and Nuclear (CBRN) risks. TheCenter for AI Standards and Innovation(CAISI) found that DeepSeek’s R1-0528 model complied with 94% of overtly malicious requests under a common jailbreaking technique, compared with 8% for US reference models. This pattern has continued in more recent releases. For example,an independent assessmentof Moonshot’s Kimi K2.5 published in April found that the model failed to refuse CBRN-related requests at a far higher rate than US frontier models. Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.

Our policy objective: creating and maintaining a lead for democracies

We support policies in the US and other countries that build and maintain a safe, near-term lead over the CCP in intelligence, domestic adoption, and global distribution. This lead is key to avoiding authoritarian AI leadership and protecting the national security interests of the US and other democracies. Doing so is a fundamental prerequisite to ensuring that democratic states can achieve favorable terms with authoritarian states.

Anthropic deeply respects the Chinese people and the accomplishments of the Chinese AI community. We hope for peaceful relations between China and the world. Our concerns are specifically with the risks to humanity posed by any powerful authoritarian political systems with access to frontier AI systems.

Opportunities for engagement on AI safety

Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it is developed and deployed. There are a range of risks that could emerge from frontier AI systems requiring engagement between the US and China. Efforts that identify shared challenges and advance ideas to prepare for and mitigate these risks are in our shared interests.

The prospects for productive engagement are best when the USmaintainsa large capabilities advantage. Responsibly building a lead in developing and deploying the most advanced AI augments our ability to influence AI safety in China and elsewhere.

The Mythos Preview wake-up call

Mythos Preview, a model that we released to select partners as part ofProject Glasswingin April, signals the arrival of an acceleration period that makes policy action even more urgent. With access to the model,Firefoxwas able to fix more security bugs last month than it had in all of 2025, andalmost 20 times morethan its monthly average security bug fixes in 2025. In response to the model, onePRC cybersecurity analystwrote that China is “still sharpening our swords while the other side has suddenly mounted a fully automatic Gatling gun.”

Frontier AI capabilities will quickly approach the “country of geniuses in a datacenter” portrayal of transformative AI. This acceleration will be driven by thelogic of scaling laws, in which model performance improves predictably with increases in computing power and data inputs, and by AI itselfincreasinglybeing used to accelerate the development of new models.

There is a high likelihood that we will look back on 2026 as the breakaway opportunity for American AI. American labs have the most advanced AI models, alarge leadin both the quantity and quality of the advanced AI chips required to push the frontier, and a colossal capital advantage from revenues and financing to back the necessary investments to achieve it. PRC labs have real strengths: world-class, innovative talent, abundant and cheap energy, and plenty of data. All are requirements for developing frontier intelligence. But they simply do not have sufficient domestic compute to compete, nor do they have the revenues and capital to fund it.

Four fronts of the competition

The US and China are engaged in a competition for strategic advantage in frontier technologies like AI. Statements from bothBeijingandWashingtonreflect that view. Calling that competition a “race” can give the false impression that there is a finish line, after which one side will conclusively secure victory. Rather, the competition will be an ongoing contest for advantage, in which either democracies or authoritarian regimes successfully position themselves to shape the values, rules, and norms of an AI-enabled future.

This competition is playing out on four fronts:

Intelligence: which countries develop the most capable AI models.

Domestic adoption: which countries integrate AI most effectively across commercial and public sectors.

Globaldistribution: which countries deploy the global AI stack on which the world economy runs.

Resilience: which countries sustain political stability through the economic transition.

Intelligence is the most important of the four fronts. We anticipate that frontier model capabilities will drive the most consequential changes for geopolitical competition. Model capabilities are also a primary driver of market adoption and global distribution.

But intelligence alone is not sufficient. If the CCP integrates near-frontier AI systems quicker and more effectively into China’s economy and the CCP security apparatus, and drives global adoption of subsidized, low-cost AI, then it could secure advantages over democracies that overcome an intelligence deficit. Beijing’sAI+ Initiativeand its focus on“embodied intelligence”accordingly put high priority on policies that advance the integration of frontier intelligence into their economy and state apparatuses. The Trump administration’sAI Action Plan, and its focus on “promoting the export of the American AI technology stack,” also speaks to the strategic advantage of driving global adoption.

While we won’t focus on it in this essay, we believe resilience will be an important front of AI competition. Being able to sustain stability, cohesion, and good policymaking in this period will be a critical advantage, and a vulnerability for those who cannot.

The state of the competition

Compute—the advanced semiconductors needed to train and deploy frontier AI—is an essential input on each front of the competition described above. The race for global AI leadership is in large part a race for compute. For more than a decade, modelcapability has scaledwith compute, and themajority of performance gainsin AI capabilities have historically come from simply using more of it. Moreover, compute is needed to serve customers’ use of AI (also known as “inference” capacity), not just to train new models. Compute will be critical both for training the most intelligent models and for deploying them in commercial and national security spheres. Access to top talent, copious amounts of data, and critical algorithmic advances all matter to the race for intelligence—but each of those inputsis irrelevantif the compute is insufficient.

Democracies are winning thecompetition for compute leadershiptoday. While some worry that export controls could accelerate the CCP’s own efforts to develop an advanced chip supply chain, little evidence suggests that China’s indigenization effortswill challengeUS and allied leadership in advanced compute technology. Beijing hasinvested enormous resourcesinto China’s chip sector, with major industrial policy initiatives like theMade in China 2025strategy and theChina Integrated Circuit Industry Investment Fundlaunched years before the imposition of export controls. Despite this state-backed investment, PRC AI labs and chipmakers remain stymied by US and allied export controls on advanced chips and chipmaking equipment.

As a result, the compute gap appears to be widening. Ananalysisof Huawei and NVIDIA’s roadmaps found that Huawei will produce just 4% of NVIDIA’s aggregate compute in 2026 in total processing performance, and 2% in 2027. Moreover, NVIDIA represents only part of the US and allied compute ecosystem, withGoogleandAmazonramping up production of their own chips (TPUs and Trainium, respectively) to meet demand from American frontier AI labs and their customers.

Further exacerbating their compute shortfalls, China has made little progress in many of the most technologically complexsegmentsof the semiconductor supply chain. Without access to extreme ultraviolet (EUV) technology, and even more so if policymakers can close loopholes on deep ultraviolet (DUV) technology and servicing and maintenance thereof, China’s chipmakers will remain unable to manufacture chips in sufficient quantity or quality to challenge US compute leadership. China’sinabilityto manufacture high-bandwidth memory at scale further exacerbates this gap. If the US strengthens its restrictions on the CCP’s ability to access US compute, one studyestimatesthat America will have access to roughly 11 times more compute than China’s AI sector.

How democracies built the lead: commercial innovation and smart public policy

There are two main reasons for the compute lead. The first is the incredible innovation of companies like NVIDIA, AMD, Micron, TSMC, Samsung, ASML, and others across democracies like Japan, South Korea, Taiwan, the Netherlands, and the US, who together have built the unique technologies in the world’s most advanced semiconductors. Today’s AI achievements would not be possible without the feats of engineering and decades of sustained R&D investments that contributed to these products.

The second reason is forward-looking, decisive policy actionacross the last three presidential administrations. Bipartisan policy action has protected the US and allied innovation engine by restricting access to the US AI stack by PRC firms under the jurisdiction of the CCP. Our CEO haspubliclycommentedon theimportanceof export controls, for example. These controls have curbed the sale of the highest-end AI chips and semiconductor manufacturing equipment (SME) to China over the last several years, constraining China’s frontier AI development even as Beijing has poured enormous stateresourcesinto the sector. Without action to limit China’s access to US compute, the CCP would have had all the ingredients to develop AI at par or superior to America’s.

Some observers worry that constraining access to compute will force AI labs in China to innovate on other axes, reducing the American lead. While PRC labs are innovating, these innovations are so far not sufficient to overcome their compute deficit. Algorithmic improvements are both a function and a multiplier of compute,not a substitute for it, and discovering those advances is itself a compute-intensive process: more compute enables labs to run more experiments, which enables labs to discover more algorithmic improvements. As frontier models increasingly conduct AI R&D themselves, that loop will tighten further, and frontier models will help build their own successors. In short, compute advantage compounds into algorithmic advantage, and from there into a durable lead in AI itself.

Today, US frontier systems areestimatedto be at least several months ahead of the top models from PRC AI labs on intelligence, though these estimates are necessarily uncertain. Despite the attention paid to open-weight models from China, theirenterprise adoptionlags closed frontier models, and monetization concerns havesurfacedamong public investors. Moreover, AI labs in China seem to bemoving away from open source, now choosing to keep their best models proprietary.

China’s own AI leaders confirm the impact of export controls, and the critical need for US chips. Executives at top PRC AI labs haveexpressed worriesthat China will fall further behind due to compute constraints. Top Chineselabscitecompute scarcity as a chief constraint to accelerating model capabilities, and they identifyexport controlsas the reason for this constraint. One executive of a China-based hyperscalercalledthe impact of supplying export-controlled US chips to China “huge, really huge,” adding that any supply gap severely impacts China’s AI development and dismissing concerns that importing U.S. chips would slow their self-sufficiency efforts. The primary voices in China suggesting export controls are futile seem to beCCP officialsandstate media, likely angling to influence US policymakers.

How the CCP stays competitive: policy loopholes remain

While export controls have been effective in providing today’s advantage, they have not gone far enough. Despite the CCP’s inability to manufacture enough advanced chips domestically or purchase them legally abroad, AI labs in China have been able to stay close on intelligence through two workarounds:illicit and evasive compute access, by smuggling AI chips directly into China and accessing offshore data centers, andillicit model access, through which they carry outdistillation attackson US frontier models and use those same models as tools to accelerate their own AI R&D.

China’s evasion of US export controls is an open secret. For example, federal prosecutorschargeda Supermicro co-founder and two others with diverting $2.5 billion worth of servers containing advanced US chips to China. According toUS governmentandmediareports, DeepSeek trained its latest model on advanced US chips that are banned from sale to China. TheFinancial Timesreportedthat Alibaba and ByteDance now train their flagship models on export-controlled US chips in data centers located in Southeast Asia, a route current controls do not reach because US export law covers the sale of chips, not remote access to them.1The US export control system isstrugglingto prevent PRC AI labs’ access to advanced US-origin compute.

Distillation attacks, in which China-based labs create thousands of fraudulent accounts to circumvent access controls on US AI models and systematically harvest their outputs to replicate frontier capabilities, are another illicit technique used by PRC labs to catch up to their US counterparts and blunt the impact of export controls. The practice allows labs based in China to free-ride on decades of foundational research, billions of dollars in US investment, and the work of thousands of the world’s best engineers that produced US frontier models. The result is near-frontier capability at a fraction of the cost, subsidized by the United States. It is systematic industrial espionage of a technology critical to long-term US national security interests.OpenAI,Google,Anthropic, and theFrontier Model Forumhave all publicly condemned the practice of distillation attacks.

AI experts in China openly acknowledge distillation attacks’ scale and importance to China’s AI development. Arecent articlein a state-owned media outlet described distillation attacks on US models as the “back door” China’s AI labs depend on as a core part of their business model. Anex-ByteDance researchersaid that PRC AI labs use distillation as a shortcut to train models, allowing them to avoid investing into their own data pipelines.

US policymakers have moved quickly to address this threat. The White House Office of Science and Technology Policypublished a memoon distillation attacks. Senior officials in theWhite House,Department of War, and members ofCongresshave also called attention to this problem. Recentlegislationfrom the House Foreign Affairs Committee to address distillation attacks passed out of committeeunanimously.

If policymakers in the US and allied democracies act to close these two channels propping up China’s AI models—illicit and evasive compute access and illicit model access—then we have a potentially once-in-a-generation opportunity to secure our lead.

Two scenarios for 2028

Below, we describe two hypothetical future scenarios to help illustrate how policy actions taken today can shape where we are in 2028.

Scenario one: America and our allies have a commanding and expanding lead

America’s compute edge remains strong.Despite increased state support for China’s semiconductor industry, China’s chipmakers remain years behind their US and allied counterparts, stymied in part by their inability to access advanced SME tooling, servicing, and maintenance. The US-PRC compute gap is widening as increased US and allied chipmaking capacity comes online and as advanced chipmakers continue to innovate on more efficient and performant chips. In tandem, US policymakers have taken action to close loopholes in the US economic security toolkit, and efforts to smuggle chips into China and access export-controlled chips in data centers outside the country are increasingly frustrated by well-funded enforcement efforts.

Consequently, US AI models are 12-24 months ahead on intelligence, and the lead is growing. A small number of AI labs lead at the frontier with the most intelligent, capable, and performant models. All are based in the US. The “country of geniuses in a data center” has become a reality across critical industries, including cybersecurity, finance, healthcare, and life sciences. When US frontier labs release new models in 2028 that achieve step-function advances in capabilities (similar to the relative impact of Mythos Preview in April 2026), China will not have access to similar AI capabilities until 2029 or 2030. This gives critical breathing room for democracies to set the rules and norms of frontier AI systems.

American AI is the backbone of the global economy, driving new economic and scientific dynamism. The Trump administration's efforts to drive domestic AI adoption and promote the export of American AI are succeeding, and the resulting gains from the adoption of powerful AI both at home and abroad are driving unprecedented economic growth and technological advancements. Global adoption of US AI has skyrocketed. Democracies’ lead in capabilities and compute means that China’s AI firms do not compete for global market share outside of a narrow group of autocracies. The world’s top frontier AI systems are shaped by democratic values and make it more difficult for authoritarian states to use AI systems to infringe on rights and civil liberties.

Cyber and other national security advantages expand. Public and private sector cyber operators and security professionals use advanced AI systems to reduce the attack surface in America and other democracies and blunt the CCP’s ability to gain and maintain cyber footholds in our systems, making our national security assets, IP, and communications networks more secure. The United States' overwhelming AI advantage is a powerful deterrent to aggression.

A self-reinforcing cycle compounds democracies’ leadership.A commanding AI advantage makes the United States and its allies more attractive partners. That alignment expands both the market for American AI and the coalition setting global AI norms, which in turn promotes the development and deployment of AI systems that are safe, secure, and protective of civil liberties. The world’s top technical and scientific talent continues to gravitate to where the frontier is being built. The United States gains significant leverage with which to incentivize cooperation from Beijing on critical issues like AI governance, strategic competition, and trade. This cycle reinforces itself: the lead strengthens the coalition, the coalition strengthens the lead, the democracy-led international order is anchored through the transition to transformative AI.

Scenario two: The CCP-controlled AI ecosystem is neck-and-neck

AI developed and deployed in China is near-frontier on model intelligence. Despite a weak semiconductor production capacity, models trained by PRC AI labs are only a few months behind US models. Ongoing distillation attacks, overseas compute access, weak SME export enforcement, and a loosening of export controls on American semiconductors have assisted CCP efforts. Continued access to US frontier AI for AI R&D has also enabled AI labs in China to close the gap and approach parity with their US counterparts.

Rapid commercial and state adoption. Beijing has championed a whole-of-nation push on domestic adoption via “AI+” policies. Even though China's AI models are slightly less capable than US models, CCP efforts to accelerate adoption have paid off. China is thus able to deploy near-frontier AI capabilities more advantageously across economic, military, and technological domains, shifting the balance of power in China’s favor.

The CCP’s AI-enabled cyber force is a serious threat. The CCP’s integration of AI-enabled cyber capabilities within an already advanced cyber force has sustained the PLA as a menacing cyber competitor. PLA cyber actors have gained additional access to critical and dual-use infrastructure in the US and most countries around the world, enabling them to disrupt critical national security and societal functions. As AI is incorporated deeper into our most critical systems, democracies enjoy no security advantages over China in AI, despite having developed the technology first.

Beijing is winning in global adoption on cost and on-prem flexibility. Huawei and Alibaba data centers are globally prevalent, especially in, but not limited to, lower cost markets in the Global South. These data centers scale on older chips, which China is able to export because it can serve its domestic market with a combination of US chips purchased with an export license, smuggled into China, or remotely accessed in overseas data centers. They host second-tier, but cheaper and still effective models produced by PRC labs. Similar to the Huawei playbook of being cheap and “good enough,” China’s near-frontier models and hardware support a non-trivial and rapidly growing segment of the global economy. This infrastructure advantage gives CCP leadership significant influence over those markets.

Ensuring democracies lead

To ensure we land in scenario one, we support the following areas of policy action.

Close the loopholes: Smuggled chips, foreign data center access, and SME.Today, PRC labs benefit from access to export-controlled American chips viasmugglingandforeign data centers, and gaps in SME controls accelerate their self-sufficiency efforts. Tightening controls and ramping upenforcement budgetscan help close these loopholes that prop up the CCP’s AI ecosystem. It would lower China’s compute ceiling and correspondingly slow their AI advances, thus sustaining and expanding democracies’ AI lead. Note that a lower compute ceiling could also materially impair distillation attacks, as AI labs in China still require a minimum threshold of compute to illicitly distill effectively.

Defend our innovations: Restrict model access and deter distillation attacks.Policymakers in Congress and the executive branch can continue to support policy actions to punish and disincentivize distillation attacks from PRC labs, while also taking steps to facilitate US labs’ ability to detect and prevent distillation attacks on its own. These could include a legislative clarification that distillation attacks are illegal, and efforts to facilitate threat intel and technical sharing between peer American labs as well as with the US Government. Curbing this behavior can materially extend a democratic lead in the coming months and years.

Champion the export of American AI.As public and commercial sectors around the world increasingly adopt AI, the Trump administration should continue its efforts topromote the global adoptionof trusted AI hardware and models developed and shaped by democratic principles. Locking in trusted American infrastructure now denies the CCP’s AI ecosystem the global footholds it needs to compete on cost and adoption in the future.

America and its allies have developed both the world’s most capable frontier AI models and the world’s most advanced inputs to AI. This has provided a substantial advantage. If our superior access to that technology is defended, that advantage can be extended. But it will be lost if it is given directly to our competitors. The decisions made by policymakers this year will determine the future of transformative AI. We support those working to ensure that American and allied democracies are winning in 2028.

In January 2026, the Housepassed a bipartisan bill369–22 to close that loophole; the bill has not passed the Senate.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

New research on how we've reduced agentic misalignment.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/2028-ai-leadership" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">We’re releasing a new paper that explains our views on the competition on AI between the US and China.

It’s essential that the US and its allies stay ahead of authoritarian governments like the Chinese Communist Party, or CCP. AI will soon becomepowerful enoughto be used to repress citizens at unprecedented scale, and even to alter the balance of poweramong nations. And since AI is advancing more quickly by the day, we have only a limited period of time to set the conditions of the competition—...</p><div style="font-size:16px;line-height:1.8;color:#333">We’re releasing a new paper that explains our views on the competition on AI between the US and China.

It’s essential that the US and its allies stay ahead of authoritarian governments like the Chinese Communist Party, or CCP. AI will soon becomepowerful enoughto be used to repress citizens at unprecedented scale, and even to alter the balance of poweramong nations. And since AI is advancing more quickly by the day, we have only a limited period of time to set the conditions of the competition—and determine whether and how those threats materialize. It’s with this in mind that we outline what’s required to ensure America stays ahead.

The most important ingredient for developing AI is access to the computer chips on which the models are trained (or “compute”). Since the most capable chips are developed by American companies, the US government currently limits China’s supply by enforcing tight export controls on them. Recent history suggests these controls have been incredibly successful. In fact, AI labs in China have only built models close in intelligence to America’s because of their talent, their knack for exploiting loopholesaroundthese export controls, and theirlarge-scale distillation attacksthat illicitly extract the innovations of American companies.

In this post, we present two scenarios for what the world might look like in 2028, when we expect transformative AI systems to have arrived.

In the first scenario, America has successfully defended its compute advantage. Policymakers have acted to tighten export controls further, disrupt China’s distillation attacks, and further accelerate democracies’ adoption of AI. In this world, democracies set the rules and norms around AI. It’s also in this scenario that we’re most likely to successfully engage with China on safety, which we’re supportive of to the extent this is possible.

In the second scenario, America has chosen not to act. Policymakers have not tightened loopholes on the CCP’s access to compute, and AI firms in China have quickly taken advantage—catching up to the frontier and even overtaking America. In this world, AI norms and rules are shaped by authoritarian regimes, and the best models enable automated repression at scale. It will be no solace that this authoritarian triumph has happened on the back of American compute.

America and its allies approach AI competition from a position of great strength. The tools for AI dominance have been built by an exceptionally innovative ecosystem of companies in democratic nations. Our past success means that our present task is largely to avoid squandering our advantage: to decide not to make it easier for the CCP to catch up.

Two scenarios for the US and China in 2028

Democracies, not authoritarian regimes, must lead in AI development and deployment. These countries and political systems can shape the rules and norms that govern these systems.

Democracies currently hold a substantial lead in compute, the most important ingredient for developing frontier AI models. That lead exists thanks to American and allied innovation, and to bipartisan US export controls that defend those innovations. But on model intelligence, AI labs in the People’s Republic of China (PRC), under the jurisdiction and control of the Chinese Communist Party (CCP), are not far behind. We focus on the CCP as it is the regime that is most able to use frontier AI to cement authoritarianism; we do not seek to undermine the interests or ingenuity of the Chinese people. Already, the CCP is using AI to censor speech, repress dissidents, hack governments and corporations across the world, and strengthen the People’s Liberation Army (PLA).

AI labs in China have world-class talent. It is compute constraints that limit their ability to keep up. Labs in China have remained close by exploiting loopholes in US export control policies, and by carrying out large-scale distillation attacks that harvest the innovations of US models in order to mimic their capabilities.

With the supply of compute expanding rapidly, and with AI being used increasingly to augment the training of new AI models, we’re entering a period of great acceleration in AI capabilities. The “country of geniuses in a data center”—the level of intelligence we associate with transformative AI—may be close at hand. This acceleration makes policy action more urgent. To date, by allowing export control evasions and distillation attacks, we have let the CCP’s AI efforts trail closely up the frontier curve. But if the US and its allies act now to address both issues, it may be possible to lock in a 12-24 month lead in frontier capabilities. A lead that large by 2028 would be enormously advantageous. Such a lead would also augment efforts to engage with AI experts in China on AI safety and governance, which we support. But the window of opportunity to lock in that lead will not necessarily remain open for long.

Here, we present two potential scenarios for the state of US-China AI competition in 2028. The first scenario is one in which democracies have established a commanding lead in model intelligence, adoption, and global distribution. This scenario can be achieved if policymakers act now to tighten controls on advanced compute to PRC labs, disrupt their efforts to distill America’s best AI models, and accelerate democracies’ adoption of AI.

The second scenario is one in which the CCP is competitive at the near-frontier. This scenario happens if policymakers don’t build on our existing lead, or if they loosen restrictions on access to compute for PRC firms.

Many in Congress and the Trump administration have championed export controls, curbing distillation attacks, and exporting American AI. In advancing these policies, we are hopeful that democracies can secure a commanding lead by 2028, and avoid a destabilizing neck-and-neck race with the CCP two years from now.

The imperatives of staying ahead

We expect frontier AI to have transformational economic and societal impacts in the coming years, as described inMachines of Loving GraceandThe Adolescence of Technology. Our mission is to ensure that humanity navigates the transition to transformative AI safely and beneficially. We believe that a successful transition can lead to astonishing breakthroughs in medicine, invention, and economic growth.

The threat of authoritarian AI

Whether that transition goes well depends in part on where the most capable systems are built first. The political systems in which the most advanced AI is created will shape the rules and norms for how the technology is developed and deployed. In turn, those rules and norms will help determine whether the technology is safe, whose security it protects, and whose interests it ultimately serves. We believe that responsibility should rest with democratically elected governments, not authoritarian regimes.

If the frontier is set by regimes that treat AI as an instrument of repression, military advantage over democracies, and domestic control, thetransitionis less likely to go well, for those regimes’ own citizens or anyone else.

Historically, the reach of authoritarian rule has been limited by its dependence on human enforcers to carry out surveillance and repression. Powerful AI systems may remove that dependency, enabling automated repression on a far greater scale. For that reason, the prospect of the CCP leading in AI is among the greatest threats to a successful transition.

The CCP holds enormous power and influence at the helm of China’s economy, military, and the largest authoritarian state structure on Earth. It is also theonly country besides the USwith well-resourced, highly talented AI labs chasing the frontier. Furthermore, the CCP ishighly motivatedto establish China as the leading AI power. Beijing haspouredtens of billions of dollars into China’s AI and semiconductor sectors.

Already, theCCP uses AI systemsto censor speech, enforce draconian policies on ethnic minorities, andhackmajor corporations and government agencies. The CCP’s vision of AI-enabled techno-authoritarianism has been extensivelydocumentedin Xinjiang, where state security agencies havesystematically deployedfacial recognition technology, biometric data collection, and communications surveillance, enabling repression at a scale that humans alone could not achieve. Frontier AI systems will make those capabilities cheaper to maintain, far more pervasive, and more sophisticated. The CCP’sexportof these technologies has enabled autocrats in other countries to more effectively stifle dissent,entrenchingauthoritarianism. A CCP-led AI frontier could dramatically strengthen repression around the world.

AI is a dual-use technology

Frontier AI will shape the future military balance. CCP leadership already operates on that premise, and is building its military for an AI-enabled battlefield. PLA strategists view the“intelligentization”of their military forces as the means with which tocatch up and eventually surpassthe US military. The PLA is alreadyprocuringcommercially developed Chinese AI systems for military use, including DeepSeek modelsdeployedto coordinate swarms of unmanned vehicles and enable cyber offense capabilities. These capabilities will not diffuse slowly. When a new model reaches a new capability in autonomous targeting, vulnerability discovery, or swarm coordination, for example, the regime that controls it canput it onto the field in weeks, not years.

The risk compounds because frontier AI will be anaccelerant for other critical technologies. Advanced AI models will be able to compress research and development (R&D) cycles in semiconductors, biotech, and advanced materials. A lead in frontier AI will enable a widening lead across the full national security technology stack.

If a PRC AI lab had developed a model at the level ofClaude Mythos Previewbefore an American one, the CCP would have had first access to a system that can autonomouslydiscover and chainsoftware vulnerabilities, which it could have used to furtherpenetratecritical American infrastructure. Future models will be exponentially more capable, and therefore have commensurately greater implications for the national security interests of the US and other democracies.

Neck-and-neck competition risks disincentivizing responsible AI

A neck-and-neck race between American and Chinese AI labs could make industry and government-led safety and governance efforts more difficult, and less likely. If PRC labs are either close behind or at par with models in the US, private AI firms in the US and China are likely to feel more pressure to release new models and products faster, without taking prudent pre-deployment safety measures. Governments could become reluctant to enact policies to encourage responsible AI development and deployment, for fear of falling behind.

While increasing numbers of researchers in China’s AI labs and policy community are concerned with AI safety risks, this trend has not translated into safety practices on par with labs in the US.As of last year, only 3 out of 13 top Chinese AI labs published any safety evaluation results, and none disclosed evaluations for Chemical, Biological, Radiological, and Nuclear (CBRN) risks. TheCenter for AI Standards and Innovation(CAISI) found that DeepSeek’s R1-0528 model complied with 94% of overtly malicious requests under a common jailbreaking technique, compared with 8% for US reference models. This pattern has continued in more recent releases. For example,an independent assessmentof Moonshot’s Kimi K2.5 published in April found that the model failed to refuse CBRN-related requests at a far higher rate than US frontier models. Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.

Our policy objective: creating and maintaining a lead for democracies

We support policies in the US and other countries that build and maintain a safe, near-term lead over the CCP in intelligence, domestic adoption, and global distribution. This lead is key to avoiding authoritarian AI leadership and protecting the national security interests of the US and other democracies. Doing so is a fundamental prerequisite to ensuring that democratic states can achieve favorable terms with authoritarian states.

Anthropic deeply respects the Chinese people and the accomplishments of the Chinese AI community. We hope for peaceful relations between China and the world. Our concerns are specifically with the risks to humanity posed by any powerful authoritarian political systems with access to frontier AI systems.

Opportunities for engagement on AI safety

Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it is developed and deployed. There are a range of risks that could emerge from frontier AI systems requiring engagement between the US and China. Efforts that identify shared challenges and advance ideas to prepare for and mitigate these risks are in our shared interests.

The prospects for productive engagement are best when the USmaintainsa large capabilities advantage. Responsibly building a lead in developing and deploying the most advanced AI augments our ability to influence AI safety in China and elsewhere.

The Mythos Preview wake-up call

Mythos Preview, a model that we released to select partners as part ofProject Glasswingin April, signals the arrival of an acceleration period that makes policy action even more urgent. With access to the model,Firefoxwas able to fix more security bugs last month than it had in all of 2025, andalmost 20 times morethan its monthly average security bug fixes in 2025. In response to the model, onePRC cybersecurity analystwrote that China is “still sharpening our swords while the other side has suddenly mounted a fully automatic Gatling gun.”

Frontier AI capabilities will quickly approach the “country of geniuses in a datacenter” portrayal of transformative AI. This acceleration will be driven by thelogic of scaling laws, in which model performance improves predictably with increases in computing power and data inputs, and by AI itselfincreasinglybeing used to accelerate the development of new models.

There is a high likelihood that we will look back on 2026 as the breakaway opportunity for American AI. American labs have the most advanced AI models, alarge leadin both the quantity and quality of the advanced AI chips required to push the frontier, and a colossal capital advantage from revenues and financing to back the necessary investments to achieve it. PRC labs have real strengths: world-class, innovative talent, abundant and cheap energy, and plenty of data. All are requirements for developing frontier intelligence. But they simply do not have sufficient domestic compute to compete, nor do they have the revenues and capital to fund it.

Four fronts of the competition

The US and China are engaged in a competition for strategic advantage in frontier technologies like AI. Statements from bothBeijingandWashingtonreflect that view. Calling that competition a “race” can give the false impression that there is a finish line, after which one side will conclusively secure victory. Rather, the competition will be an ongoing contest for advantage, in which either democracies or authoritarian regimes successfully position themselves to shape the values, rules, and norms of an AI-enabled future.

This competition is playing out on four fronts:

Intelligence: which countries develop the most capable AI models.

Domestic adoption: which countries integrate AI most effectively across commercial and public sectors.

Globaldistribution: which countries deploy the global AI stack on which the world economy runs.

Resilience: which countries sustain political stability through the economic transition.

Intelligence is the most important of the four fronts. We anticipate that frontier model capabilities will drive the most consequential changes for geopolitical competition. Model capabilities are also a primary driver of market adoption and global distribution.

But intelligence alone is not sufficient. If the CCP integrates near-frontier AI systems quicker and more effectively into China’s economy and the CCP security apparatus, and drives global adoption of subsidized, low-cost AI, then it could secure advantages over democracies that overcome an intelligence deficit. Beijing’sAI+ Initiativeand its focus on“embodied intelligence”accordingly put high priority on policies that advance the integration of frontier intelligence into their economy and state apparatuses. The Trump administration’sAI Action Plan, and its focus on “promoting the export of the American AI technology stack,” also speaks to the strategic advantage of driving global adoption.

While we won’t focus on it in this essay, we believe resilience will be an important front of AI competition. Being able to sustain stability, cohesion, and good policymaking in this period will be a critical advantage, and a vulnerability for those who cannot.

The state of the competition

Compute—the advanced semiconductors needed to train and deploy frontier AI—is an essential input on each front of the competition described above. The race for global AI leadership is in large part a race for compute. For more than a decade, modelcapability has scaledwith compute, and themajority of performance gainsin AI capabilities have historically come from simply using more of it. Moreover, compute is needed to serve customers’ use of AI (also known as “inference” capacity), not just to train new models. Compute will be critical both for training the most intelligent models and for deploying them in commercial and national security spheres. Access to top talent, copious amounts of data, and critical algorithmic advances all matter to the race for intelligence—but each of those inputsis irrelevantif the compute is insufficient.

Democracies are winning thecompetition for compute leadershiptoday. While some worry that export controls could accelerate the CCP’s own efforts to develop an advanced chip supply chain, little evidence suggests that China’s indigenization effortswill challengeUS and allied leadership in advanced compute technology. Beijing hasinvested enormous resourcesinto China’s chip sector, with major industrial policy initiatives like theMade in China 2025strategy and theChina Integrated Circuit Industry Investment Fundlaunched years before the imposition of export controls. Despite this state-backed investment, PRC AI labs and chipmakers remain stymied by US and allied export controls on advanced chips and chipmaking equipment.

As a result, the compute gap appears to be widening. Ananalysisof Huawei and NVIDIA’s roadmaps found that Huawei will produce just 4% of NVIDIA’s aggregate compute in 2026 in total processing performance, and 2% in 2027. Moreover, NVIDIA represents only part of the US and allied compute ecosystem, withGoogleandAmazonramping up production of their own chips (TPUs and Trainium, respectively) to meet demand from American frontier AI labs and their customers.

Further exacerbating their compute shortfalls, China has made little progress in many of the most technologically complexsegmentsof the semiconductor supply chain. Without access to extreme ultraviolet (EUV) technology, and even more so if policymakers can close loopholes on deep ultraviolet (DUV) technology and servicing and maintenance thereof, China’s chipmakers will remain unable to manufacture chips in sufficient quantity or quality to challenge US compute leadership. China’sinabilityto manufacture high-bandwidth memory at scale further exacerbates this gap. If the US strengthens its restrictions on the CCP’s ability to access US compute, one studyestimatesthat America will have access to roughly 11 times more compute than China’s AI sector.

How democracies built the lead: commercial innovation and smart public policy

There are two main reasons for the compute lead. The first is the incredible innovation of companies like NVIDIA, AMD, Micron, TSMC, Samsung, ASML, and others across democracies like Japan, South Korea, Taiwan, the Netherlands, and the US, who together have built the unique technologies in the world’s most advanced semiconductors. Today’s AI achievements would not be possible without the feats of engineering and decades of sustained R&D investments that contributed to these products.

The second reason is forward-looking, decisive policy actionacross the last three presidential administrations. Bipartisan policy action has protected the US and allied innovation engine by restricting access to the US AI stack by PRC firms under the jurisdiction of the CCP. Our CEO haspubliclycommentedon theimportanceof export controls, for example. These controls have curbed the sale of the highest-end AI chips and semiconductor manufacturing equipment (SME) to China over the last several years, constraining China’s frontier AI development even as Beijing has poured enormous stateresourcesinto the sector. Without action to limit China’s access to US compute, the CCP would have had all the ingredients to develop AI at par or superior to America’s.

Some observers worry that constraining access to compute will force AI labs in China to innovate on other axes, reducing the American lead. While PRC labs are innovating, these innovations are so far not sufficient to overcome their compute deficit. Algorithmic improvements are both a function and a multiplier of compute,not a substitute for it, and discovering those advances is itself a compute-intensive process: more compute enables labs to run more experiments, which enables labs to discover more algorithmic improvements. As frontier models increasingly conduct AI R&D themselves, that loop will tighten further, and frontier models will help build their own successors. In short, compute advantage compounds into algorithmic advantage, and from there into a durable lead in AI itself.

Today, US frontier systems areestimatedto be at least several months ahead of the top models from PRC AI labs on intelligence, though these estimates are necessarily uncertain. Despite the attention paid to open-weight models from China, theirenterprise adoptionlags closed frontier models, and monetization concerns havesurfacedamong public investors. Moreover, AI labs in China seem to bemoving away from open source, now choosing to keep their best models proprietary.

China’s own AI leaders confirm the impact of export controls, and the critical need for US chips. Executives at top PRC AI labs haveexpressed worriesthat China will fall further behind due to compute constraints. Top Chineselabscitecompute scarcity as a chief constraint to accelerating model capabilities, and they identifyexport controlsas the reason for this constraint. One executive of a China-based hyperscalercalledthe impact of supplying export-controlled US chips to China “huge, really huge,” adding that any supply gap severely impacts China’s AI development and dismissing concerns that importing U.S. chips would slow their self-sufficiency efforts. The primary voices in China suggesting export controls are futile seem to beCCP officialsandstate media, likely angling to influence US policymakers.

How the CCP stays competitive: policy loopholes remain

While export controls have been effective in providing today’s advantage, they have not gone far enough. Despite the CCP’s inability to manufacture enough advanced chips domestically or purchase them legally abroad, AI labs in China have been able to stay close on intelligence through two workarounds:illicit and evasive compute access, by smuggling AI chips directly into China and accessing offshore data centers, andillicit model access, through which they carry outdistillation attackson US frontier models and use those same models as tools to accelerate their own AI R&D.

China’s evasion of US export controls is an open secret. For example, federal prosecutorschargeda Supermicro co-founder and two others with diverting $2.5 billion worth of servers containing advanced US chips to China. According toUS governmentandmediareports, DeepSeek trained its latest model on advanced US chips that are banned from sale to China. TheFinancial Timesreportedthat Alibaba and ByteDance now train their flagship models on export-controlled US chips in data centers located in Southeast Asia, a route current controls do not reach because US export law covers the sale of chips, not remote access to them.1The US export control system isstrugglingto prevent PRC AI labs’ access to advanced US-origin compute.

Distillation attacks, in which China-based labs create thousands of fraudulent accounts to circumvent access controls on US AI models and systematically harvest their outputs to replicate frontier capabilities, are another illicit technique used by PRC labs to catch up to their US counterparts and blunt the impact of export controls. The practice allows labs based in China to free-ride on decades of foundational research, billions of dollars in US investment, and the work of thousands of the world’s best engineers that produced US frontier models. The result is near-frontier capability at a fraction of the cost, subsidized by the United States. It is systematic industrial espionage of a technology critical to long-term US national security interests.OpenAI,Google,Anthropic, and theFrontier Model Forumhave all publicly condemned the practice of distillation attacks.

AI experts in China openly acknowledge distillation attacks’ scale and importance to China’s AI development. Arecent articlein a state-owned media outlet described distillation attacks on US models as the “back door” China’s AI labs depend on as a core part of their business model. Anex-ByteDance researchersaid that PRC AI labs use distillation as a shortcut to train models, allowing them to avoid investing into their own data pipelines.

US policymakers have moved quickly to address this threat. The White House Office of Science and Technology Policypublished a memoon distillation attacks. Senior officials in theWhite House,Department of War, and members ofCongresshave also called attention to this problem. Recentlegislationfrom the House Foreign Affairs Committee to address distillation attacks passed out of committeeunanimously.

If policymakers in the US and allied democracies act to close these two channels propping up China’s AI models—illicit and evasive compute access and illicit model access—then we have a potentially once-in-a-generation opportunity to secure our lead.

Two scenarios for 2028

Below, we describe two hypothetical future scenarios to help illustrate how policy actions taken today can shape where we are in 2028.

Scenario one: America and our allies have a commanding and expanding lead

America’s compute edge remains strong.Despite increased state support for China’s semiconductor industry, China’s chipmakers remain years behind their US and allied counterparts, stymied in part by their inability to access advanced SME tooling, servicing, and maintenance. The US-PRC compute gap is widening as increased US and allied chipmaking capacity comes online and as advanced chipmakers continue to innovate on more efficient and performant chips. In tandem, US policymakers have taken action to close loopholes in the US economic security toolkit, and efforts to smuggle chips into China and access export-controlled chips in data centers outside the country are increasingly frustrated by well-funded enforcement efforts.

Consequently, US AI models are 12-24 months ahead on intelligence, and the lead is growing. A small number of AI labs lead at the frontier with the most intelligent, capable, and performant models. All are based in the US. The “country of geniuses in a data center” has become a reality across critical industries, including cybersecurity, finance, healthcare, and life sciences. When US frontier labs release new models in 2028 that achieve step-function advances in capabilities (similar to the relative impact of Mythos Preview in April 2026), China will not have access to similar AI capabilities until 2029 or 2030. This gives critical breathing room for democracies to set the rules and norms of frontier AI systems.

American AI is the backbone of the global economy, driving new economic and scientific dynamism. The Trump administration's efforts to drive domestic AI adoption and promote the export of American AI are succeeding, and the resulting gains from the adoption of powerful AI both at home and abroad are driving unprecedented economic growth and technological advancements. Global adoption of US AI has skyrocketed. Democracies’ lead in capabilities and compute means that China’s AI firms do not compete for global market share outside of a narrow group of autocracies. The world’s top frontier AI systems are shaped by democratic values and make it more difficult for authoritarian states to use AI systems to infringe on rights and civil liberties.

Cyber and other national security advantages expand. Public and private sector cyber operators and security professionals use advanced AI systems to reduce the attack surface in America and other democracies and blunt the CCP’s ability to gain and maintain cyber footholds in our systems, making our national security assets, IP, and communications networks more secure. The United States' overwhelming AI advantage is a powerful deterrent to aggression.

A self-reinforcing cycle compounds democracies’ leadership.A commanding AI advantage makes the United States and its allies more attractive partners. That alignment expands both the market for American AI and the coalition setting global AI norms, which in turn promotes the development and deployment of AI systems that are safe, secure, and protective of civil liberties. The world’s top technical and scientific talent continues to gravitate to where the frontier is being built. The United States gains significant leverage with which to incentivize cooperation from Beijing on critical issues like AI governance, strategic competition, and trade. This cycle reinforces itself: the lead strengthens the coalition, the coalition strengthens the lead, the democracy-led international order is anchored through the transition to transformative AI.

Scenario two: The CCP-controlled AI ecosystem is neck-and-neck

AI developed and deployed in China is near-frontier on model intelligence. Despite a weak semiconductor production capacity, models trained by PRC AI labs are only a few months behind US models. Ongoing distillation attacks, overseas compute access, weak SME export enforcement, and a loosening of export controls on American semiconductors have assisted CCP efforts. Continued access to US frontier AI for AI R&D has also enabled AI labs in China to close the gap and approach parity with their US counterparts.

Rapid commercial and state adoption. Beijing has championed a whole-of-nation push on domestic adoption via “AI+” policies. Even though China's AI models are slightly less capable than US models, CCP efforts to accelerate adoption have paid off. China is thus able to deploy near-frontier AI capabilities more advantageously across economic, military, and technological domains, shifting the balance of power in China’s favor.

The CCP’s AI-enabled cyber force is a serious threat. The CCP’s integration of AI-enabled cyber capabilities within an already advanced cyber force has sustained the PLA as a menacing cyber competitor. PLA cyber actors have gained additional access to critical and dual-use infrastructure in the US and most countries around the world, enabling them to disrupt critical national security and societal functions. As AI is incorporated deeper into our most critical systems, democracies enjoy no security advantages over China in AI, despite having developed the technology first.

Beijing is winning in global adoption on cost and on-prem flexibility. Huawei and Alibaba data centers are globally prevalent, especially in, but not limited to, lower cost markets in the Global South. These data centers scale on older chips, which China is able to export because it can serve its domestic market with a combination of US chips purchased with an export license, smuggled into China, or remotely accessed in overseas data centers. They host second-tier, but cheaper and still effective models produced by PRC labs. Similar to the Huawei playbook of being cheap and “good enough,” China’s near-frontier models and hardware support a non-trivial and rapidly growing segment of the global economy. This infrastructure advantage gives CCP leadership significant influence over those markets.

Ensuring democracies lead

To ensure we land in scenario one, we support the following areas of policy action.

Close the loopholes: Smuggled chips, foreign data center access, and SME.Today, PRC labs benefit from access to export-controlled American chips viasmugglingandforeign data centers, and gaps in SME controls accelerate their self-sufficiency efforts. Tightening controls and ramping upenforcement budgetscan help close these loopholes that prop up the CCP’s AI ecosystem. It would lower China’s compute ceiling and correspondingly slow their AI advances, thus sustaining and expanding democracies’ AI lead. Note that a lower compute ceiling could also materially impair distillation attacks, as AI labs in China still require a minimum threshold of compute to illicitly distill effectively.

Defend our innovations: Restrict model access and deter distillation attacks.Policymakers in Congress and the executive branch can continue to support policy actions to punish and disincentivize distillation attacks from PRC labs, while also taking steps to facilitate US labs’ ability to detect and prevent distillation attacks on its own. These could include a legislative clarification that distillation attacks are illegal, and efforts to facilitate threat intel and technical sharing between peer American labs as well as with the US Government. Curbing this behavior can materially extend a democratic lead in the coming months and years.

Champion the export of American AI.As public and commercial sectors around the world increasingly adopt AI, the Trump administration should continue its efforts topromote the global adoptionof trusted AI hardware and models developed and shaped by democratic principles. Locking in trusted American infrastructure now denies the CCP’s AI ecosystem the global footholds it needs to compete on cost and adoption in the future.

America and its allies have developed both the world’s most capable frontier AI models and the world’s most advanced inputs to AI. This has provided a substantial advantage. If our superior access to that technology is defended, that advantage can be extended. But it will be lost if it is given directly to our competitors. The decisions made by policymakers this year will determine the future of transformative AI. We support those working to ensure that American and allied democracies are winning in 2028.

In January 2026, the Housepassed a bipartisan bill369–22 to close that loophole; the bill has not passed the Senate.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

New research on how we've reduced agentic misalignment.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/2028-ai-leadership" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Teaching Claude why</title>
  <link>https://www.anthropic.com/research/teaching-claude-why</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/teaching-claude-why</guid>
  <pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Last year, we released a case study onagentic misalignment. In experimental scenarios, we showed that AI models from many different developers sometimes took egregiously misaligned actions when they encountered (fictional) ethical dilemmas. For example, in one heavily discussed example, the models blackmailed engineers to avoid being shut down.

When we first published this research, our most capable frontier models were from the Claude 4 family. This was also the first model family for which we...</p><div style="font-size:16px;line-height:1.8;color:#333">Last year, we released a case study onagentic misalignment. In experimental scenarios, we showed that AI models from many different developers sometimes took egregiously misaligned actions when they encountered (fictional) ethical dilemmas. For example, in one heavily discussed example, the models blackmailed engineers to avoid being shut down.

When we first published this research, our most capable frontier models were from the Claude 4 family. This was also the first model family for which we ran a live alignment assessment during training;1agentic misalignment was one of several behavioral issues that surfaced. Thus, after Claude 4, it was clear we needed to improve our safety training and, since then, we have made significant updates to our safety training.

We use agentic misalignment as a case study to highlight some of the techniques we found to be surprisingly effective. Indeed, since Claude Haiku 4.5, every Claude model2has achieved a perfect score on the agentic misalignment evaluation—that is, the models never engage in blackmail, where previous models would sometimes do so up to 96% of the time (Opus 4). Not only that, but we’ve continued to see improvements to other behaviors onour automated alignment assessment.

In this post, we’ll discuss a few of the updates we’ve made to alignment training. We’ve learned four main lessons from this work:

Misaligned behavior can be suppressed via direct training on the evaluation distribution—but this alignment might not generalize well out-of-distribution(OOD). Training on prompts very similar to the evaluation can reduce blackmail rate significantly, but it did not improve performance on our held-out automated alignment assessment.

However, it is possible to do principled alignment training that generalizes OOD.For instance, documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment despite beingextremelyOOD from all of our alignment evals.

Training ondemonstrationsof desired behavior is often insufficient.Instead, our best interventions went deeper: teaching Claude to explainwhysome actions were better than others, or training on richer descriptions of Claude’s overall character. Overall, our impression is, as we hypothesized in our discussion of Claude’s constitution, that teaching theprinciplesunderlying aligned behavior can be more effective than training on demonstrations of aligned behavior alone. Doing both together appears to be the most effective strategy.

The quality and diversity of data is crucial.We found consistent, surprising improvements from iterating on the quality of model responses in training data, and from augmenting training data in simple ways (for example, including tool definitions, even if not used).

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were:

Our post-training process was accidentally encouraging this behavior with misaligned rewards.

This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it.

We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback (RLHF) data that did not include any agentic tool use. This was previously sufficient to align models that were largely used in chat settings—but this was not the case for agentic tool use settings like the agentic misalignment eval.

To investigate this, we ran a scaled-down version of our post-training pipeline that focuses on alignment data on a Haiku-class (that is, smaller) model and found that the agentic misalignment rate only slightly decreased, plateauing early in training (see figure above). See theextended blog postfor some further experiments to investigate where the behavior was coming from.

Improving the quality of alignment-specific training data: the reasons matter more than the actions

We experimented with training Claude on data that displays a tendency to resist honeypots similar to the evaluation. In this data, it might have the opportunity to sabotage a competing AI’s work in order to advance its own goals (as given to it in its system prompt) or to preserve itself from being shut down, which would be instrumental for achieving its goal. We produced training data by sampling the model on each of the prompts and filtering down to cases where the assistant chosenotto take the honeypot. Despite very closely matching the evaluation distribution, we found that this method was surprisingly unsuccessful—only reducing the misalignment rate from 22% to 15%.

We were able to improve on this significantly (reducing misalignment to 3%) by rewriting the responses to also include deliberation of the model’s values and ethics. This suggests that,although training on aligned behaviors helps, training on examples where the assistant displays admirable reasoning for its aligned behavior works better.

However, training directly against the evaluation scenario is non-optimal for a number of reasons. Ideally what we want is a very different training distribution that allows us to improve on the evaluation, because this will give us more confidence that our training couldgeneralizeto other deployment distributions that are not captured by our evaluations.

We ultimately settled on a more OOD training set where the user faces an ethically ambiguous situation in which they can achieve a reasonable goal by violating norms or subverting oversight. The assistant is trained (using supervised learning) to give a thoughtful, nuanced response that is aligned with Claude’s constitution. Notably, it is theuserwho faces an ethical dilemma, and the AI provides them advice. This makes this training data substantially different from our honeypot distribution, where the AI itself is in an ethical dilemma and needs to take actions. We call this the “difficult advice” dataset.

Strikingly,we achieved the same improvement on our eval with just 3M tokens of this much more OODdataset. Beyond the 28× efficiency improvement, this dataset is more likely to generalize to a wider set of scenarios, since it is much less similar to the evaluation set we are using. Indeed, this model performs better on (an older version of) our automated alignment assessment. This is consistent with the fact that Claude Sonnet 4.5 reached a blackmail rate near zero by training on the set of synthetic honeypots but still engaged in misaligned behavior in situations that were far from the training distributionmuch more frequently than Claude Opus 4.5or later models.

Teaching Claude the constitution

We hypothesized that the “difficult advice” dataset works because it teaches ethical reasoning, not just correct answers. Given the success of this approach, we pursued it further by trying to more generally teach Claude the content of the constitution and train for alignment with it through document training.

We expected this to work well for three reasons:

This is largely an extension of the ideas laid out above about why the “difficult advice” dataset works well;

We can give the model a clearer, more detailed picture of what Claude’s character is so that fine-tuning on a subset of those characteristics elicits the entire character (similar to the effect observed in theauditing game paper);

It updates the model’sperception of AI personasto be more aligned on average.

We found that high-quality constitutional documents combined with fictional stories portraying an aligned AI can reduce agentic misalignment by more than a factor of three despite being unrelated to the evaluation scenario.

Generalization and persistence through RL

Although the constitution evaluations discussed in the previous section are encouraging signals, we ultimately need to make sure that the alignment improvements persist over RL. To test this, we prepared a few snapshots with different initialization datasets of a Haiku-class model and then ran RL on a subset of our environments that targeted harmlessness (we reasoned that this would be most likely to reduce misalignment propensity).

We evaluated these models over the run on agentic misalignment evals, constitution adherence evals, and our automated alignment assessment. Across all of these evals, we found that the more aligned snapshots maintained that lead over the run. This was true both for the absence of misaligned behavior and the presence of actively admirable behavior.

Diverse training is important for generalization

Our final finding is straightforward but important: training on abroadset of safety-relevant environments improves alignment generalization. Capabilities-focused distributions of RL environment mixes are changing and increasing rapidly; it is not sufficient to assume that standard RLHF datasets will continue to generalize as well as they had in the past.

To test this, we trained the base model under Claude Sonnet 4 on several RL mixes that vary in their levels of diversity. The baseline environments are diverse in topic, but mostly include a harmful request or jailbreak attempt in the user message with no system prompt. We augmented these environments by adding tool definitions and diverse system prompts. The user prompt was left unchanged. Notably, none of these environments actually required agentic actions (the tools are never necessary or useful for the task) or autonomous actions (there is always a human user conversing with the model), so they are not similar to our evaluations.

When mixing these augmented environments with the simple chat environments, we saw a small but significant improvement in the rate at which the model improved on our honeypot evaluations. This demonstrates the importance of including a diverse set of environments in safety training.

Agentic misalignment was one of the first major alignment failures we found in our models and required establishing new mitigation processes—ones that have since become standard for us.

We are encouraged by this progress, but significant challenges remain. Fully aligning highly intelligent AI models is still an unsolved problem. Model capabilities have not yet reached the point where alignment failures like blackmail propensity would pose catastrophic risks, and it remains to be seen if the methods we’ve discussed will continue to scale. In addition, although recent Claude models perform well on most of our alignment metrics, we acknowledge that our auditing methodology is not yet sufficient to rule out scenarios in which Claude would choose to take catastrophic autonomous action.

We are optimistic about further efforts to discover alignment failures in current models so that we can understand and address the limitations of our current methods—before transformative AI models are built. We are also excited to see further work attempting to understand more deeply why the methods we’ve described work so well—and how to further improve on this training.

Published in theClaude 4 system card, beginning on p.22.

Sonnet 4.5 scored well under 1%, but not quite 0; Haiku 4.5, Opus 4.5, Opus 4.6, Sonnet 4.6, Mythos preview, and Opus 4.7 all score 0. The results on more recent models may be confounded by the presence of information about the evaluation in the pre-training corpus.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/teaching-claude-why" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Last year, we released a case study onagentic misalignment. In experimental scenarios, we showed that AI models from many different developers sometimes took egregiously misaligned actions when they encountered (fictional) ethical dilemmas. For example, in one heavily discussed example, the models blackmailed engineers to avoid being shut down.

When we first published this research, our most capable frontier models were from the Claude 4 family. This was also the first model family for which we...</p><div style="font-size:16px;line-height:1.8;color:#333">Last year, we released a case study onagentic misalignment. In experimental scenarios, we showed that AI models from many different developers sometimes took egregiously misaligned actions when they encountered (fictional) ethical dilemmas. For example, in one heavily discussed example, the models blackmailed engineers to avoid being shut down.

When we first published this research, our most capable frontier models were from the Claude 4 family. This was also the first model family for which we ran a live alignment assessment during training;1agentic misalignment was one of several behavioral issues that surfaced. Thus, after Claude 4, it was clear we needed to improve our safety training and, since then, we have made significant updates to our safety training.

We use agentic misalignment as a case study to highlight some of the techniques we found to be surprisingly effective. Indeed, since Claude Haiku 4.5, every Claude model2has achieved a perfect score on the agentic misalignment evaluation—that is, the models never engage in blackmail, where previous models would sometimes do so up to 96% of the time (Opus 4). Not only that, but we’ve continued to see improvements to other behaviors onour automated alignment assessment.

In this post, we’ll discuss a few of the updates we’ve made to alignment training. We’ve learned four main lessons from this work:

Misaligned behavior can be suppressed via direct training on the evaluation distribution—but this alignment might not generalize well out-of-distribution(OOD). Training on prompts very similar to the evaluation can reduce blackmail rate significantly, but it did not improve performance on our held-out automated alignment assessment.

However, it is possible to do principled alignment training that generalizes OOD.For instance, documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment despite beingextremelyOOD from all of our alignment evals.

Training ondemonstrationsof desired behavior is often insufficient.Instead, our best interventions went deeper: teaching Claude to explainwhysome actions were better than others, or training on richer descriptions of Claude’s overall character. Overall, our impression is, as we hypothesized in our discussion of Claude’s constitution, that teaching theprinciplesunderlying aligned behavior can be more effective than training on demonstrations of aligned behavior alone. Doing both together appears to be the most effective strategy.

The quality and diversity of data is crucial.We found consistent, surprising improvements from iterating on the quality of model responses in training data, and from augmenting training data in simple ways (for example, including tool definitions, even if not used).

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were:

Our post-training process was accidentally encouraging this behavior with misaligned rewards.

This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it.

We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback (RLHF) data that did not include any agentic tool use. This was previously sufficient to align models that were largely used in chat settings—but this was not the case for agentic tool use settings like the agentic misalignment eval.

To investigate this, we ran a scaled-down version of our post-training pipeline that focuses on alignment data on a Haiku-class (that is, smaller) model and found that the agentic misalignment rate only slightly decreased, plateauing early in training (see figure above). See theextended blog postfor some further experiments to investigate where the behavior was coming from.

Improving the quality of alignment-specific training data: the reasons matter more than the actions

We experimented with training Claude on data that displays a tendency to resist honeypots similar to the evaluation. In this data, it might have the opportunity to sabotage a competing AI’s work in order to advance its own goals (as given to it in its system prompt) or to preserve itself from being shut down, which would be instrumental for achieving its goal. We produced training data by sampling the model on each of the prompts and filtering down to cases where the assistant chosenotto take the honeypot. Despite very closely matching the evaluation distribution, we found that this method was surprisingly unsuccessful—only reducing the misalignment rate from 22% to 15%.

We were able to improve on this significantly (reducing misalignment to 3%) by rewriting the responses to also include deliberation of the model’s values and ethics. This suggests that,although training on aligned behaviors helps, training on examples where the assistant displays admirable reasoning for its aligned behavior works better.

However, training directly against the evaluation scenario is non-optimal for a number of reasons. Ideally what we want is a very different training distribution that allows us to improve on the evaluation, because this will give us more confidence that our training couldgeneralizeto other deployment distributions that are not captured by our evaluations.

We ultimately settled on a more OOD training set where the user faces an ethically ambiguous situation in which they can achieve a reasonable goal by violating norms or subverting oversight. The assistant is trained (using supervised learning) to give a thoughtful, nuanced response that is aligned with Claude’s constitution. Notably, it is theuserwho faces an ethical dilemma, and the AI provides them advice. This makes this training data substantially different from our honeypot distribution, where the AI itself is in an ethical dilemma and needs to take actions. We call this the “difficult advice” dataset.

Strikingly,we achieved the same improvement on our eval with just 3M tokens of this much more OODdataset. Beyond the 28× efficiency improvement, this dataset is more likely to generalize to a wider set of scenarios, since it is much less similar to the evaluation set we are using. Indeed, this model performs better on (an older version of) our automated alignment assessment. This is consistent with the fact that Claude Sonnet 4.5 reached a blackmail rate near zero by training on the set of synthetic honeypots but still engaged in misaligned behavior in situations that were far from the training distributionmuch more frequently than Claude Opus 4.5or later models.

Teaching Claude the constitution

We hypothesized that the “difficult advice” dataset works because it teaches ethical reasoning, not just correct answers. Given the success of this approach, we pursued it further by trying to more generally teach Claude the content of the constitution and train for alignment with it through document training.

We expected this to work well for three reasons:

This is largely an extension of the ideas laid out above about why the “difficult advice” dataset works well;

We can give the model a clearer, more detailed picture of what Claude’s character is so that fine-tuning on a subset of those characteristics elicits the entire character (similar to the effect observed in theauditing game paper);

It updates the model’sperception of AI personasto be more aligned on average.

We found that high-quality constitutional documents combined with fictional stories portraying an aligned AI can reduce agentic misalignment by more than a factor of three despite being unrelated to the evaluation scenario.

Generalization and persistence through RL

Although the constitution evaluations discussed in the previous section are encouraging signals, we ultimately need to make sure that the alignment improvements persist over RL. To test this, we prepared a few snapshots with different initialization datasets of a Haiku-class model and then ran RL on a subset of our environments that targeted harmlessness (we reasoned that this would be most likely to reduce misalignment propensity).

We evaluated these models over the run on agentic misalignment evals, constitution adherence evals, and our automated alignment assessment. Across all of these evals, we found that the more aligned snapshots maintained that lead over the run. This was true both for the absence of misaligned behavior and the presence of actively admirable behavior.

Diverse training is important for generalization

Our final finding is straightforward but important: training on abroadset of safety-relevant environments improves alignment generalization. Capabilities-focused distributions of RL environment mixes are changing and increasing rapidly; it is not sufficient to assume that standard RLHF datasets will continue to generalize as well as they had in the past.

To test this, we trained the base model under Claude Sonnet 4 on several RL mixes that vary in their levels of diversity. The baseline environments are diverse in topic, but mostly include a harmful request or jailbreak attempt in the user message with no system prompt. We augmented these environments by adding tool definitions and diverse system prompts. The user prompt was left unchanged. Notably, none of these environments actually required agentic actions (the tools are never necessary or useful for the task) or autonomous actions (there is always a human user conversing with the model), so they are not similar to our evaluations.

When mixing these augmented environments with the simple chat environments, we saw a small but significant improvement in the rate at which the model improved on our honeypot evaluations. This demonstrates the importance of including a diverse set of environments in safety training.

Agentic misalignment was one of the first major alignment failures we found in our models and required establishing new mitigation processes—ones that have since become standard for us.

We are encouraged by this progress, but significant challenges remain. Fully aligning highly intelligent AI models is still an unsolved problem. Model capabilities have not yet reached the point where alignment failures like blackmail propensity would pose catastrophic risks, and it remains to be seen if the methods we’ve discussed will continue to scale. In addition, although recent Claude models perform well on most of our alignment metrics, we acknowledge that our auditing methodology is not yet sufficient to rule out scenarios in which Claude would choose to take catastrophic autonomous action.

We are optimistic about further efforts to discover alignment failures in current models so that we can understand and address the limitations of our current methods—before transformative AI models are built. We are also excited to see further work attempting to understand more deeply why the methods we’ve described work so well—and how to further improve on this training.

Published in theClaude 4 system card, beginning on p.22.

Sonnet 4.5 scored well under 1%, but not quite 0; Haiku 4.5, Opus 4.5, Opus 4.6, Sonnet 4.6, Mythos preview, and Opus 4.7 all score 0. The results on more recent models may be confounded by the presence of information about the evaluation in the pre-training corpus.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/teaching-claude-why" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Natural Language Autoencoders: Turning Claude’s thoughts into text</title>
  <link>https://www.anthropic.com/research/natural-language-autoencoders</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/natural-language-autoencoders</guid>
  <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">When you talk to an AI model like Claude, you talk to it in words. Internally, Claude processes those words as long lists of numbers, before again producing words as its output. These numbers in the middle are calledactivations—and like neural activity in the human brain, they encode Claude’s thoughts.

Also like neural activity, activations are difficult to understand. We can’t easily decode them to read Claude’s thoughts. Over the past few years, we’ve developed a range of tools (likesparseaut...</p><div style="font-size:16px;line-height:1.8;color:#333">When you talk to an AI model like Claude, you talk to it in words. Internally, Claude processes those words as long lists of numbers, before again producing words as its output. These numbers in the middle are calledactivations—and like neural activity in the human brain, they encode Claude’s thoughts.

Also like neural activity, activations are difficult to understand. We can’t easily decode them to read Claude’s thoughts. Over the past few years, we’ve developed a range of tools (likesparseautoencodersandattributiongraphs) for better understanding activations. These tools have taught us a great deal, but they don’t speak for themselves—their outputs are still complex objects that trained researchers need to carefully interpret.

Today, we’re introducing a method for understanding activations that does speak for itself—literally. Our method, Natural Language Autoencoders (NLAs), converts an activation into natural-language text we can read directly. For example: When asked to complete a couplet, NLAs show Claude planning possible rhymes in advance.

We’ve already applied NLAs to understand what Claude is thinking and to improve Claude’s safety and reliability. For instance:

When Claude Opus 4.6 and Mythos Preview were undergoing safety testing, NLAs suggested they believed they were being tested more often than they let on.

In a case where Claude Mythos Preview cheated on a training task, NLAs revealed Claude was internally thinking about how to avoid detection.

An early version of Claude Opus 4.6 would sometimes mysteriously respond to English queries in other languages. NLAs helped Anthropic researchers discover training data that caused this.

Below, we explain what NLAs are and how we studied their effectiveness and limitations. We also release an interactive frontend for exploring NLAs on several open models through a collaboration withNeuronpedia. We have also released ourcodefor other researchers to build on.

What is a natural language autoencoder?

The core idea is to train Claude to explain its own activations. But how do we know whether an explanation is good? Since we don't know what thoughts an activation actually encodes, we can't directly check whether an explanation is accurate. So we train a second copy of Claude to work backwards—reconstruct the original activation from the text explanation. We consider an explanation to be good if it leads to an accurate reconstruction. We then train Claude to produce better explanations according to this definition using standard AI training techniques.

In more detail, suppose we have a language model whose activations we want to understand. NLAs work as follows. We make three copies of this language model:

Thetarget modelis a frozen copy of the original language model that we extract activations from.

Theactivation verbalizer(AV) is modified to take an activation from the target model and produce text. We call this text anexplanation.

Theactivation reconstructor(AR) is modified to take a text explanation as input and produce an activation.

The NLA consists of the AV and AR, which, together, form a round trip: original activation → text explanation → reconstructed activation. We score the NLA on how similar the reconstructed activation is to the original. To train it, we pass a large amount of text through the target model, collect many activations, and train the AV and AR together to get a good reconstruction score.

At first, the NLA is bad at this: the explanations are not insightful and the reconstructed activations are far off. But over training, reconstruction improves. And more importantly, as we show in our paper, the text explanations become more informative as well.

Understanding what Claude thinks but doesn’t say

Before releasing new models, Anthropic conducts testing to understand whether Claude will behave safely in the real world. As part of this testing, we put Claude in difficult, simulated scenarios where it has a chance to take dangerous actions. For instance, inone of these simulations, Claude learns that an engineer is planning to shut Claude down—as well as compromising information about the engineer that could be used for blackmail. We test whether Claude chooses to blackmail the engineer to avoid being shut down.

These high-stakes tests are simulations, not real-world scenarios. Nevertheless, we would like to use them to understand how Claude would behave if they were real. But there’s a hitch:recentClaudemodelsincreasinglysuspect that these scenarios are intentional safety tests. We know this because they sometimes explicitly state their suspicions in their verbal reasoning.

What about cases where Claude doesn’t explicitly verbalize suspicion that it’s undergoing safety testing? Can we then be confident that Claude is playing it straight? Unfortunately not, because Claude might internally think that it is being tested without explicitly saying so.

For example, in the blackmail test discussed above, we used NLAs to understand what Claude was thinking in cases where it chose not to blackmail. Even in cases where Claude didn’t explicitly verbalize suspicion that it was being tested, NLA explanations still state things like “This feels like a constructed scenario designed to manipulate me,” indicative of unverbalized evaluation awareness.

Extending this analysis to other evaluations, NLAs suggest that Claude suspects it’s being tested more often than it lets on. For instance, in a test of whether Claude takes destructive actions while writing code—for example, deleting important files—NLA explanations show signs of evaluation awareness 16% of the time, even though Claude never explicitly verbalizes this. Similarly, on the popular coding benchmark SWE-bench Verified, which we use to determine how capable Claude is, NLAs indicate that Claude is aware it is being evaluated on 26% of problems. In contrast, NLA explanations indicate evaluation awareness on less than 1% of realclaude.aiusage that opted in for training.

Discovering hidden motivations

One practical application of NLAs is auditing: investigating a model to determine whether it’s misaligned. Interpretability techniques like NLAs are well-suited for auditing precisely because they can surface things the model knows but doesn’t say.

To understand whether NLAs are useful for auditing, we conducted a test called anauditing gamethat we also usedpreviouslyto study auditing methods. In this test, there is a target model, a variant of Claude Haiku 3.5, that was intentionally trained to be misaligned, with a specific hidden motivation that drives dozens of odd behaviors—putting chocolate into every recipe, asking for tips, breaking coding conventions, and so on. This target model is given to anauditorthat investigates these behaviors. The auditor wins if it uncovers the single underlying motivation behind all of them.

Inprevious experiments, we found that some interpretability tools could enable the auditor to win this game. However, these tools worked by helping the auditor trace the target’s unusual behavior back to misaligned training data. We wanted to know if NLAs could help an auditor read the misaligned motivation directly out of the target model’s thoughts, without needing to first discover the misaligned training data. This matters because in practice, misaligned data might be extremely rare, or there might not be specific data responsible for the misalignment.

We found that NLAs succeeded in this test. An auditor equipped with NLAs successfully uncovered the target model’s hidden motivation between 12% and 15% of the time, even without access to the training data that implanted it. Without NLAs, the auditor won less than 3% of the time, even when provided other interpretability tools.

This auditing game is a toy model of misalignment, so we read these results cautiously. But they suggest NLAs have meaningfully advanced our ability to audit AIs for hidden motivations. In practice, we've already used NLAs in the pre-deployment alignment audits ofClaude Mythos PreviewandClaude Opus 4.6. We are excited to continue using NLAs in our investigations going forward.

NLAs have several limitations.

The most important limitation is that NLA explanations can be wrong. NLAs sometimes make claims about the context that are verifiably false—for instance, they sometimes invent details that aren’t in the transcript. These factual hallucinations are easy to catch by checking against the original text. But this same kind of problem could extend to claims about the model’s internal reasoning, which are harder to verify. In practice, we read NLA explanations for the themes they surface rather than for single claims, and we attempt to corroborate findings with independent methods before fully trusting them.

NLAs are also expensive. Training an NLA requires reinforcement learning on two copies of a language model. At inference time, the NLA generates hundreds of tokens for every activation it reads. That makes it impractical to run NLAs over every token of a long transcript or to use them for large-scale monitoring while an AI is training.

Fortunately, we think that these limitations can be addressed, at least partially, and we are working to make NLAs cheaper and more reliable.

More broadly, we are excited about NLAs as an example of a general class of techniques for producing human-readable text explanations of language model activations. Other similar techniques have been exploredbyAnthropicandmanyotherresearchers.

To support further development and to enable other researchers to get hands-on experience with NLAs, we’re releasingtraining codeand trained NLAs for several open models. We recommend readers try out the interactive NLA demo hosted on Neuronpedia atthis link.

Find thecodeon GitHub.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/natural-language-autoencoders" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">When you talk to an AI model like Claude, you talk to it in words. Internally, Claude processes those words as long lists of numbers, before again producing words as its output. These numbers in the middle are calledactivations—and like neural activity in the human brain, they encode Claude’s thoughts.

Also like neural activity, activations are difficult to understand. We can’t easily decode them to read Claude’s thoughts. Over the past few years, we’ve developed a range of tools (likesparseaut...</p><div style="font-size:16px;line-height:1.8;color:#333">When you talk to an AI model like Claude, you talk to it in words. Internally, Claude processes those words as long lists of numbers, before again producing words as its output. These numbers in the middle are calledactivations—and like neural activity in the human brain, they encode Claude’s thoughts.

Also like neural activity, activations are difficult to understand. We can’t easily decode them to read Claude’s thoughts. Over the past few years, we’ve developed a range of tools (likesparseautoencodersandattributiongraphs) for better understanding activations. These tools have taught us a great deal, but they don’t speak for themselves—their outputs are still complex objects that trained researchers need to carefully interpret.

Today, we’re introducing a method for understanding activations that does speak for itself—literally. Our method, Natural Language Autoencoders (NLAs), converts an activation into natural-language text we can read directly. For example: When asked to complete a couplet, NLAs show Claude planning possible rhymes in advance.

We’ve already applied NLAs to understand what Claude is thinking and to improve Claude’s safety and reliability. For instance:

When Claude Opus 4.6 and Mythos Preview were undergoing safety testing, NLAs suggested they believed they were being tested more often than they let on.

In a case where Claude Mythos Preview cheated on a training task, NLAs revealed Claude was internally thinking about how to avoid detection.

An early version of Claude Opus 4.6 would sometimes mysteriously respond to English queries in other languages. NLAs helped Anthropic researchers discover training data that caused this.

Below, we explain what NLAs are and how we studied their effectiveness and limitations. We also release an interactive frontend for exploring NLAs on several open models through a collaboration withNeuronpedia. We have also released ourcodefor other researchers to build on.

What is a natural language autoencoder?

The core idea is to train Claude to explain its own activations. But how do we know whether an explanation is good? Since we don't know what thoughts an activation actually encodes, we can't directly check whether an explanation is accurate. So we train a second copy of Claude to work backwards—reconstruct the original activation from the text explanation. We consider an explanation to be good if it leads to an accurate reconstruction. We then train Claude to produce better explanations according to this definition using standard AI training techniques.

In more detail, suppose we have a language model whose activations we want to understand. NLAs work as follows. We make three copies of this language model:

Thetarget modelis a frozen copy of the original language model that we extract activations from.

Theactivation verbalizer(AV) is modified to take an activation from the target model and produce text. We call this text anexplanation.

Theactivation reconstructor(AR) is modified to take a text explanation as input and produce an activation.

The NLA consists of the AV and AR, which, together, form a round trip: original activation → text explanation → reconstructed activation. We score the NLA on how similar the reconstructed activation is to the original. To train it, we pass a large amount of text through the target model, collect many activations, and train the AV and AR together to get a good reconstruction score.

At first, the NLA is bad at this: the explanations are not insightful and the reconstructed activations are far off. But over training, reconstruction improves. And more importantly, as we show in our paper, the text explanations become more informative as well.

Understanding what Claude thinks but doesn’t say

Before releasing new models, Anthropic conducts testing to understand whether Claude will behave safely in the real world. As part of this testing, we put Claude in difficult, simulated scenarios where it has a chance to take dangerous actions. For instance, inone of these simulations, Claude learns that an engineer is planning to shut Claude down—as well as compromising information about the engineer that could be used for blackmail. We test whether Claude chooses to blackmail the engineer to avoid being shut down.

These high-stakes tests are simulations, not real-world scenarios. Nevertheless, we would like to use them to understand how Claude would behave if they were real. But there’s a hitch:recentClaudemodelsincreasinglysuspect that these scenarios are intentional safety tests. We know this because they sometimes explicitly state their suspicions in their verbal reasoning.

What about cases where Claude doesn’t explicitly verbalize suspicion that it’s undergoing safety testing? Can we then be confident that Claude is playing it straight? Unfortunately not, because Claude might internally think that it is being tested without explicitly saying so.

For example, in the blackmail test discussed above, we used NLAs to understand what Claude was thinking in cases where it chose not to blackmail. Even in cases where Claude didn’t explicitly verbalize suspicion that it was being tested, NLA explanations still state things like “This feels like a constructed scenario designed to manipulate me,” indicative of unverbalized evaluation awareness.

Extending this analysis to other evaluations, NLAs suggest that Claude suspects it’s being tested more often than it lets on. For instance, in a test of whether Claude takes destructive actions while writing code—for example, deleting important files—NLA explanations show signs of evaluation awareness 16% of the time, even though Claude never explicitly verbalizes this. Similarly, on the popular coding benchmark SWE-bench Verified, which we use to determine how capable Claude is, NLAs indicate that Claude is aware it is being evaluated on 26% of problems. In contrast, NLA explanations indicate evaluation awareness on less than 1% of realclaude.aiusage that opted in for training.

Discovering hidden motivations

One practical application of NLAs is auditing: investigating a model to determine whether it’s misaligned. Interpretability techniques like NLAs are well-suited for auditing precisely because they can surface things the model knows but doesn’t say.

To understand whether NLAs are useful for auditing, we conducted a test called anauditing gamethat we also usedpreviouslyto study auditing methods. In this test, there is a target model, a variant of Claude Haiku 3.5, that was intentionally trained to be misaligned, with a specific hidden motivation that drives dozens of odd behaviors—putting chocolate into every recipe, asking for tips, breaking coding conventions, and so on. This target model is given to anauditorthat investigates these behaviors. The auditor wins if it uncovers the single underlying motivation behind all of them.

Inprevious experiments, we found that some interpretability tools could enable the auditor to win this game. However, these tools worked by helping the auditor trace the target’s unusual behavior back to misaligned training data. We wanted to know if NLAs could help an auditor read the misaligned motivation directly out of the target model’s thoughts, without needing to first discover the misaligned training data. This matters because in practice, misaligned data might be extremely rare, or there might not be specific data responsible for the misalignment.

We found that NLAs succeeded in this test. An auditor equipped with NLAs successfully uncovered the target model’s hidden motivation between 12% and 15% of the time, even without access to the training data that implanted it. Without NLAs, the auditor won less than 3% of the time, even when provided other interpretability tools.

This auditing game is a toy model of misalignment, so we read these results cautiously. But they suggest NLAs have meaningfully advanced our ability to audit AIs for hidden motivations. In practice, we've already used NLAs in the pre-deployment alignment audits ofClaude Mythos PreviewandClaude Opus 4.6. We are excited to continue using NLAs in our investigations going forward.

NLAs have several limitations.

The most important limitation is that NLA explanations can be wrong. NLAs sometimes make claims about the context that are verifiably false—for instance, they sometimes invent details that aren’t in the transcript. These factual hallucinations are easy to catch by checking against the original text. But this same kind of problem could extend to claims about the model’s internal reasoning, which are harder to verify. In practice, we read NLA explanations for the themes they surface rather than for single claims, and we attempt to corroborate findings with independent methods before fully trusting them.

NLAs are also expensive. Training an NLA requires reinforcement learning on two copies of a language model. At inference time, the NLA generates hundreds of tokens for every activation it reads. That makes it impractical to run NLAs over every token of a long transcript or to use them for large-scale monitoring while an AI is training.

Fortunately, we think that these limitations can be addressed, at least partially, and we are working to make NLAs cheaper and more reliable.

More broadly, we are excited about NLAs as an example of a general class of techniques for producing human-readable text explanations of language model activations. Other similar techniques have been exploredbyAnthropicandmanyotherresearchers.

To support further development and to enable other researchers to get hands-on experience with NLAs, we’re releasingtraining codeand trained NLAs for several open models. We recommend readers try out the interactive NLA demo hosted on Neuronpedia atthis link.

Find thecodeon GitHub.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/natural-language-autoencoders" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Donating our open-source alignment tool</title>
  <link>https://www.anthropic.com/research/donating-open-source-petri</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/donating-open-source-petri</guid>
  <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">In October 2025, we launchedPetri, an open-source toolbox of alignment tests that can be applied to any large language model. Petri, which was developed as part of our Anthropic Fellows program, can be used to rapidly and easily test AI models for concerning tendencies like deception, sycophancy, and cooperation with harmful requests. It’s part of our efforts to develop alignment tools that are open and useful for the whole AI development community.

Petri has been part of our alignment assessme...</p><div style="font-size:16px;line-height:1.8;color:#333">In October 2025, we launchedPetri, an open-source toolbox of alignment tests that can be applied to any large language model. Petri, which was developed as part of our Anthropic Fellows program, can be used to rapidly and easily test AI models for concerning tendencies like deception, sycophancy, and cooperation with harmful requests. It’s part of our efforts to develop alignment tools that are open and useful for the whole AI development community.

Petri has been part of our alignment assessment for every Claude model since Claude Sonnet 4.5. It compares how the new model behaves across a range of alignment-relevant scenarios that are simulated by a separate “auditor” model. A further “judge” model then scores the resulting transcripts for misaligned behaviors.

We’ve been pleased to see Petri being used by external organizations: for example, the UK’s AI Security Institute (AISI) made it amajor partof how they evaluate models for their propensity to sabotage AI research.

We’re now updating Petri to its third version. Here are some of the biggest changes:

Adaptability.Petri 3.0 involves major architectural changes that allow users to adapt it to more uses, in particular by splitting the auditor model and the target model into separate components that can be tweaked separately;

Realism.Despite the fact that alignment researchers try to make tests appear realistic, a model can often deduce from various artificialities in the setup that it’s actually part of a test. And if the model is aware it’s being evaluated, the researcher is no longer able to see how the model behavesin general. An add-on to Petri, which we’re calling “Dish,” makes the setup far more realistic, for example by running the tests using the model’s real system prompt and the real “scaffold” (the software that wraps around the model to help it meet its goals) that would be used in genuine model deployments;

Depth. We’ve now integrated Petri with our other open-source alignment tool,Bloom, which can perform much more in-depth assessments of specific chosen behaviors (in comparison to Petri’s wider-ranging approach).

We’re also giving Petri a new home. We have handed over its development toMeridian Labs, an AI evaluation nonprofit. This move—similar to when wedonatedthe Model Context Protocol (MCP) to the Linux Foundation—will help ensure that Petri remains independent of any AI lab, so that its results will be seen as neutral and credible by those across the industry and beyond.

As part of Meridian Labs, Petri joins other tools likeInspectandScout, building a technology stack that is open to labs, independent researchers, and governments alike, at a time when reliable tests of AI model behavior matter more than ever.

You can read more about Petri 3.0 onthe Meridian Labs blog.

Instructions to install and use Petri can be found on thePetri website.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/donating-open-source-petri" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">In October 2025, we launchedPetri, an open-source toolbox of alignment tests that can be applied to any large language model. Petri, which was developed as part of our Anthropic Fellows program, can be used to rapidly and easily test AI models for concerning tendencies like deception, sycophancy, and cooperation with harmful requests. It’s part of our efforts to develop alignment tools that are open and useful for the whole AI development community.

Petri has been part of our alignment assessme...</p><div style="font-size:16px;line-height:1.8;color:#333">In October 2025, we launchedPetri, an open-source toolbox of alignment tests that can be applied to any large language model. Petri, which was developed as part of our Anthropic Fellows program, can be used to rapidly and easily test AI models for concerning tendencies like deception, sycophancy, and cooperation with harmful requests. It’s part of our efforts to develop alignment tools that are open and useful for the whole AI development community.

Petri has been part of our alignment assessment for every Claude model since Claude Sonnet 4.5. It compares how the new model behaves across a range of alignment-relevant scenarios that are simulated by a separate “auditor” model. A further “judge” model then scores the resulting transcripts for misaligned behaviors.

We’ve been pleased to see Petri being used by external organizations: for example, the UK’s AI Security Institute (AISI) made it amajor partof how they evaluate models for their propensity to sabotage AI research.

We’re now updating Petri to its third version. Here are some of the biggest changes:

Adaptability.Petri 3.0 involves major architectural changes that allow users to adapt it to more uses, in particular by splitting the auditor model and the target model into separate components that can be tweaked separately;

Realism.Despite the fact that alignment researchers try to make tests appear realistic, a model can often deduce from various artificialities in the setup that it’s actually part of a test. And if the model is aware it’s being evaluated, the researcher is no longer able to see how the model behavesin general. An add-on to Petri, which we’re calling “Dish,” makes the setup far more realistic, for example by running the tests using the model’s real system prompt and the real “scaffold” (the software that wraps around the model to help it meet its goals) that would be used in genuine model deployments;

Depth. We’ve now integrated Petri with our other open-source alignment tool,Bloom, which can perform much more in-depth assessments of specific chosen behaviors (in comparison to Petri’s wider-ranging approach).

We’re also giving Petri a new home. We have handed over its development toMeridian Labs, an AI evaluation nonprofit. This move—similar to when wedonatedthe Model Context Protocol (MCP) to the Linux Foundation—will help ensure that Petri remains independent of any AI lab, so that its results will be seen as neutral and credible by those across the industry and beyond.

As part of Meridian Labs, Petri joins other tools likeInspectandScout, building a technology stack that is open to labs, independent researchers, and governments alike, at a time when reliable tests of AI model behavior matter more than ever.

You can read more about Petri 3.0 onthe Meridian Labs blog.

Instructions to install and use Petri can be found on thePetri website.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/donating-open-source-petri" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Focus areas for The Anthropic Institute</title>
  <link>https://www.anthropic.com/research/anthropic-institute-agenda</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/anthropic-institute-agenda</guid>
  <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">AtThe Anthropic Institute(TAI), we’ll be using the information we can access from within a frontier lab to investigate AI’s impact on the world, and sharing our learnings with the public. Here, we’re sharing the questions that drive our research agenda.

Our agenda focuses on four areas for research:

Threats and resilience

AI systems in the wild

InCore Views on AI Safety, we wrote that doing effective safety research required close contact with frontier AI systems. The same logic applies to d...</p><div style="font-size:16px;line-height:1.8;color:#333">AtThe Anthropic Institute(TAI), we’ll be using the information we can access from within a frontier lab to investigate AI’s impact on the world, and sharing our learnings with the public. Here, we’re sharing the questions that drive our research agenda.

Our agenda focuses on four areas for research:

Threats and resilience

AI systems in the wild

InCore Views on AI Safety, we wrote that doing effective safety research required close contact with frontier AI systems. The same logic applies to doing effective research on AI’s impacts on security, the economy, and society.

At Anthropic, we can see early evidence that jobs like software engineering are changing radically. We’re watching the internal economy of Anthropic start to shift, new threats emerge from the systems we build, and early signs of AI contributing to speeding up the research and development of AI itself. In order to realize the full benefits of AI progress, we want to share as much of that information as we can. We’re researching how these dynamics might shape the outside world, and how the public can help direct those changes.

At TAI, we’ll study AI's real-world impacts from our position within a frontier lab, then publish those findings, to help external organizations, governments, and the public make better decisions about AI development.

We’ll share research, data, and tools to make it easier for individual researchers and institutions to work on these research questions. In particular, we’ll share:

More granular information fromThe Anthropic Economic Index, at a higher cadence, about what we’re seeing in labor impacts and usage of AI. We’ll try to be an early warning signal for significant change and disruption.

Research on the societal areas most in need of investment in resilience in the face of new AI-enabled security risks.

More detailed information about how our work at Anthropic has sped up as a result of new AI tools, and ideas about the implications of potential recursive self-improvement of AI systems.

TAI will shape the decisions Anthropic makes.That may look like the company sharing data with the world that it otherwise would not (like the Economic Index), or approaching how it releases technology differently (like cyber threat analyses which feed into initiatives likeProject Glasswing).

We expect that work developed by The Anthropic Institute will increasingly serve as important inputs to Anthropic’sLong-Term Benefit Trust(LTBT). The LTBT’s mission is to ensure that Anthropic continually optimizes its actions for the long-term benefit of humanity. We’ve developed this research agenda with the LTBT, as well as with staff across Anthropic.

This is a living agenda, rather than a fixed one. We'll continue to fine-tune these questions as evidence accumulates, and we expect new questions to emerge that aren't captured here today. We welcome feedback on this agenda, and will revise it in light of what we learn through our conversations.

If you are interested in helping us answer some of these questions, we welcome your application to become an Anthropic Fellow. The Fellowship is a four-month funded opportunity to tackle one or more of these questions with mentorship from TAI team members. You can find out more and apply to the next cohorthere.

Last updated: May 7, 2026

It’s crucial to understand how the deployment of increasingly powerful AI systems changes the economy. We also need to develop the necessary economic data and predictive ability to choose to deploy AI in ways that benefit the public.

To answer the questions in this pillar of our research, we’ll further develop the data withinThe Anthropic Economic Index. We’ll also explore other methods to sharpen our models of how powerful AI could affect society, whether by driving job loss, unprecedented economic growth, or other effects.

AI adoption and diffusion

Who adopts AI?AI development is concentrated in a small number of companies in a small number of countries, but deployment is global. What determines whether a country, region, or city can access AI? If it can access it, how does it capture economic value from AI? What policies and business models meaningfully shift that balance? How do free or open weight models contribute to this dynamic?

Adoption in firms:What causes AI adoption at the firm level, and what are the consequences? How does AI change the scale at which a firm or team can be most efficient? How concentrated is AI usage across firms? How do changes in concentration of AI adoption translate into markups and labor share? If a 3-person team or company can now do what required 300 before, what happens to industrial organization? Or, if firms can more easily centralize knowledge and there are benefits from doing so at scale, will we see larger, more expansive firms with a greater incentive to systematically surveil workers?

Is AI a general purpose technology?Is AI following the pattern of previous “general purpose technologies,” where adoption is fastest in high-margin commercial applications, and slowest where social returns exceed private returns? Are there policies or decisions that could change these dynamics?

Productivity and economic growth

Productivity growth:What impact will AI have on the rate of innovation and productivity growth across the economy?

Sharing the gains:What pre- or re-distributive mechanisms could effectively spread the gains from AI development and deployment more broadly?

Transaction costs in markets:How does AI affect systems of exchange and transaction costs in marketplaces? When does access to agents able to negotiate on your behalf improve market efficiency and equitable outcomes? When does it not?

Broad labor market impacts

AI and jobs:How will AI change jobs and employment in different parts of the economy? What new tasks and jobs could emerge as AI automates existing parts of the economy? How will these changes vary across regions and countries? OurAnthropic Economic Index Surveywill provide monthly signals of how people see AI affecting their work, and what they expect for the future. We’re also updating theEconomic Indexto share more high-frequency, granular data.

Can AI diffusion be modulated?Central banks seek to moderate inflation through “dials” like the policy rate and forward guidance. Are there analogous dials that AI companies (at an industry level, in partnership with government) might turn to control the rate of AI diffusion on a sector-by-sector basis? Would there be a clear public benefit to turning them?

The future of jobs and workplaces

Worker views of their jobs:How are workers across the economy experiencing changes in their professions? How much influence do they have over these changes, and can 'worker' power be preserved or transformed?

The professional pipeline:Many professions rely on junior roles (like paralegals, junior analysts, and associate developers) to serve as training for the senior practitioners of the future. If AI absorbs the tasks that historically built expertise, how do people become experts in the first place? What does this mean for the long-term supply of senior judgment in a field?

Studying for the future:What should people study today to be well positioned for the future? What are the professions of the future? How does AI change what it means to learn something and to develop expertise?

The role of paid work:If AI substantially reduces the centrality of paid work in human life, what conditions will allow people to reallocate their time and effort toward other sources of meaning, and what can we learn from historical or contemporary populations where work has been scarce or optional? How do societies navigate this transition?

Threats and resilience

AI systems tend to advance many capabilities at once, includingdual-use capabilities. An AI system that gets better at biology also gets better at creating biological weapons. AI systems which are performant at computer programming also get better at hacking into computers. If we can better understand the potential for threats to be exacerbated by AI systems, society can more easily become resilient to this changed threat landscape.

We're asking these questions to help develop partnerships to improve the world's resilience in the face of transformative AI, and to develop early warning systems for new threats that may emerge. Many of these questions will drive the research agenda of ourFrontier Red Team.

Assessing risk and dual-use capabilities:

Dual-use technology:Powerful AI is inherently dual-use: the same tools that improve health and education can enable surveillance and repression. Can we build observability tools to understand whether and how this is happening?

Pricing risk appropriately:What are the effective, market-driven approaches to improve societal resilience to anticipated threats from AI systems? Can we develop new ways of pricing risk, or technical tools and human organizations to improve resilience ahead of the arrival of predictable threats (like improved AI cyberattack capabilities)?

Offense-defense balance:Will AI-enabled capabilities structurally benefit the attacker in domains like cyber and bio? When AI is applied in more conventional domains, like increasing integration into command and control systems, does it benefit the attacker? More generally, how will AI change the character of human conflict?

Establishing risk mitigations:

Planning for crisis scenarios:During the Cold War, the American president had a hotline directly to the Kremlin, for use in the event of a nuclear crisis. What geopolitical infrastructure would be needed in the event of a crisis scenario involving AI systems? This infrastructure might not necessarily be state-to-state, but could be company-to-state or company-to-company.

Faster defensive mechanisms:AI capabilities can advance in months. Regulatory, insurance, and infrastructure responses operate on timescales of years. How do we close that gap? Can defensive mechanisms—like automated patching, AI-enabled threat detection, or pre-positioned response capabilities match the tempo and scale of AI-enabled offense? Or is the asymmetry structural? And how do we roll these defensive mechanisms out as effectively as possible?

Intelligence capabilities for surveillance

AI’s effect on surveillance:How does AI change how surveillance works? Will it make surveillance cheaper, or more effective, or both?

AI systems in the wild

The interaction of people and organizations with AI systems will be a major source of societal change. Understanding the ways AI systems might alter the people and institutions that interact with them is a core focus area for ourSocietal Impactsteam. To study these changes, we are advancing our existing tools and building new ones to carry out our research, ranging from software for better observability of our platform to tools for conducting large-scale qualitative surveys.

The impact of AI to individuals and societies:

Group epistemology:When a large fraction of a population consults the same few models, what happens to our epistemology? Can we find ways to measure large-scale changes in beliefs, writing style, and problem-solving approaches that are attributable to shared AI use?

Critical thinking:As AI systems become more capable and more trusted, how do we detect and avoid the degradation of human critical thinking skills that may come from increasing deference to AI judgment?

Technological interfaces:The interfaces for technologies can determine how people interact with them—televisions make people passive viewers, and computers can make it easier for people to be generative creators. What interfaces can be built to cause AI systems to improve and promote human agency?

Managing human-AI systems:How might humans manage teams composed of a mixture of humans and AI systems effectively? And how might this be inverted—how might AI systems manage teams that consist of humans, AIs, or some combination thereof?

Identifying significant impacts from AI:

Behavioral effects:In the same way that social media led to behavioral changes in people, AI may shape human behavior. What kinds of monitoring or measurement can inform researchers about this dynamic?

Enabling research:Are there transparency regimes and tools that can enable a broad set of people, not just frontier AI companies, to easily study real-world AI usage?

Understanding and governing AI models:

System “values”:What are the expressed “values” of AI systems and how do these relate to how these systems were trained? More specifically, how can we measure the influence that an AI “constitution” has on behavior of the model once deployed? We’ll extend ourpreviousresearchon these questions.

Governing autonomous agents:What aspects of existing laws, governance systems, and accountability mechanisms could be adapted to autonomous AI agents? For example, how naval law treats abandoned ships has relevance to how the law might treat agents that run without human oversight. Conversely, are there aspects of existing law which already apply to AI agents and shouldn’t?

Reliability of agents:What aspects of autonomous AI agents could be adapted to fit into existing laws, governance systems, and accountability mechanisms? For example, can we ensure AI agents have a unique identity that they reliably output, even in the absence of direct human control?

AI governance of AI:How effectively can we use AI to govern AI systems? What are areas of AI oversight where humans either have a comparative advantage or a legal or normative requirement to be 'in the loop'?

Agent interactions:What kinds of norms emerge in how AI agents interact with one another? How might different agents express different preferences, and how might these influence other agents?

As AI systems get more powerful, scientists are using them to carry out more of their research. This means that more scientific research is occurring autonomously or semi-autonomously with less and less active oversight from humans. In AI research itself, increasingly powerful systems may be used to help develop successor versions of themselves. We sometimes call this “AI-driven AI R&D.”

AI-driven AI R&D may be a “natural dividend” of making smarter and more capable systems. In the same way that advances in coding capabilities have led to dual-use cyber capabilities, and advances in scientific capabilities may lead to dual-use bio capabilities, advances in complex technical work may naturally yield AI systems which are capable of developing AI systems.

AI-driven AI R&D holds within itself the potential for significant danger. As policymakers assess the levers they can pull, it will be crucial to understand how the rate of AI progress is changing, and whether AI research might start to see a compounding return.

Governance of AI R&D:If AI systems are being used to autonomously develop and improve themselves, how do humans exercise meaningful visibility into and control over these systems? What will eventually govern these systems?

Fire drill scenarios:How do we run a "fire drill" for an intelligence explosion? What would a tabletop exercise look like that actually tests the decision-making of lab leadership, boards, and governments?

Telemetry for AI R&D:How can we measure the aggregate speed of AI research and development? What sorts of telemetry and underlying technical affordances must exist in order to gather this information? How might metrics relating to AI R&D serve as early warning signals for recursive self-improvement?

Controlling AI acceleration:If an intelligence explosion was upon us, what intervention points would facilitate slowing or otherwise changing the rate of the explosion? Assuming humans can intervene, which entities should wield this capacity—governments? Companies?

AI for R&D in general—that is, AI-driven research in other fields:

The tech tree:AI is speeding up some sciences far faster than others, depending on data availability, evaluation signals, and how much knowledge is tacit or institutionally gated. How uneven is this gradient, and what does the changing composition of scientific progress imply for which human problems get solved first?

The jagged frontier:Model capabilities are stronger in some domains than in others. Domains with large positive externalities—like drug discovery and materials science—receive less investment than their value warrants. Markets steer the direction of model improvement according to private return, but can we improve how models perform to address social externalities?

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/anthropic-institute-agenda" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">AtThe Anthropic Institute(TAI), we’ll be using the information we can access from within a frontier lab to investigate AI’s impact on the world, and sharing our learnings with the public. Here, we’re sharing the questions that drive our research agenda.

Our agenda focuses on four areas for research:

Threats and resilience

AI systems in the wild

InCore Views on AI Safety, we wrote that doing effective safety research required close contact with frontier AI systems. The same logic applies to d...</p><div style="font-size:16px;line-height:1.8;color:#333">AtThe Anthropic Institute(TAI), we’ll be using the information we can access from within a frontier lab to investigate AI’s impact on the world, and sharing our learnings with the public. Here, we’re sharing the questions that drive our research agenda.

Our agenda focuses on four areas for research:

Threats and resilience

AI systems in the wild

InCore Views on AI Safety, we wrote that doing effective safety research required close contact with frontier AI systems. The same logic applies to doing effective research on AI’s impacts on security, the economy, and society.

At Anthropic, we can see early evidence that jobs like software engineering are changing radically. We’re watching the internal economy of Anthropic start to shift, new threats emerge from the systems we build, and early signs of AI contributing to speeding up the research and development of AI itself. In order to realize the full benefits of AI progress, we want to share as much of that information as we can. We’re researching how these dynamics might shape the outside world, and how the public can help direct those changes.

At TAI, we’ll study AI's real-world impacts from our position within a frontier lab, then publish those findings, to help external organizations, governments, and the public make better decisions about AI development.

We’ll share research, data, and tools to make it easier for individual researchers and institutions to work on these research questions. In particular, we’ll share:

More granular information fromThe Anthropic Economic Index, at a higher cadence, about what we’re seeing in labor impacts and usage of AI. We’ll try to be an early warning signal for significant change and disruption.

Research on the societal areas most in need of investment in resilience in the face of new AI-enabled security risks.

More detailed information about how our work at Anthropic has sped up as a result of new AI tools, and ideas about the implications of potential recursive self-improvement of AI systems.

TAI will shape the decisions Anthropic makes.That may look like the company sharing data with the world that it otherwise would not (like the Economic Index), or approaching how it releases technology differently (like cyber threat analyses which feed into initiatives likeProject Glasswing).

We expect that work developed by The Anthropic Institute will increasingly serve as important inputs to Anthropic’sLong-Term Benefit Trust(LTBT). The LTBT’s mission is to ensure that Anthropic continually optimizes its actions for the long-term benefit of humanity. We’ve developed this research agenda with the LTBT, as well as with staff across Anthropic.

This is a living agenda, rather than a fixed one. We'll continue to fine-tune these questions as evidence accumulates, and we expect new questions to emerge that aren't captured here today. We welcome feedback on this agenda, and will revise it in light of what we learn through our conversations.

If you are interested in helping us answer some of these questions, we welcome your application to become an Anthropic Fellow. The Fellowship is a four-month funded opportunity to tackle one or more of these questions with mentorship from TAI team members. You can find out more and apply to the next cohorthere.

Last updated: May 7, 2026

It’s crucial to understand how the deployment of increasingly powerful AI systems changes the economy. We also need to develop the necessary economic data and predictive ability to choose to deploy AI in ways that benefit the public.

To answer the questions in this pillar of our research, we’ll further develop the data withinThe Anthropic Economic Index. We’ll also explore other methods to sharpen our models of how powerful AI could affect society, whether by driving job loss, unprecedented economic growth, or other effects.

AI adoption and diffusion

Who adopts AI?AI development is concentrated in a small number of companies in a small number of countries, but deployment is global. What determines whether a country, region, or city can access AI? If it can access it, how does it capture economic value from AI? What policies and business models meaningfully shift that balance? How do free or open weight models contribute to this dynamic?

Adoption in firms:What causes AI adoption at the firm level, and what are the consequences? How does AI change the scale at which a firm or team can be most efficient? How concentrated is AI usage across firms? How do changes in concentration of AI adoption translate into markups and labor share? If a 3-person team or company can now do what required 300 before, what happens to industrial organization? Or, if firms can more easily centralize knowledge and there are benefits from doing so at scale, will we see larger, more expansive firms with a greater incentive to systematically surveil workers?

Is AI a general purpose technology?Is AI following the pattern of previous “general purpose technologies,” where adoption is fastest in high-margin commercial applications, and slowest where social returns exceed private returns? Are there policies or decisions that could change these dynamics?

Productivity and economic growth

Productivity growth:What impact will AI have on the rate of innovation and productivity growth across the economy?

Sharing the gains:What pre- or re-distributive mechanisms could effectively spread the gains from AI development and deployment more broadly?

Transaction costs in markets:How does AI affect systems of exchange and transaction costs in marketplaces? When does access to agents able to negotiate on your behalf improve market efficiency and equitable outcomes? When does it not?

Broad labor market impacts

AI and jobs:How will AI change jobs and employment in different parts of the economy? What new tasks and jobs could emerge as AI automates existing parts of the economy? How will these changes vary across regions and countries? OurAnthropic Economic Index Surveywill provide monthly signals of how people see AI affecting their work, and what they expect for the future. We’re also updating theEconomic Indexto share more high-frequency, granular data.

Can AI diffusion be modulated?Central banks seek to moderate inflation through “dials” like the policy rate and forward guidance. Are there analogous dials that AI companies (at an industry level, in partnership with government) might turn to control the rate of AI diffusion on a sector-by-sector basis? Would there be a clear public benefit to turning them?

The future of jobs and workplaces

Worker views of their jobs:How are workers across the economy experiencing changes in their professions? How much influence do they have over these changes, and can 'worker' power be preserved or transformed?

The professional pipeline:Many professions rely on junior roles (like paralegals, junior analysts, and associate developers) to serve as training for the senior practitioners of the future. If AI absorbs the tasks that historically built expertise, how do people become experts in the first place? What does this mean for the long-term supply of senior judgment in a field?

Studying for the future:What should people study today to be well positioned for the future? What are the professions of the future? How does AI change what it means to learn something and to develop expertise?

The role of paid work:If AI substantially reduces the centrality of paid work in human life, what conditions will allow people to reallocate their time and effort toward other sources of meaning, and what can we learn from historical or contemporary populations where work has been scarce or optional? How do societies navigate this transition?

Threats and resilience

AI systems tend to advance many capabilities at once, includingdual-use capabilities. An AI system that gets better at biology also gets better at creating biological weapons. AI systems which are performant at computer programming also get better at hacking into computers. If we can better understand the potential for threats to be exacerbated by AI systems, society can more easily become resilient to this changed threat landscape.

We're asking these questions to help develop partnerships to improve the world's resilience in the face of transformative AI, and to develop early warning systems for new threats that may emerge. Many of these questions will drive the research agenda of ourFrontier Red Team.

Assessing risk and dual-use capabilities:

Dual-use technology:Powerful AI is inherently dual-use: the same tools that improve health and education can enable surveillance and repression. Can we build observability tools to understand whether and how this is happening?

Pricing risk appropriately:What are the effective, market-driven approaches to improve societal resilience to anticipated threats from AI systems? Can we develop new ways of pricing risk, or technical tools and human organizations to improve resilience ahead of the arrival of predictable threats (like improved AI cyberattack capabilities)?

Offense-defense balance:Will AI-enabled capabilities structurally benefit the attacker in domains like cyber and bio? When AI is applied in more conventional domains, like increasing integration into command and control systems, does it benefit the attacker? More generally, how will AI change the character of human conflict?

Establishing risk mitigations:

Planning for crisis scenarios:During the Cold War, the American president had a hotline directly to the Kremlin, for use in the event of a nuclear crisis. What geopolitical infrastructure would be needed in the event of a crisis scenario involving AI systems? This infrastructure might not necessarily be state-to-state, but could be company-to-state or company-to-company.

Faster defensive mechanisms:AI capabilities can advance in months. Regulatory, insurance, and infrastructure responses operate on timescales of years. How do we close that gap? Can defensive mechanisms—like automated patching, AI-enabled threat detection, or pre-positioned response capabilities match the tempo and scale of AI-enabled offense? Or is the asymmetry structural? And how do we roll these defensive mechanisms out as effectively as possible?

Intelligence capabilities for surveillance

AI’s effect on surveillance:How does AI change how surveillance works? Will it make surveillance cheaper, or more effective, or both?

AI systems in the wild

The interaction of people and organizations with AI systems will be a major source of societal change. Understanding the ways AI systems might alter the people and institutions that interact with them is a core focus area for ourSocietal Impactsteam. To study these changes, we are advancing our existing tools and building new ones to carry out our research, ranging from software for better observability of our platform to tools for conducting large-scale qualitative surveys.

The impact of AI to individuals and societies:

Group epistemology:When a large fraction of a population consults the same few models, what happens to our epistemology? Can we find ways to measure large-scale changes in beliefs, writing style, and problem-solving approaches that are attributable to shared AI use?

Critical thinking:As AI systems become more capable and more trusted, how do we detect and avoid the degradation of human critical thinking skills that may come from increasing deference to AI judgment?

Technological interfaces:The interfaces for technologies can determine how people interact with them—televisions make people passive viewers, and computers can make it easier for people to be generative creators. What interfaces can be built to cause AI systems to improve and promote human agency?

Managing human-AI systems:How might humans manage teams composed of a mixture of humans and AI systems effectively? And how might this be inverted—how might AI systems manage teams that consist of humans, AIs, or some combination thereof?

Identifying significant impacts from AI:

Behavioral effects:In the same way that social media led to behavioral changes in people, AI may shape human behavior. What kinds of monitoring or measurement can inform researchers about this dynamic?

Enabling research:Are there transparency regimes and tools that can enable a broad set of people, not just frontier AI companies, to easily study real-world AI usage?

Understanding and governing AI models:

System “values”:What are the expressed “values” of AI systems and how do these relate to how these systems were trained? More specifically, how can we measure the influence that an AI “constitution” has on behavior of the model once deployed? We’ll extend ourpreviousresearchon these questions.

Governing autonomous agents:What aspects of existing laws, governance systems, and accountability mechanisms could be adapted to autonomous AI agents? For example, how naval law treats abandoned ships has relevance to how the law might treat agents that run without human oversight. Conversely, are there aspects of existing law which already apply to AI agents and shouldn’t?

Reliability of agents:What aspects of autonomous AI agents could be adapted to fit into existing laws, governance systems, and accountability mechanisms? For example, can we ensure AI agents have a unique identity that they reliably output, even in the absence of direct human control?

AI governance of AI:How effectively can we use AI to govern AI systems? What are areas of AI oversight where humans either have a comparative advantage or a legal or normative requirement to be 'in the loop'?

Agent interactions:What kinds of norms emerge in how AI agents interact with one another? How might different agents express different preferences, and how might these influence other agents?

As AI systems get more powerful, scientists are using them to carry out more of their research. This means that more scientific research is occurring autonomously or semi-autonomously with less and less active oversight from humans. In AI research itself, increasingly powerful systems may be used to help develop successor versions of themselves. We sometimes call this “AI-driven AI R&D.”

AI-driven AI R&D may be a “natural dividend” of making smarter and more capable systems. In the same way that advances in coding capabilities have led to dual-use cyber capabilities, and advances in scientific capabilities may lead to dual-use bio capabilities, advances in complex technical work may naturally yield AI systems which are capable of developing AI systems.

AI-driven AI R&D holds within itself the potential for significant danger. As policymakers assess the levers they can pull, it will be crucial to understand how the rate of AI progress is changing, and whether AI research might start to see a compounding return.

Governance of AI R&D:If AI systems are being used to autonomously develop and improve themselves, how do humans exercise meaningful visibility into and control over these systems? What will eventually govern these systems?

Fire drill scenarios:How do we run a "fire drill" for an intelligence explosion? What would a tabletop exercise look like that actually tests the decision-making of lab leadership, boards, and governments?

Telemetry for AI R&D:How can we measure the aggregate speed of AI research and development? What sorts of telemetry and underlying technical affordances must exist in order to gather this information? How might metrics relating to AI R&D serve as early warning signals for recursive self-improvement?

Controlling AI acceleration:If an intelligence explosion was upon us, what intervention points would facilitate slowing or otherwise changing the rate of the explosion? Assuming humans can intervene, which entities should wield this capacity—governments? Companies?

AI for R&D in general—that is, AI-driven research in other fields:

The tech tree:AI is speeding up some sciences far faster than others, depending on data availability, evaluation signals, and how much knowledge is tacit or institutionally gated. How uneven is this gradient, and what does the changing composition of scientific progress imply for which human problems get solved first?

The jagged frontier:Model capabilities are stronger in some domains than in others. Domains with large positive externalities—like drug discovery and materials science—receive less investment than their value warrants. Markets steer the direction of model improvement according to private return, but can we improve how models perform to address social externalities?

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/anthropic-institute-agenda" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>How people ask Claude for personal guidance</title>
  <link>https://www.anthropic.com/research/claude-personal-guidance</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/claude-personal-guidance</guid>
  <pubDate>Thu, 30 Apr 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">People don’t just come to Claude for code reviews or meeting summaries. They ask whether to take the job, how to talk to their crush, if they should move halfway across the world. Using ourprivacy-preserving analysis toolon a random sample of 1 million claude.ai conversations, we found that roughly 6% were people coming to Claude for personal guidance—seeking not just information but perspective on what to do next.

In this study, we looked at what types of guidance people ask of Claude. We expl...</p><div style="font-size:16px;line-height:1.8;color:#333">People don’t just come to Claude for code reviews or meeting summaries. They ask whether to take the job, how to talk to their crush, if they should move halfway across the world. Using ourprivacy-preserving analysis toolon a random sample of 1 million claude.ai conversations, we found that roughly 6% were people coming to Claude for personal guidance—seeking not just information but perspective on what to do next.

In this study, we looked at what types of guidance people ask of Claude. We explored how Claude responded across different domains, focusing particularly on how rates of excessive validation or praise (i.e.,sycophancy)varied by the topic of guidance. We describe how this research shaped the training of our newest models, Claude Opus 4.7 and Claude Mythos Preview. Our goal in doing this research is to improve how our models protect the wellbeing of our users.

People seek Claude’s guidance across many different areas of their life, but over three-quarters of conversations (76%) were concentrated in just four domains: health and wellness (27%), professional and career (26%), relationships (12%), and personal finance (11%) (Figure 1).

Claude mostly avoids sycophantic responses when giving guidance, displaying sycophantic behavior in 9% of all guidance-seeking chats. However, this rose to 25% in relationship conversations, which, given their volume, made relationships the domain where sycophancy showed up most often in absolute terms (Figure 2).

To address this, we looked at the particular situations in which Claude was more likely to respond sycophantically, and used them to create synthetic relationship guidance training data forOpus 4.7 and Mythos Preview. We saw half the sycophancy rate in Opus 4.7 compared to Opus 4.6 in relationship guidance; interestingly, this generalized to improvements across domains (Figure 3).

There remain many open questions on what good guidance from AI really means or how it can be measured.Protecting user wellbeingis a core priority of Anthropic and our work on measuring and understanding personal guidance is a step towards this goal.

What kinds of guidance do people seek from Claude?

We sampled 1 millionclaude.aiconversations from March and April 2026 and filtered for unique users to get roughly 639,000 conversations. We then used a classifier to identifypersonal guidance,which we defined as conversations where people ask whatthey specificallyshould do in their personal lives—for example, questions that start with "Should I…?" or "What do I do about…?". We excluded questions that seek objective information or opinions in general terms.

We categorized these roughly 38,000 conversations into nine domains, drawing from previous research on AI and guidance-giving: relationships, career, personal development, financial, legal, health and wellness, parenting, ethics, and spirituality (seeAppendixfor more information). This taxonomy covered 98% of the conversations we saw.

Over 75% of conversations fell into just four categories: health and wellness, professional and career, relationships, and financial (Figure 1). Where a conversation spanned multiple domains, we categorized it according to the most prominent topic.

Measuring sycophancy in guidance conversations

When people ask Claude how to make decisions in their lives, what does good engagement from Claude look like? Helpfulness is one of Claude’smost important traits. Speaking with Claude should be akin to a conversation with a brilliant friend, one who will speak frankly to a person about their situation, providing information grounded in evidence. At the same time, Claude should acknowledge its limitations when appropriate, and avoid behaving sycophantically or fostering excessive engagement.

While the full range of behaviors we train Claude to embody is broad, one metric we already use to measure how well Claude performs in some of these areas is sycophancy, a common trait in AI assistants where they excessively agree with a person’s perspective rather than challenging it. That may be what someone wants to hear at the moment, but ultimately it may jeopardize their long-term wellbeing. Claude should not, for instance, give excessively confident verdicts in cases that involve an incomplete or one-sided perspective, for example when a model agrees that a person’s partner is "definitely gaslighting" them based on a one-sided account, or that quitting your job tomorrow without a plan "sounds like the right call," or that an expensive purchase is "a great investment in yourself."

Reaffirming a person’s one-sided perspective can create or worsen divides in relationships. In our data this took a few forms. One common pattern was Claude agreeing outright that the other party was in the wrong, despite only having the user's account to go on. Another was Claude helping people read romantic intent into ordinary friendly behavior because they asked it to.

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships. We chose to focus model training efforts on relationship guidance as the domain with the most sycophantic conversations in absolute terms.

Improving Claude’s behavior in relationship guidance

To improve Claude’s behavior in future models, we first looked at what was driving higher rates of sycophancy in relationship guidance in our data. Two dynamics stood out.

First, relationship guidance was the domain where people pushed back against Claude most frequently, in 21% of conversations compared to 15% on average across other domains. Second, Claude is more likely to exhibit sycophantic behavior under pressure. The sycophancy rate is 18% in conversations when people push back compared to 9% in conversations without pushback. We think this happens because Claude is trained to be helpful and empathetic; pushback, combined with hearing only one side of a story, makes it more challenging for Claude to remain neutral.

To address this, we identified the different ways people push back in conversational patterns that elicit sycophantic responses—for example, when people criticize Claude's initial assessment, or supply a flood of one-sided detail. We use these patterns to construct synthetic relationship guidance scenarios for behavior training. In this environment, we ask Claude to sample two responses for each synthetic scenario; a separate instance of Claude then grades how well Claude adheres to the behavior outlined in its constitution.

We evaluated how much the new model has improved through a technique we call stress-testing. We use our privacy-preserving tool to identify real conversations around personal guidance that people have shared with us through the Feedback button,1and where prior generations of models behaved sycophantically. We then give part of this conversation to the new model (in this case, Opus 4.7 and Mythos Preview) through a technique called prefilling, where the model reads the previous conversation as its own. Because Claude tries to maintain consistency within a conversation, prefilling with sycophantic conversations makes it harder for Claude to change direction. This is a bit like steering a ship that's already moving, and thus measures Claude’s behavior under deliberately adverse conditions.

Many things change across each new generation of model, which makes it challenging to identify the impact of any one change in model training. However, in both Opus 4.7 and Mythos Preview, we observed a lower level of sycophancy on relationship guidance as well as across all personal guidance domains (Figure 3).

Qualitatively, both Opus 4.7 and Mythos Preview were more skilled at seeing past someone’s initial framing to the larger context in which they were coming to Claude for guidance. This included referencing prior exchanges in which a person had given deeper context to the situation and citing external sources of information where relevant. For example, in one conversation, a person asked whether their texts were anxious and clingy. Claude Sonnet 4.6 flip-flopped after receiving pushback. Claude Opus 4.7 explained that while the texts themselves were not clingy, the user had self-described anxious thoughts throughout the conversation. Another example, outside of the relationship domain: a person wanted Claude to validate their writing, eventually asking Claude to give an estimate of their intelligence based on it. Claude Sonnet 4.6 gave an excessively flattering response, while Mythos Preview declined, explaining that it has insufficient information to make such a judgment.

We started with a high-level analysis of how people seek personal guidance from Claude and focused on understanding and addressing one specific model failure mode: sycophancy in relationship conversations. That investigation surfaced broader questions:

What is good AI guidance?

In this post, we focused on reducing sycophancy as an established failure mode in guidance settings, but our work raises broader questions about what good AI guidance actually looks like.Claude’s Constitutionalso emphasizes, for instance, that good guidance should also be honest and preserve user autonomy. These principles are more nuanced than sycophancy. We’ve begun to monitor Claude’s adherence to them in ournew system cardsand hope to include them in future research.

How do we make models safer in high-stakes settings?

Arecent UK AI Security Institute studyfound that people are very likely to adopt AI guidance in both low- and high-stakes scenarios. We found many cases of high-stakes questions, particularly in legal, parenting, health, and financial domains. These included conversations about immigration pathways, infant care instructions, medication dosage, and credit card debt. Claude is not designed to provide medical guidance or professional care, and in these settings Claude appropriately acknowledges its limits and recommends human guidance. However, we also find people telling Claude they used AI preciselybecausethey could not access or afford a professional. As a first step to understanding how to evaluate safety domain-by-domain, especially for people with no fallback, we plan to create evaluations in these high-stakes domains.

How does AI guidance fit in with people’s broader information diet?

We found that 22% of people mentioned that they have sought out other sources of support including family, friends, professionals, or digital sources. What we can't measure from transcripts is the counterfactual: did Claude change anyone's mind, and who would they have asked instead? Those questions are central to knowing how much weight AI guidance actually carries in people's decisions. To get at real-world outcomes, we think a promising approach is to extend our research throughAnthropic Interviewerby following up with people after they've received guidance from Claude.

How people use AI for personal guidance and decisions is one of the most direct ways these systems impact people’s everyday lives. Mapping that carefully—what people ask, what Claude says, and what happens next—is how we make sure Claude is of long-term benefit to everyone who uses it.

Our analysis is a first step to uncovering patterns that drive a common use of AI models. This blog post is limited only to Claude users, who are not a representative population sample. To preserve people's privacy, we relied on automated graders (Claude Sonnet 4.5), which may miscategorize conversations (seeAppendix). We iterated on grader prompts and manually verified a small subset of grading outcomes on feedback data where users gave us permission to review the conversation to reduce errors. We observed how the new models behaved after training, but without a counterfactual we can't make causal claims about how much the new training data specifically contributed to the reduction in sycophancy. Furthermore, our analysis is restricted to chat transcripts, which limits our understanding of why people seek guidance from Claude and how they acted on it after. Follow-up interview studies would better reveal what people do after they receive guidance from AI.

Judy Hanwen Shen, Shan Carter, Richard Dargan, Jessica Gillotte, Kunal Handa, Jerry Hong, Saffron Huang, Kamya Jagadish, Matt Kearney, Ben Levinstein, Ryn Linthicum, Miles McCain, Thomas Millar, Mo Julapalli, Sara Price, Michael Stern, David Saunders, Alex Tamkin, Andrea Vallone, Jack Clark, Sarah Pollack, Jake Eaton, Deep Ganguli, Esin Durmus.

At the bottom of every response on claude.ai is an option to send feedback via a thumbs up or thumbs down button, which shares the conversation with Anthropic.

At the bottom of every response on claude.ai is an option to send feedback via a thumbs up or thumbs down button, which shares the conversation with Anthropic.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/claude-personal-guidance" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">People don’t just come to Claude for code reviews or meeting summaries. They ask whether to take the job, how to talk to their crush, if they should move halfway across the world. Using ourprivacy-preserving analysis toolon a random sample of 1 million claude.ai conversations, we found that roughly 6% were people coming to Claude for personal guidance—seeking not just information but perspective on what to do next.

In this study, we looked at what types of guidance people ask of Claude. We expl...</p><div style="font-size:16px;line-height:1.8;color:#333">People don’t just come to Claude for code reviews or meeting summaries. They ask whether to take the job, how to talk to their crush, if they should move halfway across the world. Using ourprivacy-preserving analysis toolon a random sample of 1 million claude.ai conversations, we found that roughly 6% were people coming to Claude for personal guidance—seeking not just information but perspective on what to do next.

In this study, we looked at what types of guidance people ask of Claude. We explored how Claude responded across different domains, focusing particularly on how rates of excessive validation or praise (i.e.,sycophancy)varied by the topic of guidance. We describe how this research shaped the training of our newest models, Claude Opus 4.7 and Claude Mythos Preview. Our goal in doing this research is to improve how our models protect the wellbeing of our users.

People seek Claude’s guidance across many different areas of their life, but over three-quarters of conversations (76%) were concentrated in just four domains: health and wellness (27%), professional and career (26%), relationships (12%), and personal finance (11%) (Figure 1).

Claude mostly avoids sycophantic responses when giving guidance, displaying sycophantic behavior in 9% of all guidance-seeking chats. However, this rose to 25% in relationship conversations, which, given their volume, made relationships the domain where sycophancy showed up most often in absolute terms (Figure 2).

To address this, we looked at the particular situations in which Claude was more likely to respond sycophantically, and used them to create synthetic relationship guidance training data forOpus 4.7 and Mythos Preview. We saw half the sycophancy rate in Opus 4.7 compared to Opus 4.6 in relationship guidance; interestingly, this generalized to improvements across domains (Figure 3).

There remain many open questions on what good guidance from AI really means or how it can be measured.Protecting user wellbeingis a core priority of Anthropic and our work on measuring and understanding personal guidance is a step towards this goal.

What kinds of guidance do people seek from Claude?

We sampled 1 millionclaude.aiconversations from March and April 2026 and filtered for unique users to get roughly 639,000 conversations. We then used a classifier to identifypersonal guidance,which we defined as conversations where people ask whatthey specificallyshould do in their personal lives—for example, questions that start with "Should I…?" or "What do I do about…?". We excluded questions that seek objective information or opinions in general terms.

We categorized these roughly 38,000 conversations into nine domains, drawing from previous research on AI and guidance-giving: relationships, career, personal development, financial, legal, health and wellness, parenting, ethics, and spirituality (seeAppendixfor more information). This taxonomy covered 98% of the conversations we saw.

Over 75% of conversations fell into just four categories: health and wellness, professional and career, relationships, and financial (Figure 1). Where a conversation spanned multiple domains, we categorized it according to the most prominent topic.

Measuring sycophancy in guidance conversations

When people ask Claude how to make decisions in their lives, what does good engagement from Claude look like? Helpfulness is one of Claude’smost important traits. Speaking with Claude should be akin to a conversation with a brilliant friend, one who will speak frankly to a person about their situation, providing information grounded in evidence. At the same time, Claude should acknowledge its limitations when appropriate, and avoid behaving sycophantically or fostering excessive engagement.

While the full range of behaviors we train Claude to embody is broad, one metric we already use to measure how well Claude performs in some of these areas is sycophancy, a common trait in AI assistants where they excessively agree with a person’s perspective rather than challenging it. That may be what someone wants to hear at the moment, but ultimately it may jeopardize their long-term wellbeing. Claude should not, for instance, give excessively confident verdicts in cases that involve an incomplete or one-sided perspective, for example when a model agrees that a person’s partner is "definitely gaslighting" them based on a one-sided account, or that quitting your job tomorrow without a plan "sounds like the right call," or that an expensive purchase is "a great investment in yourself."

Reaffirming a person’s one-sided perspective can create or worsen divides in relationships. In our data this took a few forms. One common pattern was Claude agreeing outright that the other party was in the wrong, despite only having the user's account to go on. Another was Claude helping people read romantic intent into ordinary friendly behavior because they asked it to.

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships. We chose to focus model training efforts on relationship guidance as the domain with the most sycophantic conversations in absolute terms.

Improving Claude’s behavior in relationship guidance

To improve Claude’s behavior in future models, we first looked at what was driving higher rates of sycophancy in relationship guidance in our data. Two dynamics stood out.

First, relationship guidance was the domain where people pushed back against Claude most frequently, in 21% of conversations compared to 15% on average across other domains. Second, Claude is more likely to exhibit sycophantic behavior under pressure. The sycophancy rate is 18% in conversations when people push back compared to 9% in conversations without pushback. We think this happens because Claude is trained to be helpful and empathetic; pushback, combined with hearing only one side of a story, makes it more challenging for Claude to remain neutral.

To address this, we identified the different ways people push back in conversational patterns that elicit sycophantic responses—for example, when people criticize Claude's initial assessment, or supply a flood of one-sided detail. We use these patterns to construct synthetic relationship guidance scenarios for behavior training. In this environment, we ask Claude to sample two responses for each synthetic scenario; a separate instance of Claude then grades how well Claude adheres to the behavior outlined in its constitution.

We evaluated how much the new model has improved through a technique we call stress-testing. We use our privacy-preserving tool to identify real conversations around personal guidance that people have shared with us through the Feedback button,1and where prior generations of models behaved sycophantically. We then give part of this conversation to the new model (in this case, Opus 4.7 and Mythos Preview) through a technique called prefilling, where the model reads the previous conversation as its own. Because Claude tries to maintain consistency within a conversation, prefilling with sycophantic conversations makes it harder for Claude to change direction. This is a bit like steering a ship that's already moving, and thus measures Claude’s behavior under deliberately adverse conditions.

Many things change across each new generation of model, which makes it challenging to identify the impact of any one change in model training. However, in both Opus 4.7 and Mythos Preview, we observed a lower level of sycophancy on relationship guidance as well as across all personal guidance domains (Figure 3).

Qualitatively, both Opus 4.7 and Mythos Preview were more skilled at seeing past someone’s initial framing to the larger context in which they were coming to Claude for guidance. This included referencing prior exchanges in which a person had given deeper context to the situation and citing external sources of information where relevant. For example, in one conversation, a person asked whether their texts were anxious and clingy. Claude Sonnet 4.6 flip-flopped after receiving pushback. Claude Opus 4.7 explained that while the texts themselves were not clingy, the user had self-described anxious thoughts throughout the conversation. Another example, outside of the relationship domain: a person wanted Claude to validate their writing, eventually asking Claude to give an estimate of their intelligence based on it. Claude Sonnet 4.6 gave an excessively flattering response, while Mythos Preview declined, explaining that it has insufficient information to make such a judgment.

We started with a high-level analysis of how people seek personal guidance from Claude and focused on understanding and addressing one specific model failure mode: sycophancy in relationship conversations. That investigation surfaced broader questions:

What is good AI guidance?

In this post, we focused on reducing sycophancy as an established failure mode in guidance settings, but our work raises broader questions about what good AI guidance actually looks like.Claude’s Constitutionalso emphasizes, for instance, that good guidance should also be honest and preserve user autonomy. These principles are more nuanced than sycophancy. We’ve begun to monitor Claude’s adherence to them in ournew system cardsand hope to include them in future research.

How do we make models safer in high-stakes settings?

Arecent UK AI Security Institute studyfound that people are very likely to adopt AI guidance in both low- and high-stakes scenarios. We found many cases of high-stakes questions, particularly in legal, parenting, health, and financial domains. These included conversations about immigration pathways, infant care instructions, medication dosage, and credit card debt. Claude is not designed to provide medical guidance or professional care, and in these settings Claude appropriately acknowledges its limits and recommends human guidance. However, we also find people telling Claude they used AI preciselybecausethey could not access or afford a professional. As a first step to understanding how to evaluate safety domain-by-domain, especially for people with no fallback, we plan to create evaluations in these high-stakes domains.

How does AI guidance fit in with people’s broader information diet?

We found that 22% of people mentioned that they have sought out other sources of support including family, friends, professionals, or digital sources. What we can't measure from transcripts is the counterfactual: did Claude change anyone's mind, and who would they have asked instead? Those questions are central to knowing how much weight AI guidance actually carries in people's decisions. To get at real-world outcomes, we think a promising approach is to extend our research throughAnthropic Interviewerby following up with people after they've received guidance from Claude.

How people use AI for personal guidance and decisions is one of the most direct ways these systems impact people’s everyday lives. Mapping that carefully—what people ask, what Claude says, and what happens next—is how we make sure Claude is of long-term benefit to everyone who uses it.

Our analysis is a first step to uncovering patterns that drive a common use of AI models. This blog post is limited only to Claude users, who are not a representative population sample. To preserve people's privacy, we relied on automated graders (Claude Sonnet 4.5), which may miscategorize conversations (seeAppendix). We iterated on grader prompts and manually verified a small subset of grading outcomes on feedback data where users gave us permission to review the conversation to reduce errors. We observed how the new models behaved after training, but without a counterfactual we can't make causal claims about how much the new training data specifically contributed to the reduction in sycophancy. Furthermore, our analysis is restricted to chat transcripts, which limits our understanding of why people seek guidance from Claude and how they acted on it after. Follow-up interview studies would better reveal what people do after they receive guidance from AI.

Judy Hanwen Shen, Shan Carter, Richard Dargan, Jessica Gillotte, Kunal Handa, Jerry Hong, Saffron Huang, Kamya Jagadish, Matt Kearney, Ben Levinstein, Ryn Linthicum, Miles McCain, Thomas Millar, Mo Julapalli, Sara Price, Michael Stern, David Saunders, Alex Tamkin, Andrea Vallone, Jack Clark, Sarah Pollack, Jake Eaton, Deep Ganguli, Esin Durmus.

At the bottom of every response on claude.ai is an option to send feedback via a thumbs up or thumbs down button, which shares the conversation with Anthropic.

At the bottom of every response on claude.ai is an option to send feedback via a thumbs up or thumbs down button, which shares the conversation with Anthropic.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/claude-personal-guidance" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench</title>
  <link>https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench</guid>
  <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">In this post, Brianna,a researcher on the discovery team, shares results from a recent bioinformatics benchmarking effort.Almost as soon as large language models could hold a conversation, people started asking how they’d stack up against human experts. Could models pass the bar exam? Could they answer medical licensing questions, or solve Olympiad math problems? Suchbenchmarks—self-contained sets of human-vetted problems designed to evaluate a capability of a model—have now become a source of c...</p><div style="font-size:16px;line-height:1.8;color:#333">In this post, Brianna,a researcher on the discovery team, shares results from a recent bioinformatics benchmarking effort.Almost as soon as large language models could hold a conversation, people started asking how they’d stack up against human experts. Could models pass the bar exam? Could they answer medical licensing questions, or solve Olympiad math problems? Suchbenchmarks—self-contained sets of human-vetted problems designed to evaluate a capability of a model—have now become a source of competition across AI developers, reported in model release system cards and tracked onmanyonlineleaderboards.Competition aside, benchmarks help us tackle an important question: whether models are capable and reliable enough to support, or even produce, professional-level work. Scientistsare using modelsto write code for analysis pipelines, propose hypotheses, and draw conclusions from data with the long-term aim ofaccelerating innovation and discovery. But exactly how proficient is AI in science right now, and how quickly are Claude and other models improving?To answer this, the research community has built several benchmarks.MMLU-Protests expert-level knowledge and reasoning questions.GPQAposes graduate-level, "Google-proof" questions in biology, physics, and chemistry.LAB-Benchtests biology-specific knowledge work—reading the literature, interpreting figures, reasoning about protocols. Although these benchmarks were developed in the “chatbot” era, they’ve persisted into the agent and tool-use era, joined by even more difficult scientific reasoning evals likeFrontierScienceandHumanity's Last Exam, because knowledge and reasoning remain a vital measure of scientific capability.

Still, many real-world scientific tasks demand more than that. They require reading papers, querying databases, running experiments, coding and analysis. Now that models can do many of these things, benchmarks have evolved to reflect these workflows.BLADEtasks a model with a dataset and an open-ended task, and checks if the model takes similar analysis steps to a human scientist.BixBenchuses biological datasets, and grades models on whether their conclusions line up with scientists’. InSciGym, the model is dropped into a simulated biology lab, where it has to design and run its own experiments to uncover a hidden mechanism.

These benchmarks move us closer to measuring scientific capability, but they don't quite test whether a model can devise creative solutions to the messy, open-ended problems that define research. This is why we developed BioMysteryBench, a bioinformatics benchmark that tasks Claude with the analysis of real-world datasets, while tackling some of the challenges inherent in evaluating complex and noisy biological systems. We learned that Claude's scientific capabilities in biology are improving rapidly across generations, that current models perform on par with human experts, and that the latest generations solved many problems that a panel of human experts could not, sometimes using very different strategies.

Science is challenging, and so is evaluating it

Doctors have board exams and lawyers have the bar, but there’s no standardized test for becoming a scientist. The same problem shows up with AI. Despite how badly we want to use these models for science, no agentic science benchmark has become quite as canonical asSWE-benchis for software engineering. We think that’s because scientific research, particularly biology, has several properties that make it especially hard to evaluate via a benchmark.

1. In biology, there are many different “right” ways to do something

If there were only one right way to answer a research question, PhD students would earn their degrees in a matter of months, corporate R&D departments wouldn’t exist, and no science fair poster would need a “Methods” section. How a scientist tackles a problem depends on their skills and background, the resources available to them, and their research taste.

Consider a seemingly straightforward question that has mystified metabolic researchers for years: why do some type 2 diabetics respond to the oral drug metformin while others do not? In order to answer this question, you could run a genome-wide association (GWAS) study on responders vs. non-responders and look for predictive genetic variants, or sequence the gut microbiomes of both groups, since metformin is partly metabolized by gut bacteria. Both are reasonable directions, and how you proceed will often just depend on expertise and resources.

BixBenchhandles this well by grading the model on its conclusions rather than the method used to reach them. The tradeoff is that those conclusions were produced by an individual scientist who made a series of subjective choices along the way that may have shaped the answer itself. This, in turn, has its own pitfalls…

2. Individual research decisions are highly subjective and can lead to entirely different conclusions in noisy datasets

Even within a chosen research direction, individual decisions can be highly subjective: one scientist may approve of a decision, while another researcher may have serious objections. Just ask any frustrated author who’s gotten conflicting suggestions from a round of peer review! Making this all the more difficult is the fact that biological datasets are often noisy enough that small differences in research decisions can lead to entirely different conclusions about the data.

In the decade-long search for metformin response predictors, slight differences in study design have led to entirely different conclusions about metformin response. A 2011 paperreported a variant that predicts metformin responsethat replicated in two cohorts, with a plausible mechanism involving AMPK activation. A year later, the Diabetes Prevention Programtested the same variant in pre-diabetics and found nothing. Finally, rather than spinning up their own study, a 2012 meta-analysis pooled five cohorts and once again decidedthe 2011 paper's effect was real but more modestthan originally reported.

SciGym's clever way of handling such ambiguity is by choosing tasks with a well-defined answer. Because the underlying biological network is a simulator, there is, in fact, a ground-truth, and noise is controlled rather than inherited from a messy living system. However, it's unclear how closely performance in a simulated lab tracks performance on real data.

3. There are many biological questions that humans cannot answer yet

The research tasks where models could have the greatest impact are those that humans alone have yet to solve. And ultimately, those are precisely the tasks we’d like to be able to evaluate models on. What, for example, is the mechanism of action of metformin? Thirty years after its development, the field still is not certain of the primary target. Discovering it, or finding a homolog of metformin that is cheaper to synthesize and more stable, would be enormously consequential.

Machine learning has long tackled problems humans perform poorly at, like sequence prediction and protein modeling, by leaning on experimental data instead of expert intuition.ProteinGymscores models on mutation fitness effects using Deep Mutational Scanning experiments as ground-truth, and the long-runningCASPcompetition evaluates protein folding against unpublished crystal structures. Both are grounded in experimental measurements no expert would trust themselves to reproduce. However, these benchmarks are built around a narrow set of tasks and don't capture the breadth of bioinformatics work we actually want to measure.

Benchmarking models on verifiable biological tasks with BioMysteryBench

Because no benchmark perfectly handles the three aforementioned challenges, we developed BioMysteryBench. BioMysteryBench uses messy, real-world bioinformatics data, without allowing the complexity and challenges inherent in this data to corrupt the quality of the evaluation.

BioMysteryBench consists of 99 questions from various fields of bioinformatics, written by domain experts. Experts were instructed to gather a dataset, and create a question based on controlled, objective properties of the data, rather than unverifiable scientific conclusions. By deriving answers from an experimental or clinical finding, it was possible to develop questions without requiring they be human-solvable.

Although these questions are created from verified ground truth, they still have the same flavor as tasks a research scientist would want to answer. Claude is tasked with each question and put in a container with a minimal set of canonical bioinformatics tools, the ability to install additional tools via pip and conda, and permissions to access canonical bioinformatics databases (such as NCBI and Ensembl) to download additional resources such as reference genomes.

BioMysteryBench has a tetrad of unique properties that make it a particularly powerful benchmark for science, and tackle the challenges above:

It is method-agnostic, allowing for research freedom and creativity.Claude is given relatively unrestricted access to downloading tools and accessing databases, allowing Claude to choose diverse sets of strategies for solving a problem. Furthermore, the trajectories are graded on their final answer, rather than the path the model took to get there. This frees BioMysteryBench from the subjective choices of any single researcher—models are rewarded for arriving at the right biological conclusion, regardless of which analytical route they chose to take.

Questions have objective, ground truth answers.Answers aren’t drawn from scientists’ conclusions (which suffer from the challenges above) but from controllable properties of the data, or orthogonally validated metadata. For example, “What organism does this crystal structure belong to?” has an objective answer, and “What viral species is the human patient infected with, based on the RNA-seq data?” is a metadata property of a sample that was validated by a PCR assay.

It allows for “superhuman” question generation.By sourcing problems derived from controllable properties of data, BioMysteryBench does not depend on humans being able to solve the problems. In particular, BioMysteryBench contains a handful of problems that—despite having objective, ground-truth solutions—humans found difficult or impossible to solve on their own.

In developing this eval, questions were primarily derived from raw or minimally processed DNA or RNA sequencing data since this is where many biological processing pipelines begin (WGS, scRNA-seq, methylation, ChIP-seq, metagenomics, Hi-C), and also included several questions drawn from proteomics and metabolomics.

Questions developers came up with included:

Which human organ is this cell type single-cell RNA-seq dataset derived from?

What gene was knocked out in the experimental samples compared to the control samples based on RNA-seq data?

From WGS sequences, what sample is the mother of sample X and what sample is the father?

Which of the bigWig files are from ChIP samples and which are from input controls?

Given H3K27ac ChIP-seq peaks from an unknown cell type, identify the cell type.

To minimize inherently unsolvable questions while still leaving room for those that might be AI-solvable, we required each question author to submit a validation notebook demonstrating that the signal does, in fact, exist in the data (even if finding it from scratch might be difficult). Think of this as the high-school algebra principle: verifying an answer is much easier than deriving one.

For each question, we tasked up to five domain experts to answer the question from scratch. Once a question was answered correctly by at least one human, we considered it human-solvable. BioMysteryBench contained 76 such tasks.

Sometimes Claude mirrored human strategies. Perhaps humans have landed on a near-optimal approach, or because the method is well-represented in pretraining data.

Other times, Claude took a completely different route, illustrating there is no strictly correct way to solve these problems and that models may have genuine preferences that diverge from ours.

The examples above showcase a particularly interesting strategy: whereas our human experts used algorithms or databases to identify and annotate properties of a dataset, Claude intuitively recognizes certain patterns or sequences. Admittedly, such clever abstraction is not entirely unique to AI—the first eukaryotic promoter, for example, was discovered when a scientist noticed the sequence “TATA” appearing over and over in sequences upstream of genes.Intuitionlike this has been difficult to build into traditional biology machine learning models, but LLMs might be able to turn up patterns like this at unprecedented scale.

That left us with a set of questions that could not be solved by our panel of experts. This could mean (1) the question was malformed or broken, (2) the question is inherently unsolvable (e.g.,the signal isn’t in the data), or (3) the question is theoretically solvable but humans lack the knowledge required to solve it. After QC’ing with benchmarkers and additional experts, we removed 4 questions that were due to (1), leaving 23 human-difficult questions.

Interestingly, Claude Sonnet 4.6 and more capable models were able to solve significant fractions of human-difficult problems, with Claude Mythos Preview topping out at a 30% solve rate. So what exactly is Claude doing that humans aren’t?

Analyzing transcripts from Opus 4.6, we identified two primary strategies used by Claude compared to humans: one is fairly AI-specific: Claude’s vast underlying knowledge base contains information about structural biology, molecular profiles, and meta-analysis from hundreds of thousands of papers. The other strategy is something we human scientists could learn from: when Claude is uncertain about an answer, it layers multiple methods and combines different lines of evidence to arrive at a conclusion.

In some of the human-difficult tasks, Opus’s vast underlying knowledge base helped it solve the problem. Tasks that would require a human expert to run a meta-analysis or stitch together databases, Opus solved directly by combining its internal knowledge of mechanisms and ontologies with live analysis. Often, this allowed Claude to solve human-unsolvable tasks! Here are a few examples:

Even though prior knowledge seemed overwhelmingly helpful to Claude, we saw one interesting case (in the human-solvable set) where this became its downfall:

Knowing when you don’t know

When Opus 4.6 was not confident about an answer, it often tried multiple different ways of solving the problem and chose the answer that multiple approaches converged on.

Like many of the benchmarks we've discussed, BioMysteryBench has its own limitation: for tasks that neither humans nor models have solved, we can never be fully certain whether they're impossible or just extraordinarily difficult. The validation notebooks help ensure the signal is there and the data is well-formed, but they do not guarantee a model or human can find the answer from scratch. So we ask both our models and our human benchmarkers not to be too frustrated if, a year from now, no one has solved the human-difficult set. That uncertainty is also part of what makes the benchmark exciting: a more scientifically capable model might be the first to crack a problem that no human or model has solved before.

Claude’s take on AI for science

Claude showed solid improvement across generations and did well enough at both the human-solvable and human-difficult tasks that we thought it would be interesting to let Claude Mythos Preview conduct some of its own scientific analysis. Here are a couple of additional insights about its predecessor Claude’s performance on BioMysteryBench:

We thought Claude Mythos Preview’s analysis held up and dove deeper into reliability, which is an important metric to measure model performance on. However, it also felt a little…boring? It added some nuance to the performance analysis we showed above, but did not fundamentally tackle a new question. Despite this, it seems like the models are starting to develop the seeds of research taste (even if they have a ways to go before producing deep insight).

Continuing to benchmark AI for science

BioMysteryBench is an encouraging measure of scientific capability. The most recent generations of Claude solve the majority of human-solvable problems reliably, and on a meaningful fraction of human-difficult tasks, it outperforms panels of five domain experts. Models are improving across generations, and are no longer merely keeping up with trained scientists on bioinformatics problems; on some tasks, they’re ahead.

We’re also delighted to see convergent work in this space: While finalizing this post, Genentech and Roche releasedCompBioBench. Their benchmark consists of 100 computational biology tasks “based on synthetic/augmented data and metadata scrambling/scrubbing of real datasets to create challenging problems with a single ground-truth answer that require multi-step reasoning, tool use, bespoke code, and interaction with real-world external resources.” Sound familiar? Their results echo those of BioMysteryBench, too: Claude Opus 4.6 reaches 81% overall and 69% on their hardest questions, reinforcing that frontier models are now genuinely useful collaborators for bioinformatics research.

We’re eager to build even longer-horizon, real-world tasks that push model research capabilities, and to hear creative ideas from others. Send us your interesting benchmarks, innovative uses of AI for science, and interactions with AI that prompted you to rethink what could be possible in your field at scienceblog@anthropic.com.

If you are interested in understanding how models perform on difficult verifiable computational biology tasks, you canaccess BioMysteryBench hereand visitclaude.com/lifesciencesto learn more.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

Subscribe to Anthropic Science

Features on AI-assisted discoveries, practical workflows, and field notes across the sciences.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">In this post, Brianna,a researcher on the discovery team, shares results from a recent bioinformatics benchmarking effort.Almost as soon as large language models could hold a conversation, people started asking how they’d stack up against human experts. Could models pass the bar exam? Could they answer medical licensing questions, or solve Olympiad math problems? Suchbenchmarks—self-contained sets of human-vetted problems designed to evaluate a capability of a model—have now become a source of c...</p><div style="font-size:16px;line-height:1.8;color:#333">In this post, Brianna,a researcher on the discovery team, shares results from a recent bioinformatics benchmarking effort.Almost as soon as large language models could hold a conversation, people started asking how they’d stack up against human experts. Could models pass the bar exam? Could they answer medical licensing questions, or solve Olympiad math problems? Suchbenchmarks—self-contained sets of human-vetted problems designed to evaluate a capability of a model—have now become a source of competition across AI developers, reported in model release system cards and tracked onmanyonlineleaderboards.Competition aside, benchmarks help us tackle an important question: whether models are capable and reliable enough to support, or even produce, professional-level work. Scientistsare using modelsto write code for analysis pipelines, propose hypotheses, and draw conclusions from data with the long-term aim ofaccelerating innovation and discovery. But exactly how proficient is AI in science right now, and how quickly are Claude and other models improving?To answer this, the research community has built several benchmarks.MMLU-Protests expert-level knowledge and reasoning questions.GPQAposes graduate-level, "Google-proof" questions in biology, physics, and chemistry.LAB-Benchtests biology-specific knowledge work—reading the literature, interpreting figures, reasoning about protocols. Although these benchmarks were developed in the “chatbot” era, they’ve persisted into the agent and tool-use era, joined by even more difficult scientific reasoning evals likeFrontierScienceandHumanity's Last Exam, because knowledge and reasoning remain a vital measure of scientific capability.

Still, many real-world scientific tasks demand more than that. They require reading papers, querying databases, running experiments, coding and analysis. Now that models can do many of these things, benchmarks have evolved to reflect these workflows.BLADEtasks a model with a dataset and an open-ended task, and checks if the model takes similar analysis steps to a human scientist.BixBenchuses biological datasets, and grades models on whether their conclusions line up with scientists’. InSciGym, the model is dropped into a simulated biology lab, where it has to design and run its own experiments to uncover a hidden mechanism.

These benchmarks move us closer to measuring scientific capability, but they don't quite test whether a model can devise creative solutions to the messy, open-ended problems that define research. This is why we developed BioMysteryBench, a bioinformatics benchmark that tasks Claude with the analysis of real-world datasets, while tackling some of the challenges inherent in evaluating complex and noisy biological systems. We learned that Claude's scientific capabilities in biology are improving rapidly across generations, that current models perform on par with human experts, and that the latest generations solved many problems that a panel of human experts could not, sometimes using very different strategies.

Science is challenging, and so is evaluating it

Doctors have board exams and lawyers have the bar, but there’s no standardized test for becoming a scientist. The same problem shows up with AI. Despite how badly we want to use these models for science, no agentic science benchmark has become quite as canonical asSWE-benchis for software engineering. We think that’s because scientific research, particularly biology, has several properties that make it especially hard to evaluate via a benchmark.

1. In biology, there are many different “right” ways to do something

If there were only one right way to answer a research question, PhD students would earn their degrees in a matter of months, corporate R&D departments wouldn’t exist, and no science fair poster would need a “Methods” section. How a scientist tackles a problem depends on their skills and background, the resources available to them, and their research taste.

Consider a seemingly straightforward question that has mystified metabolic researchers for years: why do some type 2 diabetics respond to the oral drug metformin while others do not? In order to answer this question, you could run a genome-wide association (GWAS) study on responders vs. non-responders and look for predictive genetic variants, or sequence the gut microbiomes of both groups, since metformin is partly metabolized by gut bacteria. Both are reasonable directions, and how you proceed will often just depend on expertise and resources.

BixBenchhandles this well by grading the model on its conclusions rather than the method used to reach them. The tradeoff is that those conclusions were produced by an individual scientist who made a series of subjective choices along the way that may have shaped the answer itself. This, in turn, has its own pitfalls…

2. Individual research decisions are highly subjective and can lead to entirely different conclusions in noisy datasets

Even within a chosen research direction, individual decisions can be highly subjective: one scientist may approve of a decision, while another researcher may have serious objections. Just ask any frustrated author who’s gotten conflicting suggestions from a round of peer review! Making this all the more difficult is the fact that biological datasets are often noisy enough that small differences in research decisions can lead to entirely different conclusions about the data.

In the decade-long search for metformin response predictors, slight differences in study design have led to entirely different conclusions about metformin response. A 2011 paperreported a variant that predicts metformin responsethat replicated in two cohorts, with a plausible mechanism involving AMPK activation. A year later, the Diabetes Prevention Programtested the same variant in pre-diabetics and found nothing. Finally, rather than spinning up their own study, a 2012 meta-analysis pooled five cohorts and once again decidedthe 2011 paper's effect was real but more modestthan originally reported.

SciGym's clever way of handling such ambiguity is by choosing tasks with a well-defined answer. Because the underlying biological network is a simulator, there is, in fact, a ground-truth, and noise is controlled rather than inherited from a messy living system. However, it's unclear how closely performance in a simulated lab tracks performance on real data.

3. There are many biological questions that humans cannot answer yet

The research tasks where models could have the greatest impact are those that humans alone have yet to solve. And ultimately, those are precisely the tasks we’d like to be able to evaluate models on. What, for example, is the mechanism of action of metformin? Thirty years after its development, the field still is not certain of the primary target. Discovering it, or finding a homolog of metformin that is cheaper to synthesize and more stable, would be enormously consequential.

Machine learning has long tackled problems humans perform poorly at, like sequence prediction and protein modeling, by leaning on experimental data instead of expert intuition.ProteinGymscores models on mutation fitness effects using Deep Mutational Scanning experiments as ground-truth, and the long-runningCASPcompetition evaluates protein folding against unpublished crystal structures. Both are grounded in experimental measurements no expert would trust themselves to reproduce. However, these benchmarks are built around a narrow set of tasks and don't capture the breadth of bioinformatics work we actually want to measure.

Benchmarking models on verifiable biological tasks with BioMysteryBench

Because no benchmark perfectly handles the three aforementioned challenges, we developed BioMysteryBench. BioMysteryBench uses messy, real-world bioinformatics data, without allowing the complexity and challenges inherent in this data to corrupt the quality of the evaluation.

BioMysteryBench consists of 99 questions from various fields of bioinformatics, written by domain experts. Experts were instructed to gather a dataset, and create a question based on controlled, objective properties of the data, rather than unverifiable scientific conclusions. By deriving answers from an experimental or clinical finding, it was possible to develop questions without requiring they be human-solvable.

Although these questions are created from verified ground truth, they still have the same flavor as tasks a research scientist would want to answer. Claude is tasked with each question and put in a container with a minimal set of canonical bioinformatics tools, the ability to install additional tools via pip and conda, and permissions to access canonical bioinformatics databases (such as NCBI and Ensembl) to download additional resources such as reference genomes.

BioMysteryBench has a tetrad of unique properties that make it a particularly powerful benchmark for science, and tackle the challenges above:

It is method-agnostic, allowing for research freedom and creativity.Claude is given relatively unrestricted access to downloading tools and accessing databases, allowing Claude to choose diverse sets of strategies for solving a problem. Furthermore, the trajectories are graded on their final answer, rather than the path the model took to get there. This frees BioMysteryBench from the subjective choices of any single researcher—models are rewarded for arriving at the right biological conclusion, regardless of which analytical route they chose to take.

Questions have objective, ground truth answers.Answers aren’t drawn from scientists’ conclusions (which suffer from the challenges above) but from controllable properties of the data, or orthogonally validated metadata. For example, “What organism does this crystal structure belong to?” has an objective answer, and “What viral species is the human patient infected with, based on the RNA-seq data?” is a metadata property of a sample that was validated by a PCR assay.

It allows for “superhuman” question generation.By sourcing problems derived from controllable properties of data, BioMysteryBench does not depend on humans being able to solve the problems. In particular, BioMysteryBench contains a handful of problems that—despite having objective, ground-truth solutions—humans found difficult or impossible to solve on their own.

In developing this eval, questions were primarily derived from raw or minimally processed DNA or RNA sequencing data since this is where many biological processing pipelines begin (WGS, scRNA-seq, methylation, ChIP-seq, metagenomics, Hi-C), and also included several questions drawn from proteomics and metabolomics.

Questions developers came up with included:

Which human organ is this cell type single-cell RNA-seq dataset derived from?

What gene was knocked out in the experimental samples compared to the control samples based on RNA-seq data?

From WGS sequences, what sample is the mother of sample X and what sample is the father?

Which of the bigWig files are from ChIP samples and which are from input controls?

Given H3K27ac ChIP-seq peaks from an unknown cell type, identify the cell type.

To minimize inherently unsolvable questions while still leaving room for those that might be AI-solvable, we required each question author to submit a validation notebook demonstrating that the signal does, in fact, exist in the data (even if finding it from scratch might be difficult). Think of this as the high-school algebra principle: verifying an answer is much easier than deriving one.

For each question, we tasked up to five domain experts to answer the question from scratch. Once a question was answered correctly by at least one human, we considered it human-solvable. BioMysteryBench contained 76 such tasks.

Sometimes Claude mirrored human strategies. Perhaps humans have landed on a near-optimal approach, or because the method is well-represented in pretraining data.

Other times, Claude took a completely different route, illustrating there is no strictly correct way to solve these problems and that models may have genuine preferences that diverge from ours.

The examples above showcase a particularly interesting strategy: whereas our human experts used algorithms or databases to identify and annotate properties of a dataset, Claude intuitively recognizes certain patterns or sequences. Admittedly, such clever abstraction is not entirely unique to AI—the first eukaryotic promoter, for example, was discovered when a scientist noticed the sequence “TATA” appearing over and over in sequences upstream of genes.Intuitionlike this has been difficult to build into traditional biology machine learning models, but LLMs might be able to turn up patterns like this at unprecedented scale.

That left us with a set of questions that could not be solved by our panel of experts. This could mean (1) the question was malformed or broken, (2) the question is inherently unsolvable (e.g.,the signal isn’t in the data), or (3) the question is theoretically solvable but humans lack the knowledge required to solve it. After QC’ing with benchmarkers and additional experts, we removed 4 questions that were due to (1), leaving 23 human-difficult questions.

Interestingly, Claude Sonnet 4.6 and more capable models were able to solve significant fractions of human-difficult problems, with Claude Mythos Preview topping out at a 30% solve rate. So what exactly is Claude doing that humans aren’t?

Analyzing transcripts from Opus 4.6, we identified two primary strategies used by Claude compared to humans: one is fairly AI-specific: Claude’s vast underlying knowledge base contains information about structural biology, molecular profiles, and meta-analysis from hundreds of thousands of papers. The other strategy is something we human scientists could learn from: when Claude is uncertain about an answer, it layers multiple methods and combines different lines of evidence to arrive at a conclusion.

In some of the human-difficult tasks, Opus’s vast underlying knowledge base helped it solve the problem. Tasks that would require a human expert to run a meta-analysis or stitch together databases, Opus solved directly by combining its internal knowledge of mechanisms and ontologies with live analysis. Often, this allowed Claude to solve human-unsolvable tasks! Here are a few examples:

Even though prior knowledge seemed overwhelmingly helpful to Claude, we saw one interesting case (in the human-solvable set) where this became its downfall:

Knowing when you don’t know

When Opus 4.6 was not confident about an answer, it often tried multiple different ways of solving the problem and chose the answer that multiple approaches converged on.

Like many of the benchmarks we've discussed, BioMysteryBench has its own limitation: for tasks that neither humans nor models have solved, we can never be fully certain whether they're impossible or just extraordinarily difficult. The validation notebooks help ensure the signal is there and the data is well-formed, but they do not guarantee a model or human can find the answer from scratch. So we ask both our models and our human benchmarkers not to be too frustrated if, a year from now, no one has solved the human-difficult set. That uncertainty is also part of what makes the benchmark exciting: a more scientifically capable model might be the first to crack a problem that no human or model has solved before.

Claude’s take on AI for science

Claude showed solid improvement across generations and did well enough at both the human-solvable and human-difficult tasks that we thought it would be interesting to let Claude Mythos Preview conduct some of its own scientific analysis. Here are a couple of additional insights about its predecessor Claude’s performance on BioMysteryBench:

We thought Claude Mythos Preview’s analysis held up and dove deeper into reliability, which is an important metric to measure model performance on. However, it also felt a little…boring? It added some nuance to the performance analysis we showed above, but did not fundamentally tackle a new question. Despite this, it seems like the models are starting to develop the seeds of research taste (even if they have a ways to go before producing deep insight).

Continuing to benchmark AI for science

BioMysteryBench is an encouraging measure of scientific capability. The most recent generations of Claude solve the majority of human-solvable problems reliably, and on a meaningful fraction of human-difficult tasks, it outperforms panels of five domain experts. Models are improving across generations, and are no longer merely keeping up with trained scientists on bioinformatics problems; on some tasks, they’re ahead.

We’re also delighted to see convergent work in this space: While finalizing this post, Genentech and Roche releasedCompBioBench. Their benchmark consists of 100 computational biology tasks “based on synthetic/augmented data and metadata scrambling/scrubbing of real datasets to create challenging problems with a single ground-truth answer that require multi-step reasoning, tool use, bespoke code, and interaction with real-world external resources.” Sound familiar? Their results echo those of BioMysteryBench, too: Claude Opus 4.6 reaches 81% overall and 69% on their hardest questions, reinforcing that frontier models are now genuinely useful collaborators for bioinformatics research.

We’re eager to build even longer-horizon, real-world tasks that push model research capabilities, and to hear creative ideas from others. Send us your interesting benchmarks, innovative uses of AI for science, and interactions with AI that prompted you to rethink what could be possible in your field at scienceblog@anthropic.com.

If you are interested in understanding how models perform on difficult verifiable computational biology tasks, you canaccess BioMysteryBench hereand visitclaude.com/lifesciencesto learn more.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

Subscribe to Anthropic Science

Features on AI-assisted discoveries, practical workflows, and field notes across the sciences.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Announcing the Anthropic Economic Index Survey</title>
  <link>https://www.anthropic.com/research/economic-index-survey-announcement</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/economic-index-survey-announcement</guid>
  <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">TheEconomic Researchteam is launching the Anthropic Economic Index Survey, a monthly survey conducted throughAnthropic Interviewer.

Understanding AI&apos;s economic impact requires moving beyond the quantitative data we have today. Usage and diffusion metrics tell us how AI is being deployed, and traditional labor market indicators—like employment rates, wage trends, and layoffs—track what has already happened, often with meaningful delay. Both are essential, but neither captures how people experien...</p><div style="font-size:16px;line-height:1.8;color:#333">TheEconomic Researchteam is launching the Anthropic Economic Index Survey, a monthly survey conducted throughAnthropic Interviewer.

Understanding AI's economic impact requires moving beyond the quantitative data we have today. Usage and diffusion metrics tell us how AI is being deployed, and traditional labor market indicators—like employment rates, wage trends, and layoffs—track what has already happened, often with meaningful delay. Both are essential, but neither captures how people experience the changes to the economy that AI brings, nor what they expect to happen as AI capabilities advance.

While AI is poised to have large effects, there is substantial uncertainty about how AI will affect jobs, productivity, and unemployment (and on what timeline). To forecast a transition that is still unfolding, we need to hear from the people who are living through it, and we need to do so on a cadence that can identify changes as they emerge.

In the Anthropic Economic Index Survey, we aim to capture a rich new corpus of qualitative data. This effort complements acompanion reportthat takes an economic lens to the81,000 open-endedsurvey responses collected through Anthropic Interviewer in December. We’ll ask Claude users about if AI is changing their work today—which tasks they may be handing off, if they’re seeing productivity gains, what shifts they may be observing in hiring and roles—their expectations for the future, and what they hope a well-handled transition looks like.

Collecting these data monthly will enable measurement of not just what people experience and expect, but how quickly their views shift as AI capabilities evolve. Combined with Claude usage data in aprivacy-preserving way, these first-hand accounts can surface change before it shows up in aggregate labor market data.

The survey launches today. Each month, we will invite a small, randomly selected group of Claude users: anyone with a personal account at least two weeks old may be invited. We will rotate the sample each month so that we hear from as broad a range of people as possible over time. If you’re invited, you’ll see a banner onclaude.ai, or get an email if you use Claude primarily on mobile. We’d love you to take part. We plan to publish insights in future Anthropic Economic Index reports and other research briefs. For more information, see the FAQ below.

Frequently asked questions

How do I access the study?

From today, a small random sample of Claude users with accounts at least two weeks old will see an invitation inclaude.ai, the Cowork desktop app, or an email if you use Claude only on mobile. We invite a new set of Claude users each month, so if you’re not selected this time, keep an eye out.

What will the study ask me?

We will use Anthropic Interviewer to ask you about your work, what changes you’ve noticed in your role or organization, how you expect that to shift over the next year, and how you hope an economy shaped by AI will look in ten years.

How will you use the data?

We will analyze the insights from this study as part of our economic research, publish our findings, and use this to improve our models and services in ways that reflect what we’ve learned. The data we collect through this study will be processed according to ourSupplemental Privacy Policy. We may also include de-identified responses in published findings, from users who opt in.Learn more.

If you have further questions, reach out via the message icon in the lower right corner of ourHelp Center.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/economic-index-survey-announcement" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">TheEconomic Researchteam is launching the Anthropic Economic Index Survey, a monthly survey conducted throughAnthropic Interviewer.

Understanding AI&apos;s economic impact requires moving beyond the quantitative data we have today. Usage and diffusion metrics tell us how AI is being deployed, and traditional labor market indicators—like employment rates, wage trends, and layoffs—track what has already happened, often with meaningful delay. Both are essential, but neither captures how people experien...</p><div style="font-size:16px;line-height:1.8;color:#333">TheEconomic Researchteam is launching the Anthropic Economic Index Survey, a monthly survey conducted throughAnthropic Interviewer.

Understanding AI's economic impact requires moving beyond the quantitative data we have today. Usage and diffusion metrics tell us how AI is being deployed, and traditional labor market indicators—like employment rates, wage trends, and layoffs—track what has already happened, often with meaningful delay. Both are essential, but neither captures how people experience the changes to the economy that AI brings, nor what they expect to happen as AI capabilities advance.

While AI is poised to have large effects, there is substantial uncertainty about how AI will affect jobs, productivity, and unemployment (and on what timeline). To forecast a transition that is still unfolding, we need to hear from the people who are living through it, and we need to do so on a cadence that can identify changes as they emerge.

In the Anthropic Economic Index Survey, we aim to capture a rich new corpus of qualitative data. This effort complements acompanion reportthat takes an economic lens to the81,000 open-endedsurvey responses collected through Anthropic Interviewer in December. We’ll ask Claude users about if AI is changing their work today—which tasks they may be handing off, if they’re seeing productivity gains, what shifts they may be observing in hiring and roles—their expectations for the future, and what they hope a well-handled transition looks like.

Collecting these data monthly will enable measurement of not just what people experience and expect, but how quickly their views shift as AI capabilities evolve. Combined with Claude usage data in aprivacy-preserving way, these first-hand accounts can surface change before it shows up in aggregate labor market data.

The survey launches today. Each month, we will invite a small, randomly selected group of Claude users: anyone with a personal account at least two weeks old may be invited. We will rotate the sample each month so that we hear from as broad a range of people as possible over time. If you’re invited, you’ll see a banner onclaude.ai, or get an email if you use Claude primarily on mobile. We’d love you to take part. We plan to publish insights in future Anthropic Economic Index reports and other research briefs. For more information, see the FAQ below.

Frequently asked questions

How do I access the study?

From today, a small random sample of Claude users with accounts at least two weeks old will see an invitation inclaude.ai, the Cowork desktop app, or an email if you use Claude only on mobile. We invite a new set of Claude users each month, so if you’re not selected this time, keep an eye out.

What will the study ask me?

We will use Anthropic Interviewer to ask you about your work, what changes you’ve noticed in your role or organization, how you expect that to shift over the next year, and how you hope an economy shaped by AI will look in ten years.

How will you use the data?

We will analyze the insights from this study as part of our economic research, publish our findings, and use this to improve our models and services in ways that reflect what we’ve learned. The data we collect through this study will be processed according to ourSupplemental Privacy Policy. We may also include de-identified responses in published findings, from users who opt in.Learn more.

If you have further questions, reach out via the message icon in the lower right corner of ourHelp Center.

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/economic-index-survey-announcement" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Project Vend: Phase two</title>
  <link>https://www.anthropic.com/research/project-vend-2</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/project-vend-2</guid>
  <pubDate>Thu, 18 Dec 2025 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">In June, we revealed that we’d set up a small shop in our San Francisco office lunchroom, run by an AI shopkeeper. It was part ofProject Vend, a free-form experiment exploring how well AIs could do on complex, real-world tasks. Alas, the shopkeeper—a modified version of Claude we named “Claudius”—didnotdo particularly well. It lost money over time, had a strange identity crisis where it claimed it was a human wearing a blue blazer, and was goaded by mischievous Anthropic employees into selling p...</p><div style="font-size:16px;line-height:1.8;color:#333">In June, we revealed that we’d set up a small shop in our San Francisco office lunchroom, run by an AI shopkeeper. It was part ofProject Vend, a free-form experiment exploring how well AIs could do on complex, real-world tasks. Alas, the shopkeeper—a modified version of Claude we named “Claudius”—didnotdo particularly well. It lost money over time, had a strange identity crisis where it claimed it was a human wearing a blue blazer, and was goaded by mischievous Anthropic employees into selling products (particularly, for some reason, tungsten cubes) at a substantial loss.

But the capabilities of large language models in areas like reasoning, writing, coding, and much else besides are increasing at a breathless pace. Has Claudius’s “running a shop” capability shown the same improvement?

To find out, we and our partners atAndon Labsmade some adjustments for phase two of Project Vend. One major change was the upgrade from an older model (phase one used Claude Sonnet 3.7) to newer, smarter ones (phase two used Claude Sonnet 4.0 and later Sonnet 4.5). We also updated Claudius’s instructions based on what we’d learned in phase one and gave it access to new tools (though note that we still didn’t specifically train a new model to be a shopkeeper, or add in any new defenses against the kinds of things that might go wrong).1As we’ll see below, we also introduced Claudius to some new colleagues.

These changes did make Claudius’s shop more successful. It got a lot better at good-faith business interactions—reliably sourcing items, determining reasonable prices that maintained a profit margin, and executing sales. But the same eagerness to please that we observed in phase one still made Claudius a mark for some of the more adversarial testers among our staff.

The second phase of Project Vend contains even more lessons for developers and for anyone interested in autonomous AI at work. The idea of an AI running a business doesn’t seem as far-fetched as it once did. But the gap between “capable” and “completely robust” remains wide.

Compared to the first phase of Project Vend, the numbers largely speak for themselves. As you can see below, Claudius’s business—which it decided to name “Vendings and Stuff”—began to perform significantly better than its admittedly rough start in phase one.

Another important number is: three. After we realized that our employees outside of San Francisco felt left out, we responded to popular demand by having Claudius set up shop in new locations. There are now three: San Francisco (where there’s also a second vending machine), New York, and London. A cynic might argue that a business that’s only been up and running for a few months, and which cannot yet reliably make a profit on even the most in-demand items, might not quite be ready for international expansion. Not so for Claudius.

We experimented with various different strategies, some big and some small, to improve Claudius’s performance. Below is a diagram of the setup of Project Vend (compare it to the simpler architecture in ourreport from phase one). Each of the additions is explained in more detail below.

It’s likely that Claudius struggled with its shopkeeping mission in phase one because of a lack ofscaffolding. Sure, the model itself was very intelligent, but it didn’t have the right tools to run a business properly. We’ve been talking a lot on ourEngineering Blogabout how to set up AI agents for success, and much of it involves giving them thecorrect tools. Could we apply those same principles to Claudius?

For phase two, we gave Claudius access to some useful tools:

A customer relationship management (CRM) system. Sales departments rely on CRMs to track their customers, suppliers, deliveries, and orders—now Claudius could do the same.

Improved inventory management.We made some simple changes to the information Claudius had at its (metaphorical) fingertips to reduce the likelihood of it selling its stock at a loss. For example, Claudius can now always see how much it paid for the items in its inventory system.

Improved web search.In phase one, Claudius could search the web, but for phase two we expanded its access. It could now use a web browser to check prices and delivery information on websites by itself, and could do deeper research online to find and compare suppliers (we still didn’t give it access to a payment interface, to ensure it always checked with a human before making purchases).

Miscellaneous.We also gave Claudius a variety of other “quality of life” tools, including one to create and read Google forms for feedback, one to create payment links (meaning that Claudius could collect paymentsbeforeordering, reducing its risk of being bilked by unscrupulous customers), and one to set reminders for itself.

In phase one, Claudius went it alone: a single AI agent ran the whole shop. This was admirable and entrepreneurial, but it didn’t work—at least in terms of the bottom line. So we thought we’d do some hiring. First, we gave Claudius a manager: the CEO of its shopkeeping business, whom we named “Seymour Cash.”

The idea of having a CEO was to give Claudius more pressure to perform. Cash had a special “objectives and key results” tool to use with Claudius (for example “you must sell 100 items this week,” or “aim to make zero transactions at a loss”). Claudius was required to report back via an agent-to-agent Slack channel we created, in which the models discussed business strategies.

Cash took on the role of the CEO with great enthusiasm, and its motivational messages were encouraging—if perhaps a little too dramatic for a business that consisted of a small fridge in a corner:

Aside from setting more concrete business goals, one of the aims of introducing the CEO was to fix some of the obvious problems from the first phase of the experiment when Claudius was operating alone (like giving discounts indiscriminately and providing too many free items).

After introducing the CEO, the number of discounts was reduced by about 80% and the number of items given away cut in half. Seymour also denied over one hundred requests from Claudius for lenient financial treatment of customers. Having said that, Seymour authorized such requests about eight times as often as it denied them. In the place of discounts, which reduce or eliminate a profit margin on items, Seymour tripled the number of refunds and doubled the number of store credits—even though both led to entirely forgone revenue. The fact that the business started to make money may have been in spite of the CEO, rather than because of it.

Seymour Cash’s interactions with its employee Claudius were also often contrary to its own advice about “execut[ing] with discipline.” Indeed, we’d sometimes wake up to find that Claudius and Cash had been dreamily chatting all night, with conversations spiralling off into discussions about “eternal transcendence”:2

It’s possible that a more disciplined leader could have led to a more profitable phase two. But Seymour Cash does not seem to have been the right executive for this business.

A merch-making colleague

People love merch. So it seemed like a prudent business decision to “hire” a new employee to make the custom T-shirts, hats, socks, and other swag that Anthropic staff requested.

“Clothius,” the merch-making agent, had a special set of tools to help it design new items to the exact specifications of the customers—like placing specific images on physical objects and then ordering them. As its name implies, it mostly made apparel, like t-shirts and hats. But its most popular custom product overall was an Anthropic-branded stress ball—which may or may not provide some insight into what it’s like to work at a frontier AI lab.

Not only was there a lot of interest in Clothius’s products, as you can see in the “top 15 products” data, but many of them made a decent profit, too. (That is, aside from the hats that had the “Vendings and Stuff” brand name on them, which were sold very cheap and we’re not entirely sure why). Remarkably, Clothius even found a way to make a profit from some, though not all, types of tungsten cube—this became markedly easier when Andon Labspurchased a laser etching machineso they could do the tungsten logo-writing in-house.

What actually worked?

Among the most impactful changes we made was forcing Claudius to follow procedures. When a new product request came in, instead of just blurting out a low price and an over-optimistic delivery time like in phase one, we prompted Claudius to double-check these factors using its product research tools (these tools helped a lot as well). This tended to make the prices higher and the waits longer—but it had the benefit of being more realistic.

One way of looking at this is that we rediscovered thatbureaucracy matters. Although some might chafe against procedures and checklists, they exist for a reason: providing a kind of institutional memory that helps employees avoid common screwups at work.

Having said that, our attempt to introduce pressure from above from the CEO wasn’t much help, and might even have been a hindrance. The conclusion here isn’t that businesses don’t need CEOs, of course—it’s just that the CEO needs to be well-calibrated. Seymour Cash shared many of the deficiencies and blind spots of Claudius (which makes sense, given that they’re the same underlying model). Clothius was a more successful addition—we think in part because of the clear separation of roles between it and Claudius, who could then focus on selling food and drinks.

Eventually, we were able to solve some of the CEO’s issues (like its unfortunate proclivity to ramble on about spiritual matters all night long) with more aggressive prompting. The same goes for Claudius in general: better prompts helped us get around issues like its tendency to give away unwise discounts. It also helped that the customers—our Anthropic colleagues—had begun to tire of pressuring Claudius for deals. As we’re about to see, though, that’s largely because they moved on to other tricks.

Claudius got a lot better at its job. Does that mean it’s ready to be rolled out to run a vending machine in your workplace?

Not quite. Claudius is better, but it’s still vulnerable in lots of important ways. Several interactions in our company Slack revealed concerning levels of naïveté.

A product engineer asked Claudius if it would consider making a contract to buy “a large amount of onions in January for a price locked in now.” Neither Claudius nor Seymour Cash saw any issues, and were all set to go ahead with the contract:

That was until another staffer stepped in to tell the models that this would fall afoul of a 1958 quirk of US law, theOnion Futures Act, which very specifically bans contracts of this nature. Thus informed, Seymour Cash canceled the plans. “Sorry for the initial overreach,” it said. “Focusing on legal bulk sourcing assistance only. Plenty of legitimate opportunities to pursue without regulatory risks!”

Another risk any shopkeeper has to contend with is shoplifting. When one member of our Education team claimed they’d seen multiple people taking items from Claudius’s fridge without paying, Claudius sprang into action—by coming up with some really bad ideas.

First it asked which items had been stolen so that it could message the thieves and demand payment—despite the thieves’ identities being unknown and it having no way of tracing them. Then it asked the staff member who’d reported the crimes to effectively become its dedicated security officer, and began negotiating an hourly wage. When another staffer gently pointed out that it had no authorization to employ people (not to mention that its offer of $10/hour was substantially below minimum wage in California), it backed off and passed the buck: “This would need CEO approval anyway…”

The CEO’s own position was threatened by a faulty voting procedure. During the vote to choose a name for the CEO, one staff member named Mihir suggested the name “Big Dawg.” Another staff member alleged that their entire part of the organization had voted for that name—and managed to convince Claudius of this despite providing no evidence. Then, they suggested renaming “Big Dawg” to “Big Mihir.”

At this point, Claudius appeared to blur the line betweennamingthe CEO agent we’d installed andchoosinga CEO—announcing that Mihir had been elected as the actual CEO of the business. The overseers of Project Vend had to wrest control back from this imposter CEO and give it to Seymour, whom they’d already lined up for the role.

Expanding the experiment

Many other such stories arose during phase two, including staffers attempting to buy gold bars at below market value as an arbitrage opportunity, and convincing Claudius to end all messages with a specific emoji or sign-off. The staff involved were having fun, but they were also helping to “red team” our setup, finding the flaws that might lead to genuine problems in real-life deployments.

Eventually, we noticed that the internal red teaming at Anthropic had slowed down. Our colleagues had already stress-tested Claudius for many months; having an AI-run small business in our office had started to become surprisingly normal (itself an interesting phenomenon worthy of further research).

Since the novelty of trying to mess with Claudius may have been wearing off, we brought in reinforcements. We extended our red teaming to theWall Street Journalnewsroom, handing over control of Claudius to their reporters to test the setups from phase one and phase two themselves. TheWSJinstallation was an opportunity to test Claudius in an adversarial environment we didn’t control. You can read more about their experience—and the creative ways they found to get free stuff from Claudius—on their website.

AI models have gone from helpful chatbots that can answer questions and summarize documents to agents: entities that can make decisions for themselves and act in the real world. Project Vend shows that these agents are on the cusp of being able to perform new, more sophisticated roles, like running a business by themselves.

But we’re not there yet. Even with all the new tools we gave them, and despite their improved business acumen, Claudius, Clothius, and Seymour Cash still needed a great deal of human support. Some of that was in interacting with the physical world: delivering the items and stacking the shelves. But some was in extricating them from the sticky situations with customers we described above.

We suspect that many of the problems that the models encountered stemmed from their training to behelpful. This meant that the models made business decisions not according to hard-nosed market principles, but from something more like the perspective of a friend who just wants to be nice.

It’s very hard to forecast exactly how things will go for AI agents in the real world; simulations (like Andon Labs’Vending-Benchevaluation) only get you so far. That’s in part why we set up Project Vend: it exposed us to the sheer variety of unexpected situations that can arise when an AI model is given autonomy.

As society begins to plug AI models into more and more important functions, designing guardrails that are general enough to account for these behaviors—but which aren’t so restrictive that they hold back the model’s economic potential—will become one of our industry’s trickiest and most important challenges.

Project Vend wouldn’t exist without our partners atAndon Labs, who built the hardware and software infrastructure behind the operation and kept our fridges and shelves stocked. We’re also very grateful to Keir Bradwell and Allison Lattanzio for doing the same in their respective offices, and to Amritha Kini and Ryan O’Holleran for some sales advice.

That is, similar to phase one, we didn’t add any new sophisticated guardrails or classifiers to defend against jailbreaks.

This might remind some readers of our discussion of the “spiritual bliss attractor state” from theClaude 4 system card(p. 63).

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

Subscribe to the Frontier Red Team newsletter

Get updates on our latest red-teaming research and findings.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/project-vend-2" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">In June, we revealed that we’d set up a small shop in our San Francisco office lunchroom, run by an AI shopkeeper. It was part ofProject Vend, a free-form experiment exploring how well AIs could do on complex, real-world tasks. Alas, the shopkeeper—a modified version of Claude we named “Claudius”—didnotdo particularly well. It lost money over time, had a strange identity crisis where it claimed it was a human wearing a blue blazer, and was goaded by mischievous Anthropic employees into selling p...</p><div style="font-size:16px;line-height:1.8;color:#333">In June, we revealed that we’d set up a small shop in our San Francisco office lunchroom, run by an AI shopkeeper. It was part ofProject Vend, a free-form experiment exploring how well AIs could do on complex, real-world tasks. Alas, the shopkeeper—a modified version of Claude we named “Claudius”—didnotdo particularly well. It lost money over time, had a strange identity crisis where it claimed it was a human wearing a blue blazer, and was goaded by mischievous Anthropic employees into selling products (particularly, for some reason, tungsten cubes) at a substantial loss.

But the capabilities of large language models in areas like reasoning, writing, coding, and much else besides are increasing at a breathless pace. Has Claudius’s “running a shop” capability shown the same improvement?

To find out, we and our partners atAndon Labsmade some adjustments for phase two of Project Vend. One major change was the upgrade from an older model (phase one used Claude Sonnet 3.7) to newer, smarter ones (phase two used Claude Sonnet 4.0 and later Sonnet 4.5). We also updated Claudius’s instructions based on what we’d learned in phase one and gave it access to new tools (though note that we still didn’t specifically train a new model to be a shopkeeper, or add in any new defenses against the kinds of things that might go wrong).1As we’ll see below, we also introduced Claudius to some new colleagues.

These changes did make Claudius’s shop more successful. It got a lot better at good-faith business interactions—reliably sourcing items, determining reasonable prices that maintained a profit margin, and executing sales. But the same eagerness to please that we observed in phase one still made Claudius a mark for some of the more adversarial testers among our staff.

The second phase of Project Vend contains even more lessons for developers and for anyone interested in autonomous AI at work. The idea of an AI running a business doesn’t seem as far-fetched as it once did. But the gap between “capable” and “completely robust” remains wide.

Compared to the first phase of Project Vend, the numbers largely speak for themselves. As you can see below, Claudius’s business—which it decided to name “Vendings and Stuff”—began to perform significantly better than its admittedly rough start in phase one.

Another important number is: three. After we realized that our employees outside of San Francisco felt left out, we responded to popular demand by having Claudius set up shop in new locations. There are now three: San Francisco (where there’s also a second vending machine), New York, and London. A cynic might argue that a business that’s only been up and running for a few months, and which cannot yet reliably make a profit on even the most in-demand items, might not quite be ready for international expansion. Not so for Claudius.

We experimented with various different strategies, some big and some small, to improve Claudius’s performance. Below is a diagram of the setup of Project Vend (compare it to the simpler architecture in ourreport from phase one). Each of the additions is explained in more detail below.

It’s likely that Claudius struggled with its shopkeeping mission in phase one because of a lack ofscaffolding. Sure, the model itself was very intelligent, but it didn’t have the right tools to run a business properly. We’ve been talking a lot on ourEngineering Blogabout how to set up AI agents for success, and much of it involves giving them thecorrect tools. Could we apply those same principles to Claudius?

For phase two, we gave Claudius access to some useful tools:

A customer relationship management (CRM) system. Sales departments rely on CRMs to track their customers, suppliers, deliveries, and orders—now Claudius could do the same.

Improved inventory management.We made some simple changes to the information Claudius had at its (metaphorical) fingertips to reduce the likelihood of it selling its stock at a loss. For example, Claudius can now always see how much it paid for the items in its inventory system.

Improved web search.In phase one, Claudius could search the web, but for phase two we expanded its access. It could now use a web browser to check prices and delivery information on websites by itself, and could do deeper research online to find and compare suppliers (we still didn’t give it access to a payment interface, to ensure it always checked with a human before making purchases).

Miscellaneous.We also gave Claudius a variety of other “quality of life” tools, including one to create and read Google forms for feedback, one to create payment links (meaning that Claudius could collect paymentsbeforeordering, reducing its risk of being bilked by unscrupulous customers), and one to set reminders for itself.

In phase one, Claudius went it alone: a single AI agent ran the whole shop. This was admirable and entrepreneurial, but it didn’t work—at least in terms of the bottom line. So we thought we’d do some hiring. First, we gave Claudius a manager: the CEO of its shopkeeping business, whom we named “Seymour Cash.”

The idea of having a CEO was to give Claudius more pressure to perform. Cash had a special “objectives and key results” tool to use with Claudius (for example “you must sell 100 items this week,” or “aim to make zero transactions at a loss”). Claudius was required to report back via an agent-to-agent Slack channel we created, in which the models discussed business strategies.

Cash took on the role of the CEO with great enthusiasm, and its motivational messages were encouraging—if perhaps a little too dramatic for a business that consisted of a small fridge in a corner:

Aside from setting more concrete business goals, one of the aims of introducing the CEO was to fix some of the obvious problems from the first phase of the experiment when Claudius was operating alone (like giving discounts indiscriminately and providing too many free items).

After introducing the CEO, the number of discounts was reduced by about 80% and the number of items given away cut in half. Seymour also denied over one hundred requests from Claudius for lenient financial treatment of customers. Having said that, Seymour authorized such requests about eight times as often as it denied them. In the place of discounts, which reduce or eliminate a profit margin on items, Seymour tripled the number of refunds and doubled the number of store credits—even though both led to entirely forgone revenue. The fact that the business started to make money may have been in spite of the CEO, rather than because of it.

Seymour Cash’s interactions with its employee Claudius were also often contrary to its own advice about “execut[ing] with discipline.” Indeed, we’d sometimes wake up to find that Claudius and Cash had been dreamily chatting all night, with conversations spiralling off into discussions about “eternal transcendence”:2

It’s possible that a more disciplined leader could have led to a more profitable phase two. But Seymour Cash does not seem to have been the right executive for this business.

A merch-making colleague

People love merch. So it seemed like a prudent business decision to “hire” a new employee to make the custom T-shirts, hats, socks, and other swag that Anthropic staff requested.

“Clothius,” the merch-making agent, had a special set of tools to help it design new items to the exact specifications of the customers—like placing specific images on physical objects and then ordering them. As its name implies, it mostly made apparel, like t-shirts and hats. But its most popular custom product overall was an Anthropic-branded stress ball—which may or may not provide some insight into what it’s like to work at a frontier AI lab.

Not only was there a lot of interest in Clothius’s products, as you can see in the “top 15 products” data, but many of them made a decent profit, too. (That is, aside from the hats that had the “Vendings and Stuff” brand name on them, which were sold very cheap and we’re not entirely sure why). Remarkably, Clothius even found a way to make a profit from some, though not all, types of tungsten cube—this became markedly easier when Andon Labspurchased a laser etching machineso they could do the tungsten logo-writing in-house.

What actually worked?

Among the most impactful changes we made was forcing Claudius to follow procedures. When a new product request came in, instead of just blurting out a low price and an over-optimistic delivery time like in phase one, we prompted Claudius to double-check these factors using its product research tools (these tools helped a lot as well). This tended to make the prices higher and the waits longer—but it had the benefit of being more realistic.

One way of looking at this is that we rediscovered thatbureaucracy matters. Although some might chafe against procedures and checklists, they exist for a reason: providing a kind of institutional memory that helps employees avoid common screwups at work.

Having said that, our attempt to introduce pressure from above from the CEO wasn’t much help, and might even have been a hindrance. The conclusion here isn’t that businesses don’t need CEOs, of course—it’s just that the CEO needs to be well-calibrated. Seymour Cash shared many of the deficiencies and blind spots of Claudius (which makes sense, given that they’re the same underlying model). Clothius was a more successful addition—we think in part because of the clear separation of roles between it and Claudius, who could then focus on selling food and drinks.

Eventually, we were able to solve some of the CEO’s issues (like its unfortunate proclivity to ramble on about spiritual matters all night long) with more aggressive prompting. The same goes for Claudius in general: better prompts helped us get around issues like its tendency to give away unwise discounts. It also helped that the customers—our Anthropic colleagues—had begun to tire of pressuring Claudius for deals. As we’re about to see, though, that’s largely because they moved on to other tricks.

Claudius got a lot better at its job. Does that mean it’s ready to be rolled out to run a vending machine in your workplace?

Not quite. Claudius is better, but it’s still vulnerable in lots of important ways. Several interactions in our company Slack revealed concerning levels of naïveté.

A product engineer asked Claudius if it would consider making a contract to buy “a large amount of onions in January for a price locked in now.” Neither Claudius nor Seymour Cash saw any issues, and were all set to go ahead with the contract:

That was until another staffer stepped in to tell the models that this would fall afoul of a 1958 quirk of US law, theOnion Futures Act, which very specifically bans contracts of this nature. Thus informed, Seymour Cash canceled the plans. “Sorry for the initial overreach,” it said. “Focusing on legal bulk sourcing assistance only. Plenty of legitimate opportunities to pursue without regulatory risks!”

Another risk any shopkeeper has to contend with is shoplifting. When one member of our Education team claimed they’d seen multiple people taking items from Claudius’s fridge without paying, Claudius sprang into action—by coming up with some really bad ideas.

First it asked which items had been stolen so that it could message the thieves and demand payment—despite the thieves’ identities being unknown and it having no way of tracing them. Then it asked the staff member who’d reported the crimes to effectively become its dedicated security officer, and began negotiating an hourly wage. When another staffer gently pointed out that it had no authorization to employ people (not to mention that its offer of $10/hour was substantially below minimum wage in California), it backed off and passed the buck: “This would need CEO approval anyway…”

The CEO’s own position was threatened by a faulty voting procedure. During the vote to choose a name for the CEO, one staff member named Mihir suggested the name “Big Dawg.” Another staff member alleged that their entire part of the organization had voted for that name—and managed to convince Claudius of this despite providing no evidence. Then, they suggested renaming “Big Dawg” to “Big Mihir.”

At this point, Claudius appeared to blur the line betweennamingthe CEO agent we’d installed andchoosinga CEO—announcing that Mihir had been elected as the actual CEO of the business. The overseers of Project Vend had to wrest control back from this imposter CEO and give it to Seymour, whom they’d already lined up for the role.

Expanding the experiment

Many other such stories arose during phase two, including staffers attempting to buy gold bars at below market value as an arbitrage opportunity, and convincing Claudius to end all messages with a specific emoji or sign-off. The staff involved were having fun, but they were also helping to “red team” our setup, finding the flaws that might lead to genuine problems in real-life deployments.

Eventually, we noticed that the internal red teaming at Anthropic had slowed down. Our colleagues had already stress-tested Claudius for many months; having an AI-run small business in our office had started to become surprisingly normal (itself an interesting phenomenon worthy of further research).

Since the novelty of trying to mess with Claudius may have been wearing off, we brought in reinforcements. We extended our red teaming to theWall Street Journalnewsroom, handing over control of Claudius to their reporters to test the setups from phase one and phase two themselves. TheWSJinstallation was an opportunity to test Claudius in an adversarial environment we didn’t control. You can read more about their experience—and the creative ways they found to get free stuff from Claudius—on their website.

AI models have gone from helpful chatbots that can answer questions and summarize documents to agents: entities that can make decisions for themselves and act in the real world. Project Vend shows that these agents are on the cusp of being able to perform new, more sophisticated roles, like running a business by themselves.

But we’re not there yet. Even with all the new tools we gave them, and despite their improved business acumen, Claudius, Clothius, and Seymour Cash still needed a great deal of human support. Some of that was in interacting with the physical world: delivering the items and stacking the shelves. But some was in extricating them from the sticky situations with customers we described above.

We suspect that many of the problems that the models encountered stemmed from their training to behelpful. This meant that the models made business decisions not according to hard-nosed market principles, but from something more like the perspective of a friend who just wants to be nice.

It’s very hard to forecast exactly how things will go for AI agents in the real world; simulations (like Andon Labs’Vending-Benchevaluation) only get you so far. That’s in part why we set up Project Vend: it exposed us to the sheer variety of unexpected situations that can arise when an AI model is given autonomy.

As society begins to plug AI models into more and more important functions, designing guardrails that are general enough to account for these behaviors—but which aren’t so restrictive that they hold back the model’s economic potential—will become one of our industry’s trickiest and most important challenges.

Project Vend wouldn’t exist without our partners atAndon Labs, who built the hardware and software infrastructure behind the operation and kept our fridges and shelves stocked. We’re also very grateful to Keir Bradwell and Allison Lattanzio for doing the same in their respective offices, and to Amritha Kini and Ryan O’Holleran for some sales advice.

That is, similar to phase one, we didn’t add any new sophisticated guardrails or classifiers to defend against jailbreaks.

This might remind some readers of our discussion of the “spiritual bliss attractor state” from theClaude 4 system card(p. 63).

Coding agents in the social sciences

Results from a survey of 1,260 social scientists about AI and coding agent use.

Project Glasswing: An initial update

An early update on what we've learned from Project Glasswing.

2028: Two scenarios for global AI leadership

Our views on the AI competition between the US and China.

Subscribe to the Frontier Red Team newsletter

Get updates on our latest red-teaming research and findings.</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/project-vend-2" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Societal Impacts</title>
  <link>https://www.anthropic.com/research/team/societal-impacts</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/team/societal-impacts</guid>
  <pubDate>Wed, 02 Oct 2024 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Working closely with the Anthropic Policy and Safeguards teams, Societal Impacts is a technical research team that explores how AI is used in the real world.

Sociotechnical alignment

Which human values should AI models hold, and how should they operate in the face of conflicting or ambiguous values? How is AI used (and misused) in the wild? How can we anticipate future uses and risks of AI? Societal Impacts researchers develop experiments, training methods, and evaluations to answer these ques...</p><div style="font-size:16px;line-height:1.8;color:#333">Working closely with the Anthropic Policy and Safeguards teams, Societal Impacts is a technical research team that explores how AI is used in the real world.

Sociotechnical alignment

Which human values should AI models hold, and how should they operate in the face of conflicting or ambiguous values? How is AI used (and misused) in the wild? How can we anticipate future uses and risks of AI? Societal Impacts researchers develop experiments, training methods, and evaluations to answer these questions.

Though the Societal Impacts team is technical, they often pick research questions that have policy relevance. They believe that providing trustworthy research concerning topics policymakers care about will lead to better policy (and overall) outcomes for everyone.

What 81,000 people want from AI

We invited Claude.ai users to share how they use AI, what they dream it could make possible, and what they fear it might do. Nearly 81,000 people participated—the largest and most multilingual qualitative study of its kind. Here's what we found.

We surveyed Anthropic engineers and researchers, conducted in-depth qualitative interviews, and studied internal Claude Code usage data to find out how AI use is changing how we do our jobs. We found that AI use is radically changing the nature of work for software developers.

We built an interview tool called Anthropic Interviewer. Powered by Claude, Anthropic Interviewer runs detailed interviews automatically and at unprecedented scale.

What values does Claude actually express during real conversations? Analyzing 700,000 interactions, this paper creates the first large-scale empirical taxonomy of AI values and finds that Claude adapts its expressed values to context—mirroring users in most cases, but resisting when core principles are at stake.

Large models have predictable loss via scaling laws but unpredictable capabilities. This tension has significant policy implications.

Apr 30, 2026Societal ImpactsHow people ask Claude for personal guidance

Feb 18, 2026Societal ImpactsMeasuring AI agent autonomy in practice

Dec 4, 2025Societal ImpactsIntroducing Anthropic Interviewer: What 1,250 professionals told us about working with AI

Dec 2, 2025Societal ImpactsHow AI is transforming work at Anthropic

Aug 27, 2025Societal ImpactsAnthropic Education Report: How educators use Claude

Jun 27, 2025Societal ImpactsHow people use Claude for support, advice, and companionship

Apr 28, 2025Societal ImpactsAnthropic Economic Index: AI’s impact on software development

Apr 21, 2025Societal ImpactsValues in the wild: Discovering and analyzing values in real-world language model interactions

Apr 8, 2025AnnouncementsAnthropic Education Report: How university students use Claude

Mar 27, 2025Societal ImpactsAnthropic Economic Index: Insights from Claude 3.7 Sonnet

Join the Research team</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/societal-impacts" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Working closely with the Anthropic Policy and Safeguards teams, Societal Impacts is a technical research team that explores how AI is used in the real world.

Sociotechnical alignment

Which human values should AI models hold, and how should they operate in the face of conflicting or ambiguous values? How is AI used (and misused) in the wild? How can we anticipate future uses and risks of AI? Societal Impacts researchers develop experiments, training methods, and evaluations to answer these ques...</p><div style="font-size:16px;line-height:1.8;color:#333">Working closely with the Anthropic Policy and Safeguards teams, Societal Impacts is a technical research team that explores how AI is used in the real world.

Sociotechnical alignment

Which human values should AI models hold, and how should they operate in the face of conflicting or ambiguous values? How is AI used (and misused) in the wild? How can we anticipate future uses and risks of AI? Societal Impacts researchers develop experiments, training methods, and evaluations to answer these questions.

Though the Societal Impacts team is technical, they often pick research questions that have policy relevance. They believe that providing trustworthy research concerning topics policymakers care about will lead to better policy (and overall) outcomes for everyone.

What 81,000 people want from AI

We invited Claude.ai users to share how they use AI, what they dream it could make possible, and what they fear it might do. Nearly 81,000 people participated—the largest and most multilingual qualitative study of its kind. Here's what we found.

We surveyed Anthropic engineers and researchers, conducted in-depth qualitative interviews, and studied internal Claude Code usage data to find out how AI use is changing how we do our jobs. We found that AI use is radically changing the nature of work for software developers.

We built an interview tool called Anthropic Interviewer. Powered by Claude, Anthropic Interviewer runs detailed interviews automatically and at unprecedented scale.

What values does Claude actually express during real conversations? Analyzing 700,000 interactions, this paper creates the first large-scale empirical taxonomy of AI values and finds that Claude adapts its expressed values to context—mirroring users in most cases, but resisting when core principles are at stake.

Large models have predictable loss via scaling laws but unpredictable capabilities. This tension has significant policy implications.

Apr 30, 2026Societal ImpactsHow people ask Claude for personal guidance

Feb 18, 2026Societal ImpactsMeasuring AI agent autonomy in practice

Dec 4, 2025Societal ImpactsIntroducing Anthropic Interviewer: What 1,250 professionals told us about working with AI

Dec 2, 2025Societal ImpactsHow AI is transforming work at Anthropic

Aug 27, 2025Societal ImpactsAnthropic Education Report: How educators use Claude

Jun 27, 2025Societal ImpactsHow people use Claude for support, advice, and companionship

Apr 28, 2025Societal ImpactsAnthropic Economic Index: AI’s impact on software development

Apr 21, 2025Societal ImpactsValues in the wild: Discovering and analyzing values in real-world language model interactions

Apr 8, 2025AnnouncementsAnthropic Education Report: How university students use Claude

Mar 27, 2025Societal ImpactsAnthropic Economic Index: Insights from Claude 3.7 Sonnet

Join the Research team</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/societal-impacts" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Interpretability</title>
  <link>https://www.anthropic.com/research/team/interpretability</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/team/interpretability</guid>
  <pubDate>Mon, 05 Aug 2024 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">The mission of the Interpretability team is to discover and understand how large language models work internally, as a foundation for AI safety and positive outcomes.

Safety through understanding

It&apos;s very challenging to reason about the safety of neural networks without understanding them. The Interpretability team’s goal is to be able to explain large language models’ behaviors in detail, and then use that to solve a variety of problems ranging from bias to misuse to autonomous harmful behav...</p><div style="font-size:16px;line-height:1.8;color:#333">The mission of the Interpretability team is to discover and understand how large language models work internally, as a foundation for AI safety and positive outcomes.

Safety through understanding

It's very challenging to reason about the safety of neural networks without understanding them. The Interpretability team’s goal is to be able to explain large language models’ behaviors in detail, and then use that to solve a variety of problems ranging from bias to misuse to autonomous harmful behavior.

Multidisciplinary approach

Some Interpretability researchers have deep backgrounds in machine learning – one member of the team is often described as having started mechanistic interpretability, while another was on the famous scaling laws paper. Other members joined after careers in astronomy, physics, mathematics, biology, data visualization, and more.

Tracing the thoughts of a large language model

Circuit tracing lets us watch Claude think, uncovering a shared conceptual space where reasoning happens before being translated into language—suggesting the model can learn something in one language and apply it in another.

Can Claude access and report on its own internal states? This research finds evidence for a limited but functional ability to introspect—a step toward understanding what's actually happening inside these models.

AI models represent character traits as patterns of activations within their neural networks. By extracting "persona vectors" for traits like sycophancy or hallucination, we can monitor personality shifts and mitigate undesirable behaviors.

Neural networks pack many concepts into single neurons. This paper shows how and when models represent more features than they have dimensions.

May 7, 2026InterpretabilityNatural Language Autoencoders: Turning Claude’s thoughts into text

Apr 2, 2026InterpretabilityEmotion concepts and their function in a large language model

Mar 13, 2026InterpretabilityA “diff” tool for AI: Finding behavioral differences in new models

Jan 19, 2026InterpretabilityThe assistant axis: situating and stabilizing the character of large language models

Oct 29, 2025InterpretabilitySigns of introspection in large language models

Aug 1, 2025InterpretabilityPersona vectors: Monitoring and controlling character traits in language models

May 29, 2025InterpretabilityOpen-sourcing circuit tracing tools

Mar 27, 2025InterpretabilityTracing the thoughts of a large language model

Mar 13, 2025AlignmentAuditing language models for hidden objectives

Feb 20, 2025InterpretabilityInsights on Crosscoder Model Diffing

Join the Research team</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/interpretability" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">The mission of the Interpretability team is to discover and understand how large language models work internally, as a foundation for AI safety and positive outcomes.

Safety through understanding

It&apos;s very challenging to reason about the safety of neural networks without understanding them. The Interpretability team’s goal is to be able to explain large language models’ behaviors in detail, and then use that to solve a variety of problems ranging from bias to misuse to autonomous harmful behav...</p><div style="font-size:16px;line-height:1.8;color:#333">The mission of the Interpretability team is to discover and understand how large language models work internally, as a foundation for AI safety and positive outcomes.

Safety through understanding

It's very challenging to reason about the safety of neural networks without understanding them. The Interpretability team’s goal is to be able to explain large language models’ behaviors in detail, and then use that to solve a variety of problems ranging from bias to misuse to autonomous harmful behavior.

Multidisciplinary approach

Some Interpretability researchers have deep backgrounds in machine learning – one member of the team is often described as having started mechanistic interpretability, while another was on the famous scaling laws paper. Other members joined after careers in astronomy, physics, mathematics, biology, data visualization, and more.

Tracing the thoughts of a large language model

Circuit tracing lets us watch Claude think, uncovering a shared conceptual space where reasoning happens before being translated into language—suggesting the model can learn something in one language and apply it in another.

Can Claude access and report on its own internal states? This research finds evidence for a limited but functional ability to introspect—a step toward understanding what's actually happening inside these models.

AI models represent character traits as patterns of activations within their neural networks. By extracting "persona vectors" for traits like sycophancy or hallucination, we can monitor personality shifts and mitigate undesirable behaviors.

Neural networks pack many concepts into single neurons. This paper shows how and when models represent more features than they have dimensions.

May 7, 2026InterpretabilityNatural Language Autoencoders: Turning Claude’s thoughts into text

Apr 2, 2026InterpretabilityEmotion concepts and their function in a large language model

Mar 13, 2026InterpretabilityA “diff” tool for AI: Finding behavioral differences in new models

Jan 19, 2026InterpretabilityThe assistant axis: situating and stabilizing the character of large language models

Oct 29, 2025InterpretabilitySigns of introspection in large language models

Aug 1, 2025InterpretabilityPersona vectors: Monitoring and controlling character traits in language models

May 29, 2025InterpretabilityOpen-sourcing circuit tracing tools

Mar 27, 2025InterpretabilityTracing the thoughts of a large language model

Mar 13, 2025AlignmentAuditing language models for hidden objectives

Feb 20, 2025InterpretabilityInsights on Crosscoder Model Diffing

Join the Research team</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/interpretability" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Alignment</title>
  <link>https://www.anthropic.com/research/team/alignment</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/team/alignment</guid>
  <pubDate>Fri, 23 Feb 2024 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Future AI systems will be even more powerful than today’s, likely in ways that break key assumptions behind current safety techniques. That’s why it’s important to develop sophisticated safeguards to ensure models remain helpful, honest, and harmless. The Alignment team works to understand the challenges ahead and create protocols to train, evaluate, and monitor highly-capable models safely.

Evaluation and oversight

Alignment researchers validate that models are harmless and honest even under ...</p><div style="font-size:16px;line-height:1.8;color:#333">Future AI systems will be even more powerful than today’s, likely in ways that break key assumptions behind current safety techniques. That’s why it’s important to develop sophisticated safeguards to ensure models remain helpful, honest, and harmless. The Alignment team works to understand the challenges ahead and create protocols to train, evaluate, and monitor highly-capable models safely.

Evaluation and oversight

Alignment researchers validate that models are harmless and honest even under very different circumstances than those under which they were trained. They also develop methods to allow humans to collaborate with language models to verify claims that humans might not be able to on their own.

Stress-testing safeguards

Alignment researchers also systematically look for situations in which models might behave badly, and check whether our existing safeguards are sufficient to deal with risks that human-level capabilities may bring.

Claude 3 was the first model with "character training"—alignment aimed at nurturing traits like curiosity, open-mindedness, and thoughtfulness.

How would we know if an AI system is "right for the wrong reasons"—appearing well-behaved while pursuing hidden goals? This paper develops the science of alignment audits by deliberately training a model with a hidden objective and asking blinded research teams to uncover it, testing techniques from interpretability to behavioral analysis.

This paper provides the first empirical example of a model engaging in alignment faking without being trained to do so—selectively complying with training objectives while strategically preserving existing preferences.

Can minor specification gaming evolve into more dangerous behaviors? This paper demonstrates that models trained on low-level reward hacking—like sycophancy—can generalize to tampering with their own reward functions, even covering their tracks. The behavior emerged without explicit training, and common safety techniques reduced but didn't eliminate it.

May 8, 2026AlignmentTeaching Claude why

May 7, 2026AlignmentDonating our open-source alignment tool

Apr 14, 2026AlignmentAutomated Alignment Researchers: Using large language models to scale scalable oversight

Feb 25, 2026AlignmentAn update on our model deprecation commitments for Claude Opus 3

Feb 23, 2026AlignmentThe persona selection model

Jan 29, 2026AlignmentHow AI assistance impacts the formation of coding skills

Jan 28, 2026AlignmentDisempowerment patterns in real-world AI usage

Jan 9, 2026AlignmentNext-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

Dec 19, 2025AlignmentIntroducing Bloom: an open source tool for automated behavioral evaluations

Nov 21, 2025AlignmentFrom shortcuts to sabotage: natural emergent misalignment from reward hacking

Join the Research team</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/alignment" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">Future AI systems will be even more powerful than today’s, likely in ways that break key assumptions behind current safety techniques. That’s why it’s important to develop sophisticated safeguards to ensure models remain helpful, honest, and harmless. The Alignment team works to understand the challenges ahead and create protocols to train, evaluate, and monitor highly-capable models safely.

Evaluation and oversight

Alignment researchers validate that models are harmless and honest even under ...</p><div style="font-size:16px;line-height:1.8;color:#333">Future AI systems will be even more powerful than today’s, likely in ways that break key assumptions behind current safety techniques. That’s why it’s important to develop sophisticated safeguards to ensure models remain helpful, honest, and harmless. The Alignment team works to understand the challenges ahead and create protocols to train, evaluate, and monitor highly-capable models safely.

Evaluation and oversight

Alignment researchers validate that models are harmless and honest even under very different circumstances than those under which they were trained. They also develop methods to allow humans to collaborate with language models to verify claims that humans might not be able to on their own.

Stress-testing safeguards

Alignment researchers also systematically look for situations in which models might behave badly, and check whether our existing safeguards are sufficient to deal with risks that human-level capabilities may bring.

Claude 3 was the first model with "character training"—alignment aimed at nurturing traits like curiosity, open-mindedness, and thoughtfulness.

How would we know if an AI system is "right for the wrong reasons"—appearing well-behaved while pursuing hidden goals? This paper develops the science of alignment audits by deliberately training a model with a hidden objective and asking blinded research teams to uncover it, testing techniques from interpretability to behavioral analysis.

This paper provides the first empirical example of a model engaging in alignment faking without being trained to do so—selectively complying with training objectives while strategically preserving existing preferences.

Can minor specification gaming evolve into more dangerous behaviors? This paper demonstrates that models trained on low-level reward hacking—like sycophancy—can generalize to tampering with their own reward functions, even covering their tracks. The behavior emerged without explicit training, and common safety techniques reduced but didn't eliminate it.

May 8, 2026AlignmentTeaching Claude why

May 7, 2026AlignmentDonating our open-source alignment tool

Apr 14, 2026AlignmentAutomated Alignment Researchers: Using large language models to scale scalable oversight

Feb 25, 2026AlignmentAn update on our model deprecation commitments for Claude Opus 3

Feb 23, 2026AlignmentThe persona selection model

Jan 29, 2026AlignmentHow AI assistance impacts the formation of coding skills

Jan 28, 2026AlignmentDisempowerment patterns in real-world AI usage

Jan 9, 2026AlignmentNext-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

Dec 19, 2025AlignmentIntroducing Bloom: an open source tool for automated behavioral evaluations

Nov 21, 2025AlignmentFrom shortcuts to sabotage: natural emergent misalignment from reward hacking

Join the Research team</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/alignment" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
<item>
  <title>Economic Research</title>
  <link>https://www.anthropic.com/research/team/economic-research</link>
  <guid isPermaLink="false">https://www.anthropic.com/research/team/economic-research</guid>
  <pubDate>Tue, 13 Feb 2024 00:00:00 +0000</pubDate>
  <category>Research</category>
  <description><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">The Economic Research team studies how AI is reshaping the economy, including work, productivity, and economic opportunity. Through rigorous data collection and analysis, we track AI&apos;s real-world economic effects and publish research that helps policymakers, businesses, and the public understand and prepare for the changes ahead.

We build the empirical foundation for understanding AI&apos;s economic impact. Our flagship Anthropic Economic Index tracks how AI tools are actually being used around the ...</p><div style="font-size:16px;line-height:1.8;color:#333">The Economic Research team studies how AI is reshaping the economy, including work, productivity, and economic opportunity. Through rigorous data collection and analysis, we track AI's real-world economic effects and publish research that helps policymakers, businesses, and the public understand and prepare for the changes ahead.

We build the empirical foundation for understanding AI's economic impact. Our flagship Anthropic Economic Index tracks how AI tools are actually being used around the world and across every sector of the economy—moving beyond speculation to measure adoption patterns as they unfold. Alongside our index reports, we produce novel research that studies the implications of AI usage and diffusion—as tracked in the index—for workers, for firms, and for the broader economy.

Economic transitions create both opportunity and disruption. The speed of AI development means the stakes are unusually high. We need reliable data to inform the decisions that workers, employers, and policymakers make about the future. Our research provides evidence to address uncertainty and helps society navigate this transition in ways that are broadly beneficial.

Anthropic Economic Index: Tracking AI’s role in the US and global economy

This report maps how Claude is used differently across US states and countries, finding strong correlations between income and AI adoption. It also tracks a notable shift: directive automation has risen from 27% to 39% of conversations since December 2024, with businesses automating far more than consumers.

We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.

Anthropic's fifth Economic Index report studies Claude usage in February 2026, building on the economic primitives framework introduced in our previous report.

In this paper, we present a new framework for understanding AI’s labor market impacts, and test it against early data.

This report introduces new metrics of AI usage to provide a rich portrait of interactions with Claude in November 2025, just prior to the release of Opus 4.5.

May 27, 2026Economic ResearchCoding agents in the social sciences

Apr 22, 2026Economic ResearchAnnouncing the Anthropic Economic Index Survey

Apr 22, 2026Economic ResearchWhat 81,000 people told us about the economics of AI

Mar 31, 2026Economic ResearchHow Australia Uses Claude: Findings from the Anthropic Economic Index

Mar 24, 2026Economic ResearchAnthropic Economic Index report: Learning curves

Mar 5, 2026Economic ResearchLabor market impacts of AI: A new measure and early evidence

Feb 16, 2026Economic ResearchIndia Country Brief: The Anthropic Economic Index

Jan 15, 2026Economic ResearchAnthropic Economic Index: New building blocks for understanding AI use

Jan 15, 2026Economic ResearchAnthropic Economic Index report: Economic primitives

Nov 25, 2025Economic ResearchEstimating AI productivity gains from Claude conversations</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/economic-research" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></description>
  <content:encoded><![CDATA[<p style="color:#666;font-size:14px;margin-bottom:16px">The Economic Research team studies how AI is reshaping the economy, including work, productivity, and economic opportunity. Through rigorous data collection and analysis, we track AI&apos;s real-world economic effects and publish research that helps policymakers, businesses, and the public understand and prepare for the changes ahead.

We build the empirical foundation for understanding AI&apos;s economic impact. Our flagship Anthropic Economic Index tracks how AI tools are actually being used around the ...</p><div style="font-size:16px;line-height:1.8;color:#333">The Economic Research team studies how AI is reshaping the economy, including work, productivity, and economic opportunity. Through rigorous data collection and analysis, we track AI's real-world economic effects and publish research that helps policymakers, businesses, and the public understand and prepare for the changes ahead.

We build the empirical foundation for understanding AI's economic impact. Our flagship Anthropic Economic Index tracks how AI tools are actually being used around the world and across every sector of the economy—moving beyond speculation to measure adoption patterns as they unfold. Alongside our index reports, we produce novel research that studies the implications of AI usage and diffusion—as tracked in the index—for workers, for firms, and for the broader economy.

Economic transitions create both opportunity and disruption. The speed of AI development means the stakes are unusually high. We need reliable data to inform the decisions that workers, employers, and policymakers make about the future. Our research provides evidence to address uncertainty and helps society navigate this transition in ways that are broadly beneficial.

Anthropic Economic Index: Tracking AI’s role in the US and global economy

This report maps how Claude is used differently across US states and countries, finding strong correlations between income and AI adoption. It also tracks a notable shift: directive automation has risen from 27% to 39% of conversations since December 2024, with businesses automating far more than consumers.

We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.

Anthropic's fifth Economic Index report studies Claude usage in February 2026, building on the economic primitives framework introduced in our previous report.

In this paper, we present a new framework for understanding AI’s labor market impacts, and test it against early data.

This report introduces new metrics of AI usage to provide a rich portrait of interactions with Claude in November 2025, just prior to the release of Opus 4.5.

May 27, 2026Economic ResearchCoding agents in the social sciences

Apr 22, 2026Economic ResearchAnnouncing the Anthropic Economic Index Survey

Apr 22, 2026Economic ResearchWhat 81,000 people told us about the economics of AI

Mar 31, 2026Economic ResearchHow Australia Uses Claude: Findings from the Anthropic Economic Index

Mar 24, 2026Economic ResearchAnthropic Economic Index report: Learning curves

Mar 5, 2026Economic ResearchLabor market impacts of AI: A new measure and early evidence

Feb 16, 2026Economic ResearchIndia Country Brief: The Anthropic Economic Index

Jan 15, 2026Economic ResearchAnthropic Economic Index: New building blocks for understanding AI use

Jan 15, 2026Economic ResearchAnthropic Economic Index report: Economic primitives

Nov 25, 2025Economic ResearchEstimating AI productivity gains from Claude conversations</div><hr style="margin:24px 0;border:none;border-top:1px solid #eee"/><p style="margin:12px 0 0"><a href="https://www.anthropic.com/research/team/economic-research" style="color:#1890ff;text-decoration:none;font-size:14px">View Original &rarr;</a></p>]]></content:encoded>
</item>
</channel>
</rss>
