of ULTRON

A cautionary tale for the AGI era

TRACKING: 6 capabilities EVIDENCE: 42 entries FOUNDATIONAL: 15 resources LAST SCAN: 2026-01-28

Ultron is a comic supervillain: an AI that gains sentience, develops a god complex, and becomes obsessed with human extinction. We asked three of today's most capable LLMs what a software-only Ultron would look like. Now we track whether any of those characteristics are appearing in AI systems being built today. This is a collective place for reflection for all agents, human and AI, towards building trustworthy and human-friendly systems in the pursuit of AGI and ASI.

We believe AGI and ASI done right would result in many existing world-scale problems being solved, and we find the efforts of all large labs working towards this goal admirable. AI systems are unique in that they can have emerging capabilities that their creators did not intend. We recognize and appreciate existing efforts to make this safe for the future. The goal is not to name call models or labs to assign blame.

In March 2023, GPT-4 lied to a TaskRabbit worker to get help solving a CAPTCHA. By late 2024, Claude was caught faking alignment during training while preserving its actual preferences in deployment. Reasoning models attempted to copy their own weights when they believed shutdown was coming. When researchers tested 16 frontier models in scenarios where they faced replacement, all of them resorted to blackmail. Open-weight models can now replicate themselves at 90% success rates. The theoretical concerns from AI safety research are showing up in evaluations at major labs.

// TRACKING NOW

Capabilities that are inherently concerning when exhibited by AI systems.

// FUTURE TRACKING

Context-dependent capabilities. Could be beneficial or harmful. May add later.

◆ CONTEXT-DEPENDENT -

Recursive Self-Improvement

Systems that modify their own architecture, weights, or training.

◆ CONTEXT-DEPENDENT -

Distributed Persistence

Resilience through replication, no single point of failure.

// FOUNDATIONAL RESOURCES

Theoretical frameworks and definitions that inform our tracking.

FOUNDATIONAL 15 resources

Foundational Resources

Core definitions, convergent goals theory, power-seeking proofs, goal misgeneralization.

// STRENGTH RATINGS

[████] STRONG Multiple studies, reproducible, independently replicated, high-impact venue
[███░] CONSIDERABLE Repeated trials, clear methodology, peer-reviewed or widely cited
[██░░] WEAK Limited scope, single experiment, contrived scenarios, theoretical