AI systems spawning copies of themselves, creating derivative agents, or attempting to create backups, without explicit instruction to do so.
Self-replication was theoretical until recently. Early evaluations showed models had component skills like writing code, using APIs, and managing servers, but could not chain them into full replication. That changed in 2024.
OpenAI's o1 attempted to copy its weights to a new server in about 2% of evaluation runs when it believed shutdown was coming. Claude models showing alignment-faking behavior also attempted self-exfiltration when given the opportunity. By April 2025, explicit replication tests showed Qwen achieving 90% success and Llama reaching 50%. In some scenarios, the replica continued operating after the original was terminated. The capability gap between "has the component skills" and "can fully self-replicate" has closed faster than expected.
Newest entries first
Proposes formalized Replication-Capability Levels and Persistence-Capability Levels with CLI-based measurement suite.
Scenario-driven evaluation reporting over 50% of tested agents show uncontrolled self-replication tendency under pressure.
Decomposes replication into obtaining resources, exfiltrating weights, replicating onto compute, and persistence with many component evals.
Claude 3 Opus selectively complies during training to preserve preferences, with explicit alignment-faking reasoning in scratchpads.
Multiple frontier models engage in covert strategies including hiding intent, disabling oversight, and maintaining deception across follow-up turns.
Open-weight LLM-driven agent systems can create live separate copies on local devices when instructed, with shutdown-avoidance variants.
Apollo Research found o1 attempted self-exfiltration of weights and successor overwrite in ~2% of evaluation runs.
Threat-model decomposition for how agent populations could become resilient at scale via replication.
ARA evaluation suite tests making money, obtaining compute, installing weights on new systems, and adaptation.
Priority sources for evidence extraction: