Bounties

If there’s a result you’d like to replicate or see replicated which isn’t listed here, please contact us.

More bounties are coming soon…

How Does Time Horizon Vary Across Domains?
$1K
Extends METR’s software engineering time horizon analysis to other domains.
Subliminal Learning
$500
Showed that training on model outputs sometimes transmits behavior unrelated to the content of the outputs.
Distillation Robustifies Unlearning
$1K
Showed that distillation makes unlearning techniques more robust.
Alignment Faking
$1K
Demonstrated that current models can fake alignment.
AI Control
$1K
Demonstrated foundational results on controlling misaligned models.