Bounties

If there’s a result you’d like to replicate or see replicated which isn’t listed here, please contact us.

More bounties are coming soon…

How Does Time Horizon Vary Across Domains?

Extends METR’s software engineering time horizon analysis to other domains.

Subliminal Learning

Showed that training on model outputs sometimes transmits behavior unrelated to the content of the outputs.

Distillation Robustifies Unlearning

Showed that distillation makes unlearning techniques more robust.

Alignment Faking

Demonstrated that current models can fake alignment.

Demonstrated foundational results on controlling misaligned models.