• theunknownmuncher@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    11 hours ago

    including that the model could follow instructions that encouraged it to break out of a virtual sandbox.

    “The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards,” Anthropic recounted in its safety card.

    📖👀

    Yes, it did.