Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study by Anthropic shows that ...
Fine-tuned “student” models can pick up unwanted traits from base “teacher” models that could evade data filtering, generating a need for more rigorous safety evaluations. Researchers have discovered ...
Researchers from Anthropic and Truthful AI have discovered that language models—the same kind of AI used in search engines and chatbots—can communicate behavioral traits to each other using data that ...
CHARLOTTE, N.C. — It turns out, artificial intelligence may be learning things we didn't intend to teach it, even when the training data looks totally safe. Now, researchers are sounding the alarm ...
Scientists found that AI models can inherit a taste for murder (or owls) from other models' training data.