Large language models (LLMs) have made remarkable progress in recent years. But understanding how they work remains a challenge and scientists at artificial intelligence labs are trying to peer into ...
Bhalla, Usha, Alex Oesterling, Claudio Mayrink Verdun, Himabindu Lakkaraju, and Flavio Calmon. "Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability." ...
For large language models (LLMs) like ChatGPT, accuracy often means complexity. To be able to make good predictions, ChatGPT must deeply understand the concepts and features that are associated with ...
Jiaxun Li, Aaron, Suraj Srinivas, Usha Bhalla, and Himabindu Lakkaraju. "Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders." Proceedings of the Conference of the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results