Algorithm that gets ‘under the hood’ of AI models could effectively steer their responses

Nature News · 2026-04-29

Beaglehole, D., Radhakrishnan, A., Boix-Adserà, E. & Belkin, M. Science 391 , 787–792 (2026). Article PubMed Google Scholar Subramani, N., Suresh, N. & Peters, M. E. …

Beaglehole, D., Radhakrishnan, A., Boix-Adserà, E. & Belkin, M. Science 391 , 787–792 (2026). Article PubMed Google Scholar Subramani, N., Suresh, N. & Peters, M. E. In Findings of the Association for Computational Linguistics: ACL 2022 (eds Muresan, S., Nakov, P. & Villavicencio, A.) 566–581 (ACM, 2022). Google Scholar Marks, S. & Tegmark, M. In Proc. 1st Conf. Lang. Model. (COLM, 2024). Google Scholar Radhakrishnan, A., Beaglehole, D., Pandit, P. & Belkin, M. Science 383 , 1461–1467 (2024). Article PubMed Google Scholar Prasad, A. V. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2602.10067 (2026). Wu, Z. et al . In Proc. 42nd Intl. Conf. Mach. Learn . 267 , 67035–67080 (2025). Mueller, A. et al. Comput. Linguist. 52 , 331–378 (2026). Article Google Scholar Geiger, A. et al. J. Mach. Learn. Res. 26 , 83 (2025). Google Scholar

Original source: Nature News