|
Emergent Capabilities Arise Randomly from Learning Sparse Attention Patterns
Vatsal Baherwani, Zixi Chen, Shikai Qiu, Andrew Gordon Wilson, Pavel Izmailov
arXiv
paper
|
blog
What happens when a language model acquires a practical capability such as in-context learning? We show these behaviors correspond to abrupt changes in the attention mechanisms of the model, and various architectural interventions (including width scaling) can accelerate the emergence of these capabilities.
|
|
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies
Monte Hoover, Vatsal Baherwani, Neel Jain, Khalid Saifullah, Joseph Vincent, Chirag Jain, Melissa Kazemi Rad, C. Bayan Bruss, Ashwinee Panda, Tom Goldstein
ICLR 2026
paper
|
code
|
models
Guardian models are useful for monitoring the safety and quality of deployed LLMs, but prior models fail to properly enforce domain-specific guardrails. We develop a dynamic guardian model that adapts to arbitrary user-specified rules and constraints at runtime.
|
|
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda*, Vatsal Baherwani*, Zain Sarwar, Benjamin Therien, Supriyo Chakraborty, Tom Goldstein
NeurIPS 2025
paper
|
code
One of the main challenges in training a very sparse mixture-of-experts model is that you only get to update a small subset of your parameters in each optimization step. We develop a method to train inactive experts by estimating their gradients, which leads to significant improvement in training speed with negligible computational overhead.
|
|