Yilun Zhou

I currently work at Salesforce as a research scientist. Before that, I worked at Amazon as an applied scientist. I received my Ph.D. degree at MIT Department of Electrical Engineering and Computer Science (EECS), advised by Prof. Julie Shah, from which I also received my Master of Science degree in 2019. I received my Bachelor of Science in Engineering degree from Duke University, with double major in Computer Sciene and Electrical & Computer Engineering. I worked with Prof. George Konidaris and Prof. Kris Hauser on my undergraduate research on robotics.

My long-term research goal is to make machine learning models and systems reliable and responsible. These days, I am mainly working on three particular directions:

LLM reasoning, especially around LLM-as-judges, [Show Only]
(Mechanistic) interpretability of LLM, [Show Only]
Trustworthy and societal implications for LLMs. [Show Only]

Publications [Show Highlighted]

^* Equal Contribution

On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization
Janvijay Singh, Austin Xu, Yilun Zhou, Yefan Zhou, Dilek Hakkani-Tur, Shafiq Joty
arXiv preprint: 2509.23542, 2025
[Paper] [BibTeX]
@article{singh2025shelf, title={On the Shelf Life of Fine-Tuned LLM Judges: Future Proofing, Backward Compatibility, and Question Generalization}, author={Singh, Janvijay and Xu, Austin and Zhou, Yilun and Zhou, Yefan and Hakkani-Tur, Dilek and Joty, Shafiq}, journal={arXiv preprint arXiv:2509.23542}, year={2025} }
Variation in Verification: Understanding Verification Dynamics in Large Language Models
Yefan Zhou, Austin Xu, Yilun Zhou, Janvijay Singh, Jiang Gui, Shafiq Joty
arXiv preprint: 2509.17995, 2025
[Paper] [BibTeX]
@article{zhou2025verification, title={Variation in Verification: Understanding Verification Dynamics in Large Language Models}, author={Zhou, Yefan and Xu, Austin and Zhou, Yilun and Singh, Janvijay and Gui, Jiang and Joty, Shafiq}, journal={arXiv preprint arXiv:2509.17995}, year={2025} }
All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens
Siddarth Mamidanna, Daking Rai, Ziyu Yao, Yilun Zhou
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
[Paper] [BibTeX]
@inproceedings{mamidanna2025all, title={All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens}, author={Mamidanna, Siddarth and Rai, Daking and Yao, Ziyu and Zhou, Yilun}, booktitle={Conference on Empirical Methods in Natural Language Processing}, year={2025} }
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
Pranav Narayanan Venkit, Philippe Laban, Yilun Zhou, Kung-Hsiang Huang, Yixin Mao, Chien-Sheng Wu
arXiv preprint: 2509.04499, 2025
[Paper] [BibTeX]
@article{venkit2025deeptrace, title={DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence}, author={Venkit, Pranav Narayanan and Laban, Philippe and Zhou, Yilun and Huang, Kung-Hsiang and Mao, Yixin and Wu, Chien-Sheng}, journal={arXiv preprint arXiv:2509.04499}, year={2025} }
Shared Imagination: LLMs Hallucinate Alike
Yilun Zhou, Caiming Xiong, Silvio Savarese, Chien-Sheng Wu
Transactions on Machine Learning Research (TMLR), 2025
[Paper] [Website] [BibTeX]
@article{zhou2025shared, title={Shared Imagination: LLMs Hallucinate Alike}, author={Zhou, Yilun and Xiong, Caiming and Savarese, Silvio and Wu, Chien-Sheng}, journal={Transactions on Machine Learning Research}, year={2025} }
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu, Yilun Zhou, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty
arXiv preprint: 2505.13346, 2025
[Paper] [BibTeX]
@article{xu2025j4r, title={J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization}, author={Xu, Austin and Zhou, Yilun and Nguyen, Xuan-Phi and Xiong, Caiming and Joty, Shafiq}, journal={arXiv preprint arXiv:2505.13346}, year={2025} }
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou^*, Austin Xu^*, Peifeng Wang, Caiming Xiong, Shafiq Joty
International Conference on Machine Learning (ICML), 2025
[Paper] [Code] [BibTeX]
@inproceedings{zhou2025jetts, title={Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators}, author={Zhou, Yilun and Xu, Austin and Wang, Peifeng and Xiong, Caiming and Joty, Shafiq}, booktitle={International Conference on Machine Learning}, year={2025} }
Search Engines in the AI Era: A Qualitative Understanding to the False Promise of Factual and Verifiable Source-Cited Responses in LLM-based Search
Pranav Narayanan Venkit, Philippe Laban, Yilun Zhou, Yixin Mao, Chien-Sheng Wu
ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2025
[Paper] [BibTeX]
@inproceedings{venkit2025search, title={Search Engines in the AI Era: A Qualitative Understanding to the False Promise of Factual and Verifiable Source-Cited Responses in LLM-Based Search}, author={Venkit, Pranav Narayanan and Laban, Philippe and Zhou, Yilun and Mao, Yixin and Wu, Chien-Sheng}, booktitle={ACM Conference on Fairness, Accountability, and Transparency}, year={2025} }
BingoGuard: LLM Content Moderation Tools with Risk Levels
Fan Yin, Philippe Laban, Xiangyu Peng, Yilun Zhou, Yixin Mao, Vaibhav Vats, Linnea Ross, Divyansh Agarwal, Caiming Xiong, Chien-Sheng Wu
International Conference on Learning Representations (ICLR), 2025
[Paper] [BibTeX]
@inproceedings{yin2025bingoguard, title={BingoGuard: LLM Content Moderation Tools with Risk Levels}, author={Yin, Fan and Laban, Philippe and Peng, Xiangyu and Zhou, Yilun and Mao, Yixin and Vats, Vaibhav and Ross, Linnea and Agarwal, Divyansh and Xiong, Caiming and Wu, Chien-Sheng}, booktitle={International Conference on Learning Representations}, year={2025} }
Direct Judgement Preference Optimization
Peifeng Wang^*, Austin Xu^*, Yilun Zhou, Caiming Xiong, Shafiq Joty
arXiv preprint: 2409.14664, 2024
[Paper] [BibTeX]
@article{wang2024direct, title={Direct Judgement Preference Optimization}, author={Wang, Peifeng and Xu, Austin and Zhou, Yilun and Xiong, Caiming and Joty, Shafiq}, journal={arXiv preprint arXiv:2409.14664}, year={2024} }
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Rithesh Murthy^*, Liangwei Yang^*, Juntao Tan^*, Tulika Manoj Awalgaonkar^*, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese
arXiv preprint: 2406.10290, 2024
[Paper] [BibTeX]
@article{murthy2024mobileaibench, title={MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases}, author={Murthy, Rithesh and Yang, Liangwei and Tan, Juntao and Awalgaonkar, Tulika Manoj and Zhou, Yilun and Heinecke, Shelby and Desai, Sachin and Wu, Jason and Xu, Ran and Tan, Sarah and Zhang, Jianguo and Liu, Zhiwei and Kokane, Shirley and Liu, Zuxin and Zhu, Ming and Wang, Huan and Xiong, Caiming and Savarese, Silvio}, journal={arXiv preprint arXiv:2406.10290}, year={2024} }
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai, Yilun Zhou, Shi Feng, Abulhair Saparov, Ziyu Yao
arXiv preprint: 2407.02646, 2024
[Paper] [Website] [BibTeX]
@article{rai2024practical, title={A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models}, author={Rai, Daking and Zhou, Yilun and Feng, Shi and Saparov, Abulhair and Yao, Ziyu}, journal={arXiv preprint arXiv:2407.02646}, year={2024} }
CHAMP: A Competition-Level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
Yujun Mao, Yoon Kim, Yilun Zhou
Annual Meeting of the Association for Computational Linguistics (ACL) Findings, 2024
Preliminary version in NeurIPS 2023 Workshop on Mathematical Reasoning and AI (MATH-AI)
[Paper] [Code] [Website] [BibTeX]
@inproceedings{mao2024champ, title={CHAMP: A Competition-Level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities}, author={Mao, Yujun and Kim, Yoon and Zhou, Yilun}, booktitle={Findings of the Association for Computational Linguistics: ACL 2024}, year={2024} }
Evaluating the Utility of Model Explanations for Model Development
Shawn Im, Jacob Andreas, Yilun Zhou
NeurIPS Workshop on Attributing Model Behavior at Scale (ATTRIB), 2023
[Paper] [BibTeX]
@inproceedings{im2023evaluating, title={Evaluating the Utility of Model Explanations for Model Development}, author={Im, Shawn and Andreas, Jacob and Zhou, Yilun}, booktitle={NeurIPS Workshop on Attributing Model Behavior at Scale}, year={2023} }
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou, Leilani H. Gilpin
arXiv preprint: 2310.11207, 2023
[Paper] [BibTeX]
@article{huang2023can, title={Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations}, author={Huang, Shiyuan and Mamidanna, Siddarth and Jangam, Shreedhar and Zhou, Yilun and Gilpin, Leilani H.}, journal={arXiv preprint arXiv:2310.11207}, year={2023} }
Iterative Partial Fulfillment of Counterfactual Explanations: Benefits and Risks
Yilun Zhou
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2023
[Paper] [BibTeX]
@inproceedings{zhou2023iterative, title={Iterative Partial Fulfillment of Counterfactual Explanations: Benefits and Risks}, author={Zhou, Yilun}, booktitle={AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society}, year={2023} }
Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques
Daking Rai, Bailin Wang, Yilun Zhou, Ziyu Yao
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
[Paper] [Code] [BibTeX]
@inproceedings{rai2023improving, title={Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques}, author={Rai, Daking and Wang, Bailin and Zhou, Yilun and Yao, Ziyu}, booktitle={Annual Meeting of the Association for Computational Linguistics}, year={2023} }
Techniques for Interpretability and Transparency of Black-Box Models
Yilun Zhou
MIT Ph.D. Thesis, 2023
[Thesis] [BibTeX]
@phdthesis{zhou2023techniques, title={Techniques for Interpretability and Transparency of Black-Box Models}, author={Zhou, Yilun}, year={2023}, school={Massachusetts Institute of Technology} }
The Solvability of Interpretability Evaluation Metrics
Yilun Zhou, Julie Shah
Conference of the European Chapter of the Association for Computational Linguistics (EACL) Findings, 2023
[Paper] [Code] [Website] [BibTeX]
@inproceedings{zhou2023solvability, title={The Solvability of Interpretability Evaluation Metrics}, author={Zhou, Yilun and Shah, Julie}, booktitle={Findings of the Association for Computational Linguistics: EACL 2023}, year={2023} }
Explaining Large Language Model-Based Neural Semantic Parsers
Daking Rai, Yilun Zhou, Bailin Wang, Ziyu Yao
AAAI Conference on Artificial Intelligence: Student Abstract and Poster Program, 2023
[Paper] [BibTeX]
@inproceedings{rai2023explaining, title={Explaining Large Language Model-Based Neural Semantic Parsers}, author={Rai, Daking and Zhou, Yilun and Wang, Bailin and Yao, Ziyu}, booktitle={AAAI Conference on Artificial Intelligence: Student Abstract and Poster Program}, year={2023} }
ExSum: From Local Explanations to Model Understanding
Yilun Zhou, Marco Tulio Ribeiro, Julie Shah
Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT), 2022
[Paper] [Code] [Video] [Website] [MIT News] [BibTeX]
@inproceedings{zhou2022exsum, title={ExSum: From Local Explanations to Model Understanding}, author={Zhou, Yilun and Ribeiro, Marco Tulio and Shah, Julie}, booktitle={Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies}, year={2022} }
The Irrationality of Neural Rationale Models
Yiming Zheng, Serena Booth, Julie Shah, Yilun Zhou
NAACL Workshop on Trustworthy Natural Language Processing (TrustNLP), 2022
[Paper] [Poster] [Code] [BibTeX]
@inproceedings{zheng2022irrationality, title={The Irrationality of Neural Rationale Models}, author={Zheng, Yiming and Booth, Serena and Shah, Julie and Zhou, Yilun}, booktitle={NAACL Workshop on Trustworthy Natural Language Processing}, year={2022} }
Do Feature Attribution Methods Correctly Attribute Features?
Yilun Zhou, Serena Booth, Marco Tulio Ribeiro, Julie Shah
AAAI Conference on Artificial Intelligence (AAAI), 2022
Preliminary version in NeurIPS 2021 Workshop on Explainable AI Approaches for Debugging and Diagnosis
[Paper] [Poster] [Code] [Video] [Website] [MIT News] [BibTeX]
@inproceedings{zhou2022feature, title={Do Feature Attribution Methods Correctly Attribute Features?}, author={Zhou, Yilun and Booth, Serena and Ribeiro, Marco Tulio and Shah, Julie}, booktitle={AAAI Conference on Artificial Intelligence}, year={2022} }
Long-Term Resource Allocation Fairness in Average Markov Decision Process (AMDP) Environment
Ganesh Ghalme^*, Vineet Nair^*, Vishakha Patil^*, Yilun Zhou^*
International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2022
[Paper] [Code] [Website] [BibTeX]
@inproceedings{ghalme2022long, title={Long-Term Resource Allocation Fairness in Average Markov Decision Process (AMDP) Environment}, author={Ghalme, Ganesh and Nair, Vineet and Patil, Vishakha and Zhou, Yilun}, booktitle={International Conference on Autonomous Agents and Multi-Agent Systems}, year={2022} }
Latent Space Alignment Using Adversarially Guided Self-Play
Mycal Tucker, Yilun Zhou, Julie Shah
International Journal of Human-Computer Interaction (IJHCI), 2022
[Paper] [BibTeX]
@article{tucker2022latent, title={Latent Space Alignment Using Adversarially Guided Self-Play}, author={Tucker, Mycal and Zhou, Yilun and Shah, Julie}, journal={International Journal of Human-Computer Interaction}, year={2022} }
RoCUS: Robot Controller Understanding via Sampling
Yilun Zhou, Serena Booth, Nadia Figueroa, Julie Shah
Conference on Robot Learning (CoRL), 2021
[Paper] [Poster] [Code] [Video] [Website] [BibTeX]
@inproceedings{zhou2021rocus, title={RoCUS: Robot Controller Understanding via Sampling}, author={Zhou, Yilun and Booth, Serena and Figueroa, Nadia and Shah, Julie}, booktitle={Conference on Robot Learning}, year={2021} }
Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example
Serena Booth^*, Yilun Zhou^*, Ankit Shah, Julie Shah
AAAI Conference on Artificial Intelligence (AAAI), 2021
Preliminary version in AAAI 2020 Workshop on Statistical Relational AI
[Paper] [Poster] [Code] [MIT News] [BibTeX]
@inproceedings{booth2021bayes, title={Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example}, author={Booth, Serena and Zhou, Yilun and Shah, Ankit and Shah, Julie}, booktitle={AAAI Conference on Artificial Intelligence}, year={2021} }
Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms
Yilun Zhou, Adithya Renduchintala, Xian Li, Sida Wang, Yashar Mehdad, Asish Ghoshal
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
[Paper] [Poster] [Code] [Video] [BibTeX]
@inproceedings{zhou2021optimal, title={Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms}, author={Zhou, Yilun and Renduchintala, Adithya and Li, Xian and Wang, Sida and Mehdad, Yashar and Ghoshal, Asish}, booktitle={International Conference on Artificial Intelligence and Statistics}, year={2021} }
Learning Household Task Knowledge from WikiHow Descriptions
Yilun Zhou, Julie Shah, Steven Schockaert
International Joint Conference on Artificial Intelligence (IJCAI) Workshop on Semantic Deep Learning, 2019
[Paper] [Code] [BibTeX]
@inproceedings{zhou2019learning, title={Learning Household Task Knowledge from WikiHow Descriptions}, author={Zhou, Yilun and Shah, Julie and Schockaert, Steven}, booktitle={International Joint Conference on Artificial Intelligence Workshop on Semantic Deep Learning}, year={2019} }
Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness
Yilun Zhou, Steven Schockaert, Julie Shah
The Web Conference (WWW), 2019
[Paper] [Code] [BibTeX]
@inproceedings{zhou2019predicting, title={Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness}, author={Zhou, Yilun and Schockaert, Steven and Shah, Julie}, booktitle={The Web Conference}, year={2019} }
Representing, Learning, and Controlling Complex Object Interactions
Yilun Zhou, Benjamin Burchfiel, George Konidaris
Autonomous Robots (AuRo), 2018
Original version in Robotics: Science and Systems (RSS), 2016
[Paper] [Video] [BibTeX]
@article{zhou2018representing, title={Representing, Learning, and Controlling Complex Object Interactions}, author={Zhou, Yilun and Burchfiel, Benjamin and Konidaris, George}, journal={Autonomous Robots}, year={2018} }
6DOF Grasp Planning by Optimizing a Deep Learning Scoring Function
Yilun Zhou, Kris Hauser
Robotics: Science and Systems (RSS) Workshop on Revisiting Contact - Turning a Problem into a Solution, 2017
[Paper] [Poster] [BibTeX]
@inproceedings{zhou2017grasp, title={6DOF Grasp Planning by Optimizing a Deep Learning Scoring Function}, author={Zhou, Yilun and Hauser, Kris}, booktitle={Robotics: Science and Systems Workshop on Revisiting Contact - Turning a Problem into a Solution}, year={2017} }
Incorporating Side-Channel Information into Convolutional Neural Networks for Robotic Tasks
Yilun Zhou, Kris Hauser
IEEE International Conference on Robotics and Automation (ICRA), 2017
[Paper] [Code] [BibTeX]
@inproceedings{zhou2017incorporating, title={Incorporating Side-Channel Information into Convolutional Neural Networks for Robotic Tasks}, author={Zhou, Yilun and Hauser, Kris}, booktitle={IEEE International Conference on Robotics and Automation}, year={2017} }
Asymptotically Optimal Planning by Feasible Kinodynamic Planning in a State-Cost Space
Kris Hauser, Yilun Zhou
IEEE Transactions on Robotics (TRO), 2016
[Paper] [Code] [Website] [BibTeX]
@article{hauser2016asymptotically, title={Asymptotically Optimal Planning by Feasible Kinodynamic Planning in a State-Cost Space}, author={Hauser, Kris and Zhou, Yilun}, journal={IEEE Transactions on Robotics}, year={2016} }