Boxin Wang

汪博欣

Research Scientist
Applied Deep Learning Research, NVIDIA

Contact: boxinw@nvidia.com
[Google Scholar] [GitHub] [Linkedln]

I am a Research Scientist at Applied Deep Learning Research (ADLR) team, NVIDIA. I obtained my Ph.D. degree from the Computer Science department of University of Illinois, Urbana-Champaign (UIUC). During my Ph.D., I was supervised by Prof. Bo Li.

My research vision is to develop practical and scalable large language models (LLMs) and close the trustworthiness gap. My research interests lie in but not limited to:

  • Retrieval-augmented generation (RAG)
  • Trustworthiness and alignment
  • Multi-modal language modeling

Boxin Wang

News


Publications

2024
NVLM: Open Frontier-Class Multimodal LLMs Preprint 2024
Wenliang Dai*, Nayeon Lee*, Boxin Wang*, Zhuoling Yang*, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping*. (equal contribution, listed alphabetically) [PDF][project]
RankRAG: Unifying Retrieval-Augmented Generation and Context Ranking in LLMs NeurIPS 2024
Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, Bryan Catanzaro [PDF]
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining ICML 2024
Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro. [PDF][code]
Can Public Large Language Models Help Private Cross-device Federated Learning? NAACL 2024 (Findings)
Boxin Wang, Yibo Jacky Zhang, Yuan Cao, Bo Li, H. Brendan McMahan, Sewoong Oh, Zheng Xu, Manzil Zaheer [PDF]
UNICORN: A Unified Causal Video-Oriented Language-Modeling Framework for Temporal Video-Language Tasks EMNLP 2024
Yuanhao Xiong, Yixin Nie, Haotian Liu, Boxin Wang, Jun Chen, Rong Jin, Cho-Jui Hsieh, Lorenzo Torresani, Jie Lei [PDF]
2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models NeurIPS 2023 (Outstanding Paper)
Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li. [PDF][website][code]
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study EMNLP 2023
Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro [PDF][code]
2022
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models NeurIPS 2022
Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro [PDF][code]
Improving Certified Robustness via Statistical Learning with Logical Reasoning NeurIPS 2022
Zhuolin Yang, Zhikuan Zhao, Boxin Wang, Jiawei Zhang, Linyi Li, Hengzhi Pei, Bojan Karlaš, Ji Liu, Heng Guo, Ce Zhang, Bo Li [PDF]
SemAttack: Natural Textual Attacks via Different Semantic Spaces NAACL 2022 (Findings)
Boxin Wang*, Chejian Xu*, Xiangyu Liu, Yu Cheng, Bo Li [PDF] [Code]
Certifying Out-of-Domain Generalization for Blackbox Functions ICML 2022
Maurice Weber, Linyi Li, Boxin Wang, Zhikuan Zhao, Bo Li, Ce Zhang [PDF]
2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models NeurIPS 2021 (Oral)
Boxin Wang*, Chejian Xu*, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li [PDF] [Dataset]
G-PATE: Scalable Differentially Private Data Generator via Private Aggregation
of Teacher Discriminators
NeurIPS 2021
Yunhui Long*, Boxin Wang*, Zhuolin Yang, Bhavya Kailkhura, Aston Zhang, Carl A. Gunter, Bo Li [PDF] [Code]
DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation CCS 2021
Boxin Wang*, Fan Wu*, Yunhui Long*, Luka Rimanic, Ce Zhang, Bo Li [PDF] [Code]
Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability ICML 2021
Kaizhao Liang*, Jacky Zhang*, Boxin Wang, Zhuolin Yang, Sanmi Koyejo, Bo Li [PDF] [Code]
Counterfactual Adversarial Learning with Representation Interpolation EMNLP 2021
Wei Wang, Boxin Wang, Ning Shi, Jinfeng Li, Bingyu Zhu, Xiangyu Liu, Rong Zhang [PDF] [Code]
Incorporating External POS Tagger for Punctuation Restoration Interspeech 2021
Ning Shi, Wei Wang, Boxin Wang, Jinfeng Li, Xiangyu Liu, Zhouhan Lin [PDF][Code]
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective ICLR 2021
Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu [PDF][Code]
2020
T3: Tree-Autoencoder Regularized Adversarial Text Generation for Targeted Attack EMNLP 2020
Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang and Bo Li [PDF][Code]
Reinforcement-Learning based Portfolio Management with Augmented Asset Movement
Prediction States
AAAI 2020
Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Jun Xiao, Bo Li [PDF]
2019
Efficient task-specific data valuation for nearest neighbor algorithms PVLDB 2019
Ruoxi Jia*, David Dao*, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Costas J. Spanos, Dawn Song   [PDF]
Towards efficient data valuation based on the shapley value AISTATS 2019
Ruoxi Jia*, David Dao*, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Dawn Song, Costas J. Spanos   [PDF]

Experiences


Personal

I like photographing. I put my photos on 500px.