*: Equal Contributions; ✝: My Advisee
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Tianyu Zheng*✝, Ge Zhang*, Tianhao Shen*, Xueling Liu*, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue arXiv 2024
[Project Page] [Huggingface Dataset] [Huggingface Models] [GitHub] -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
CVPR 2024 (Oral: 90/11,532=0.8%)
[Project Page] [Huggingface Dataset] [Code] -
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Xiang Yue*, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen*
ICLR 2024 Spotlight
[Project Page] [Huggingface Models] [Huggingface Dataset] [Code] -
Automatic Evaluation of Attribution by Large Language Models
Xiang Yue, Boshi Wang, Kai Zhang, Ziru Chen, Yu Su, Huan Sun
EMNLP 2023, Findings
[Huggingface Dataset] [Code] -
Can ChatGPT Defend the Truth? Automatic Dialectical Evaluation Elicits LLMs’ Deficiencies in Reasoning
Boshi Wang, Xiang Yue, Huan Sun
EMNLP 2023, Findings
-
Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe
Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023 Main Conference)
(Best Paper Honorable Mention) [Code] -
Synthetic Question Value Estimation for Domain Adaptation of Question Answering
Xiang Yue, Ziyu Yao, Huan Sun
60th Annual Meeting of the Association for Computational Linguistics (ACL 2022 Main Conference)
[arXiv version] [Code]
-
C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References
Xiang Yue, Xiaoman Pan, Wenlin Yao, Dian Yu, Dong Yu, Jianshu Chen
60th Annual Meeting of the Association for Computational Linguistics (ACL 2022 Main Conference)
[arXiv version] [Code] [Dataset]
-
CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering
Xiang Yue*, Frederick Zhang*, Ziyu Yao, Simon Lin, Huan Sun
IEEE Internatinal Conference on Bioinformatics and Biomedicine 2021 (BIBM 2021)
(Best Paper Award) [arXiv version] [Code]
Poster Version in Machine Learning for Health Workshop at NeurIPS 2020
-
COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval
Frederick Zhang, Heming Sun, Xiang Yue, Simon Lin and Huan Sun
The 2021 Conference on Empirical Methods in Natural Language Processing
(EMNLP 2021)
[Dataset] -
Differential Privacy for Text Analytics via Natural Text Sanitization
Xiang Yue*, Minxin Du*, Tianhao Wang, Yaliang Li, Huan Sun and Sherman S. M. Chow
The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
(ACL-IJCNLP 2021, Findings, Long Paper)
-
Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset
Xiang Yue, Bernal Jimenez Gutierrez and Huan Sun
The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)
[arXiv version] [Code] [Slides & Video] -
Clinical Phrase Mining with Language Models
Kaushik Mani*, Xiang Yue*, Bernal Jimenez Gutierrez, Yungui Huang, Simon Lin, and Huan Sun
IEEE Internatinal Conference on Bioinformatics and Biomedicine 2020 (BIBM 2020)
[arXiv extended version] [Code] -
PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation
Xiang Yue and Shuang Zhou
The 3rd Clinical Natural Language Processing Workshop at EMNLP 2020
[arXiv version] [Code] -
Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations
Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang and Huan Sun
Bioinformatics (Vol 36 Issue 4, 15 Feb 2020, Page 1241-1251) (Impact Factor: 4.531)
(ESI Highly Cited Paper: top 1% cited paper of its academic field)
[arXiv version] [Code & Datasets] -
SurfCon: Synonym Discovery on Privacy-Aware Clinical Data
Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin and Huan Sun
The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019, research track, acceptance rate: ~110/~1200=9.2%, oral)
[Code] | [Slides]