Xiang Yue (岳 翔)

PhD student, The Ohio State University, OH, U.S.

Email: yue.149 AT osu DOT edu


Bio

I am currently a final year PhD student working with Prof. Huan Sun in the Department of Computer Science and Engineering at The Ohio State University (OSU). I have broad research interests in Natural Language Processing (NLP). Specifically, my research aims to build safe, responsible and reliable large language models (LLMs), which

  • ensure faithfulness to factual world knowledge and truth
  • generalize well to various unseen environments
  • safeguard user data privacy
I also have extensive experience building LLMs for different applications (e.g., Question Answering). I interned at Microsoft Research (Redmond) in 2022 summer, and at Tencent AI Lab (Bellevue) in 2021 summer.

I'm looking for full-time positions (I will graduate in Summer 2023)! Feel free to drop me an email if you have openings!


What's New

  • [May 2023] Check out our new preprint on Automatic Evaluation of Attribution by Large Language Models
  • [May 2023] I'm honored to receive two research awards: 2023 CSE Graduate Research Award and 2023 College of Engineering Exemplary Graduate Student Researcher
  • [May 2023] Our paper on Synthetic Text Generation with Differential Privacy got accepted by ACL 2023 main conference
  • [June 2022] Our OSU TacoBot team earned the third-place honor ($50K) in the first Alexa Prize TaskBot Challenge! 10 teams were selected worldwide out of 125 initiated applications to participate in the challenge in May 2021 and 5 teams were selected into finals in April 2022. We are the only US team in the top-3 performers! Check out our report here.
  • [Mar 2022] - Two recent papers about question answering got accepted by ACL 2022 main conference: "Synthetic Question Value Estimation for Domain Adaptation of Question Answering" and "C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References"
  • [Mar 2022] - I will join Microsoft Research to explore NLP+Privacy for my 2022 summer internship!
  • [Dec 2021] - Our paper "CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering" has received the IEEE BIBM 2021 Best Paper Award!
  • [Aug 2021] - Our short paper "COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval" has been accepted to EMNLP 2021 main conference!
  • [May 2021] - Our team has been selected in the Alexa Prize TaskBot Challenge as one of 10 teams over 125 applications initiated from 15 countries! We will build a smart dialogue system to help users finish Cooking and DIY tasks.
  • [May 2021] - Our long paper "Differential Privacy for Text Analytics via Natural Text Sanitization " has been accepted to ACL-IJCNLP 2021, Findings! We propose a privacy-preserving NLP pipeline (which consists of DP-based text sanitization mechanisms, sanitization-aware language model pretraining and finetuning)
  • [Sept 2020] - Our paper "PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation" has been accepted to EMNLP'20 Clinical NLP Workshop!
  • [July 2020] - Attended ACL 2020 and presented our Clinical Reading Comprehension work. Check out our slides and video
  • [April 2020] - Our paper "Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset" has been accepted to ACL 2020! We conduct a comprehensive study on the Clinical Reading Comprehension task based on the recently-released emrQA dataset!

Last Updated: 08/2021