Biography

Nan Zhang (Chinese: 张楠) is a Ph.D. student in College of Information Sciences and Technology at The Pennsylvania State University. He has broad interests in natural language processing (NLP), clinical NLP, and machine learning. He is advised by Dr. Rui Zhang and Dr. Prasenjit Mitra. He is currently working on LLMs compression (e.g., pruning and quantization) and RAG.

Before joining Penn State, he received his bachelor’s degree from Worcester Polytechnic Institute (WPI) in 2017 and his master’s degree from Georgia Institute of Technology in 2020.

Interests
  • LLMs Compression
  • RAG
  • Natural Language Processing
  • Machine Learning
Education
  • PhD in Informatics, 2020 - Present

    The Pennsylvania State University

  • MS in Computational Science and Engineering, 2020

    Georgia Institute of Technology

  • BS in Computer Science & Industrial Engineering (double major), 2017

    Worcester Polytechnic Institute

Recent News

All news»

[Dec. 2024] Our RAG indexing paper on similar and related corpus contents is online, entitled SiReRAG: Indexing Similar and Related Information for Multihop Reasoning. Our paper consistently outperforms current indexing works on multihop datasets!

[Sept. 2024] One paper on LLMs as paper reviewers and area chairs has been accepted to EMNLP 2024.

[Aug. 2024] One paper on self-correction of LLMs has been accepted to TACL 2024.

[July 2024] One paper on error detection benchmark of LLMs has been accepted to COLM 2024.

[June 2024] Our survey paper on self-correction of LLMs is online, entitled When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs. Feel free to check it out!

Publications

(2025). SiReRAG: Indexing Similar and Related Information for Multihop Reasoning. ICLR, 2025.

PDF

(2024). LLMs assist NLP Researchers: Critique Paper (Meta-) Reviewing. EMNLP, 2024.

PDF

(2024). When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs. TACL, 2024.

PDF

(2024). Evaluating LLMs at Detecting Errors in LLM Responses. COLM, 2024.

PDF Code Dataset

(2024). Pruning as a Domain-specific LLM Extractor. NAACL Findings, 2024.

PDF Code

(2024). Fair Abstractive Summarization of Diverse Perspectives. NAACL, 2024.

PDF Code

(2024). PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents. LREC-COLING, 2024.

PDF Code

(2024). Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models. Preprint, 2024.

PDF

(2023). FaMeSumm: Investigating and Improving Faithfulness of Medical Summarization. EMNLP, 2023.

PDF Code