Chaoyi Wu

Hi! I'm Chaoyi, an Assistant Professor at the School of Artificial Intelligence, Shanghai Jiao Tong University , specializing in AI for Medicine.

My current research focuses on advancing medical foundation models in both language and multimodal domains, and on designing agentic systems that push the boundaries of AI4Medicine.

I'm looking for enthusiastic Master's and PhD students to join our team! If you're passionate about medical AI and motivated to explore new frontiers, I'd love to hear from you and work together to shape the future.

Email  /  Scholar  /  Github

profile photo

Latest Highlights❗
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie
Technical Report, 2025

In this paper, we propose an end-to-end reinforcement learning framework for training agentic RAG systems, evolving their action policies through large-scale data fitting to achieve enhanced traceable diagnostic reasoning capabilities.

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning
Weike Zhao*, Chaoyi Wu*, Yanjie Fan*, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie,
Technical Report, 2025.

We develop DeepRare, the first agentic AI system for rare disease diagnosis that integrates specialized tools and medical knowledge sources to provide traceable diagnostic reasoning with exceptional accuracy across multiple evaluation datasets.

ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification
Ziqing Fan, Cheng Liang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2025.

We present ChestX-Reasoner, a radiology diagnosis reasoning MLLM. We demonstrate that unlike general multimodal reasoning which is difficult to define, radiology imaging analysis naturally embeds reasoning structures in daily clinical reports from findings to impressions, providing invaluable process-level supervision for PRM-based reinforcement learning.

Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases
Pengcheng Qiu*, Chaoyi Wu*, Pengcheng Qiu, Shuyu Liu, Weike Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2025

In this study, we quantitatively evaluate the free-text reasoning abilities of various state-of-the-art LLMs, such as DeepSeek-R1 and OpenAI-o3-mini, in assessment recommendation, diagnostic decision, and treatment planning.


Research

2025
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie
Technical Report, 2025
In this paper, we propose an end-to-end reinforcement learning framework for training agentic RAG systems, evolving their action policies through large-scale data fitting to achieve enhanced traceable diagnostic reasoning capabilities.
Weike Zhao*, Chaoyi Wu*, Yanjie Fan*, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie
Technical Report, 2025
We develop DeepRare, the first agentic AI system for rare disease diagnosis that integrates specialized tools and medical knowledge sources to provide traceable diagnostic reasoning with exceptional accuracy across multiple evaluation datasets.
Ziqing Fan, Cheng Liang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
arXiv preprint arXiv:2504.20930, 2025
We present ChestX-Reasoner, a radiology diagnosis reasoning MLLM that leverages naturally embedded reasoning structures in clinical reports to provide process-level supervision for PRM-based reinforcement learning, achieving significant improvements in diagnostic accuracy and reasoning ability.
Pengcheng Qiu*, Chaoyi Wu*, Pengcheng Qiu, Shuyu Liu, Weike Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2025
In this study, we quantitatively evaluate the free-text reasoning abilities of various state-of-the-art LLMs, such as DeepSeek-R1 and OpenAI-o3-mini, in assessment recommendation, diagnostic decision, and treatment planning.
Jinghao Feng*, Qiaoyu Zheng*, Chaoyi Wu, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie
MICCAI2025-Workshop, Oral
In this study, we develop an automated machine learning agentic AI system for medical imaging analysis, aiming to equip medical agent systems with self-evolution capabilities.
Tengfei Zhang*, Ziheng Zhao*, Chaoyi Wu, Xiao Zhou, Ya Zhang, Yanfeng Wang, Weidi Xie
MICCAI2025, Early Accepted
In this study, we propose a novel medical image similarity ordering pipeline that operates at multiple granularities by effectively utilizing rich information extracted from dense radiology report annotations.
2024
Qiaoyu Zheng*, Chaoyi Wu*, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2024
In this study, we systematically investigate a pre-requisite question for building concrete radiology agents which is, 'Can modern LLMs act as agent cores in radiology environments?' Serving for this goal, we build up RadABench, a comprehensive LLM-based agent evaluation benchmark for radiology.
Chaoyi Wu*, Pengcheng Qiu*, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie
npj Digital Medicine, 2025
In this study, we present MedS-Bench, a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts beyond multiple-choice question-answering. Moreover, we build up a new comprehensive medical instruction dataset, termed as MedS-Ins.
Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
Computerized Medical Imaging and Graphics (CMIG), 2024
In this paper, we propose a grounded report generation system for brain MRI leveraging the cooperation of different sub-tools. In real clinical scenarios, the system can significantly improve the efficiency of the radiologists.
Weike Zhao, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
EMNLP2024, Main
In this paper, we propose an entity-level assessment metric for radiological reports beyond chest x-ray using NER and synonym normalization models. Unlike LLM-based assessment pipelines, our metric is more lightweight and objective targeting large-scale auto-evaluation.
Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie
Technical Report, 2024
In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. It includes: Organ-level segmentation for 197 categories; 665K multi-granularity grounded reports; 1.3M grounded VQA pairs.
Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
ECCV2024 Oral
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.
Pengcheng Qiu*, Chaoyi Wu*, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie
Nature Communications, 2024
In this paper, we aim to develop a multilingual language corpus (MMedC), benchmark (MMedBench) and an open-source, multilingual language model (MMedLM) for medicine, that benefits a wider, linguistically diverse audience from different regions.
Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
npj Digital Medicine, 2025
In this paper, we build up a universal medical segmentation model, driven by text prompts (SAT).
Qiaoyu Zheng*, Weike Zhao*, Chaoyi Wu*, Xiaoman Zhang*, Ya Zhang, Yanfeng Wang, Weidi Xie
Nature Communications, 2024
In this paper, we collect a large-scale multi-modal, multi-scan, long-tailed multi-label diagnosis (classification) dataset. We further propose a vision encoder together with a fusion module, enabling arbitrary scan input per case. On evaluation, our methods achieve better experiment results on our benchmark and can also serve as a pre-train model for external datasets.
2023
Chaoyi Wu*, Jiayu Lei*, Qiaoyu Zheng*, Weike Zhao*, Weixiong Lin*, Xiaoman Zhang*, Xiao Zhou*, Ziheng Zhao*, Yanfeng Wang, Ya Zhang, Weidi Xie
Technical Report, 2023
We evaluate the GPT-4V on 92 radiographic cases, 20 pathology cases and 16 location cases across 17 medical systems covering 8 imaging modalities. In general, as the cases shown, GPT-4V is still far from clinical usage.
Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, Yanfeng Wang
Computerized Medical Imaging and Graphics (CMIG), 2025
We release a new knowledge-enhanced Brain MRI pre-train foundation model leveraging image-report pairs which can realize zero-shot diagnosis of unseen brain diseases.
Chaoyi Wu*, Xiaoman Zhang*, Yanfeng Wang, Ya Zhang, Weidi Xie
Nature Communications
In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM. We construct a large-scale Medical Multi-modal Dataset, MedMD, consisting of 16M 2D and 3D medical scans.
Xiaoman Zhang*, Chaoyi Wu*, Weixiong Lin, Ziheng Zhao, Yanfeng Wang, Ya Zhang, Weidi Xie
Nature Communications Medicine, 2024
In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA). We propose a generative medical VQA model, MedVInT, together with a large scale MedVQA Dataset, PMC-VQA.
Chaoyi Wu, Xiaoman Zhang, Yanfeng Wang, Ya Zhang, Weidi Xie
Journal of the American Medical Informatics Association (JAMIA)
In this report, we introduce PMC-LLaMA, an open-source language model that is acquired leveraging large medical corpus, surpassing chatGPT on medicalQA benchmarks.
Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang, Ya Zhang, Weidi Xie
International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2023
We collect a biomedical dataset, PMC-OA with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset.
Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang, Ya Zhang, Weidi Xie
Nature Communications, 2023
Here, we propose a knowledge-enhanced vision-language pre-training approach for auto-diagnosis on chest X-ray images. First trains a knowledge encoder based on an existing medical knowledge graph, then leverages the pre-trained knowledge encoder to guide the visual representation learning.
Chaoyi Wu*, Xiaoman Zhang*, Yanfeng Wang, Ya Zhang, Weidi Xie
MICCAI2023-Workshop, Oral
In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge.
Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
International Conference on Computer Vision (ICCV), 2023
We propose to leverage medical specific knowledge enhancing language-image pre-training method, significantly advancing the ability of pre-trained models to handle unseen diseases on zero-shot classification and grounding tasks.
2022
Feng Chang, Chaoyi Wu, Yanfeng Wang, Ya Zhang, Xin Chen, Qi Tian
International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2022
We propose Boundary-Enhanced Self-Supervised Learning (BE-SSL), leveraging supervoxel segmentation and registration as two related proxy tasks, enhancing brain structure segmentation.
Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang, Ling Zhu, Ya Zhang
Computerized Medical Imaging and Graphics (CMIG), 2022, 101: 102108
We first leverage the information of LN stations for metastatic LN detection. Metastatic LN station classification is proposed as proxy task for metastatic LN detection. A GCN-based structure is adopted to model the mutual influence among LN stations.

Based on a template by Jon Barron.