Chaoyi Wu

I am a PhD candidate working on AI4Medicine at Shanghai Jiao Tong University.

My current research interests lie in developing medical foundation models at both the language and multimodal levels, as well as agentic systems for healthcare applications.

Email / Scholar / Github

Research

2025
	Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases Pengcheng Qiu, Chaoyi Wu, Pengcheng Qiu, Shuyu Liu , Weike Zhao , Ya Zhang, Yanfeng Wang, Weidi Xie, Technical Report, 2025. In this study, we quantitatively evaluate the free-text reasoning abilities of various state-of-the-art LLMs, such as DeepSeek-R1 and OpenAI-o3-mini, in assessment recommendation, diagnostic decision, and treatment planning.
	M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging Jinghao Feng, Qiaoyu Zheng, Chaoyi Wu, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie, Technical Report, 2025. In this study, we develop an automated machine learning agentic AI system for medical imaging analysis, aiming to equip medical agent systems with self-evolution capabilities.
	RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining Tengfei Zhang, Ziheng Zhao, Chaoyi Wu, Xiao Zhou, Ya Zhang, Yanfeng Wang, Weidi Xie, Technical Report, 2025. In this study, we propose a novel medical image similarity ordering pipeline that operates at multiple granularities by effectively utilizing rich information extracted from dense radiology report annotations.
2024
	Can Modern LLMs Act as Agent Cores in Radiology Environments? Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai , Ya Zhang, Yanfeng Wang, Weidi Xie, Technical Report, 2024. In this study, we systematically investigate a pre-requisite question for building concrete radiology agents which is, ‘Can modern LLMs act as agent cores in radiology environments?’ Serving for this goal, we build up RadABench, a comprehensive LLM-based agent evaluation benchmark for radiology.
	Towards Evaluating and Building Versatile Large Language Models for Medicine Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie, npj Digital Medicine, 2025. In this study, we present MedS-Bench, a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts beyond multiple-choice question-answering. Moreover, we build up a new comprehensive medical instruction dataset, termed as MedS-Ins.
	AutoRG-Brain: Grounded Report Generation for Brain MRI Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie, Technical Report, 2024. In this paper, we propose a grounded report generation system for brain MRI leveraging the coorperation of different sub-tools. In real clinical scenerios, the system can significantly improve the efficiency of the radiologists.
	RaTEScore: A Metric for Radiology Report Generation Weike Zhao, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie, EMNLP2024, Main. In this paper, we propose an entity-level assessment metric for radiological reports beyond chest x-ray using NER and synonym normalization models. Unlike LLM-based assessment pipelines, our metric is more lightweight and objective targeting large-scale auto-evaluation.
	RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie, Technical Report, 2024. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. It includes: Organ-level segmentation for 197 categories; 665K multi-granularity grounded reports; 1.3M grounded VQA pairs.
	Knowledge-enhanced Visual-Language Pretraining for Computational Pathology Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie, ECCV2024 Oral. In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.
	Towards Building Multilingual Language Model for Medicine Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie, Nature Communications, 2024. In this paper, we aim to develop a multilingual language corpus (MMedC), benchmark (MMedBench) and an open-source, multilingual language model (MMedLM) for medicine, that benefits a wider, linguistically diverse audience from different regions.
	One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie, Technical Report, 2024. In this paper, we build up a universal medical segmentation model, driven by text prompts (SAT).
	Large-scale Long-tailed Disease Diagnosis on Radiology Images Qiaoyu Zheng^, Weike Zhao^, Chaoyi Wu^, Xiaoman Zhang^, Ya Zhang, Yanfeng Wang, Weidi Xie, Nature Communications, 2024. In this paper, we collect a large-scale multi-modal, multi-scan, long-tailed muti-lable diagnosis (classification) dataset. We further propose a vision encoder together with a fusion module, enabling arbitrary scan input per case. On evaluation, our methods achieve better experiment results on our benchmark and can also serve as a pre-train model for external datasets.
2023
	Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis Chaoyi Wu^, Jiayu Lei^, Qiaoyu Zheng^, Weike Zhao^, Weixiong Lin^, Xiaoman Zhang^, Xiao Zhou^, Ziheng Zhao^, Yanfeng Wang, Ya Zhang, Weidi Xie, Technical Report, 2023. We evaluate the GPT-4V on 92 radiographic cases, 20 pathoglogy cases and 16 location cases across 17 medical systems covering 8 imaging modalities. In general, as the cases shown, GPT-4V is still far from clinical usage.
	UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, Yanfeng Wang , Computerized Medical Imaging and Graphics (CMIG), 2025. We release a new knowledge-enhanced Brain MRI pre-train foundation model leveraging image-report pairs which can realize zero-shot diagnosis of unseen brain diseases.
	Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data Chaoyi Wu^, Xiaoman Zhang^, Yanfeng Wang , Ya Zhang, Weidi Xie, Technical Report, 2023. In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM. we construct a large-scale Medical Multi-modal Dataset, MedMD, consisting of 16M 2D and 3D medical scans.
	Development of a large-scale medical visual question-answering dataset Xiaoman Zhang^, Chaoyi Wu^, Weixiong Lin, Ziheng Zhao, Yanfeng Wang , Ya Zhang, Weidi Xie, Nature Communications Medicine, 2024. In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA). We propose a generative medical VQA model, MedVInT, together with a large scale MedVQA Dataset, PMC-VQA.
	PMC-LLaMA: Towards Building Open-source Language Models for Medicine Chaoyi Wu, Xiaoman Zhang, Yanfeng Wang , Ya Zhang, Weidi Xie, Journal of the American Medical Informatics Association (JAMIA) In this report, we introduce PMC-LLaMA, an open-source language model that is acquired leveraging large medical corpus, surpassing chatGPT on medicalQA benchmarks.
	PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Weidi Xie, International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2023. We collect a biomedical dataset, PMC-OA with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset.
	Knowledge-enhanced Pre-training for Auto-diagnosis of Chest Radiology Images Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Weidi Xie, Nature Communications, 2023. Here, we propose a knowledge-enhanced vision-language pre-training approach for auto-diagnosis on chest X-ray images. First trains a knowledge encoder based on an existing medical knowledge graph, then leverages the pre-trained knowledge encoder to guide the visual representation learning.
	K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging Chaoyi Wu^, Xiaoman Zhang^, Yanfeng Wang , Ya Zhang, Weidi Xie, MICCAI-BTSD (workshop) 2023, Oral. In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge.
	MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang , Weidi Xie, International Conference on Computer Vision (ICCV), 2023. We propose to leverage medical specific knowledge enhancing language-image pre-training method, significantly advancing the ability of pre-trained models to handle unseen diseases on zero-shot classification and grounding tasks.
2022
	Boundary-Enhanced Self-supervised Learning for Brain Structure Segmentation Feng Chang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Xin Chen, Qi Tian, International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2022. We propose Boundary-Enhanced Self-SupervisedLearning (BE-SSL), leveraging supervoxel segmentation and registrationas two related proxy tasks, enhancing brain structure segmentation.
	Integrating features from lymph node stations for metastatic lymph node detection Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang , Ling Zhu, Ya Zhang, Computerized Medical Imaging and Graphics (CMIG), 2022, 101: 102108. We first leverage the information of LN stations for metastatic LN detection. Metastatic LN station classification is proposed as proxy task for metastatic LN detection. A GCN-based structure is adopted to model the mutual influence among LN stations.

Based on a template by Jon Barron.