Chaoyi Wu

I am a PhD candidate working on medical image analysis and machine learning at Shanghai Jiao Tong University.

My current research interest is in developing foundation models and agentic systems for medicine.

Email  /  Scholar  /  Github

profile photo

Research

2024
Can Modern LLMs Act as Agent Cores in Radiology Environments?
Qiaoyu Zheng*, Chaoyi Wu*, Pengcheng Qiu, Lisong Dai , Ya Zhang, Yanfeng Wang, Weidi Xie,
Technical Report, 2024.

In this study, we systematically investigate a pre-requisite question for building concrete radiology agents which is, ‘Can modern LLMs act as agent cores in radiology environments?’ Serving for this goal, we build up RadABench, a comprehensive LLM-based agent evaluation benchmark for radiology.

Towards Evaluating and Building Versatile Large Language Models for Medicine
Chaoyi Wu*, Pengcheng Qiu*, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie,
Technical Report, 2024.

In this study, we present MedS-Bench, a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts beyond multiple-choice question-answering. Moreover, we build up a new comprehensive medical instruction dataset, termed as MedS-Ins.

AutoRG-Brain: Grounded Report Generation for Brain MRI
Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie,
Technical Report, 2024.

In this paper, we propose a grounded report generation system for brain MRI leveraging the coorperation of different sub-tools. In real clinical scenerios, the system can significantly improve the efficiency of the radiologists.

RaTEScore: A Metric for Radiology Report Generation
Weike Zhao, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie,
EMNLP2024, Main.

In this paper, we propose an entity-level assessment metric for radiological reports beyond chest x-ray using NER and synonym normalization models. Unlike LLM-based assessment pipelines, our metric is more lightweight and objective targeting large-scale auto-evaluation.

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis
Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie,
Technical Report, 2024.

In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. It includes: Organ-level segmentation for 197 categories; 665K multi-granularity grounded reports; 1.3M grounded VQA pairs.

Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie,
ECCV2024 Oral.

In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology.

Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu*, Chaoyi Wu*, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie,
Nature Communications (Accepted In Principle)

In this paper, we aim to develop a multilingual language corpus (MMedC), benchmark (MMedBench) and an open-source, multilingual language model (MMedLM) for medicine, that benefits a wider, linguistically diverse audience from different regions.

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts
Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie,
Technical Report, 2024.

In this paper, we build up a universal medical segmentation model, driven by text prompts (SAT).

Large-scale Long-tailed Disease Diagnosis on Radiology Images
Qiaoyu Zheng*, Weike Zhao*, Chaoyi Wu*, Xiaoman Zhang*, Ya Zhang, Yanfeng Wang, Weidi Xie,
Technical Report, 2024.

In this paper, we collect a large-scale multi-modal, multi-scan, long-tailed muti-lable diagnosis (classification) dataset. We further propose a vision encoder together with a fusion module, enabling arbitrary scan input per case. On evaluation, our methods achieve better experiment results on our benchmark and can also serve as a pre-train model for external datasets.

2023
Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis
Chaoyi Wu*, Jiayu Lei*, Qiaoyu Zheng*, Weike Zhao*, Weixiong Lin*, Xiaoman Zhang*, Xiao Zhou*, Ziheng Zhao*, Yanfeng Wang, Ya Zhang, Weidi Xie,
Technical Report, 2023.

We evaluate the GPT-4V on 92 radiographic cases, 20 pathoglogy cases and 16 location cases across 17 medical systems covering 8 imaging modalities. In general, as the cases shown, GPT-4V is still far from clinical usage.

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training
Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, Yanfeng Wang ,
Technical Report, 2023.

We release a new knowledge-enhanced Brain MRI pre-train foundation model leveraging image-report pairs which can realize zero-shot diagnosis of unseen brain diseases.

Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data
Chaoyi Wu*, Xiaoman Zhang*, Yanfeng Wang , Ya Zhang, Weidi Xie,
Technical Report, 2023.

In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM. we construct a large-scale Medical Multi-modal Dataset, MedMD, consisting of 16M 2D and 3D medical scans.

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Xiaoman Zhang*, Chaoyi Wu*, Weixiong Lin, Ziheng Zhao, Yanfeng Wang , Ya Zhang, Weidi Xie,
Technical Report, 2023.

In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA). We propose a generative medical VQA model, MedVInT, together with a large scale MedVQA Dataset, PMC-VQA.

PMC-LLaMA: Towards Building Open-source Language Models for Medicine
Chaoyi Wu, Xiaoman Zhang, Yanfeng Wang , Ya Zhang, Weidi Xie,
Journal of the American Medical Informatics Association (JAMIA)

In this report, we introduce PMC-LLaMA, an open-source language model that is acquired leveraging large medical corpus, surpassing chatGPT on medicalQA benchmarks.

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents
Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Weidi Xie,
International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2023.

We collect a biomedical dataset, PMC-OA with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset.

Knowledge-enhanced Pre-training for Auto-diagnosis of Chest Radiology Images
Xiaoman Zhang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Weidi Xie,
Nature Communications, 2023.

Here, we propose a knowledge-enhanced vision-language pre-training approach for auto-diagnosis on chest X-ray images. First trains a knowledge encoder based on an existing medical knowledge graph, then leverages the pre-trained knowledge encoder to guide the visual representation learning.

K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging
Chaoyi Wu*, Xiaoman Zhang*, Yanfeng Wang , Ya Zhang, Weidi Xie,
MICCAI-BTSD (workshop) 2023, Oral.

In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge.

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training
Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang , Weidi Xie,
International Conference on Computer Vision (ICCV), 2023.

We propose to leverage medical specific knowledge enhancing language-image pre-training method, significantly advancing the ability of pre-trained models to handle unseen diseases on zero-shot classification and grounding tasks.

2022
Boundary-Enhanced Self-supervised Learning for Brain Structure Segmentation
Feng Chang, Chaoyi Wu, Yanfeng Wang , Ya Zhang, Xin Chen, Qi Tian,
International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2022.

We propose Boundary-Enhanced Self-SupervisedLearning (BE-SSL), leveraging supervoxel segmentation and registrationas two related proxy tasks, enhancing brain structure segmentation.

Integrating features from lymph node stations for metastatic lymph node detection
Chaoyi Wu, Feng Chang, Xiao Su, Zhihan Wu, Yanfeng Wang , Ling Zhu, Ya Zhang,
Computerized Medical Imaging and Graphics (CMIG), 2022, 101: 102108.

We first leverage the information of LN stations for metastatic LN detection. Metastatic LN station classification is proposed as proxy task for metastatic LN detection. A GCN-based structure is adopted to model the mutual influence among LN stations.


Based on a template by Jon Barron.