Medical Data Annotation Practices

May 14, 2025 at 14:00 CEST (UTC+2)

Medical data annotation is a crucial yet challenging aspect of developing reliable deep learning models for healthcare. Errors, biases, and inconsistencies frequently affect commonly used datasets, particularly in biomedical image analysis, where data is often limited, inter-rater variability is high, and annotation styles differ among experts. The lack of standardized methods for defining "ground truth" and the inherent ambiguity of medical data further complicate the annotation process.

This workshop will bring together researchers to explore the complexities of medical dataset annotation through lectures, case study presentations, and discussions. A key focus will be on the importance of high-quality annotations. We will touch upon issues connected to label noise, confounders, and systematic biases, which can compromise model performance. The workshop will highlight how improving annotation processes can help address these issues. By tackling these challenges, the workshop aims to advance best practices, foster collaboration, and drive improvements in medical AI development.


Dr. Veronika Cheplygina Portrait of Veronika Cheplygina

IT University of Copenhagen

Title: Curious Findings about Medical Image Datasets

Bio: Dr. Veronika Cheplygina's research focuses on limited labeled scenarios in machine learning, in particular in medical image analysis, as well as meta-science in this field. She received her Ph.D. from Delft University of Technology in 2015. After a postdoc at the Erasmus Medical Center, in 2017 she started as an assistant professor at Eindhoven University of Technology. In 2020, failing to achieve various metrics, she left the tenure track of search of the next step where she can contribute to open and inclusive science. In 2021 she started as an associate professor at IT University of Copenhagen. Next to research and teaching, Veronika blogs about academic life. She also loves cats, which you will often encounter in her work.

Abstract: It may seem intuitive that we need high quality datasets to ensure robust algorithms for medical image classification. With the introduction of openly available, larger datasets, it might seem that the problem has been solved. However, this is far from being the case, as it turns out that even these datasets suffer from issues like label noise and shortcuts or confounders. Furthermore, there are behaviours in our research community that threaten the validity of published findings. In this talk I will discuss both types of issues with examples from recent papers, and how more work on annotation could help to address some of these problems.

Tim Rädsch Portrait of Tim Rädsch

German Cancer Research Center (DKFZ)

Getting it Right: Unlocking Better Annotation Data through Improved Instructions and QA

Bio: Tim is a PhD student at the German Cancer Research Center and Helmholtz Imaging. With experience in building a 25-person engineering team at a German AI startup and a master from KIT, he rigorously validates ML algorithms by developing best-in-class training and testing pipelines. His accolades include the Anton Fink Science for AI Award 2023, Falling Walls Science Breakthrough of the Year 2023 Finalist, a Hanns Seidel research fellowship, and a Best Flash Talk Award at the ELLIS Health conference in Heidelberg. His contributions are published in venues like Nature Machine Intelligence, ECCV, Nature Methods, CVPR, and NeurIPS.

Abstract: Accurate biomedical image analysis relies heavily on high-quality annotations, yet the role of annotation instructions and quality assurance (QA) remains underexplored. This talk presents insights from two large-scale studies examining how improved labelling instructions and QA practices impact annotation performance. Practical strategies for optimizing instructions and allocating resources effectively will be discussed to maximize annotation performance — ultimately improving the reliability of biomedical image analysis pipelines.

Coen de Vente Portrait of Coen de Vente

qurAI, University of Amsterdam, Amsterdam UMC

Title: Lessons from the AIROGS Challenge: Collecting Reliable Annotations for Glaucoma Screening

Bio: Coen de Vente is a postdoctoral researcher in the qurAI group at Univeristy of Amsterdam. His research focuses on advancing AI for medical imaging with an emphasis on model robustness, including uncertainty estimation, generative modeling, and efficient annotation. His work primarily spans ophthalmology. He obtained his PhD from the University of Amsterdam, where he developed deep learning techniques for the analysis of retinal diseases using optical coherence tomography and color fundus imaging. He also contributed to the development of CORADS-AI, a deep learning system for assessing COVID-19 in chest CT. He co-organized the AIROGS challenge (Artificial Intelligence for RObust Glaucoma Screening), aiming to drive AI innovation in glaucoma detection.


Register here!

The Zoom link to attend will be emailed to you upon completion of the event registration form.

Timetable

May 14, 2025 at 14:00 CEST (UTC+2)

ProgramTime Slot
Welcome and Workshop Introduction14:00 – 14:10
Curious Findings about Medical Image Datasets (Veronika Cheplygina)14:10 – 14:55
Lessons from the AIROGS Challenge:14:55 – 15:15
Collecting Reliable Annotations for Glaucoma Screening (Coen de Vente)
Break15:15 – 15:30
Scientific Talk (Speaker TBA)15:30 – 15:50
Getting it Right: Unlocking Better Annotation Data15:50 – 16:35
through Improved Instructions and QA (Tim Rädsch)
Concluding Remarks16:35 – 16:45

About the organizers

Dieuwertje Luitse (Faculty of Humanities) and Maria Galanty (Informatics Institute) are PhD students at the University of Amsterdam and Amsterdam UMC, working under the supervision of Professors Ivana Išgum, Clara I. Sanchez, Tobias Blanke and Alexander Vlaar. in an interdisciplinary research project on AI for health decision-making. In our recent work, "Assessing the Documentation of Publicly Available Medical Image and Signal Datasets and Their Impact on Bias Using the BEAMRAD Tool," we investigated biases that can arise from inadequate dataset documentation and conducted an in-depth analysis of 37 publicly available medical image and signal datasets. This research emphasized the crucial role of the annotation process and the significant challenges in achieving high-quality annotations. To further explore this topic, we aim to bring together a community of researchers in medical image analysis by organizing an online workshop on medical data annotation practices.

For further information please contact:

This workshop is supported by University of Amsterdam Research Priority Area Artificial Intelligence for Health Decision-making: https://www.rpa-aiforhealth.nl/