Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions

Temiloluwa Prioleau, Baiying Lu, Yanjun Cui

Published

Image Source

https://www.nutrisense.io/blog/normal-glucose-levels

Abstract

Artificial intelligence (AI) algorithms are a critical part of state-of-the-art digital health technology for diabetes management. Yet, access to large high-quality datasets is creating barriers that impede development of robust AI solutions. To accelerate development of transparent, reproducible, and robust AI solutions, we present Glucose-ML, a collection of 10 publicly available diabetes datasets, released within the last 7 years (i.e., 2018 - 2025). The Glucose-ML collection comprises over 300,000 days of continuous glucose monitor (CGM) data with a total of 38 million glucose samples collected from 2500+ people across 4 countries. Participants include persons living with type 1 diabetes, type 2 diabetes, prediabetes, and no diabetes. To support researchers and innovators with using this rich collection of diabetes datasets, we present a comparative analysis to guide algorithm developers with data selection. Additionally, we conduct a case study for the task of blood glucose prediction - one of the most common AI tasks within the field. Through this case study, we provide a benchmark for shortterm blood glucose prediction across all 10 publicly available diabetes datasets within the Glucose-ML collection. We show that the same algorithm can have significantly different prediction results when developed/evaluated with different datasets. Findings from this study are then used to inform recommendations for developing robust AI solutions within the diabetes or broader health domain. We provide direct links to each longitudinal diabetes dataset in the Glucose-ML collection and openly provide our code.

Direct link to paper

Related Publications

DiaTrend: A dataset from advanced diabetes technology to enable development of novel analytic solutions

Published

,

2023

Temiloluwa Prioleau, Abigail Bartolome, Richard Comi, Catherine Stanger

This publication provides a detailed description of the DiaTrend dataset, which our team is making open access to the broader community, to accelerate development of novel data-driven solutions and robust computational tools for diabetes and beyond.

View publication