Sleep studies collect comprehensive physiological information on participants while they sleep overnight in a lab. However, each study collects too much information to possibly be analyzed by a human being. A recent study from the Stanford School of Medicine found that learning models can make very accurate disease predictions from single-night sleep data.
A machine learning model known as SleepFM, was trained in a self-supervisory style on four pre-existing datasets totaling 65,000 participants.
After learning associations from its training data, the model was then tested on 7,455 participants’ records that were purposefully excluded from the training.
The model was tested via the area under the receiver operating characteristics curve method, which is a way of assessing model accuracy.
It works by a model analyzing two participants. One of them eventually got sick in six years, while the other remained healthy.
The machine must guess who gets sick first, with the worst score being 0.5 due to random guessing.
The study found that sleep data was less effective at predicting certain types of illnesses.
For example, sleep data was particularly ineffective at predicting a patient’s likelihood of having an infectious disease in six years’ time.
The model was good at predicting various diseases that have been previously correlated with sleep disturbances. These included prostate and breast cancer, along with Parkinson’s disease.
The study mentions a variety of limitations with the predictive model. One key limitation is sampling bias. As the model is trained off people who were referred to sleep clinics, the training data does not represent the general population.
This means that the model is not trained to correlate the nuances of sleep in regular, non- referred individuals to specific health outcomes. Furthermore, this model is trained on a subset of illnesses.
The development team purposefully excluded very rare illnesses, along with simplifying diagnoses to a variety of “buckets” of similar illnesses. This means that the model might struggle with unique illnesses requiring human intervention to diagnose.
However, the SleepFM model managed to integrate a variety of predictors to obtain conclusions that would be prohibitively time-consuming if analyzed by a human.
Similar but more advanced models can assist doctors with assessing patients’ long-term health risks, without long-term monitoring.
The self-supervisory training style means that improved versions of these models can be developed without significant human input to classify data, so future models can have an easier time generalizing model performance to representative data, making the data usable for the general public.
