Digital Epidemiology

Today I came across two interesting sessions given by Gossec Laure from France and William Dixon from the United Kingdom. They talked about digital epidemiology in this changing world, as there are approximately 15 billion connected devices and more than 80,000 medical apps.

In the past, players of epidemiology studies were only rheumatologists and patients who voluntarily shared data. However, these players are changing - patients are going from active to passive givers of information. Popular devices which are used to collect passive information are- smart phones, cars, watches and activity trackers. Other not commonly used sources include clothes, houses, UV patches, connected pill boxes etc. An important source of passive data is when people look up their symptoms in search engines like google. They believe they are looking for answers, but they also end up giving data. For example, influenza epidemic onset is correlated with google searches for symptoms.

Registries currently provide actively collected data. Linking data sources biobanks, claims or electronic medical data provide a rich resource. Genetic datasets containing large scale sequences are now available. NIH collects data on 700 patients on a daily basis and has archives of 560,000 individual samples. All of these are considered “Big data”. Three components to big data are

1) Volume of data in terms of number of patients or data points

2) Velocity which is mainly the frequency of input

3) Variety is mainly heterogeneity of data

Some peculiar examples of “Big data” studies using Sociological data are Neighborhood environment wide association study (NE-WAS), or using social media disease reporting for qualitative analysis.

With all the available data, it is important to develop techniques to accurately analyze data, and make better use of new predication models and mitigate bias. Machine learning statistics is the newest technology available for this and may be better as they don’t have causality hypothesis. Machine has the ability to learn without being explicitly programmed. It comes up with prediction models which works best after several reiterations.

It is important to recognize the challenges with the digitalization and “big data” :-

- Ownership of data and privacy policy for data protection,

- Commercial use of the data by pharmaceuticals/ corporate giants and insurance companies may be worrisome and

- Losing sight of the priorities given so much available data

In an ideal digital epidemiology world, we could have “cradle to grave” records in order to predict and improve patient outcomes.