Three Important Skills for Data Scientists in Healthcare

Author: Steven Buechler

Four people look at a computer screen in a science lab.

Use of data has long been an important aspect of healthcare.  

Since the beginning of modern medicine, biostatisticians have analyzed data from clinical trials to enable rigorous, evidence-based decision making in treating patients.

However, with the integration of more data into clinical practice, the impact of data science on patient care has significantly expanded. 

For example, an article in The New Yorker describes how data analysis in Camden, New Jersey revealed that one percent of patients accounted for one-third of the city’s healthcare costs. High costs are not simply a burden on the healthcare system but are also indicators of poor care. Using this information, doctors, social workers, and others were able to construct a plan to focus resources on this comparatively small population of unhealthy persons to improve their lives.

The skills data scientists in healthcare need are not unique to that area, but ethics, communication, and teamwork are especially important in healthcare. The lack of such skills may limit bringing new tools into clinical use.

Understanding the Ethical Complexities of Using Health Data

Our data is increasingly being stored electronically, and healthcare is no exception. This data includes drug prescriptions, diagnostic test results, treatment notes, family medical history and insurance claims, all of which can be integrated with demographic and employment data to identify, using data science, opportunities to improve the wellness of individuals and populations.

While the potential value from data integration is high, the risk for abuse is just as high.

By its very nature, health data is intensely private, and divulging this data can lead to abuse. For example, should a company know that one of their new employees had cancer two years ago?

Concerns about privacy and the rights of individuals to make informed decisions about the use of their health data significantly constrain the use of medical data. The data scientist working in healthcare must understand and adhere to the policies and procedures put in place to protect people.

Data scientists should embrace the opportunity to use their skills and not take a back seat in discussions of rights to privacy. Their understanding of how data can be used may enable them to identify areas of health risk before others.

Data scientists may also identify ways to protect a subject’s privacy while sharing medical data. In order to be a full participant in medical research, the data scientist in healthcare must be well-versed in the ethical uses of data.

Working on Multidisciplinary Teams and Communicating with Healthcare Professionals

The goals of a data scientist working on a movie recommendation system are fairly clear. Surveying a few friends and relatives may yield a sufficient sample to produce a tool that serves the majority of users.

In contrast, understanding the problems faced by healthcare workers can be very difficult.

An oncologist planning treatment of a breast cancer patient bases a decision on results of multiple diagnostic tests and decades of medical research. Treatment of a patient in a large medical center by a team of nurses, radiologists, physical therapists, and others, may result from a complicated chain of referrals, with almost no one seeing the full picture. To have a positive impact on a healthcare process, a data scientist must understand the problem at hand. This typically requires reading medical literature and talking to doctors and other professionals.

The data scientist’s work is not done once they understand the problem and produce a solution. The next step is educating the doctors on how the product can change the way they work. This not only requires knowing how the doctors work but also requires knowing how to discuss findings with this medical audience.

In all stages of the data-intensive healthcare problem-solving process, communicating with the future users is essential to success.

How My Team Collaborates to Improve Breast and Colon Cancer Treatment

My own work in healthcare data science began around 2004. Before that time, my research in the Department of Mathematics at Notre Dame was in mathematical logic, far removed from applications. I eventually became interested in molecular biology, especially in the area of cancer. When I learned of all the opportunities available through computational and statistical analysis of molecular data, including those with a direct clinical application, I decided that would be my future.

Over the intervening years, I have invented and patented a number of molecular diagnostic tests for deciding on choices between different drug therapies in breast cancer and colon cancer. Early in this process, I formed a collaboration with a clinical pathologist and a molecular biologist at Indiana University School of Medicine. They have been invaluable in ensuring that my work focuses on diagnostic tests with clinical utility and communicating these results to the medical community.

To that end, we first present our results as posters at large international meetings of clinical oncologists. This gives us the opportunity to practice explaining the results to dozens of oncologists. Over these poster sessions, we often refine our “pitch” to match what seems to make the most sense to them. My colleagues and I have formed a company to continue moving these inventions towards clinical use.

The strength of my team of three collaborators lies in the special knowledge held by each member and our commitment to sharing this knowledge. This collaborative effort enables us to invent products that are beyond the skills of each individual. There are few instances in healthcare in which a single person knows enough to analyze a clinical problem, solve it using data, and market it to the clinical community. Teamwork is critical to the success of a data science effort in healthcare. The data scientist working in such a team must be dedicated to communicating with team members in a shared effort in order to be successful.

Preparing Data Scientists with the Skills Employers are Eagerly Seeking

The online master’s in data science program in the Department of Applied and Computational Mathematics and Statistics of the University of Notre Dame was designed to develop data scientists with the full range of skills needed to be successful in a wide variety of sectors, including healthcare.

Here, I focused on the importance of ethics, communication and teamwork in healthcare, however, virtually every company we interviewed when designing the program emphasized the importance of these skills in addition to technical expertise.

Bell Labs statistician John Tucky, considered one of the founders of data science, famously said, “The greatest thing about being a statistician is that you get to play in everyone’s backyard.” This statement is also true for data scientists. Working in the backyard of healthcare can be enormously satisfying for the data scientist who has developed all of the necessary skills.