Faculty Interview: Data Science Ethics & Policy

Author: Staff

Binary code on a glass wall looking into a meeting room.

Data Ethics Play a Critical Role in Protecting People & Society

As the world becomes increasingly digital and data driven, the role of data scientists—and their commitment to data ethics—grows in importance and impact. This drives the University of Notre Dame’s mission to ensure that students understand and can apply data ethics to make the most positive impact possible in their careers.

“Automatic processes of action, recommendation, and knowledge are being embedded in systems all around us—on our phones, on our computers, etc. Handling this information is a very big responsibility. If it’s mismanaged, it can have big impacts on millions of end users,” says John Behrens, professor of the practice and concurrent professor of data science in the Department of Applied and Computational Mathematics and Statistics.

As he describes, ethical data science means appropriately serving the end user and society at all stages of the data science process, from collecting and protecting data to applying the most appropriate data science techniques and communicating results in a way that leads to positive impacts. 


Why Applied Data Science Ethics Are So Important Today

In the past, data science was primarily focused on learning from data and communicating findings to other humans. Today, computers are heavily involved in the process. As computing becomes ubiquitous, data science acts as the central point that determines how computers will act and respond to human actions.

Data science outputs are no longer only designed for humans to understand. Instead, outputs may recommend topics in your social media newsfeed or suggest products to purchase based on your interests and previous buying history.

“Using data science, computers are doing things not necessarily for insight but for action in the world,” he describes. “As a result, data scientists need to understand what the algorithms are doing, how they interact with the data, and what the societal impacts are. For example, what are the consequences if everyone receives different news, and everyone views what’s happening in the world differently as a result?”


Learning About Ethical Data Science Practices

In his Ethics and Policy in Data Science course, Behrens relies on his 20+ years of experience in leading data science teams to prompt discussions about philosophical basis for data science and data ethics, as well as the data science process.

“We teach data ethics in a very practical way. We go through the data science process and discuss relevant issues at each stage. It’s very hands-on and concrete while also focusing on larger societal contextual issues. We want to create leaders in data science and help students get beyond basic vocabulary to use real tools so they can become the best—and most ethical—data scientists they can be.”

-John Behrens, Professor of the Practice and Instructor for Data Science Ethics and Policy, Notre Dame

In class, Behrens encourages students to think about questions like:

  • What are the goals for the project, and what are the ethical implications of those goals?
  • What types of bias, limitations, or blind spots are you building into your project hiring and management?
  • Is data being collected and secured following best practices?
  • Are the most appropriate statistical techniques being used?
  • What are the relevant legal and professional frameworks that apply to this work?

After discussing project definition and organizational dynamics, the class discusses best practices in data collection and data management, followed by statistical and computing best practices and ethics of data collection. Conversations then shift to how information will be used and communicated—and what its effects will be.

Throughout the course, students create their own data ethics checklists at each stage of the data science process. By the end, they have a practical workbook they can use in their future work to guide decision-making on any project across all phases of the data science cycle—from the ethics of data collection to communicating results.

“To make it concrete, we read papers, have discussions, and talk a lot about what’s going on in the world—and with everyone’s work,” he explains. “We bring a societal lens to what’s learned in the classroom, and it empowers students to see the world in new ways.”

A case study he references frequently is one about a large technology company that used predictive modeling to automate talent searches. Because the algorithms were trained on data with historical bias against hiring women for technical roles, women were often excluded from the lists of hiring recommendations even when they were highly qualified.

Behrens uses this as an example of what can happen when the data scientists and organizational leaders are not thinking deeply about processes, the meaning in the data, or data ethics principles, along with statistical and societal implications.

During the course, he also weaves in discussions about the AI Bill of Rights, which was released in 2022 by the U.S. Office of Science and Technology Policy to guide the design, use, and deployment of automated systems through five protection principles:

  1. Protection from unsafe and ineffective systems
  2. Protection from algorithmic discrimination and assurance that systems will be designed and used in an equitable way
  3. Protection from abusive data practices
  4. Access to plain language that explains when automated systems are being used and outcomes that could impact users
  5. The ability to opt out from automated systems and use human alternatives when appropriate


The World Needs Ethical Data Science Leaders

Careers are growing across the field of data science as organizations look to hire ethical data science professionals who can conduct system and process audits and create a framework for data ethics. As processes become more automated, there’s a bigger need for people who understand data science and data science ethics at multiple levels and can think about both technical and societal issues.

“We want students to understand the responsibility and potential impacts and be able to articulate those,” explains Behrens. “In the future, if they’re in a meeting and someone says, ‘Let’s collect this data because we can,’ our students will know enough about the societal and legal implications to say, ‘Is this the best approach? Maybe we need to talk to legal counsel or product management first.’ This makes them more than just technical contributors—they become leaders.”

Learn more about Notre Dame’s online master’s in data science program and its focus on educating three-dimensional data scientists who are supported by solid philosophical, ethical, and computational ethical frameworks that support technical and organizational leadership excellence.