Teaching Ethics for Visualization
Designing an assignment to help train visualization students in ethics.
It is perhaps no small wonder that ethics is becoming an increasingly important dimension in educating tomorrow’s computing professionals. Today’s tech companies have proven over and over again that they are not capable — or even willing — to do the right thing given the choice. Thus, a small but growing group of professors are now turning their efforts towards adding ethical dimensions to computer science education; examples include the #EthicalCS initiative, Evan Peck’s EthicalCS course modules (including his article summarizing this effort), and Casey Fiesler’s efforts on technology ethics.
Data visualization is obviously part of computing, and thus subject to the same concerns. Accordingly, there has been some recent entries into the conversation on this front, particularly Michael Correll’s “Ethical Dimensions of Visualization Research” at CHI 2019, his paper on “Black Hat Visualization”, and his blog post urging us to do “virtuous data visualization”. In fact, Alberto Cairo’s most recent book is titled How Charts Lie: Getting Smarter about Visual Information, and deals with both the positive and the negative influences that charts can have on the perception of truth.
So, when it was time for me to finalize the syllabus for my data visualization graduate course at University of Maryland, College Park for Fall 2019, I knew that I needed to include an assignment on ethics for data visualization. I am writing about this here in the hope that my experience can be useful to other faculty teaching data visualization. I’d love to hear from you in case you end up using some of these ideas. Of course, this is somewhat unfamiliar territory to me, so I am also open to feedback on how this assignment could be improved. Let me know either way!
Design Rationale
The EthicalCS courses I know about tend to include modules designed to combine teaching algorithms with ethics; examples include Evan Peck’s “ethical decision engine” for driverless cars as well as his resume screening algorithm. This “double duty” of teaching both introductory CS concepts (algorithms and data structures) at the same time as ethical dimensions is very satisfying. However, ethics in data visualization is typically not associated with algorithms or data structures, but rather — not surprisingly — with the actual data and its visual representation. For example, in Michael Correll’s “black hat visualization”, the four categories of attacks that a malicious designer can employ — breaks of convention, data manipulation, obfuscation, and nudging — all deal with either the raw data or its visual presentation.
Based on these examples, I wanted to construct an assignment where students would both learn how to visualize data at the same time as they considered the ethical impact of their design decisions. Instead of focusing merely on malicious aspects (i.e., “black hat”), I also wanted them to think about how to be as clear and transparent as possible (in computer security, this would be a “white hat hacker”). Juxtaposing both white hat and black hat versions of the same data would yield maximum effect and let students easily contrast the versions. Finally, in line with prior writings on this subject, the assignment should allow students to work on both data and visualization aspects.
Finding appropriate datasets that can afford both black hat and white hat interpretations can be time-consuming. Since I wanted students to focus on data manipulation and visualization in their work, I handpicked the data for them based on Michael Correll’s thoughts on “data advocacy,” i.e., selecting data that is actually important and (again) serves double duty in highlighting injustices while providing learning opportunities. I found three different datasets to give students some choice: U.S. mass shootings, OECD greenhouse gas emissions, and World Bank gender gap indices (see below).
Results and Lessons Learned
During Fall 2019, I taught INST 760, a graduate data visualization course. In total, I had 23 students enrolled in the course. The visualization ethics assignment (see below for the assignment text) was worth 10% of their final grade. All students submitted a solution to the assignment, and overall I was very pleased with their efforts — all students received full score on their work.
Since I had to create the assignment from scratch and a lot of deadlines got in the way, I was only able to release the assignment in early October, with an early November deadline. In retrospect, releasing the assignment early in the semester would have allowed it to better serve the double duty of teaching visualization design as well as ethics at the same time. Furthermore, it would also have enabled me to include a proper ethics module in the lecture itself where the students and I could have discussed their solutions in class. This semester, there simply was not enough available time for such a module given project deadlines and presentations. Another improvement may be to have students write public blog posts on their black hat and white hat visualizations, perhaps similar to my existing assignment on explaining visualization research to a popular audience.
Finally, I gave very little actual guidance on how students should actually break conventions, manipulate data, obfuscate, or nudge to create their black hat visualizations. An important improvement for the next offering is to give more concrete guidelines on such techniques. For example, I may want to draw upon Alberto Cairo’s new book How Charts Lie, which had not yet been released by the time I created my assignment (it came out on October 15). While you could argue that naming and describing such dishonest or outright malicious techniques is irresponsible, I am of the firm opinion that acknowledging and explaining them is important for a visualization professional. After all, design always has an ethical dimension, and ignoring it will not make it go away.
I just received the results from my teaching evaluations, and this ethics assignment was called out several times as one of the highlights of the semester. Based on this, as well as the informal feedback I received from my students in class, this is definitely something I will continue in future years.
Below follows the text of the visualization ethics assignment. Feel free to use any of this material in your own assignment. Please let me know if you do!
Assignment: Visualization Ethics
Data is increasingly becoming part of the lifeblood of modern society: today’s citizens are collecting, consuming, and interpreting data in virtually every part of their lives. They are using this data to make decisions ranging from where to shop and what to eat all the way to which house to buy, which school to send their kids to, and which political candidates to vote for. Needless to say, this means that the veracity and provenance of all this data becomes increasingly important. In particular, presenting this data appropriately becomes an ethics and fairness concern.
Overview
In this assignment, you will be visualizing a single dataset from two different perspectives: a “white hat” and a “black hat” one. White hat vs. black hat are terms from computer security, where a white hat hacker is someone who uses their skills for good, i.e. finding vulnerabilities in software and systems to help companies and their customers, whereas a black hat hacker uses them for their own (or their organization’s or country’s) gain.
More specifically, a white hat visualization would be one where:
- The visualizations are clear and easy to interpret for the intended audience (often the general population);
- Any transformations, filtering, and computations done to the data are clearly and transparently communicated; and
- The sources of the data, including potential bias, is communicated.
A black hat visualization, on the other hand, exhibits one or several of the following characteristics:
- The visual representation is intentionally inappropriate, overly complex and/or too cluttered for the audience;
- Labels, axes, and legends are misleading;
- Titles are skewed to intentionally influence the viewer’s perception;
- The data has been transformed, filtered, or processed in an intentionally misleading way; or
- The source and provenance of the data is not clear to the viewer.
Datasets
You will only be working with a single dataset, but to give you some flexibility, you get to choose from three possible datasets. Please only choose one dataset from this list. These datasets are intentionally chosen to cover politically charged topics for the simple reason that these are typically the type of data where ethical visualization is important. Be warned that some of these topics can be disturbing; avoid datasets that are difficult for you personally.
Note that you do not have to visualize the entire dataset! You may choose a subset of the data to visualize. This is useful because some of the datasets below are larger than others.
The three datasets are the following:
- U.S. Mass Shootings 1982–2019 (Mother Jones) — a spreadsheet of mass shootings in the United States (Source: Mother Jones)
- OECD Greenhouse Gas Emissions 1990–2017 — emissions from different countries over the years (Data: OECD Statistics)
- The World Bank’s Overall Global Gender Gap Index — data on the gender gap for salary across the world (Data: World Bank — Overall Gender Gap Index)
Submissions
Since you are visualizing this from two different perspectives — white hat vs black hat — you will actually be generating two visualizations in total: one “white” and one “black”. You are free to use any visualization technique and any visualization tool to generate each submission. The white and black visualizations do not have to use the same tool or technique.
Each visualization should consist of a single page with the following information:
- One or more visualizations;
- Title (short sentence) describing the visualization;
- One-paragraph description of what the visualization shows; and
- Legend (if necessary).
Clearly mark each visualization with “white hat” or “black hat”. Additional information you provide for each visualization depends on whether your visualization is white hat or black hat. For example, if you are creating a white hat visualization, you may want to clearly explain the source of your data, the organization that collected it, and how you have transformed it.
On the back of each visualization page, add a short description (3–4 sentences) explaining your motivation and design process in producing this visualization. Be sure to discuss the ethical considerations of your design choices.
Important: In the interest of ethical design, add the following disclaimer to the bottom of each page (clearly visible):
DISCLAIMER: This visualization was created as part of a visualization ethics assignment. Please use the information presented here with caution, as it may have been intentionally designed to be misleading.
This is an individual assignment. You may discuss the datasets and visualizations with your classmates, but you must create the visualizations and all writing yourself.
You may use whatever tool you prefer for this assignment, or even draw the visualizations by hand or using a drawing program.
Hints and Tips
- There are limits to what you can do for the black hat visualization; for example, it doesn’t make sense to make up data from whole cloth (but you may be forgiven for filtering inconvenient outliers).
- Similarly, it is interesting to consider that the notion of “black” vs “white” could depend on your political affiliation, personality, and background.
- However, you should strive to avoid this becoming an exercise in partisanship, but rather view black hat as being about obfuscation and white hat being about transparency.
- For the white hat visualization, it is not sufficient to just create a standard visualization and be done with it; you need to actively work to make your visualization as clear and transparent as possible!