Consider The Ethics of Your Data

By Tara Adiseshan, former Knight-Mozilla OpenNews Fellow with The Coral Project

A version of this piece was first published on The Coral Project blog in November 2015

As we think about metrics at The Coral Project, I’ve been thinking about what we can measure and how that might be different from what we should measure. Luckily, there are lots of folks in this space who have done interesting research. I’m going to start posting some of the questions that I’ve been thinking about and some of the work I’m inspired by.

How do we make sure that the metrics we collect don’t penalize newcomers?

It can be easier to figure out if you trust someone after you’ve known them for a long time. And the same is frequently true when it comes to trust scores or reputation scores. The problem is, when metrics are based on long user histories on a site, it can be difficult to make decisions about newcomers. I’ve really enjoyed reading Aaron Halfaker’s work on the treatment of good-faith newcomers in the Wikipedia community. In particular, Aaron’s work described the ways in which the tools used to maintain the quality of Wikipedia contributions also contributed to the community’s decline.

How do we best respect the privacy and safety of the community members as we build these metrics?

As someone who gets very excited about data science and anti-surveillance efforts, I’ve been thinking a lot about how we can build metrics that respect the privacy and safety of community members. It was helpful to learn about Tor’s usability research and the guidelines they use as they try to make their browser more user-friendly.

How do we make sure that the metrics we collect are inclusive?

This is something I’m looking for more research on, but something that I’ve had many conversations about with folks working in the comments / moderation space. For example, what does it mean to use vocabulary or adherence to punctuation and grammar rules as a way to decide quality? It is very possible to write comments that are perfectly punctuated, while still attacking individuals or community members. It is also very possible that a comment with spelling and grammar mistakes could be an incredibly thoughtful and meaningful perspective.

That’s one example, but points to a broader question. How could our metrics be used for unintentionally exclusionary or intentionally malicious purposes? The code we write is political, and I think it’s an important part of the design / development process to acknowledge that. Emma Pierson’s work on gender parity in The New York Times’ comments sections was a great read.

What kinds of feedback loops are involved in the metrics we collect?

Moderation, community response, and reputation scores can all be important parts of feedback loops that shape online spaces. I’ve heard a lot about Riot Games’s rehabilitative moderation approach in the past. I’ve also been reading Justin Cheng’s research on how community feedback shapes behavior.

If there’s any work that you’ve also found interesting, let me know on Twitter!

Photo by ÁWá (Own work) [GFDL or CC-BY-SA-3.0], via Wikimedia Commons

Thank you to all the folks who pointed me towards some of these questions and work, including the Coral team and Nate Matias.

Consider The Ethics of Your Data

How can we make these guides better? Let us know Subscribe to our newsletter for updates

How can we make these guides better? Let us know
Subscribe to our newsletter for updates