Avoid Collecting Too Much Data

By Martin Shelton

There seems to be no end to the creepy ways that companies want to use our data.

Advertisers want facial recognition software to identify us in YouTube videos, and to predict our future behaviors. Samsung has released a ‘smart’ television with the ability to eavesdrop on conversations and send them back to headquarters and undisclosed third parties.

With each headline comes a flash of public outrage, some journalistic scrutiny, and, just perhaps, some critical self-reflection from technologists. But what about journalistic institutions? What are they collecting about their readers, how are they sharing that collection with their readers, and how are they using the information that they collect?

If journalism’s first loyalty is to citizens, we need to reflect on how to leverage user data on a news organization’s website while being transparent, respectful, and without creeping out the same people who are being asked to trust the reporting.

Monitoring the audience

Like members of almost every other industry on the internet, news organizations use different monitoring tools to better understand their audience. That means embedded browser-tracking cookies, and analytic tools such as Hootsuite, Chartbeat, Carebot, NewsLynx, Google Analytics, and Parse.ly that monitor on-site and social media behavior.

There are two main reasons for this. News organizations care deeply about connecting with, and having the audience participate in, their journalism. They typically do this through on-site forms, the comments section, giving out journalists’ email addresses, and talking with readers on social media.

The other reason is economic. Some cookies are embedded by on-site advertisers. Other analytics, when used appropriately, can track readership behavior, measure on-site activity to report to advertisers, and yield opportunities to offer targeted subscriptions or other offers to the audience.

But while news organizations argue over different kinds of metrics – page views vs attention, bots vs real users – the norms around the technologies being used to track the audience are not well established.

When does it become creepy?

When confronted with the reality of what happens on the internet, people are often creeped out by the constant collection and analysis of their data.

According to many studies, people sometimes think tailored ads can be useful, but often feel uncomfortable with the lack of notice or choice when they are used. Highly targeted ads not only creep people out, but may even backfire, leading people to feel less inclined to purchase advertised products. Just because we have data does not necessarily mean we should use it.

The perception of creepiness also depends on the context for data collection. For example, according to one study, if users trust the source of data collection, they are generally less likely to be creeped out, though they might feel more comfortable if the data are not shared outside of the organization in question.

If news organizations want to be trusted and to avoid being seen as creepy, we have to ask what ‘creepy’ means. And here’s where we run into some problems.

There’s surprisingly little empirical research into what creepiness really is. Perhaps the only study on this topic examined the question by asking 1341 online survey participants to rate the creepiness of 44 personal attributes and behaviors, as well as several hobbies and careers. The researchers made a simple prediction – that people are creeped out by situations they see as ambiguously threatening.

When researchers measured the responses to the various situations, they found that both men and women perceived men as more creepy. Women were more likely to see sexual threat as creepy. People described certain physical characteristics as creepy, such as greasy hair, or a peculiar smile. People also said they found it creepy when people are at odds with certain social norms and conventions, like when someone laughs at inappropriate times. Finally, participants were creeped out by certain unusual occupations (e.g., taxidermy, funeral directors); clowns were seen as creepiest of all. All of these scenarios, the researchers argue, represent unclear threats that make us feel unsure whether we should be truly concerned.

As opposed to feelings of fear or disgust, perhaps we feel most thoroughly creeped out when a threat is uncertain. Think about the feeling of someone walking behind you on an empty street, or the feeling of being watched when you seem to be alone in the dark. (But are you?) Now transfer that feeling to the idea of handing over your personal data to someone standing behind you, whose face you can’t see. We don’t know how our data will be used, why that mysterious person even wants it, and you have no reason whatsoever to trust that they’ll act in your best interests. If we want people to feel okay about that kind of transaction, we need to clarify precisely how and why we want to collect and use someone’s data.

Making policies on user data collection

It’s challenging to know how to respond to collective, unclear feelings of creepiness with narrowly-defined policy choices. Coming up with specific data policy decisions for advertising is not easy, especially as the current system is a key source of revenue for many news organizations. But if news organizations want to maintain the trust of their readers, then they have to consider greater transparency around user data collection and use. It also needs to be done in a context-specific way – policies on collecting user data for advertising purposes are likely to be quite different from inviting user submissions for publication. And explaining that difference through complicated legal language hidden away behind a link in a 5-point font at the bottom of the page isn’t being transparent.

There are smart ways to leverage information from the audience in a respectful way that increases a sense of trust. Think about how ProPublica persuaded veterans sharing their story on surviving the Vietnam War and the fight to receive due benefits. Think about how the LA Times opened up conversation with readers on familial and personal struggles with drugs. In these cases, people are encouraged to go out of their way to reveal personal information in order to participate in the news. This participation is contingent on trust in the news organization.

How can news organizations communicate to users how their data are likely to be used in a way that increases their trust in the organization?

I believe that journalistic institutions have an opportunity to take charge of the conversation on user data collection in the newsroom by asking some simple questions, and by bringing the audience into the conversation when crafting transparent data policies.

Here are some of the first questions for newsrooms to ask themselves when preparing to ask readers to share their private information:

What data will we collect?
What do readers understand about the kinds of data we’re collecting?
What are the circumstances under which that data will be gathered (e.g., through an explicit prompt, versus in the background)?
What are the circumstances under which the data are used (e.g., when we find a particularly interesting quote)?
How will the data be used and by whom?
How do we want to publish about it? (e.g., do we want to quote readers?)
How could it potentially be combined with other kinds of data?
Who, if anyone, do we share it with?
What inferences are likely to be made from the data alone about people referenced in the data?
What if, later on, we think of another use of the data that the users haven’t explicitly agreed to? How will we handle that?
What is the potential harm to users if the data were taken out of context?
How long do we want to keep the data?
Who will have access to the data?
What safeguards are we taking to protect the data?

It might not be possible to answer all of these with clarity ahead of time. But to the extent that it is possible, by asking and answering questions such as these in the open at the time of the data collection, newsrooms can demonstrate that they care about the audience’s continued involvement in our journalism, and that they also care about the same thing that their readers do: transparency, ethical behavior, and above all, not being creepy.

North, M. (2015). Cool or creepy: Consumer comfort level with sentiment analytics. Issues in Information Systems, 16(3), 70-79, 2015.

Phelan, C. et al. (2016). It’s creepy but it doesn’t bother me. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 5240-5251.

Stevens, A. M. (2014). What is creepy? Towards understanding that eerie feeling when it seems the internet “knows” you. Acad. Manage. Proc 2014.

Tene, O. & Polonetsky, J. (2013). A theory of creepy: Technology, privacy and shifting social norms. Yale Journal of Law & Technology 59.

Ur., B. et al., (2012). Smart, useful, scary, creepy: perceptions of online behavioral advertising, SOUPS 2012, 4.

Zhang, H. et al. (2014). Creepy but inevitable?: the evolution of social networking. Proceedings of Computer Supported Cooperative Work 2014, 368-378.

Photo by Barry Maas [CC BY-NC-ND 2.0]

Avoid Collecting Too Much Data

How can we make these guides better? Let us know Subscribe to our newsletter for updates

How can we make these guides better? Let us know
Subscribe to our newsletter for updates