By Nick Diakopoulos
A lot of research has been carried out around using data analysis to identify different aspects of online behavior. Before I detail some of it below, I should add a note of caution: all analytics are only as good as how they are utilized in decision making by end users.
The complex interplay between computational tools and human actors in sociotechnical systems (such as online communities) means that great technology and analytics can still fall flat if the community policies aren’t “right”. Engagement editors and moderators need proper training, support and best practices in order to make useful decisions based on analytics.
And of course none of these papers has the definitive “answer” to any of these problems. Their findings are all worth studying, both as a basis for future study and also to test in different kinds of communities. They provide a vital orienting function for seeing what might be signal versus noise.
A constant feature of doing this academic work is the struggle to identify good data sets that we can study. If you run an online comment space and are interested in working with me or any of my colleagues, do get in touch.
A reading list on developing automated moderation tools for online communities
How to identify “trolls”
○ Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jure Leskovec. AntisocialBehavior in Online Discussion Communities. Proc. ICWSM 2015. [This paperidentifies antecedent behavior to someone being banned in news comments]
○ Eric Buckels et al. Trolls just want to have fun. Personality and Individual Differences.(PDF) 2014.
○ So-Hyun Lee and Hee-Woong Kim. 2015. Why people post benevolent and malicious comments online. Communications of the ACM 58, 11 (October 2015),74-79. DOI=http://dx.doi.org/10.1145/2739042
How to identify “toxic conversation” or other interesting events breaking out
○ Jing Wang, Clement T. Yu, Philip S. Yu, Bing Liu, and Weiyi Meng. 2012.Diversionary comments under political blog posts. In Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM ’12). [Develops method to identify irrelevant or distracting comments that might derail a conversation]
○ Kushal Dave, Martin Wattenberg, Michael Muller. Flash Forums and ForumReader: Navigating a New Kind of Large-scale Online Discussion. Proc. CSCW. 2004. [Examines the use of visualization to orient users towards areas of interest within large conversation spaces.
○ Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity use in online communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’12). ACM, New York, NY, USA, 1481-1490.DOI=http://dx.doi.org/10.1145/2207676.2208610
○ Sood, S.; Churchill, E.F.; Antin, J. Automatic Identification of Personal Insults on Social News Sites, Journal of the American Society for Information Science and Technology (JASIST), (2012)
○ Sood, S. and Churchill, E.F. Anger Management: Using Sentiment Analysis to Manage Online Communities. Grace Hopper Celebration of Women in Computing Conference, Sept 28th -Oct 2nd 2010.
How to score a discussion in terms of predicted length, overall quality, diversity, and readability
○ Length
■ Tae Yano, Noah A. Smith. What’s Worthy of Comment? Content and Comment Volume in Political Blogs. ICWSM 2010. [Uses topic modeling to improve the accuracy of predicting comment volume on blogs]
■ Manos Tsagkias, Wouter Weerkamp, and Maarten de Rijke. 2009.Predicting the volume of comments on online news stories. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM ’09). [Develop a classifier that’s pretty good at telling you if an article will get any comments, but less good at telling you how many]
■ Lars Backstrom, Jon Kleinberg, Lillian Lee, and Cristian Danescu Niculescu-Mizil. 2013. Characterizing and curating conversation threads: expansion, focus, volume, re-entry. In Proceedings of the sixth ACM international conference on Web search and data mining (WSDM ’13).ACM, New York, NY, USA, 13-22. DOI=http://dx.doi.org/10.1145/2433396.2433401
○ Quality
■ Nicholas A. Diakopoulos. 2015. The Editor’s Eye: Curation and Comment
Relevance on the New York Times. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’15). [Calculated article relevance and conversational relevance and shows that those scores correlate to the NYT Picks status]
■ Nicholas Diakopoulos. Picking the NYT Picks: Editorial criteria and automation in the curation of online news comments. #ISOJ Journal (5),1.2015 [Scours the literature to expose 12 different factors that may be associated with editorial notions of “quality” comments”. See references there to other literature related to quality]
■ D. Park, S. Sachar, N. Diakopoulos, and N. Elmqvist. Supporting Comment Moderators in Identifying High Quality Online News Comments. Proc. Conference on Human Factors in Computing Systems (CHI). 2016.[Synthesizes the above two papers into a visual analytic system for comment moderators]
■ Kevin Coe et al. Online and Uncivil? Patterns and Determinants ofIncivility in Newspaper Website Comments1. Journal of Communication. 2014 [How to define and measure incivility in online comments, whichmay be a good proxy for “quality”]
■ Chiao-Fang Hsu, Elham Khabiri, and James Caverlee. 2009. Ranking Comments on the Social Web. Proceedings of the 2009 InternationalConference on Computational Science and Engineering. [Developsseveral quantitative measures for ranking quality comments]
■ Stromer-Galley, J. (2007). Measuring deliberation’s content: A coding scheme. Journal of Public Deliberation, 3(1), Article 12. Retrieved from http://www.publicdeliberation.net/jpd/vol3/iss1/art12
■ Annie Louis and Ani Nenkova. What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain. Proc. ACL 2014. [Develops textual features for quality writing]
○ Diversity
■ Emma Pierson. 2015. Outnumbered but Well-Spoken: FemaleCommenters in the New York Times. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’15).
■ Fiona Martin. Getting my two cents worth in: Access, interaction,participation and social inclusion in online news commenting. #ISOJ Journal (5), 1. 2015.
○ Readability
■ Daniel Oelke et al. Visual Readability Analysis: How to Make Your Writings Easier to Read. Proc. VAST 2010. [Examines many different metrics to measure readability including word length, vocab complexity, use of nominal forms, sentence structure complexity]
■ Emily Pitler and Ani Nenkova. Revisiting Readability: A Unified Framework for Predicting Text Quality. Proc. EMNLP 2008.
Other analytic goals that may be useful:
Political affiliation. Identifying political worldview could help put moderation decisions in context, or enable end-users to sort effectively based on the positions they’re more likely to find appealing.
○ Souneil Park, Minsam Ko, Jungwoo Kim, Ying Liu, and Junehwa Song. 2011.The politics of comments: predicting political orientation of news stories with commenters’ sentiment patterns. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (CSCW ’11).
○ Felix Ming Fai Wong et al. Quantifying Political Leaning from Tweets and Retweets. Proc. ICWSM 2013.
Opinion, subjectivity, and facticity. Identifying these facets would support users who wantto focus on more of the “emotional” or subjective reaction of a community, or on theother hand who want to focus on the more objective of comments.
○ Laura Dietz, Ziqi Wang, Samuel Huston, and W. Bruce Croft. 2013. Retrieving opinions from discussion forums. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM ’13).
○ Prakhar Biyani, Cornelia Caragea, Amit Singh, and Prasenjit Mitra. 2012. I want what i need!: analyzing subjectivity of online forum threads. In Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM ’12).
○ Naeemul Hassan et al. Detecting Check-worthy Factual Claims in Presidential Debates. Proc. CIKM 2015.
Criticality. Identifying critical commentary can help alert journalists to issues that need correction in the short term (e.g. immediate fix needed) or longer term (e.g. how framingmight need to be updated in subsequent follow-on articles)
○ J. Hullman, N. Diakopoulos, E. Momeni, E. Adar. Content, Context, and Critique: Commenting on a Data Visualization Blog. Proc. Conference on Computer Supported Cooperative Work (CSCW). March, 2015.
Personal Experience. Identifying personal stories and anecdotes supports journalists looking for humanizing information.
○ D. Park, S. Sachar, N. Diakopoulos, and N. Elmqvist. Supporting Comment Moderators in Identifying High Quality Online News Comments. Proc. Conference on Human Factors in Computing Systems (CHI). 2016. [Also listed above section]
Demographics. Age and gender may be helpful to infer in order to support diversitycalculations on threads.
○ Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M,et al. (2013) Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE 8(9)
Expertise. Identifying expertise levels in different topics could support journalisticsourcing of new content, and a contextual understanding of what comments may be worthy of highlight in a particular section.
○ Ido Guy, Uri Avraham, David Carmel, Sigalit Ur, Michal Jacovi, and Inbal Ronn.2013 Mining expertise and interests from social media. In Proceedings of the 22nd international conference on World Wide Web (WWW ’13). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 515-526.
○ Alexandru Lucian Ginsca and Adrian Popescu. 2013. User profiling for answer quality assessment in Q&A communities. In Proceedings of the 2013 workshop on Data-driven user behavioral modelling and mining from social media (DUBMOD ’13). ACM, New York, NY, USA, 25-28. DOI=http://dx.doi.org/10.1145/2513577.2513579
○ Aditya Pal. 2015. Discovering Experts across Multiple Domains. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’15). ACM, New York, NY, USA, 923-926. DOI=http://dx.doi.org/10.1145/2766462.2767774
○ Tyler Munger and Jiabin Zhao. 2015. Identifying Influential Users in On-line Support Forums using Topical Expertise and Social Network Analysis. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (ASONAM ’15), Jian Pei, Fabrizio Silvestri, and Jie Tang (Eds.). ACM, New York, NY, USA, 721-728. DOI=http://dx.doi.org/10.1145/2808797.2810059
Nick Diakopoulos is an Assistant Professor at the University of Maryland, College Park College of Journalism and a member of the UMD Human Computer Interaction Lab (HCIL).
The photo has unrestricted use license and was taken by imelenchon of morguefile.com.