{"id":26,"date":"2019-01-31T21:43:21","date_gmt":"2019-01-31T21:43:21","guid":{"rendered":"http:\/\/groups.cs.umass.edu\/equate\/?p=26"},"modified":"2019-04-03T17:21:23","modified_gmt":"2019-04-03T17:21:23","slug":"natural-language-processing","status":"publish","type":"post","link":"https:\/\/groups.cs.umass.edu\/equate\/research\/natural-language-processing","title":{"rendered":"Disparities in Natural Language Processing"},"content":{"rendered":"<p><span style=\"font-weight: 400\">Do language technologies equitably serve all groups of people? \u00a0The way we speak and write varies across demographics and social communities &#8212; but natural language processing models can be quite brittle to this variation. \u00a0If an NLP system, such as machine translation or opinion analysis, works well for some groups of people but not others, that impedes information access and the ability of authors&#8217; voices to be heard, since media communication is now filtered through search and newsfeed relevance algorithms. <\/span><\/p>\n<p><span style=\"font-weight: 400\">We are pursuing an interdisciplinary project to analyze language model&#8217;s disparities across social communities, in particular African-American Vernacular English, a major dialect with marked differences compared to mainstream English. While it is used widely in oral and social media communication, it has very little presence in the well-edited texts that comprise traditional NLP corpora. \u00a0We have constructed a corpus of informal AAE from publicly available social media posts and found a variety of NLP systems work worse on this text, and have developed more equitable models for analysis tasks such as language identification and parsing. By collaborating between sociolinguistics and computer science, this work seeks to support social scientific analysis goals, as well as use social science insights to inform the construction of more effective and fairer language technologies.<\/span><\/p>\n<p><!--more--><\/p>\n<h4>Publications<\/h4>\n<ul>\n<li><span style=\"font-weight: 400\"><a href=\"https:\/\/aclweb.org\/anthology\/D16-1120\" target=\"_blank\" rel=\"noopener\">Demographic Dialectal Variation in Social Media: A Case Study of African-American English.<\/a>\u00a0<\/span><span style=\"font-weight: 400\">Su Lin Blodgett, Lisa Green, and Brendan O&#8217;Connor.\u00a0<\/span><span style=\"font-weight: 400\">Proceedings of EMNLP 2016.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><a href=\"https:\/\/arxiv.org\/pdf\/1707.00061.pdf\" target=\"_blank\" rel=\"noopener\">Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English.<\/a>\u00a0<\/span><span style=\"font-weight: 400\">Su Lin Blodgett and Brendan O&#8217;Connor.\u00a0<\/span><span style=\"font-weight: 400\">Fairness, Accountability, and Transparency in Machine Learning (FAT\/ML) workshop at KDD 2017.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><a href=\"https:\/\/noisy-text.github.io\/2017\/pdf\/WNUT08.pdf\" target=\"_blank\" rel=\"noopener\">A Dataset and Classifier for Recognizing Social Media English.<\/a>\u00a0<\/span><span style=\"font-weight: 400\">Su Lin Blodgett, Johnny Tian-Zheng Wei, and Brendan O&#8217;Connor.\u00a0<\/span><span style=\"font-weight: 400\">3rd Workshop on Noisy User-generated Text (WNUT) at EMNLP 2017. <\/span><\/li>\n<li><span style=\"font-weight: 400\"><a href=\"http:\/\/aclweb.org\/anthology\/P18-1131\" target=\"_blank\" rel=\"noopener\">Twitter Universal Dependency Parsing for African-American and Mainstream American English.<\/a>\u00a0<\/span><span style=\"font-weight: 400\">Su Lin Blodgett, Johnny Tian-Zheng Wei, and Brendan O&#8217;Connor.\u00a0<\/span><span style=\"font-weight: 400\">Proceedings of ACL 2018.<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Do language technologies equitably serve all groups of people? \u00a0The way we speak and write varies across demographics and social communities &#8212; but natural language processing models can be quite brittle to this variation. \u00a0If an NLP system, such as machine translation or opinion analysis, works well for some groups of people but not others, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-26","post","type-post","status-publish","format-standard","hentry","category-research"],"_links":{"self":[{"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/posts\/26","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/comments?post=26"}],"version-history":[{"count":8,"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/posts\/26\/revisions"}],"predecessor-version":[{"id":115,"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/posts\/26\/revisions\/115"}],"wp:attachment":[{"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/media?parent=26"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/categories?post=26"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate\/wp-json\/wp\/v2\/tags?post=26"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}