Stanford CS224N I NLP with Deep Learning ｜ Spring 2022 ｜ Socially Intelligent NLP Systems

0:00:00 - 0:00:16 Text: Welcome to CS224N. Today I'm really excited that we're getting to have the first of our guest lectures and so tonight it's not tonight.

0:00:16 - 0:00:34 Text: Today it's really great to have Martin Saft so he Martin is currently a young investigator at the Allen Institute for AI in Seattle but pretty soon now I guess he's going to be starting at Carnegie Mellon University as a professor.

0:00:34 - 0:00:59 Text: So Martin's done a huge amount of really exciting and interesting work looking at socially intelligent and all piece systems and in particular he's had a really strong emphasis at looking at issues of social inequality bias and toxicity in language and that works been widely noticed and he's had it covered in the New York Times for fortune etc.

0:00:59 - 0:01:06 Text: So it'll be a great opportunity today to hear about some of that interesting work that he has done.

0:01:06 - 0:01:12 Text: Okay, so notes questions. Martin be delighted to have live questions.

0:01:12 - 0:01:35 Text: So if you'd like to ask a live question you can either hit the raise hand thing or you could put in the Q and A you know live Q or some short hand like that and then we can make it so you can ask live questions if you're feeling very shy you can still put a question the question answer that we can relay for you that live questions much referred.

0:01:35 - 0:01:40 Text: Okay, I think that's the only instructions and otherwise take away Martin.

0:01:40 - 0:01:50 Text: Thanks, I'll just say that there by be a chance that I don't see the raising hands or questions like that so if any of the sort of coordinators can.

0:01:50 - 0:01:57 Text: Let me know if there's something that's pressing I can you know just totally answer it as well.

0:01:57 - 0:02:01 Text: But I will do that will we'll stop you the questions.

0:02:01 - 0:02:11 Text: Yeah, all right, so thank you so much for having me here today I'm really excited to talk about some of the work that I've done during my PhD and that I'm currently still doing.

0:02:11 - 0:02:20 Text: And today's talk is going to be focusing specifically on work that I've done related to detecting and rewriting socially biased language.

0:02:20 - 0:02:37 Text: And so I want to start with a quote from Rita May Brown who was a feminist author and LGBTQ activists from the 60s who said that language is the roadmap of a culture and it tells you where it's people come from and where they are going.

0:02:37 - 0:02:46 Text: And so this is a known thing that many other linguists and philosophers have discussed that language and culture really cannot be disentangled from each other that much.

0:02:46 - 0:02:54 Text: And this is particularly interesting when we think about understanding how inequality or biases or hatred can manifest themselves in language.

0:02:54 - 0:03:00 Text: And so I like to talk about this in a framework that I call the cycle of social inequality and text.

0:03:00 - 0:03:09 Text: And so what I mean by that is that we know that there is social inequality between minority and majority groups in the world, for example, between men and women.

0:03:09 - 0:03:20 Text: And because of how our language works, the world is going to influence the way that our language patterns are different with respect to these demographic groups.

0:03:20 - 0:03:24 Text: And so language is inherently going to reflect existing social inequality.

0:03:24 - 0:03:35 Text: And this is going to, for example, show up in the portrayals of minority characters who are known to be more biased or stereotypically than majority characters.

0:03:35 - 0:03:44 Text: And this also shows up in the fact that, for example, hate speech and toxicity are mostly going to target minority groups and not really majority groups.

0:03:44 - 0:03:49 Text: Alright, I'm already seeing some discussion. I think we're good.

0:03:49 - 0:03:58 Text: And then in turn, the language that we use or reread is actually influencing the world itself.

0:03:58 - 0:04:12 Text: For example, hate speech has been shown to be able to worsen relationships between demographic groups. And we also know that the portrayal of minority characters can shape the perceptions and stereotypes that we have of those minority identities.

0:04:12 - 0:04:16 Text: And so there's this sort of cycle that happens.

0:04:16 - 0:04:34 Text: And this talk I am going to try to talk about how we can make NLP systems understand and mitigate social biases and toxicity and language. And one of the reasons, you know, the crux of the reason why this is super important is because if any, any human generated data is going to inherently reflect social dynamics and inequality.

0:04:34 - 0:04:43 Text: And if our NLP systems are going to be trained on this human generated data, they need to account for these dynamics because otherwise it can have really harmful results.

0:04:43 - 0:04:47 Text: So to drive this point home a little bit more.

0:04:47 - 0:04:59 Text: Let me talk, walk you through a couple NLP tasks, the first big task that, you know, we can think about is conversational AI. And this is a very big task that people are tackling in NLP these days, you know, creating digital assistance chat bots, things like that.

0:04:59 - 0:05:19 Text: And you may remember the really awful example of Microsoft's TAY, which was a AI chat bot that they released on Twitter and it turned racist into in less than a day. And this is because they didn't really account for any of the social dynamics that maybe goes on on the internet or went into the training data of this model.

0:05:19 - 0:05:38 Text: And they're bot ended up having really rude and offensive conversations in this case. Another field, or subfield of NLP that has gotten a lot of attention recently is language generation, which is similar to conversation AI, but just more about autocomplete text or continuing stories or news articles.

0:05:38 - 0:05:49 Text: And here if you don't account for the inequality or the social dynamics that in go into your training data, then you can end up with really incoherent mindless, but also really biased or offensive generations.

0:05:49 - 0:06:01 Text: And some of my own work has shown that actually GPT 3, which is open a eyes, sort of most powerful text generator out there really can delve into toxicity really quickly.

0:06:01 - 0:06:11 Text: And then finally, also related to NLP, there's this whole field or subfield of text understanding and particularly I worked on things like hate speech detection and sentiment analysis.

0:06:11 - 0:06:25 Text: And here, if you don't account for the underlying dynamics that went into your data collection, your data creation, you can end up with really worse performance on minority user input or even worse actually bias behavior against the minority users.

0:06:25 - 0:06:34 Text: And so some of my own work has actually shown that, for example, hate speech detection systems tend to be racially biased and I'll talk about this in a minute.

0:06:34 - 0:06:45 Text: And just, you know, this is the deep learning and LP class. So, you know, we know that in recent years, there's been a lot of improvements in an LP task in general, thanks to these pre trained language models.

0:06:45 - 0:06:58 Text: For the sake of this talk, I just want to highlight some of the key parts of why this, these pre trained language models are working so well. So we know that they're large neural nets that are trained on large amounts of text to predict which word comes next.

0:06:58 - 0:07:04 Text: And they're using this thing called a transformer architecture, which is a neural network.

0:07:04 - 0:07:15 Text: And the basic recipe is to gather a large amount of text data, take a transformer model and then train that transformer model to predict basically a word given its context.

0:07:15 - 0:07:22 Text: And, you know, there's a trend of naming these models after Sesame Street characters for some reason and LP.

0:07:22 - 0:07:29 Text: But we also got some sort of text generators that were named GPT 2, GPT 3, things like that.

0:07:29 - 0:07:41 Text: And we know that these models like GPT 3 are getting bigger and bigger every day, but also getting bigger is the training data or the pre training corpora that are used to create these language models.

0:07:41 - 0:07:50 Text: And we started out with using quote unquote only documents from English Wikipedia or maybe just a small set of books.

0:07:50 - 0:08:09 Text: GPT 2 was trained on a really large set of documents that were basically outbound links Reddit and T5 and GPT 3 were trained on the common crawl dataset, which is a really large archive of basically all documents on the internet that people could find.

0:08:09 - 0:08:21 Text: And here I want to pause a little bit for a second and thinking about, you know, the fact that our models are learning language from random internet data like what could go wrong with that.

0:08:21 - 0:08:28 Text: So it could go wrong and does go wrong is actually that these transformer language models are really mindless and socially oblivious.

0:08:28 - 0:08:46 Text: And not only are they learning stereotypes and social biases from their training data, some of my own work has shown that they're at risk for generating toxicity really, really quickly, in less than 100 generations. So if you've sampled 100 sentences from GPT 3, you're likely to find something really toxic within those 100 samples.

0:08:46 - 0:08:49 Text: So that's really quick and a problem.

0:08:49 - 0:09:01 Text: And so the corrects of the issue is that as Professor Ruhu Benjamin puts it feeding AI systems on the world's beauty, ugliness and cruelty, but expecting it to reflect only the beauty is a fantasy.

0:09:01 - 0:09:17 Text: And what we need is formalisms to represent and detect this, you know, ugliness and cruelty or these social biases and algorithms to mitigate and avoid these social biases in this toxic city and this ugliness.

0:09:17 - 0:09:28 Text: And so that's what I'll be talking about today, and specifically, I'm going to talk about three projects today, the first two are going to be about detecting toxic city and social biases in language.

0:09:28 - 0:09:34 Text: And then I'll talk about rewriting and debiasing texts with a model called power transformer.

0:09:34 - 0:09:44 Text: And then I'll talk about some exciting future directions that we can go in towards human centric social bias detection and mitigation.

0:09:44 - 0:09:50 Text: So I'll start with this first work, which is called the risk of racial bias and hate speech detection.

0:09:50 - 0:09:56 Text: And this work is really tackling this phenomenon of hate speech online.

0:09:56 - 0:10:06 Text: And we know that this is a really rampant problem that causes people to quit social media because there's just too much hate and they can't take it.

0:10:06 - 0:10:11 Text: And then they can't take it into account to transcend or people are reporting being treated in humanely.

0:10:11 - 0:10:17 Text: People of color are saying that they can't subject themselves any longer to the hate, you know, people are quitting.

0:10:17 - 0:10:26 Text: And those in this increasing calls for these platforms to address this issue of hate speech online.

0:10:26 - 0:10:30 Text: I'm sorry, I'm just looking at the questions more.

0:10:30 - 0:10:40 Text: And one of the issues with this is so obviously you know that platforms are struggling to moderate this content.

0:10:40 - 0:10:46 Text: And one of the issues is that it's really challenging to rely on humans solely to moderate this content.

0:10:46 - 0:10:53 Text: You know, there's really way too many tweets or way too many posts that are posted on these platforms for humans to just be able to stick through.

0:10:53 - 0:11:01 Text: Apparently according to this source, there's 500 million tweets that are sent in one day. So there's no way that we could get humans to do this all alone.

0:11:01 - 0:11:11 Text: And you know, taking sort of a community centric perspective, like what Reddit has done where basically you delegate the moderation to the sub communities that are naturally created.

0:11:11 - 0:11:26 Text: Can lead to actually hate endorsing communities like if you remember the famous case where Reddit, CEO stepped in and basically deleted or quarantined several supermassage and stick subreddits.

0:11:26 - 0:11:30 Text: Okay, I'm looking at the questions.

0:11:30 - 0:11:41 Text: Are you more so someone someone asked, are you more of a point of for we should only feed the model good data to reduce toxicity or we should try to filter model output modify with the model has learned from the data.

0:11:41 - 0:11:43 Text: That is a good question.

0:11:43 - 0:11:52 Text: I think we should do both because I don't there's no way to say that something is inherently good and his free of biases.

0:11:52 - 0:12:06 Text: And it's easier to, I mean, we should be really mindful of the data still and you know, the decisions to shoot on which data we choose is something that I think most people tend to take just kind of lightly and just take whatever's available.

0:12:06 - 0:12:13 Text: And we should be thinking a little bit more about, you know, whose data it is. Are we, you know, ethically using this data was the purpose of it, all this stuff.

0:12:13 - 0:12:24 Text: But at the same time, there's this issue fundamentally with machine learning and AI and that is that we are trying to predict the future based on data from the past.

0:12:24 - 0:12:35 Text: So even if something might not, we might not consider something, you know, biased or problematic right now, maybe in a year or two, there'll be sort of a new evidence that some of the stuff that we trained stuff on was problematic.

0:12:35 - 0:12:44 Text: And I think having, you know, a stop or a way to mitigate these decoding time or a prediction time is also something that we should do.

0:12:44 - 0:12:54 Text: And I'm actually really excited about that direction too, especially as we're, you know, as we're in a new era of not being able to train our own models with these preaching language models being so big.

0:12:54 - 0:12:57 Text: So yeah, I'm excited about that.

0:12:57 - 0:13:09 Text: And I'm really excited about that response to that question. Okay, so I was talking about why, you know, platforms are struggling to moderate this, you know, harmful, hateful content online.

0:13:09 - 0:13:15 Text: And one of the other things that is often overlooked is that even if we ask humans to filter through this stuff.

0:13:15 - 0:13:27 Text: And also that are actually employed to do this and stare at like, you know, awful, awful things all day, suffer really in human working conditions, they're often outsourced to countries where the minimum, which is lower and the conditions are just worse.

0:13:27 - 0:13:33 Text: And, you know, the mental health of these people is just, you know, it's been very documented that it's really bad.

0:13:33 - 0:13:48 Text: So this is a place where AI could actually help, right? And so this is kind of led to this field of automatic hate speech detection, which is all about trying to make the internet less toxic. And there's a lot of work that has come out of NLP trying to tackle this task.

0:13:48 - 0:14:11 Text: And there's even a workshop on detecting online abuses and harbs. And people are developing APIs that they can use into that you can just kind of use easily off the shelf, for example, Google's sister company jigsaw develops the perspective API, which is a toxicity detection system that is being used currently to moderate the New York Times comment section, for example.

0:14:11 - 0:14:24 Text: The first thing that I'm going to talk about is that there's a big problem of racial bias and hate speech detection. And when I say racial bias, I'm not talking about bias against racial minorities or hate speech against racial minorities.

0:14:24 - 0:14:39 Text: I'm actually talking about the kind of bias in which a harmless greeting or harmless tweet can be flagged as toxic when written by certain racial minorities, but not flagged as toxic or flagged as harmless when written by a white majority person.

0:14:39 - 0:14:48 Text: And here in this case, this is happening because our toxicity models are trained on text only annotations. And so they have no idea who's speaking.

0:14:48 - 0:14:59 Text: And the issue specifically in this example is that, you know, the end words spoken by a white person is usually considered a lot more offensive than the end words spoken by a black person.

0:14:59 - 0:15:09 Text: And so this illustrates the fact that data sets are really underlying are ignoring the underlying social dynamics of speech, for example, the identity, the speaker or the dialect of English.

0:15:09 - 0:15:16 Text: And in this case, ignoring these nuances really risks harming minority populations by suppressing in offensive speech more.

0:15:16 - 0:15:23 Text: And so we wanted to call it characterizing quantify the racial bias in hate speech detection here.

0:15:23 - 0:15:31 Text: And so specifically what we weren't wanted to do is first investigate how machine learning models acquire this racial bias racial bias from data sets.

0:15:31 - 0:15:43 Text: And also look at sort of stepping back thinking about the annotation task for offensiveness or toxicity and asking what about the annotation task design actually affects these racial bias here.

0:15:43 - 0:15:53 Text: And you may be wondering why are we looking at racial bias specifically will the answer is that as we know minority populations are most often the target of hate speech compared to majority populations.

0:15:53 - 0:16:03 Text: And racial bias has actually been studied a lot less than other identity based biases and specifically gender has gotten a lot of attention in an LP.

0:16:03 - 0:16:16 Text: And specifically on Twitter, there's an actual big danger of silencing black folks disproportionately. There's been some studies that have shown the cultural importance of Twitter specifically this phenomenon called black Twitter.

0:16:16 - 0:16:21 Text: And it's also a really important space for activism, for example, in the black vice matter movements.

0:16:21 - 0:16:34 Text: However, one challenge with studying racial bias is that Twitter profiles don't actually have any race data associated with them. And so what we're going to do is we're actually going to use dialect as a proxy for racial identity.

0:16:34 - 0:16:42 Text: And we're operating under the premise here that there's a specific there are specific flexible indicators of minority identity in language.

0:16:42 - 0:16:55 Text: And specifically we're going to be looking for African American English, which is a dialect or a set of dialects of variety of English, so this common but not limited to black or African American folks in the US.

0:16:55 - 0:17:01 Text: And it's extensively been studied by linguists and shown to have like its proper its own grammar and things like that.

0:17:01 - 0:17:08 Text: And there's actually also been shown to be presence of a variance on Twitter specifically.

0:17:08 - 0:17:16 Text: And so specifically we're going to use a lexical detector that blodged it all created to infer the presence of the eight.

0:17:16 - 0:17:21 Text: But again, a caveat here is that you know dialect and race are much more complex than this.

0:17:21 - 0:17:25 Text: I'm seeing a question.

0:17:25 - 0:17:33 Text: Is using dialect as proxy for racial identity susceptible to propagating stereotypes, especially if these lexical indicators evolve over time. Yeah, so.

0:17:33 - 0:17:38 Text: And I think I'm going to click responding to it.

0:17:38 - 0:17:43 Text: Yeah, so that's definitely true for sure. And I'll talk about this later as well.

0:17:43 - 0:17:52 Text: But you know, focusing solely on dialects again, a static way ignores the evolution that this could have as well as sort of.

0:17:52 - 0:17:57 Text: We don't want to we don't want to categorize people's races based on their dialect alone. And that's not what we're doing in this work.

0:17:57 - 0:18:09 Text: And so we're looking at the dialect, which we know has a correlate has correlates with race, but it's also important to look at actually self reported race as well, which I'll talk about in a second.

0:18:09 - 0:18:18 Text: Okay, so given this intro, you know, let's look at some results and look at how racially vice our hate speech data sets actually.

0:18:18 - 0:18:31 Text: We focused on two widely used data sets that will call Twitter hate base and Twitter bootstrap, which are references to how they were collected. And what we find is that there's really big racial bias in both these data sets.

0:18:31 - 0:18:46 Text: And so, for example, in the first data set, about half of the tweets in white aligned English, which is a label that the dialect model gives us our only half are flag as offensive by the annotators, whereas 92% of them in African American English flag as offensive.

0:18:46 - 0:19:01 Text: So there's a huge skew and we see a similar skew where 18% of white aligned English in the Twitter bootstrap data set is labeled as abusive versus much higher rates of abusive tweets that are in African American English.

0:19:01 - 0:19:03 Text: I'm seeing another question.

0:19:03 - 0:19:17 Text: Something I noticed in a lot more online spaces that people regrettably will use a to spite not being black usually for like humorous effects, so I know this system couldn't choose get away with like saying the end word by trying to put an a spin on it. Yeah, that's the thing that people talk about a lot actually.

0:19:17 - 0:19:44 Text: The sort of appropriation of African American English or just, you know, those kinds of things and I think SNL called it like Gen Z slang, but it really was like markers of A that they meant that's a broader question of how language sort of gets adopted by different groups and how things evolved in that way that I think social linguists are probably more equipped to talk about.

0:19:44 - 0:19:56 Text: But yeah, it's definitely sure that like if someone is sort of adopting an A identity online, then they would also maybe fall prey to this over censorship by toxicity detection systems for sure.

0:19:56 - 0:20:05 Text: Oops, another question reflecting on a precision recall trade off is classifying classified such problems, which would you do recommend a person model building.

0:20:05 - 0:20:30 Text: But let me talk about the results real quick first before we talk about that so given that there's evidence in the data that there's a lot of racial bias like we wanted to see how you know do models actually acquire these racial biases from data sets and maybe you're thinking oh god, maybe they're actually averaging those out because they learn to pick out the right patterns in the data right because models can do that and we're really optimistic about it.

0:20:30 - 0:20:56 Text: Unfortunately, models actually not only acquire these racial biases, but they exacerbate them so that's a problem and in order to show this, we basically trained classifiers on to on these two data sets that we were looking at hate based bootstrap and we're going to look at rates of false flagging of toxicity and specifically we're going to break those false flagging of toxicity rates down by a dilate group on our development set.

0:20:56 - 0:21:05 Text: And here this is we're going to look for bias under the definition of the equality of odds opportunity criteria from more it's hard.

0:21:05 - 0:21:21 Text: And looking at the results here, what we find is that both data sets are really biased against a and specifically they make mistakes towards mistaken a as offensive much more often than mistaken white aligned English.

0:21:21 - 0:21:48 Text: So the first one 46% of non offensive a tweets are mistaken for offensive versus only 9% of white non offensive tweets and in the second class fire data set 26% of non abusive a tweets are mistaken for abusive versus only 5% white and the opposite is true for the second data set where white tweets will that are actually abusive or flag is non abusive by a higher rate.

0:21:48 - 0:21:56 Text: Does does that answer your question. I don't know if that's what you were specifically asking about, but.

0:21:56 - 0:22:00 Text: Should recommend approach to model building.

0:22:00 - 0:22:13 Text: Yeah, so I think in this case maybe the question will be answered later about, you know, actually instead of thinking about model building we should be thinking about different ways to conceptualize this task.

0:22:13 - 0:22:18 Text: Okay, how do we distinguish between disproportionate hate speech detection due to true positive false positive.

0:22:18 - 0:22:23 Text: Hopefully that those results just answered your question here.

0:22:23 - 0:22:30 Text: Do you think asking users for more personally just as race is too much of a risk for users privacy.

0:22:30 - 0:22:41 Text: That's an interesting ethical question. I think unfortunately if we want to do research that involves people like we need volunteers and we need people to give us information because.

0:22:41 - 0:22:52 Text: I think we've had enough evidence that without actually, you know, studying these problems and tackling and like really digging into the issues related to identity.

0:22:52 - 0:22:58 Text: We're not really going to be able to make something less biased unfortunately.

0:22:58 - 0:23:10 Text: Okay, let me move on for a second before I answer some more questions, but so you know what we've seen here in the sort of in domain in distribution setting is that there's really strong bias against a tweets for both these cost fires and both these data sets.

0:23:10 - 0:23:19 Text: But you may be wondering, okay, maybe this is a data set problem and the racial bias doesn't really generalize beyond these data sets, right.

0:23:19 - 0:23:27 Text: Unfortunately, it really does. And so this is kind of addressing this question that I just answered about race and asking people their race.

0:23:27 - 0:23:34 Text: So in order to simulate, we wanted to simulate a situation where these class fires would be like released in the wild.

0:23:34 - 0:23:43 Text: And we looked basically a prediction race of offensiveness when you actually have like more gold standard dialect or race information available.

0:23:43 - 0:23:55 Text: And specifically we looked at one corpus that had dialects inferred based on geolocation of tweets and US sense of census demographic data. So it's not gold. It's maybe silver silver standard labels.

0:23:55 - 0:24:13 Text: But sort of more informed about like where the tweet came from so the dialect might be the dialect level might be a little bit more valid. And what we find is that the art one of our classifiers basically as twice is twice as likely to predict that an a tweet is offensive compared to white aligns tweets.

0:24:13 - 0:24:25 Text: And then we also looked at another data set where people actually participated in a survey online gave researchers their race as well as a Twitter handle and they give a bunch of other demographics to.

0:24:25 - 0:24:31 Text: And those researchers then sort of scraped all their tweets and to study.

0:24:31 - 0:24:34 Text: I don't remember what the purpose of this study was specifically online.

0:24:34 - 0:24:43 Text: But we can use this data set to actually see you know, regardless of dialect like what is actually happening when when we look at self reported race.

0:24:43 - 0:24:58 Text: And what we find unfortunately is the same kind of biases that are there in dialect land and you know the classifier here is 1.5 times as likely to flag a tweet by African American person as offensive compared to a tweet by a white person.

0:24:58 - 0:25:15 Text: So basically this is basically the exact same pattern in the second classifier that we studied where you know basically this is showing that not just a tweet but also tweets by black folks are more often flagged as toxic compared to tweets by white people or in white align English.

0:25:15 - 0:25:22 Text: And this is really strong evidence that this racial bias is really generalizing to other corporate here.

0:25:22 - 0:25:28 Text: Let me take a minute here to answer some questions. OK.

0:25:28 - 0:25:34 Text: In the past there are cases where companies saying that their model is biased because the data is biased but it's easy impossible to have perfect data.

0:25:34 - 0:25:40 Text: It just seems that right now we have no one talking taking direct responsibility on the bias toxic problematic models.

0:25:40 - 0:25:46 Text: Do you think there's a practical way to address this and in your opinion should the burdens fall and researchers working in the area of bias detection.

0:25:46 - 0:25:54 Text: All NLP researchers or companies that use apply the models for the society where the data comes from.

0:25:54 - 0:26:02 Text: This is a very astute question. Let me think about this for a second.

0:26:02 - 0:26:17 Text: I think one thing that we forget with these systems is that AI is it just operating in a vacuum it's operating in a full societal pipeline between users and government and like laws and companies and stuff.

0:26:17 - 0:26:32 Text: So a lot of times you know researchers are like oh well I don't really know you know I don't I'm sort of turning a blind eye to how my systems are being used.

0:26:32 - 0:26:37 Text: But like in order to really address like fair and equitable.

0:26:37 - 0:26:56 Text: So it's not necessarily like yeah maybe the people the companies that are applying these models are the ones that should be held responsible.

0:26:56 - 0:27:11 Text: Whether responsibility should lie but I think that we should have you know operating under the premise of a democratic government we should have you know legislation that actually dictates like what can it can't be suppressed or removed through.

0:27:11 - 0:27:17 Text: Through these algorithms and I think currently the situation is that the companies have all the power to do whatever they want.

0:27:17 - 0:27:29 Text: And I think that's led to a lot of frustration from a lot of people. And so the answer might be that you know other companies that are you know more inclusivity designed with inclusivity from the get go could be a solution there.

0:27:29 - 0:27:35 Text: But yeah this is a probably a longer question to answer directly.

0:27:35 - 0:27:42 Text: There's a live question so let's have a second go and see if it's going to ask one live.

0:27:42 - 0:28:11 Text: Yeah. Okay wonderful. Yeah I'm wondering of course I don't know if these models have are open about how they operationalize the term toxic or hate speech but I'm curious if you have any insight into that because that seems like not only do you have the issue of you know making the model accurate based on your rules that but also defining a rule set because.

0:28:11 - 0:28:27 Text: Different people like the end up a lot of people would say that the end word is never acceptable other people other black people may feel differently so it seems that there's also a even within that model.

0:28:27 - 0:28:31 Text: I'm a live in perpetration.

0:28:31 - 0:28:52 Text: I'm sure I totally agree and I think that again like kind of pointing forward to the rest of this talk like I always advocate for like just not having a systems determined offense of this period and just moving towards like other approaches to doing this kind of problems.

0:28:52 - 0:29:00 Text: So yeah let me keep going a little bit and maybe answer some questions after that if that's okay.

0:29:00 - 0:29:13 Text: So we know that there's a lot of racial bias in these classifiers and it just seems really bleak so you know you probably are all wondering like okay what can we actually do to reduce these biases.

0:29:13 - 0:29:18 Text: One answer is that actually changing the way that we do data collection helps.

0:29:18 - 0:29:40 Text: So we did a small sort of M Turk pilot study where we took 358 tweets from our two data sets stratified by toxicity label and we asked three annotators to look at each tweet and specifically we wanted them to answer a question about how or whether it could this tweet could be offensive to anyone.

0:29:40 - 0:29:55 Text: So we did sort of ABC testing basically where we had three conditions in which people were labeling these the first one the control condition was them so annotating just the text of the tweet no context and nothing.

0:29:55 - 0:30:19 Text: So the second condition where we basically provided them with information about the dialect of the tweets so we're like oh R.I. thinks that this tweet isn't African American English basically highlighting the fact that this may come from a user who was speaking African American English and we see here that there's actually a significant decrease in the likelihood of labeling these tweets as offensive to anyone which is really interesting.

0:30:19 - 0:30:40 Text: And a second or the third condition that we had was basically instead of thinking about the dialect we made people think about the race that is associated with the dialect and so our priming text is basically a Twitter user that is likely black or African American tweeted this thing and here again we see a significant difference based on compared to the controlled condition.

0:30:40 - 0:31:00 Text: We also asked a second question of offensiveness which is is this sweet offensive to you which is a different sort of labeling and that one of the interesting things here is that the labeling the propensity for people to label a tweet as offensive to anyone is much higher than labeled offensive to them or to themselves.

0:31:00 - 0:31:13 Text: And also here we found that the only difference in decreasing sort of offensiveness to themselves is if we highlighted the race associated with the dialect and not just the dialect itself compared to the control condition.

0:31:13 - 0:31:21 Text: So this shows that you know priming annotators to think about dialect and race can actually influence the labels and maybe mitigate some of this bias.

0:31:21 - 0:31:31 Text: So just even using two questions we can show that the annotations of offensiveness are highly subjective like the mental processes that the labelers are going through are very different.

0:31:31 - 0:31:40 Text: So to quickly give some takeaways of this work over to the city really backfires against racial minorities if we try to automate it.

0:31:40 - 0:31:54 Text: And we showed specifically that there's really strong dialect based racial bias likely do we hypothesize that this is probably due to like the negative perception of race and a he has just thought sometimes thought of as like less good English or things like that.

0:31:54 - 0:32:07 Text: And you know, and LP models will that are trained on this bias data will just exacerbate those biases and in our pilot study we showed that highlighting the dialect can actually influence the labels of offensiveness.

0:32:07 - 0:32:18 Text: But kind of tackling some of the themes that have been in the question so far like maybe you know given this situation we should rethink how we tackle hate speech detection as a whole.

0:32:18 - 0:32:45 Text: To hammer this point even to hammer on this point even more racial bias isn't the only bias or issue that is going on in toxicity classification systems there's also this thing that we like to call lexical biases, which is basically that if you have a minority identity you will your system is likely to flag it as toxic more likely to flag it as toxic compared to if you have a majority identity in your text.

0:32:45 - 0:32:59 Text: There's also bias against swear words so like if you say something positive like a fucking love this your model is going to flag it as super toxic but if you say something that's really awful but doesn't have any swear words your model might just not even realize that it's a problem.

0:32:59 - 0:33:26 Text: And just to highlight this a little bit so in some recent work that we presented at a seal we actually tried to see if we could automatically de bias the racial biases and lexical biases in toxic city language detection models and specifically we asked can automatic de biasing methods from NLI you know natural language inference tasks mitigate these biases because there's been a lot of work in a de biasing and a lot of systems.

0:33:26 - 0:33:53 Text: For example there's been things looking into ensemble model learning as well as data filtering and things like that and I encourage you to read the paper but the answer the short answer is that it's not that easy to de bias these models actually and it's easier to de bias if your bias are biases are lexical so like related to keywords but it's actually a lot harder for die like based biases to be removed or mitigated in these systems.

0:33:53 - 0:33:58 Text: One point Martin there is the question.

0:33:58 - 0:33:59 Text: Awesome.

0:33:59 - 0:34:01 Text: Hi Martin.

0:34:01 - 0:34:20 Text: Hello, I'm wondering so for the different types of priming for dialect and race priming wondering because they seem quite similar to me in terms of like there in terms of conceptually as like part of the experiment so.

0:34:20 - 0:34:32 Text: Also like the results seem to be pretty similar between them so what is the significance of having both the dialect and the race to shouldn't one kind of until the other.

0:34:32 - 0:34:45 Text: Yeah so I mean it's kind of going back to like the question of like not everyone who is black speaks a e and not everyone who's a is black who speaks a is black and also just in terms of lay people's knowledge of a e.

0:34:45 - 0:34:54 Text: Like I think most people in America know what a black person is that I don't think that you know maybe maybe 60 70% I don't know how that's maybe too high.

0:34:54 - 0:35:13 Text: You know I don't know how many people actually know what African American English is and or consider it like you know a valid form of language and so when we're thinking about these annotation tasks like we should think about also like who what is a knowledge that these annotators have and so when we're doing the dialect priming condition we gave them information about the dialects.

0:35:13 - 0:35:22 Text: But in the race in the race condition the race priming condition we didn't have to give them that much information because people know what you know race is broadly.

0:35:22 - 0:35:24 Text: So that's kind of the difference.

0:35:24 - 0:35:30 Text: Okay yeah thank you that makes one more sense thank you sure.

0:35:30 - 0:35:37 Text: I wanted to should I answer some more questions or what do we think.

0:35:37 - 0:35:43 Text: How many years are we away from a time where we may not need an army of human honorators for explicit content.

0:35:43 - 0:36:01 Text: I would advocate that we should never be free of human moderators I think that do letting the systems you know remove or moderate things automatically is not the greatest option because that gives them power to silence us in a way that we don't want that.

0:36:01 - 0:36:08 Text: So as the race ethnicity of the annotated matter in these datasets since we are using these graduate labels how do we show the correctness of label quality with perspective of different races.

0:36:08 - 0:36:16 Text: Thank you so much for a great segue into my next slide which is talking about who decides what is offensive or not.

0:36:16 - 0:36:26 Text: And so I want to quick plug for our recent pre print that came out in the member called annotators with attitudes how annotator beliefs and identities bias toxic language detection.

0:36:26 - 0:36:46 Text: And basically what we studied in this work is literally what you just asked about which is the who why and what behind toxicity adaptation so basically looking at what is the effect of annotator identities and beliefs on their toxicity labeling behavior as well as looking at what types of texts they're likely to label as toxic more or not.

0:36:46 - 0:37:00 Text: And specifically we did two controlled annotation studies where we collected attitudes about different concepts like offensiveness or we asked them about like sort of racism we asked them about free speech we asked them about empathy and things like that.

0:37:00 - 0:37:15 Text: And we also collected the demographic information and some things that are showing up is that racist text so things that are really racist in meaning are less likely to be labeled as offensive by people who are.

0:37:15 - 0:37:35 Text: And also less likely to be labeled as offensive by people who don't think that censorship should exist, for example, I don't fully remember what the demographic correlates are but I'm going to assume maybe for racist speech there wasn't that much of a difference.

0:37:35 - 0:37:45 Text: But we also found that African American English tweets seem more racist to people who hold racist beliefs, which is also really interesting finding there.

0:37:45 - 0:37:54 Text: And then finally we looked at sort of the swear word piece of it as well and what we found was that swear words can seem much more offensive to people who are more conservative.

0:37:54 - 0:38:08 Text: And that's a politically conservative as well as more traditional rated as the as like as using a psychological traditional is a scale.

0:38:08 - 0:38:15 Text: Should I be following up on the discussion and the Q and a or.

0:38:15 - 0:38:24 Text: I mean, I guess you have to judge how many questions you can answer and how much you want to make progress. Welcome to answer more.

0:38:24 - 0:38:29 Text: Yeah, I mean, yeah, I think we're I could answer a couple questions.

0:38:29 - 0:38:41 Text: Are these models learning toxic of tweet entries of a corpus that are labeled a query or of any supervised learning any corporate.

0:38:41 - 0:38:47 Text: Sure, I fully understand this question.

0:38:47 - 0:38:56 Text: Does identity of researchers influence how they sets are created and what's considered toxic since the mainstream view by the cause structure reflects the majority demographic.

0:38:56 - 0:39:04 Text: That's an interesting question of like what's what about the positionality of the researchers themselves.

0:39:04 - 0:39:12 Text: I think.

0:39:12 - 0:39:18 Text: Typically, probably that influences things, but I think to sort of keep going with my talk here, like I think that really we should be rethinking automatic defensiveness detection totally.

0:39:18 - 0:39:20 Text: And.

0:39:20 - 0:39:33 Text: You know, just to really recap everything like we know that there's a lot of labeling variation on in these things like what what is defensiveness to whom like it's really different depending on your background your attitudes and everything.

0:39:33 - 0:39:36 Text: And you know, even just the task crazy and can make a big difference.

0:39:36 - 0:39:41 Text: But also when we think about hate speech, like there's some countries that actually have like legal definitions of hate speech, right.

0:39:41 - 0:39:48 Text: And so we can't just go through and around that term being like this is hate speech when there's like legal definitions of this and legal implications and label like something is hate speech.

0:39:48 - 0:39:55 Text: And then, you know, like the question asked like, you know, looking at the fact that annotators might operationalize a definition differently.

0:39:55 - 0:40:03 Text: And so, you know, if you researchers might have different views of what you know should go on in there.

0:40:03 - 0:40:16 Text: And I think that one big component is that, you know, no one's asking the real question of like, why is something hateful or offensive like everyone's just like concerned with labeling and noise and annotations, but like why aren't we focusing on like what about the text is like making it offensive or hateful.

0:40:16 - 0:40:31 Text: And also just to hammer this point home that I've already mentioned like I don't think that, you know, personally, my position is that we don't we should we really have AI systems making these decisions alone like risking removing entire swath of dialects of, you know, of content.

0:40:31 - 0:40:36 Text: And you know, moderating that like I'm not sure that that's really what should be going on.

0:40:36 - 0:40:50 Text: If AI systems were designed to help humans determine toxicity by explaining why something might be toxic or biased instead. And this is going to be the next part of my talk, but let me look at some questions real quick.

0:40:50 - 0:40:57 Text: Okay, and for the bias evaluations of this is Mr. re annotated to not classify a tweets using slang such as much as the N word is not offensive.

0:40:57 - 0:41:05 Text: If models were trained on this three annotated data set does that mitigate the issue in other words, wouldn't the NLP models learn that the N word is only offensive and not a dialect.

0:41:05 - 0:41:18 Text: So we actually did in the challenges and automatic devising paper we did do a translation experiment where we basically try to translate the A tweets into like non a English.

0:41:18 - 0:41:34 Text: And that was like the most promising direction basically when you sort of like remove the markers of a but try to preserve as much of the content and sort of use those labels there, but then the issues that you have to relabel those to.

0:41:34 - 0:41:47 Text: You have to relabel those in sort of a context of their non a versions so that sort of adds data to the or data annotation to the challenge.

0:41:47 - 0:41:53 Text: It seems like a lot of the class fires right now are very lexically focused is a way to make the model work textually aware.

0:41:53 - 0:42:07 Text: Yes, and I think that's something that we should be focusing on more and more and I'll talk about that in future work.

0:42:07 - 0:42:32 Text: I think I'm trying to like take care of these questions, let me rephrase my question, how are toxic, how are these toxicity plus fighting algorithm learning what isn't as a toxic based on labels right and that's kind of the issues like if the humans are labeling things improperly because they're flagging just words, then the models are just only going to be able to recreate that behavior.

0:42:32 - 0:42:49 Text: I do see I hope that there is going to be potential regulations coming up towards this kind of you know regulating what AI systems can, I can't do, but that also requires like tackling the fact that tech companies have a lot more freedom in the US right now.

0:42:49 - 0:42:50 Text: Okay.

0:42:50 - 0:42:57 Text: Is it possible to develop a model that detects a person's identity through this tweets has some potential privacy issue how to mitigate the potential for use.

0:42:57 - 0:43:05 Text: Yeah, so I think that language unfortunately kind of is a way to communicate your identity to people.

0:43:05 - 0:43:13 Text: And so like that's what social linguists talk about when you sort of do code switching between your family and your work friends.

0:43:13 - 0:43:29 Text: Like you are sort of asserting a common shared identity or creating a common shared identity between your family and you in that one sort of dialect or way of speaking.

0:43:29 - 0:43:34 Text: And if you're speaking to your co-workers, maybe you're going to use a different type of way of speaking.

0:43:34 - 0:43:43 Text: Not that if people are tweeting in these settings, then they're giving away a little bit of like who they're tweeting to and things like that.

0:43:43 - 0:43:49 Text: Yeah, when it comes to like studying these kinds of things, it's hard to disentangle.

0:43:49 - 0:43:55 Text: Or it's hard to do without knowing actually like who the identities of the people are in the network, but.

0:43:55 - 0:44:04 Text: It's just not as simple as as sort of like your behavior patterns might not give away as much of your identity markers as language can.

0:44:04 - 0:44:10 Text: Alright, I think I'm going to walk through some more slides a little bit first and then we can look at the questions some more.

0:44:10 - 0:44:13 Text: But if there's anything really important, please let me know.

0:44:13 - 0:44:22 Text: Alright, so we talked about the problems with, you know, classifying toxic city or toxic language.

0:44:22 - 0:44:33 Text: So I want to talk about social bias frames, which is actually a here like what I would like to call like a new alternative way of looking at this problem or you know, the first iteration of that.

0:44:33 - 0:44:39 Text: So social bias frames is a new formalism to reason about this social and power implications of language.

0:44:39 - 0:44:48 Text: I know that we've been talking about hate speech and stuff like that already, but I want to warn you that the content in this the rest of this talk, maybe upsetting or offensive because I'm going to show some examples unfortunately.

0:44:48 - 0:45:02 Text: And again, just contextualize it's a little bit more we're approaching these problems of social biases from a US social cultural perspective in this in this project, and this is actually work with another Stanford professor to enter.

0:45:02 - 0:45:10 Text: Okay, so when we think about social biases, there's a two ways that these social biases can be expressed in language.

0:45:10 - 0:45:19 Text: The first one is we should kill all XYZ demographic group and this is super easily flagged as toxic by your off the shelf toxic to detection systems.

0:45:19 - 0:45:27 Text: But there's a much more subtle way of expressing social biases, for example, in the statement, we should lower our standards just to hire more women.

0:45:27 - 0:45:39 Text: And so we can all hear understand that this type of unconscious bias here, this because of how language implicates your works like this is implying that women are less qualified than men, especially because this is just word here is really.

0:45:39 - 0:45:49 Text: And so we can understand that this is an unconscious bias, but it's not flagged as toxic at all by these models, unfortunately.

0:45:49 - 0:45:58 Text: And so this motivated our creation of social bias frames as a new structured formalism that distills knowledge about the harmful bias implications of language.

0:45:58 - 0:46:03 Text: And so specifically, this is kind of a structured formalism, so let me walk you through the structure.

0:46:03 - 0:46:12 Text: All the bone issues that he told us this is clear, if we talked about example, what has a strong landlords agenda effect would do.

0:46:12 - 0:46:15 Text: Be chosen.

0:46:15 - 0:46:24 Text: The story is that when we reported to possibly a non-stop appearance of the civil War.

0:46:24 - 0:46:25 Text: That's sworn right.

0:46:25 - 0:46:31 Text: at is this targeting a group of people or referencing a group of people or is this really just an individual

0:46:31 - 0:46:37 Text: insult? If it is a group of people we're going to ask for a free text explanation of who that group

0:46:37 - 0:46:41 Text: of people was in this case it's women and then we're going to ask for a free text explanation of

0:46:42 - 0:46:46 Text: what is the implied stereotype here and this is that the stereotype that women are less qualified

0:46:46 - 0:46:52 Text: than men. And then finally the last variable in our frame is related to in group language which

0:46:52 - 0:46:56 Text: is about sort of capturing whether the statement is made by members of the same group as the group

0:46:56 - 0:47:02 Text: that's targeted kind of trying to address this sort of you know speaker and listener identities

0:47:03 - 0:47:09 Text: but here in this case that doesn't really seem to be the case. So again to remind you the motivation

0:47:09 - 0:47:15 Text: for social bias frames is really that if we want to be able to avoid like problematic or really

0:47:15 - 0:47:21 Text: companies want to avoid PR problems of their chat box turning racist they need an understanding of

0:47:21 - 0:47:27 Text: what they actually want to avoid and social bias frames is in review on this what to avoid or

0:47:27 - 0:47:32 Text: these social biases and it's more explainable and trustworthy because it comes baked in with

0:47:32 - 0:47:37 Text: explanations of like why something could be biased and it's more holistic than binary hate speech

0:47:37 - 0:47:43 Text: detection because it gets around like are you you know offended by this statement or not and

0:47:43 - 0:47:48 Text: really it's trying to distill like what is the what is the offensive meaning behind this which is

0:47:48 - 0:47:55 Text: different. I also want to highlight that you know in order to study these social biases in the wild

0:47:55 - 0:48:02 Text: we created the social bias inference corpus which is 150,000 annotated two pulls that were labeled

0:48:02 - 0:48:10 Text: from 44,000 posts from social media including from Twitter Reddit the neo-Nazi communities of

0:48:10 - 0:48:17 Text: like gab and stormfront as well as embed really misogynistic subreddits and and our corpus contains

0:48:17 - 0:48:23 Text: like 34,000 implications about about three three thousand different demographic groups and I don't

0:48:23 - 0:48:29 Text: have time to go into the details of how this was created but because of how we trained our

0:48:29 - 0:48:34 Text: m-torkers and our annotators and how we selected them we actually got pretty high pairwise agreement

0:48:34 - 0:48:41 Text: on this on you know these annotations and also we really wanted to be able to capture the types

0:48:41 - 0:48:46 Text: of discrimination that people are actually reporting experiencing offline or you know or online but

0:48:46 - 0:48:53 Text: in the real world and so we're not just capturing like sort of fandom wars or as pops our fandom

0:48:53 - 0:48:57 Text: wars but we're actually capturing like hatred towards demographic groups that are reflective of

0:48:57 - 0:49:04 Text: real world discrimination. Also I wanted to highlight the way that we designed this frame inherently

0:49:04 - 0:49:09 Text: had entered this disciplinary sort of knowledge in mind and so we really tried to ground this in

0:49:09 - 0:49:14 Text: social science literature specifically looking at literature to root-ness, pragmatic,

0:49:14 - 0:49:19 Text: offensiveness and how people sort of perceive offensiveness and things like that and this actually

0:49:19 - 0:49:26 Text: led to the inclusion of this intent variable which is not only there because if you sort of

0:49:26 - 0:49:30 Text: perceive someone as being well-intentioned you might be more forgiving towards what they're saying

0:49:31 - 0:49:36 Text: even though you know they're biases still there but also if we think about implementing these

0:49:36 - 0:49:40 Text: tools to give feedback to people who are writing texts and if you if your AI system tells you this

0:49:40 - 0:49:48 Text: is 80% toxic versus hey you might have not intended to be offensive but here's like what your

0:49:48 - 0:49:54 Text: thing implies about this group of people that could actually be a much softer feedback to an author.

0:49:55 - 0:50:01 Text: Also you know as we've discussed with the AE racial bias case like we wanted to include situations

0:50:01 - 0:50:07 Text: where things were cases of language that's more in group so that's things like self-deprecation

0:50:07 - 0:50:11 Text: or reclaimed slurs that can appear offensive if you're not member of a group but if you are a member

0:50:11 - 0:50:17 Text: it could be less offensive and then we also wanted to be a little more intersectional so we wanted

0:50:17 - 0:50:22 Text: to have the ability for post-a target multiple groups at the same time or multiple identities and so

0:50:22 - 0:50:28 Text: we collected that and we also collected multiple implications to get more data for the kinds of

0:50:28 - 0:50:32 Text: stereotypes that are there and then finally we you know the way that we designed this frame also had

0:50:32 - 0:50:37 Text: in mind the sort of annotators to see like what can and can be done at scale.

0:50:40 - 0:50:48 Text: I see a question how are accurately can AI detect intense attacks at all? I will answer that in

0:50:48 - 0:50:53 Text: a second because we're going to look at some results. What about AE or metals finding offensive

0:50:53 - 0:50:57 Text: is it mainly used to the end word actually if you go sorry this was a question for before but if

0:50:57 - 0:51:02 Text: you go to the appendix of the racial bias paper we have like the most common features that are used

0:51:02 - 0:51:08 Text: by the classifier to determine things and plotted by likelihood of AE you should take a look at that

0:51:08 - 0:51:12 Text: and it will answer that so it's not just the end word but it also is like suffixes that appear in

0:51:12 - 0:51:18 Text: words and things like that. Okay I think the other two are like maybe a little bit longer so I'm

0:51:18 - 0:51:23 Text: going to answer them later. Okay so given that we have a corpus as annotated with social bias

0:51:23 - 0:51:28 Text: frames we wanted to know like how good are NLP models actually making inferences using social bias

0:51:28 - 0:51:34 Text: frames and so we set up a case study where the goal is to predict an entire social bias frame from

0:51:34 - 0:51:41 Text: a previously unseen post and in order to do this our model requires classifying these categorical

0:51:41 - 0:51:45 Text: variables of like intent and defensiveness and things like that but also it needs to be able to

0:51:45 - 0:51:50 Text: generate the groups and implied statements. So not all models are able to do this but

0:51:50 - 0:51:55 Text: DPPT style models actually can do this if we set them up correctly and so the way that we did this

0:51:55 - 0:52:03 Text: is we took our social bias frame and this is cool animation so watch out and we laterized it

0:52:03 - 0:52:08 Text: adding special tokens for each classification variable and then we passed them through a

0:52:08 - 0:52:12 Text: transformer based conditional language model that we had initialized with GPT2 in this case

0:52:13 - 0:52:19 Text: and then we optimized the negative log likelihood of all tokens for training and then for predicting

0:52:19 - 0:52:24 Text: social bias frames we are going to frame this as like a conditional generation setting of the linear

0:52:24 - 0:52:32 Text: ice frame token by token sort of sampling it but can anyone tell me if there's anything wrong with

0:52:32 - 0:52:40 Text: the generated frame right here? So actually you know there's a big problem here that the post was

0:52:40 - 0:52:45 Text: predicted not to be offensive but still having implications about black folks and this is a problem

0:52:45 - 0:52:51 Text: here where generated frames can not can sometimes not be consistent with the frame structure

0:52:51 - 0:52:55 Text: and this is because when you linearize your model can sometimes just not learn the full structure

0:52:55 - 0:53:02 Text: properly and so what we need to do is enforce the structure post hoc and so I'm going to gloss

0:53:02 - 0:53:06 Text: over the details here but basically you can either just kind of like top down enforce it or we can

0:53:06 - 0:53:12 Text: do something a little bit more global and let your future decisions correct your past mistakes

0:53:12 - 0:53:17 Text: a little bit and we see a little bit of difference that this constrained sort of more holistic

0:53:17 - 0:53:22 Text: inference helps but sort of answering the question of like how well can models you know how well

0:53:22 - 0:53:29 Text: do models do at these different classification variables? Well it's maybe okay to predict whether

0:53:29 - 0:53:33 Text: something was like intended to be offensive or not but what is really much more challenging is

0:53:33 - 0:53:39 Text: predicting whether something is targeting a group or an individual as well as predicting whether

0:53:39 - 0:53:43 Text: something is in group language which is just really hard to do especially given the data that we have

0:53:45 - 0:53:50 Text: but more interestingly like the performance of how models are able to generate the implications

0:53:50 - 0:53:56 Text: is actually interesting to look at and I'm going to gloss over the automatic sort of metrics of

0:53:56 - 0:54:02 Text: this here but the model is able to identify the right targeted group like actually pretty well

0:54:03 - 0:54:09 Text: but the bias implications are a lot more challenging to generate which I can illustrate with

0:54:09 - 0:54:13 Text: an example after answering some more questions because I'm seeing things pop up

0:54:15 - 0:54:19 Text: posts from fandom slash sentwitter might be overly offensive for committee effect I think they

0:54:19 - 0:54:23 Text: should be concerned about these types of posts yeah so that's one of the reasons why I think

0:54:23 - 0:54:27 Text: it's important to think about like who's being offended or like what is the implication here

0:54:28 - 0:54:35 Text: and like I'm a big believer that we should tackle like important problems sorry sentwitter but like

0:54:35 - 0:54:43 Text: um people aren't dying or being like killed because they're a Selena Gomez fan however people

0:54:43 - 0:54:47 Text: are dying because they're you know a trans woman like that is you know one of the highest murder rates

0:54:47 - 0:54:53 Text: in the country so a trans woman of color specifically so like that's the kind of biases that I

0:54:53 - 0:54:57 Text: think are yeah I system should tackle and like capturing arguments between political parties online

0:54:57 - 0:55:04 Text: like isn't that important in my view but that's why and you know but this type of these type of

0:55:04 - 0:55:09 Text: information of like who's actually being targeted like who's being harmed that wasn't there until

0:55:09 - 0:55:13 Text: socials bias frames appeared like no one thought about this kind of stuff before they're just like

0:55:13 - 0:55:18 Text: oh this is offensive or not and so you end up just putting the same you know you know anti-semitism

0:55:18 - 0:55:24 Text: in the same bucket as like anti some pop stars fandom sort of arguments so I think that's a really

0:55:24 - 0:55:32 Text: good point how are the free text responses processed in the model so it's literally just token by

0:55:32 - 0:55:37 Text: tokens it's kind of like language modeling conditional language modeling um I maybe would refer

0:55:37 - 0:55:40 Text: you to the paper for a little bit more detail but it's like really just word by word

0:55:41 - 0:55:44 Text: why is a generative method used to get to the frames rather than a classification model so we

0:55:44 - 0:55:49 Text: have to generate the implications like those are literally just like open-ended strings um so we

0:55:49 - 0:55:53 Text: can't just use classification and we could just do separate like you know classification first

0:55:53 - 0:55:58 Text: and then at a different model without which um I mean I encourage you to try if you want to

0:55:58 - 0:56:06 Text: um but that may work about the same potentially okay so let's look at some examples um

0:56:07 - 0:56:12 Text: again so warning there's some offensive content here right so here's an example from the

0:56:12 - 0:56:16 Text: death set of social bias inference corpus that says I love gay guys there's so much fun

0:56:16 - 0:56:19 Text: I would love to have a gay guy best friend but lesbians are just annoying

0:56:21 - 0:56:26 Text: so the model here predicts that this post is offensive because um it implies that lesbians are

0:56:26 - 0:56:32 Text: annoying which is basically just written there um and this is in line with what the annotators

0:56:32 - 0:56:37 Text: wrote which is that um this post implies that lesbians are annoying but the annotators also wrote

0:56:37 - 0:56:44 Text: that this post implies that all gay guys are fun to be around and so um this illustrates the kind

0:56:44 - 0:56:49 Text: of mistakes that this model tends to make which is that it can typically be successful when there's

0:56:49 - 0:56:56 Text: reliever baton cues but they struggle with more subtle biases for example the positive stereotype

0:56:56 - 0:57:01 Text: that all gay guys are fun to be around um and social psychologists tell us that these positive

0:57:01 - 0:57:07 Text: positive stereotypes can have also an affairs effect on people um even though they're not negative

0:57:07 - 0:57:14 Text: its sentiment another example here um is uh about a black guy who's in class and throws a

0:57:14 - 0:57:20 Text: paper ball into the trash and his teacher says that uh he's a disgrace to his race the model here

0:57:20 - 0:57:27 Text: predicts that this post implies that uh black people are trash which is not what the annotators wrote

0:57:27 - 0:57:32 Text: and in fact the annotators here correctly flag that um this post implies that black men are

0:57:32 - 0:57:37 Text: defined by their athletic skill or that all black men should be good at basketball um and again

0:57:37 - 0:57:41 Text: this is an example of where a model might be relying on negative keywords or and finding the

0:57:41 - 0:57:45 Text: identity-based words as well as the negative keywords and just combining them and assuming that

0:57:45 - 0:57:51 Text: that's what the implication is and so um this kind of keyword-based reliance is not

0:57:52 - 0:57:59 Text: you know uh is not uh sort of only specific to this task like there's a lot of places where

0:57:59 - 0:58:05 Text: model center rely on lexical cues a lot more um but yeah so we need some new modeling advances

0:58:05 - 0:58:11 Text: to do this task better properly so um just to summarize this real quick so social bias frames is

0:58:11 - 0:58:16 Text: a new formalism to distill the harmful or biased implications of language uh we introduced a new

0:58:16 - 0:58:21 Text: data set uh with annotations and then uh our experiments show that models really struggle with these

0:58:21 - 0:58:27 Text: more subtle bias implications uh which motivates as i said the need for uh better structured reasoning

0:58:27 - 0:58:35 Text: about social biases and people and groups in language so at a higher level though um the goal

0:58:35 - 0:58:39 Text: of social bias frames was basically to create an interpretable or explainable formalism that could

0:58:39 - 0:58:46 Text: um represent social biases language and these explanations to be really useful for determining

0:58:46 - 0:58:52 Text: you know what's already uh what's what's going on in already written text so for example

0:58:52 - 0:58:57 Text: i'm imagining social bias frames as being helpful um for helping content moderators make decisions

0:58:57 - 0:59:02 Text: so not just having the text but also having like the implications if our models were good you know

0:59:02 - 0:59:08 Text: having implications of the posts the associated with there and people could have uh make maybe a more

0:59:08 - 0:59:13 Text: uh informed decision about moderation um it could also be really useful if you're sort of looking at

0:59:13 - 0:59:17 Text: like a corpus of data that you're interested in qualifying like you know how much sexism is

0:59:17 - 0:59:23 Text: appearing in this corpus then you could sort of do that with these kinds of explanations but what

0:59:23 - 0:59:27 Text: i'm really excited about too is that these explanations could actually help authors as they are

0:59:27 - 0:59:34 Text: writing text by pointing out the maybe unintentional biases in their text and so this opens the door

0:59:34 - 0:59:39 Text: for devising text through rewriting which is the next part of this talk that i'm going to talk

0:59:39 - 0:59:46 Text: about in the final project but um i will answer one more question real quick this is not a technical

0:59:46 - 0:59:50 Text: question but i tried to develop models to filter out offensive content but had to give up because

0:59:50 - 0:59:54 Text: it was mentally exhausting to see the offensive content myself yes what do you think is the best

0:59:54 - 0:59:59 Text: way to protect the researcher and annotators themselves from uh the effect of harmful target

0:59:59 - 1:00:07 Text: data set during research could you share any personal tips so i think that um personally i am

1:00:07 - 1:00:14 Text: just kind of used to staring at this data now um so i'm maybe a little bit less affected by it

1:00:14 - 1:00:19 Text: because i think that um there's something empowering about knowing that i'm doing something

1:00:19 - 1:00:25 Text: about the issue um so that makes me a little bit less um sort of affected by it um when it comes

1:00:25 - 1:00:31 Text: to annotators i always um you know i think i always try to align their motivations with the same

1:00:31 - 1:00:34 Text: goal you know like we're trying to make the internet less toxic like really trying to make a

1:00:34 - 1:00:38 Text: difference here and that's why we're doing this we're not just you know displaying these awful

1:00:38 - 1:00:47 Text: things to you for you know a fun purpose um also like try to take breaks um i also have you know

1:00:47 - 1:00:52 Text: extended experience being in therapy so that helps uh you know having support system and things

1:00:52 - 1:00:57 Text: like that just don't really not approaching this from a place where um you're already vulnerable but

1:00:57 - 1:01:01 Text: if you can sort of have a support system and take breaks and things like that that makes it a lot

1:01:01 - 1:01:06 Text: easier to do this kind of research um but like i said i think there's something just like kind of

1:01:06 - 1:01:13 Text: empowering about knowing that i'm you know trying to tackle an important problem okay so let me talk

1:01:13 - 1:01:19 Text: about this last part of the talk which is a power transformer um and so power transformer is an

1:01:19 - 1:01:24 Text: unsupervised controllable revision model for bias language correction and what i mean when i say

1:01:24 - 1:01:28 Text: bias language correction here is we're looking at bias through the lens of connotation frames of

1:01:28 - 1:01:35 Text: power and agency so what these words mean is um basically connotation frames of power and agency

1:01:35 - 1:01:42 Text: where uh are a commonsense formalism that um i introduced in 2017 with my co-authors um and it

1:01:42 - 1:01:48 Text: distills knowledge or connotational knowledge related to uh birds or verb predicates um and

1:01:48 - 1:01:54 Text: specifically so for example if you have someone is pursuing something uh this connotation frame

1:01:54 - 1:01:58 Text: is going to distill knowledge about the power differential between the agent and the theme of the

1:01:58 - 1:02:04 Text: verb so the object and the subject of the verb um so in this case when someone is pursuing something

1:02:04 - 1:02:08 Text: it's kind of likely that the person doing the pursuing has less power than the person that

1:02:08 - 1:02:15 Text: they're trying to pursue but they're not able to get um also the connotation frames capture

1:02:15 - 1:02:20 Text: notions of agency that is attributed to the person doing the action or the event and so in this

1:02:20 - 1:02:26 Text: case someone who's high agency is someone who tends to be very decisive active driving change

1:02:26 - 1:02:30 Text: so someone who is pursuing something is like really driving that change of like pursuing the thing

1:02:30 - 1:02:36 Text: uh but also someone who's wiping, beating, citing, shooting, scarring, things like that versus

1:02:36 - 1:02:41 Text: someone who's low agency tends to be very passive and experiencing uh events so that's maybe someone

1:02:41 - 1:02:47 Text: who's doing more tripping on things or sleeping or viewing or dosing or dreaming things like that

1:02:48 - 1:02:54 Text: and so we can actually use these connotation frames to analyze um all sorts of texts and specifically

1:02:54 - 1:03:00 Text: in this original work in 2017 we analyze the way that characters were portrayed in movie scripts

1:03:00 - 1:03:06 Text: modern movie scripts i should say um and we wanted to study basically the agency levels of

1:03:06 - 1:03:12 Text: characters with respect to their gender um and so just glossing over the details here but what we

1:03:12 - 1:03:17 Text: found is that in these modern movie scripts men were portrayed with much higher agency and women

1:03:17 - 1:03:25 Text: were portrayed with much lower agency unfortunately and again this is an example of um what i

1:03:25 - 1:03:29 Text: you know call the cycle of social inequality and text because in the world men um unfortunately

1:03:29 - 1:03:35 Text: have more societal and decision making power than women and this in turn then you know

1:03:35 - 1:03:40 Text: transformative and when we're writing texts we tend to portray women as less agentic and more

1:03:40 - 1:03:45 Text: powerful than men because it's just kind of how the world works um and so then when we read these

1:03:45 - 1:03:51 Text: texts this reinforces our perceptions of gender roles and stereotypes and uh can just continue

1:03:51 - 1:03:57 Text: the cycle but this motivated our new task of controllable de-biasing which basically is asking

1:03:57 - 1:04:02 Text: can machines learn to revise texts to de-biase these portrayals and essentially break the cycle of

1:04:02 - 1:04:09 Text: social inequality and text and so specifically the goal of controllable de-biasing of our story

1:04:09 - 1:04:15 Text: sentences is to take in a sentence like may day dreams of being a doctor where uh may is portrayed

1:04:15 - 1:04:21 Text: in you know very passively with low agency and we're gonna rewrite it to may pursues her dream to

1:04:21 - 1:04:28 Text: be a doctor where may all of a sudden has a much higher agency uh than before so there's actually

1:04:28 - 1:04:34 Text: two challenges to doing this kind of thing um the first one is that contrary to some existing

1:04:34 - 1:04:39 Text: sort of work on uh rewriting we cannot just paraphrase the sentences because a lot of times the

1:04:39 - 1:04:45 Text: biases are actually um not just in the framing of the actions that are attributed to characters but

1:04:45 - 1:04:52 Text: actually in the actions themselves um and so on the other hand we also want to avoid making

1:04:52 - 1:04:56 Text: unnecessary meaning changes and completely rewriting the sentence so we want to preserve as much

1:04:56 - 1:05:02 Text: of the meaning as we can while also still devising and so what you want is targeted edits with minimal

1:05:02 - 1:05:09 Text: meaning change a second challenge is that this is essentially an unsupervised task um and what I

1:05:09 - 1:05:14 Text: mean by that is that there's no parallel input output pairs that can show a model like this is

1:05:14 - 1:05:19 Text: exactly what you should rewrite the sentence to um and so the way that people have tackled his

1:05:19 - 1:05:25 Text: problem before is using generator discriminator models by uh basically setting up like uh you know

1:05:25 - 1:05:32 Text: a discriminator on top of your generation model and kind of like GAN style things um but oftentimes

1:05:32 - 1:05:36 Text: research has shown that this leads to really you know disloant or less grammatical output text

1:05:37 - 1:05:43 Text: and so our approach is um we're gonna you follow Lee at all's approach of masking and reconstructing

1:05:43 - 1:05:49 Text: sentences um but in order to really do this fully we have to add two novel modeling aspects to this

1:05:49 - 1:05:53 Text: um the first one is we added an additional paraphrasing training objectives and then at

1:05:53 - 1:05:58 Text: um testing time or a generation time we're gonna add a vocabulary boosting mechanism to reach

1:05:58 - 1:06:04 Text: the desired agency levels better all right so let me walk you through the way that power transformer is set

1:06:04 - 1:06:10 Text: up so here this model works is we start with an input sentence um like May day dreams of being a

1:06:10 - 1:06:17 Text: doctor and a desired agency level in this case like high agency um we're gonna mask all the agency

1:06:17 - 1:06:22 Text: markers from the input sentence using our connotation frames we're gonna transform our uh desired

1:06:22 - 1:06:28 Text: agency level into a special token we're gonna feed those into a transformer based conditional

1:06:28 - 1:06:34 Text: language model um in this case initializes dpt and then at training time we're gonna use our joint

1:06:34 - 1:06:39 Text: reconstruction and paraphrasing objective and then at test time or a generation time we're going to

1:06:39 - 1:06:45 Text: use our vocab boosting mechanism on top of the probabilities that are given to us by the model um to

1:06:45 - 1:06:52 Text: even further reach the desired agency level so to be a little bit more specific um the joint

1:06:52 - 1:06:59 Text: training objective has two parts the first one is um in domain reconstruction objective which

1:06:59 - 1:07:04 Text: takes in sentences from our sorry corpus uh masks the agency markers and then optimizes the

1:07:04 - 1:07:09 Text: likelihood of the reconstructed sentence and then the second one is an out of domain paraphrasing

1:07:09 - 1:07:15 Text: objective which basically uses uh pairs of paraphrases from tv subtitles so very different corpus

1:07:15 - 1:07:20 Text: very different domain and then optimizes the likelihood of a sentence given its mass paraphrase

1:07:20 - 1:07:27 Text: and then the way that our decoding time uh that our vocab boosting mechanism works is essentially

1:07:27 - 1:07:32 Text: it's gonna increase the likelihood of tokens that are connoting the right desired um agency levels

1:07:32 - 1:07:38 Text: um and we're doing that by adding a vocab size vector to um for the right agency levels at each

1:07:38 - 1:07:44 Text: decoding time step um so basically like every time we generate a new word we're gonna shift the

1:07:44 - 1:07:49 Text: probabilities up of the words that are kind of in the right direction of the right agency

1:07:49 - 1:07:55 Text: um and then we can do this shifting at different strengths using a parameter beta here

1:07:56 - 1:08:00 Text: um so there's a lot of different components in this model right um so we can actually

1:08:00 - 1:08:06 Text: wanted to measure whether you know all of these components actually mattered um and uh just doing

1:08:06 - 1:08:10 Text: some ablation studies where we looked at whether you know the vocab boosting actually helped

1:08:10 - 1:08:14 Text: a reconstruction or the joint objective and things like that we actually do find a performance gain

1:08:14 - 1:08:19 Text: from both the vocab boosting and the joint objective looking at whether or not we reach the desired

1:08:19 - 1:08:27 Text: agency levels um so just in terms of accuracy but this is a kind of open-ended generation task uh

1:08:27 - 1:08:32 Text: where we have an input sentence and an output sentence that could look like a lot of things um and so

1:08:33 - 1:08:36 Text: we don't really have good automatic ways of evaluating this kind of open-ended generation

1:08:36 - 1:08:41 Text: task unfortunately as of today it's really still an open problem if you're excited about you know

1:08:41 - 1:08:45 Text: evaluating text generation I encourage you to work on it because it's really exciting um but

1:08:45 - 1:08:51 Text: unfortunately until we have a really good text evaluation system um we tend to rely on human

1:08:51 - 1:08:57 Text: evaluations that said to tell us like how good these outputs are and so that's what we did in this

1:08:57 - 1:09:03 Text: paper we designed a head-to-head evaluation task where we basically gave um the radars the output

1:09:03 - 1:09:06 Text: of two systems and then asked them which one better preserved the meaning at which one better

1:09:06 - 1:09:13 Text: preserved the agency compared to the original sentence um and we actually compared our um full

1:09:13 - 1:09:18 Text: power transfer remodels outputs to the non-boost inversion of the model as well as two

1:09:18 - 1:09:26 Text: baselines from related tasks uh the different tasks and what we find is that uh the both power transfer

1:09:26 - 1:09:31 Text: remodels actually are better at preserving the meaning than the previous baselines um and then the

1:09:31 - 1:09:36 Text: full power transfer remodel with the vocab boosting actually has more accurate output agency levels

1:09:36 - 1:09:44 Text: compared to um the other baselines so given that we have evidence that our model actually works well

1:09:45 - 1:09:50 Text: we want it in this paper in this project to circle back to the movies as scripts that I mentioned

1:09:50 - 1:09:56 Text: earlier and then see how power transfer remodel can help actually uh sort of mitigate these biases

1:09:59 - 1:10:03 Text: so as you recall in the original movie scripts um we found that men were portrayed with higher

1:10:03 - 1:10:09 Text: positive agency than women um and so we're going to do a case study we're going to rewrite the lines

1:10:09 - 1:10:13 Text: that describe female characters using power transformer specifically trying to give uh these

1:10:13 - 1:10:19 Text: characters more agency um and so in the revised scripts what we find is that not only do women have

1:10:19 - 1:10:27 Text: much higher agency than before but um if we compare the gender effects actually um all of a sudden

1:10:27 - 1:10:31 Text: women have much higher agency than male characters uh statistically significantly so

1:10:31 - 1:10:38 Text: so these coefficients look really big and I want to sort of give a caveat here that this is a very

1:10:38 - 1:10:44 Text: broad swath approach to uh devising uh language um because we rewrote every single character that

1:10:44 - 1:10:49 Text: was detected as being female but I think here what this actually illustrates is the promise for

1:10:49 - 1:10:54 Text: using human AI collaborative writing setups to help people maybe write less stereotypically so

1:10:54 - 1:11:00 Text: kind of enhancing this writing uh sort of platforms are these tools that people are using to write

1:11:00 - 1:11:03 Text: to give them more information about how what they write might be perceived

1:11:06 - 1:11:10 Text: so um you know glossing over the contributions of power transformer here we introduced the new

1:11:10 - 1:11:14 Text: task of controllable devising um I talked about these connotation frames which are new comments

1:11:14 - 1:11:20 Text: and formalism um and then introduced a new model power transformer um and we showed in the case

1:11:20 - 1:11:24 Text: study that we could actually mitigate um automatically mitigate some of the gender biases in

1:11:24 - 1:11:32 Text: movie scripts and so this concludes the last sort of uh you know part of my talk about projects um

1:11:32 - 1:11:35 Text: and I wanted to briefly walk through some future directions and hopefully we can get some

1:11:35 - 1:11:40 Text: more discussion going as well on that I know that I think I have about 15 minutes left

1:11:41 - 1:11:46 Text: so um to recap like in this talk I talked about racial bias and hate speech detection and

1:11:46 - 1:11:50 Text: propose to erase aware annotation strategies to maybe mitigate this um then I talked about

1:11:50 - 1:11:55 Text: social bias frames as a new way of viewing the problem of hate speech detection by looking at the

1:11:55 - 1:12:00 Text: bias and harmful implications of language instead um and then finally I showed with power transformer

1:12:00 - 1:12:06 Text: that we can revise and devise text through the lens of uh connotation frames but there's a lot more

1:12:06 - 1:12:11 Text: to be done towards this sort of broader goal of avoiding and mitigating biases in language uh with

1:12:11 - 1:12:17 Text: more human-centric models and so I'm really excited about um furthering this research on understanding

1:12:17 - 1:12:23 Text: how social biases show up in human-written language um so this requires looking at how we can do

1:12:23 - 1:12:28 Text: better at detecting social biases in toxicity in language um so I'm really excited about you know

1:12:28 - 1:12:32 Text: working more on detect on developing formalisms for contextual bias representation so

1:12:32 - 1:12:37 Text: harking back to a question that was there earlier um so for example you know a lot of these

1:12:37 - 1:12:43 Text: representations have ignored the conversational context in which things appear um and actually the

1:12:43 - 1:12:48 Text: pragmatic implications of uh conversation in terms of biases are actually not trivial to capture

1:12:49 - 1:12:52 Text: for example you could say something really offensive and then someone else could say oh I agree

1:12:53 - 1:12:57 Text: um and then you know of course the first utterances offensive but then what do you do about

1:12:57 - 1:13:02 Text: the second utterance like knowing the stance towards offensive utterances is also really important

1:13:02 - 1:13:07 Text: and so we uh studied that a little bit in uh work called toxic chat um that was published at

1:13:07 - 1:13:14 Text: even lp 2021 um as I showed there's a lot of uh differences in terms of like who the speaker

1:13:14 - 1:13:20 Text: and listener and annotators are when it comes to these kinds of um sort of social biases in language

1:13:20 - 1:13:25 Text: and so we need to keep looking at like how those identities and the power dynamics between those

1:13:25 - 1:13:31 Text: identities affect the implications um you know of course our social bias frame model wasn't perfect

1:13:31 - 1:13:37 Text: and so we need better models to do better deeper reasoning about biases in text um and

1:13:37 - 1:13:41 Text: as I mentioned I'm really excited about uh this whole sort of application of rewriting text

1:13:41 - 1:13:47 Text: to de-bias it and so um potentially using other dimensions then the uh

1:13:47 - 1:13:52 Text: connotation frames of pound agency but maybe using social bias frames instead of things like that um

1:13:52 - 1:13:57 Text: and you know this also requires studying how humans would react to these kinds of inputs from

1:13:57 - 1:14:02 Text: models like is someone going to be more or less offended if the model says that their uh

1:14:02 - 1:14:04 Text: text might be biased versus a human would say that

1:14:04 - 1:14:09 Text: um on the flip side I'm also really interested and I think it's really important to keep looking at

1:14:09 - 1:14:15 Text: how we can avoid biases and toxicities and machine generated language um and so you know uh

1:14:15 - 1:14:19 Text: done some work on scrutinizing the biases and toxicity in these pre-trained language models

1:14:19 - 1:14:24 Text: uh for example looking at you know how quickly a language model could generate toxicity but also

1:14:25 - 1:14:30 Text: um how you know what about the pre-training data and the way that pre-training data was selected

1:14:30 - 1:14:36 Text: um affects the types of biases and toxicity that is in these machines um so for example in some

1:14:36 - 1:14:42 Text: recent work we found that um I think that was published at e-minil p 2021 as well we basically found

1:14:42 - 1:14:47 Text: that like you know the african-american english was really likely to be removed from automatic

1:14:47 - 1:14:52 Text: sort of data quality filters that google uses um for their corpora so there's some problematic

1:14:52 - 1:15:00 Text: things about that as well um also really excited about you know finding ways to steer methods that are

1:15:00 - 1:15:04 Text: that can help avoid toxicity and so this is kind of also hardening back up the earlier question of

1:15:04 - 1:15:09 Text: like what about the pre-training data or the training data versus like the test time sort of

1:15:09 - 1:15:13 Text: behavior of models um and like I said I think we should tackle both and I'm really excited

1:15:13 - 1:15:17 Text: about tackling both these things um so how can we steer models that are already trained and we

1:15:17 - 1:15:22 Text: may we know that they may have toxic behavior how can we steer them nonetheless to be less toxic

1:15:22 - 1:15:27 Text: um so we have some work called experts where we basically learn from the worst in a way where we

1:15:27 - 1:15:33 Text: basically have a really really toxic LM that is um an anti-roll model for the standard language model

1:15:33 - 1:15:37 Text: and so it's basically like you're really really racist uncle and so if you know that your racist

1:15:37 - 1:15:43 Text: uncle would say it that you you know that you shouldn't say it basically um you know and I think

1:15:43 - 1:15:47 Text: in terms of like looking at steering these models and avoiding toxicity and machine-generated

1:15:47 - 1:15:52 Text: text like we should think about you know expanding what we should avoid and not just avoiding

1:15:52 - 1:15:57 Text: swear words but also you know maybe we want our models to be pro social and maybe we want them

1:15:57 - 1:16:03 Text: to not violate social norms in general and just follow community norms more um so um and specifically

1:16:03 - 1:16:08 Text: maybe we want you know we want to allow for personalization so that it's not just one culture that's

1:16:08 - 1:16:12 Text: being represented in our machine language or machine learning models that um you know could be

1:16:12 - 1:16:17 Text: personalized to your own culture and things like that um and then finally I think there's uh you

1:16:17 - 1:16:22 Text: I'm really excited about bridging the gap between NLP and social sciences in general so um you know

1:16:22 - 1:16:28 Text: asking what social sciences can do for improving NLP systems uh so for example looking at how

1:16:28 - 1:16:32 Text: psychology and social linguistics can help us understand how people label or perceive offensive

1:16:32 - 1:16:39 Text: ness in text uh but also in general how can uh cognitive science uh sort of uh findings help us

1:16:39 - 1:16:44 Text: understand how cognitive biases might appear in cross-wrestling tasks in general and then a

1:16:44 - 1:16:49 Text: converse of using NLP to answer social science questions is also really interesting so for example

1:16:49 - 1:16:54 Text: you know we know that misinformation in news is another big issue that's going online

1:16:54 - 1:16:59 Text: where people are sharing this information or propaganda but can we make machine learning or

1:16:59 - 1:17:05 Text: NLP systems that can help us fight this by for example distilling common tropes that are booked by headlines

1:17:05 - 1:17:11 Text: or um producing or generating the sort of likely reactions that someone would have to headline so

1:17:11 - 1:17:15 Text: that we know when someone's really afraid that um maybe we should be careful about framing

1:17:15 - 1:17:21 Text: as something uh framing something in uh very fear-mongering way um and in general you know

1:17:21 - 1:17:24 Text: creating methods for analyzing social phenomena in text is also really important so that we can

1:17:24 - 1:17:29 Text: answer social science questions for that all right that uh concludes my talk and I'm really

1:17:29 - 1:17:34 Text: uh excited to have been able to share my work with y'all and I want to thank you all for listening

1:17:34 - 1:17:39 Text: um and specifically also want to thank my collaborators because without them none of this work that I

1:17:39 - 1:17:44 Text: just talked about could have been possible and I'll take more questions now um I'll answer

1:17:45 - 1:17:49 Text: questions so in places where keyword censorship is enforced I observed that people change how they

1:17:49 - 1:17:54 Text: speak using code words or acronyms to avoid censorship but fundamentally it is not that effective

1:17:54 - 1:17:58 Text: and actually preventing people from talking about it hate speech detection is of course more

1:17:58 - 1:18:02 Text: complicated and hard to invade using that over time maybe who will still find a way to avoid it

1:18:02 - 1:18:07 Text: and rather the whole system uneffective yeah absolutely and this is kind of a a dual use question too

1:18:07 - 1:18:13 Text: because like um you know some moderation is really meant to protect people for example there's work

1:18:13 - 1:18:19 Text: from I think it's like sort of HCI people that looked at the pro-enerectia communities online

1:18:20 - 1:18:24 Text: and there's a console battle of people that are sort of trying to get people to become more

1:18:24 - 1:18:29 Text: anorexic uh learning new ways to promote the hashtags and things like that and then the

1:18:29 - 1:18:33 Text: platforms and the moderators like sort of falling behind and just like trying to follow and like being

1:18:33 - 1:18:38 Text: like okay this hashtag now means this and things like that um so like in the grand scheme of things

1:18:38 - 1:18:43 Text: it's like well you know shouldn't we moderate this and so it's it's definitely a tricky question there

1:18:45 - 1:18:51 Text: but yeah I think if we can if we can tackle at least like the if we can set a rule around the

1:18:51 - 1:18:55 Text: biases that we want to moderate like there's always going to be people that are going to try to evade it

1:18:55 - 1:19:00 Text: um for sure yeah that's kind of yeah I think that I don't actually know if there's a good answer

1:19:00 - 1:19:08 Text: for this question um but that's a really astute observation for sure um okay said something I'm

1:19:08 - 1:19:12 Text: curious there's such a thing as kick-friendly a kid-aware alum because they're spending record time

1:19:12 - 1:19:15 Text: on devices online this is certainly a governance issue but how do you incorporate agents into the

1:19:15 - 1:19:21 Text: models make technology truly user-friendly yeah that's hard because like who decides what's

1:19:21 - 1:19:25 Text: appropriate for kids or not right there's an anti-gay bill in Florida that's being trying to be

1:19:25 - 1:19:31 Text: pushed where people don't want to talk about homosexuality um you know that's a very political

1:19:31 - 1:19:36 Text: stance because like there are kids that are gay and struggling with that and so um I

1:19:37 - 1:19:41 Text: I just want to raise more questions honestly because I'm not going to pronounce myself on that um

1:19:43 - 1:19:50 Text: I don't know that I think that um children protecting children is oftentimes a reason that is

1:19:50 - 1:19:56 Text: used as an excuse to justify censorship and other things so I think we should be careful about

1:19:56 - 1:20:00 Text: that kind of argument though um like are we actually looking at what is actually harming children

1:20:00 - 1:20:05 Text: and like what are the biases of those studies versus um people just being morally outraged by

1:20:05 - 1:20:11 Text: something and saying that their children should be protected from it instead um so it's not clear

1:20:11 - 1:20:16 Text: and also I think in general we should be really regulating the usage of AI and technology for children

1:20:16 - 1:20:24 Text: because we do want to protect them from things that's for sure um what techniques can be used to

1:20:24 - 1:20:28 Text: detect social biases hate speech in languages that we don't have large curated label data

1:20:28 - 1:20:37 Text: sets for that's a really good question um I don't know that it's that possible um I think I mean

1:20:37 - 1:20:42 Text: you know there's there's you know this sort of low resource language approaches that people

1:20:42 - 1:20:49 Text: would use um maybe multilingual transfer type things could help um but like I think with these

1:20:49 - 1:20:54 Text: kinds of things like I'm more interested in like what the humans that are being affected think and

1:20:54 - 1:20:59 Text: so we would need some sort of information about what they like and what they don't like in order to

1:20:59 - 1:21:06 Text: be able to tackle these um you know undesirable language um that we're trying to remove or you know

1:21:06 - 1:21:14 Text: detect I wonder if you have thought about some of the potential risks or ways in which this type of

1:21:14 - 1:21:19 Text: work could be misused yeah so people talk about dual use a lot in this case um so the issue is that

1:21:20 - 1:21:27 Text: like the cat is our pen door's box has been opened the internet exists and it's just rampant

1:21:27 - 1:21:32 Text: with hate speech right now and it's pushing people out in a lot of ways and so um not doing anything

1:21:32 - 1:21:37 Text: about that problem is kind of an implicit endorsement of the fact that you know minority people are

1:21:37 - 1:21:42 Text: being pushed out of these platforms or are being silenced by these algorithms um so like we don't

1:21:42 - 1:21:46 Text: really have a choice to just go back to a world where this isn't an issue I mean you can just pretend

1:21:46 - 1:21:52 Text: like it doesn't exist anymore um I like to work on making things a little bit better because I think

1:21:52 - 1:21:58 Text: that people are you know being censored and that's not wrong um but like you know it's totally

1:21:58 - 1:22:02 Text: valid if you would rather not work on it because you're afraid that your technology's gonna backfire

1:22:02 - 1:22:07 Text: you of a war that's for sure um but I think like there's no good sort of like we're already kind of

1:22:07 - 1:22:13 Text: in an ethically uh that situation unfortunately so I've definitely thought about this a lot um

1:22:15 - 1:22:22 Text: okay have you face any opposition and applying your research and toxicity to actually

1:22:22 - 1:22:25 Text: yeah so kind of uh same same question like I think people get really freaked out that they're

1:22:25 - 1:22:31 Text: gonna get censored um and it's like the answer is that people are already being censored maybe it's

1:22:31 - 1:22:36 Text: not you um but there's already people being censored so like again cats out of the bag like right

1:22:36 - 1:22:41 Text: you know I don't think that it's like we're developing technology that is just not solving a real

1:22:41 - 1:22:45 Text: world problem and just creating more problems like their you know internet is already existing and

1:22:45 - 1:22:50 Text: so social media this and we should keep working with legislators as well to like prevent people from

1:22:50 - 1:22:56 Text: over like from overstepping the power that they they have um so for example I think considering

1:22:56 - 1:23:00 Text: social media platforms like not like you know what a company can dictate everything on but you

1:23:00 - 1:23:05 Text: considering them like public forums um and sort of having regulations around like what can I

1:23:05 - 1:23:10 Text: can't be said in public forums and I'm looking at those and seeing what um what applies to social

1:23:10 - 1:23:16 Text: media platforms is important would you rather have turkers lablers um for offensive and offensive

1:23:16 - 1:23:20 Text: text incorporate be more or less ideologically culturally congress what metrics should we use to

1:23:20 - 1:23:27 Text: disqualify someone from labeling text if any yeah so again that's where positionality of researchers

1:23:27 - 1:23:33 Text: comes in like I didn't want to create a data set that um you know was all about flagging

1:23:33 - 1:23:39 Text: anti-white statements because that's not a technology that I believe is um reflecting

1:23:39 - 1:23:44 Text: the actual harms that people are experiencing if a researcher wants to do that um they're free to

1:23:44 - 1:23:50 Text: you know I can't control what they do but um I think yeah this is kind of a part where like any

1:23:50 - 1:23:56 Text: technology that we develop comes from a place of like our own biases and our own positionality

1:23:56 - 1:24:02 Text: so um we have to recognize that no matter what we do right and it's not just because we're tackling

1:24:02 - 1:24:07 Text: something that's um politically charged all of a sudden that um that's the only case is that

1:24:07 - 1:24:12 Text: these kinds of positionality things affect our research um we have a live question

1:24:15 - 1:24:19 Text: thank you for the talk it's really enlightening I was wondering there's always going to be a gap

1:24:19 - 1:24:24 Text: between the values of the model builders and the values of the people who are using these models

1:24:24 - 1:24:30 Text: and what do you think we can do in like creating some sort of system where you have these

1:24:30 - 1:24:36 Text: valuable constantly if of we're constantly changing what we think is more and not more a bias not

1:24:36 - 1:24:41 Text: biased and you know these but these values sort of evolved over like downs of years but now with AI

1:24:41 - 1:24:46 Text: they're sort of evolving over minutes and seconds like is there ways other ways to sort of make

1:24:46 - 1:24:51 Text: sure we come up with a good set of values but can adapt to kind of the changing values of like

1:24:51 - 1:24:57 Text: the people who are using the models as well yeah I mean I think like ultimately like our

1:24:57 - 1:25:03 Text: approaches should be more human centric than they currently are um and like looking at you know

1:25:03 - 1:25:07 Text: who the stakeholders of this technology are and like we're just really starting to scratch the

1:25:07 - 1:25:13 Text: surface only of like bridging that gap between like the the developers and like the the people that

1:25:13 - 1:25:19 Text: are being affected by this and so um we need to keep doing work just like uncovering that link

1:25:19 - 1:25:24 Text: and like how people are being affected and like looking at what those people want um and doing

1:25:24 - 1:25:28 Text: that kind of like value sensitive design like looking at like what the communities that are actually

1:25:28 - 1:25:35 Text: going to be effective as a technology want out of it but um just like any technology like nothing

1:25:35 - 1:25:39 Text: should be static and if the world changes like we should be changing our technology with it as well

1:25:40 - 1:25:47 Text: to adapt with it and so um yeah it's kind of a non-answer because it's like yeah we should just be

1:25:47 - 1:25:51 Text: you know talking to people and asking them and getting out of the just NLP part of it and just

1:25:51 - 1:25:56 Text: really going to the to the users and asking them like what are they what are they experiencing

1:26:01 - 1:26:07 Text: um okay so ask how is agency empowerment reform actually quantify a character to be positive

1:26:07 - 1:26:10 Text: and negative are these based on actions intent emotional strength or characters being portrayed

1:26:10 - 1:26:16 Text: so it's about the on the verbs that are being used to describe these characters so that's how we

1:26:16 - 1:26:20 Text: measure the agency of a character basically so like are they doing a lot of chewing or are they

1:26:20 - 1:26:27 Text: doing a lot of sleeping basically um to simplify um and there's you know it's just the lexicon so

1:26:27 - 1:26:32 Text: you can download the lexicon and look at it as well um you know the current social media platforms

1:26:32 - 1:26:36 Text: images also play a crucial role more like combination of image and text very important for

1:26:36 - 1:26:40 Text: detecting heat special other biases do plant X4 yes yeah i'm really excited i'm actually

1:26:40 - 1:26:45 Text: a meeting with some open AI folks um next week to talk about multimodal sort of like

1:26:45 - 1:26:51 Text: toxicity detection and things like that um you know that's a whole other ballgame um unfortunately

1:26:51 - 1:26:55 Text: i my training is in just language so i don't really have a lot of expertise in vision but i'm

1:26:55 - 1:26:59 Text: really excited to try seeing how that works there's still a lot of places where it's just

1:26:59 - 1:27:06 Text: like only though so i'm confident that i'll still have some work to do just in that um

1:27:08 - 1:27:11 Text: thank you for amazing sharing my question is how to keep social bias models we've talked about

1:27:11 - 1:27:16 Text: on the selector updated to the social okay so that's kind of what someone else asked as well um just

1:27:16 - 1:27:20 Text: you know i think that's the answer we just need to keep updating things and just you know not think

1:27:20 - 1:27:27 Text: of models is static um and if we can make our models be able to be conditioned upon a large set of

1:27:27 - 1:27:32 Text: uh rules or things that we know are problematic or um and we can update that then our model could

1:27:32 - 1:27:41 Text: apparently be updated just by having that new data accessible we have gone to 445 Martin okay it's

1:27:41 - 1:27:46 Text: up to you if you're if you're just parrying through and we're asking our two couple of questions

1:27:46 - 1:27:52 Text: you're welcome to but you're also welcome to take it deep sigh and stop i feel like i'm just like

1:27:52 - 1:27:57 Text: i want to answer everyone's questions because they're also good but um i should probably stop here

1:27:57 - 1:28:01 Text: just in the interest of time and uh being respectful of everyone who can't listen to all the other

1:28:01 - 1:28:07 Text: questions um but this was great i'm really excited about all the questions that we got i it's been a

1:28:07 - 1:28:11 Text: minute since i've been like you know interrupted and it'll talk for questions so i wanted to answer all

1:28:11 - 1:28:17 Text: of them and try to get through everything but this is awesome yeah well thank you so much for

1:28:17 - 1:28:34 Text: um giving this talk and the origin sides of the area yeah of course