Stanford CS224N - NLP w⧸ DL | Winter 2021 | Lecture 4 - Syntactic Structure and Dependency Parsing

0:00:00 - 0:00:12     Text: Okay, so for today, we're actually going to take a bit of a change of pace from what

0:00:12 - 0:00:19     Text: the last couple of lectures are have been about and we're going to focus much more on linguistics

0:00:19 - 0:00:25     Text: and natural language processing and so in particular, we're going to start looking at the topic

0:00:25 - 0:00:35     Text: of dependency parsing. And so this is the plan of what to go about through today. So I'm going

0:00:35 - 0:00:40     Text: to start out by going through some ideas that have been used in the syntactic structure of languages

0:00:40 - 0:00:48     Text: of constituency and dependency and introduce those and then focusing in more independent

0:00:48 - 0:00:55     Text: dependency structure, I'm then going to look at dependency grammars and dependency tree banks

0:00:55 - 0:01:00     Text: and then having done that, we're then going to move back into thinking about how to build

0:01:00 - 0:01:05     Text: natural language processing systems and so I'm going to introduce the idea of transition-based

0:01:05 - 0:01:12     Text: dependency parsing and then in particular having developed that idea, I'm going to talk about

0:01:12 - 0:01:18     Text: a way to build a simple but highly effective neural dependency parser. And so this simple

0:01:18 - 0:01:23     Text: highly effective neural dependency parser is essentially what we'll be asking you to build

0:01:23 - 0:01:29     Text: in the third assignment. So in some sense, we're getting a little bit ahead of ourselves here

0:01:29 - 0:01:36     Text: because in week two of the class, we teach you how to do both assignments two and three, but all

0:01:36 - 0:01:42     Text: of this material will come in really useful. Before I get underway, just a couple of announcements.

0:01:44 - 0:01:51     Text: So for a site, again for assignment two, you don't yet need to use the PyTorch framework,

0:01:51 - 0:01:58     Text: but now's a good time to work on getting PyTorch installed for your Python programming.

0:01:59 - 0:02:05     Text: Assignment three is in part also and in production using PyTorch, it's got a lot of scaffolding

0:02:05 - 0:02:13     Text: included in the assignment, but beyond that, this Friday, we've got a PyTorch tutorial and thoroughly

0:02:13 - 0:02:21     Text: encourage you to come along to that as well, look for it under the Zoom tab. And in the second half

0:02:21 - 0:02:28     Text: of the first day of week four, we have an explicit class that partly focuses on the final projects

0:02:28 - 0:02:34     Text: and what the choices are for those, but it's never too late to start thinking about the final project

0:02:34 - 0:02:40     Text: and what kind of things you want to do for the final project. So do come meet with people,

0:02:40 - 0:02:46     Text: there are sort of resources on the course pages about what different TAs know about. I've also

0:02:46 - 0:02:50     Text: talked to a number of people about final projects, but clearly I can't talk to everybody.

0:02:50 - 0:02:54     Text: So I encourage you to also be thinking about what you want to do for final projects.

0:02:56 - 0:03:05     Text: Okay, so what I wanted to do today was introduce how people think about the structure of sentences

0:03:06 - 0:03:12     Text: and put structure on top of them to explain how human language conveys meaning.

0:03:13 - 0:03:19     Text: And so our starting point for meaning and essentially what we've dealt with with word vectors

0:03:19 - 0:03:26     Text: up until now is we have words. And words are obviously an important part of the

0:03:26 - 0:03:35     Text: meaning of human languages. But for words in human languages, there's more that we can do with them

0:03:35 - 0:03:42     Text: in thinking about how to structure sentences. So in particular, the first most basic way that we

0:03:42 - 0:03:49     Text: think about words when we are thinking about how sentences are structured is we give to them what's

0:03:49 - 0:03:58     Text: called a part of speech. We can say that cat is a noun, buy is a preposition, doors and

0:03:58 - 0:04:07     Text: other noun, cuddly is an adjective. And then for the word, if it was given a different part of

0:04:07 - 0:04:12     Text: speech, if you saw any parts of speech in school, it was probably your told it was an article.

0:04:12 - 0:04:20     Text: Sometimes that is just put into the class of adjectives in modern linguistics and what you'll see

0:04:20 - 0:04:26     Text: in the resources that we use, words like that are referred to as determiners. And the idea is

0:04:26 - 0:04:31     Text: that there's a bunch of words includes art and art, but also other words like this and that,

0:04:33 - 0:04:42     Text: or even every which are words that occur at the beginning of something like the cuddly cat

0:04:42 - 0:04:49     Text: which have a determinative function of sort of picking out which cats that they're referring to.

0:04:49 - 0:04:55     Text: And so we refer to those as determiners. But it's not the case that when we want to communicate with

0:04:55 - 0:05:03     Text: language that we just have this word salad where we say a bunch of words, we just say you know whatever

0:05:04 - 0:05:12     Text: leaking kitchen tap and let the other person put it together, we put words together in a particular

0:05:12 - 0:05:19     Text: way to express meanings. And so therefore, languages have larger units of putting meaning together.

0:05:19 - 0:05:30     Text: And the question is how we represent and think about those. Now in modern work and particular

0:05:30 - 0:05:37     Text: in modern United States linguistics or even what you see in computer science classes when thinking

0:05:37 - 0:05:44     Text: about formal languages, the most common way to approach this is with the idea of context-free

0:05:44 - 0:05:50     Text: grammars which you see at least a little bit of in 103 if you've done 103, what a linguist would

0:05:50 - 0:05:57     Text: often refer to as free structure grammars. And the idea there is to say well there are bigger

0:05:57 - 0:06:05     Text: units in languages that we refer to as phrases. So something like the cuddly cat is a cat

0:06:05 - 0:06:13     Text: with some other words modifying it. And so we refer to that as a noun phrase. But then we have

0:06:14 - 0:06:24     Text: ways in which phrases can get larger by building things inside phrases. So the door here is also a

0:06:24 - 0:06:30     Text: noun phrase. But then we can build something bigger around it with a preposition. So this is a

0:06:30 - 0:06:37     Text: preposition. And then we have a prepositional phrase. And in general we can keep going. So we

0:06:37 - 0:06:43     Text: can then make something like the cuddly cat by the door. And then the door is a noun phrase.

0:06:43 - 0:06:51     Text: The cuddly cat is a noun phrase by the door is a prepositional phrase. But then when we put it

0:06:51 - 0:06:58     Text: all together the whole of this thing becomes a bigger noun phrase. And so it's working with these

0:06:58 - 0:07:07     Text: ideas of nested phrases, what in context free grammar terms you would refer to as non-terminals.

0:07:08 - 0:07:13     Text: So noun phrase and prepositional phrase would be non-terminals in the context free grammar.

0:07:13 - 0:07:20     Text: We can build up a bigger structure of human languages. So let's just do that for a little bit

0:07:20 - 0:07:29     Text: to review what happens here. So we start off saying okay you can say the cat and a dog. And so those

0:07:29 - 0:07:35     Text: are noun phrases. And so we want a rule that can explain those. So we could say a noun phrase goes

0:07:35 - 0:07:43     Text: to the termina noun. And then somewhere over the side we'd have a lexicon. And in our lexicon

0:07:43 - 0:07:54     Text: we'd say that dog is a noun and cat is a noun and is a determiner and that is a determiner.

0:07:55 - 0:08:02     Text: Okay so then we notice you can do a bit more than that. So you can say things like the large cat

0:08:03 - 0:08:11     Text: a barking dog. So that suggests we can have a noun phrase after the determiner.

0:08:11 - 0:08:17     Text: There can optionally be an adjective and then there's the noun and that can explain some things we

0:08:17 - 0:08:26     Text: can say. But we can also say the cat by the door or a barking dog in a crate. And so we can also

0:08:26 - 0:08:34     Text: put a prepositional phrase at the end and that's optional. But you can combine it together with an

0:08:34 - 0:08:42     Text: adjective for the example I gave like a barking dog on the table. And so that a grids grammar can

0:08:42 - 0:08:51     Text: handle that. So then we'll keep on and say well actually you can use multiple adjectives so you

0:08:51 - 0:09:00     Text: can say a large barking dog or a large barking cuddly cat. No maybe not. Well sentences like that.

0:09:00 - 0:09:05     Text: So we have any number of adjectives which we can represent with a star. Pots referred to as the

0:09:05 - 0:09:14     Text: cleanie star. So that's good. But I forgot a bit actually. For by the door I have to have a rule

0:09:14 - 0:09:21     Text: for producing by the door. So I also need a rule that's a prepositional phrase goes to a preposition

0:09:21 - 0:09:29     Text: followed by a noun phrase. And so then I also have to have prepositions and that can be in or on

0:09:29 - 0:09:37     Text: or by. Okay. And I can make other sentences of course with this as well like the large crate on

0:09:37 - 0:09:45     Text: the table or something like that or the large crate on the large table. Okay. So I chug along

0:09:46 - 0:09:52     Text: and then well I could have something like talk to the cat. And so now I need more stuff. So talk

0:09:52 - 0:10:00     Text: is a verb and two is still looks like a preposition. So I need to be able to make up something

0:10:01 - 0:10:11     Text: with that as well. Okay. So what I can do is say I can also have a rule for a verb phrase that

0:10:11 - 0:10:19     Text: goes to a verb and then after that for something like talk to the cat that it can take a prepositional

0:10:19 - 0:10:32     Text: phrase after it. And then I can say that the verb goes to talk or walked. Okay. Then I can pass

0:10:32 - 0:10:40     Text: and then I can cover those sentences. Oops. Okay. So that's that's the end of what I have here.

0:10:40 - 0:10:49     Text: So in this sort of a way I'm handwriting a grammar. So here is now I have this grammar

0:10:50 - 0:10:59     Text: and a lexicon. And for the examples that I've written down here, this grammar and this lexicon

0:10:59 - 0:11:10     Text: is sufficient to pause these sort of fragments of showing expansion that I just wrote down. I mean,

0:11:10 - 0:11:16     Text: of course there's a lot more to English than what you see here. Right. So if I have something like

0:11:17 - 0:11:29     Text: the cat walked behind the dog, then I need some more grammar rules. So it seems then I need a rule

0:11:29 - 0:11:34     Text: that says I can have a sentence that goes to a noun phrase followed by a verb phrase.

0:11:35 - 0:11:46     Text: And I can keep on doing things of this sort. Let's see one question that Ruth Ann asked was about

0:11:46 - 0:11:53     Text: what do the brackets mean and is the first np different from the second.

0:11:53 - 0:12:03     Text: And so for this notation on the brackets here, I mean, this is actually a common notation that's

0:12:03 - 0:12:10     Text: used in linguistics. It's sort of in some sense a little bit different to traditional computer

0:12:10 - 0:12:18     Text: science notation since the star is used in both to mean zero or more of something. So you could have

0:12:18 - 0:12:25     Text: zero one two three four five adjectives. Somehow it's usual in linguistics that when you're using

0:12:25 - 0:12:32     Text: the star, you also put parentheses around it to mean it's optional. So sort of parentheses and

0:12:32 - 0:12:38     Text: star are used together to mean any number of something. When it's parentheses just by themselves,

0:12:38 - 0:12:49     Text: that's then meaning zero or one. And then four are these two noun phrases different? No, they're both

0:12:49 - 0:12:56     Text: noun phrase rules. And so in our grammar, we can have multiple rules that expand noun phrase in

0:12:56 - 0:13:04     Text: different ways. But, you know, actually in my example here, my second rule because I wrote it

0:13:04 - 0:13:11     Text: quite generally, it actually covers the first rule as well. So actually at that point, I can cross out

0:13:11 - 0:13:17     Text: this first rule because I don't actually need it in my grammar. But in general, you know, you have

0:13:17 - 0:13:25     Text: a choice between writing multiple rules for noun phrase goes to categories, which effectively gives

0:13:25 - 0:13:35     Text: your disjunction or working out by various syntactic conventions how to compress them together. Okay.

0:13:36 - 0:13:42     Text: So that was what gets referred to in natural language processing as constituency grammars,

0:13:43 - 0:13:50     Text: where the standard form of constituency grammar is a context-free grammar of the sort that

0:13:50 - 0:13:56     Text: I trust you saw at least a teeny bit of either in CS 103 or something like a programming language

0:13:56 - 0:14:03     Text: as compilers formal languages class. There are other forms of grammars that also pick out constituency.

0:14:04 - 0:14:09     Text: There are things like tree adjoining grammars, but I'm not going to really talk about any of those now.

0:14:09 - 0:14:15     Text: What I actually want to present is a somewhat different way of looking at grammar, which is referred to

0:14:15 - 0:14:24     Text: as dependency grammar, which puts a dependency structure over sentences. Now actually,

0:14:24 - 0:14:30     Text: it's not that these two ways of looking at grammar have nothing to do with each other. I mean,

0:14:30 - 0:14:36     Text: there's a whole formal theory about the relationships between different kinds of grammars,

0:14:36 - 0:14:44     Text: and you can very precisely state relationships and isomorphisms between different grammars of

0:14:44 - 0:14:50     Text: different kinds. But on the surface, these two kinds of grammars look sort of different and

0:14:50 - 0:15:00     Text: emphasize different things. And for reasons of their sort of closeness to picking out relationships

0:15:00 - 0:15:09     Text: and sentences and their ease of use, it turns out that in modern natural language processing,

0:15:09 - 0:15:17     Text: starting, I guess, around 2000, sort of really in the last 20 years, NLP people have really swung

0:15:17 - 0:15:23     Text: behind dependency grammars. So if you look around now where people are using grammars in NLP,

0:15:23 - 0:15:29     Text: by far the most common thing that's being used is dependency grammars. So I'm going to teach us

0:15:29 - 0:15:36     Text: today a bit about those. And for what we're going to build in assignment three is building

0:15:36 - 0:15:44     Text: using supervised learning and neural dependency parser. So the idea of dependency grammar is that

0:15:44 - 0:15:52     Text: when we have a sentence, what we're going to do is we're going to say for each word, what other

0:15:52 - 0:16:00     Text: words modify it. So what we're going to do is when we say the large crate, we're going to say,

0:16:00 - 0:16:09     Text: okay, well, large is modifying crate and that is modifying crate in the kitchen. That is modifying

0:16:09 - 0:16:18     Text: kitchen by the door. That is modifying door. And so I'm showing modification, a dependency or

0:16:18 - 0:16:25     Text: an attachment relationship by drawing an arrow from the head to what's referred to

0:16:25 - 0:16:33     Text: in dependency grammar as the dependent. The thing that modifies further specifies or attaches

0:16:34 - 0:16:44     Text: to the head. Okay, so that's the start of this. Well, another dependency that is that, well,

0:16:46 - 0:16:52     Text: looking in the large crate, that where you're looking is in the large crate. So you're going to

0:16:52 - 0:17:00     Text: want to have the large in the large crate as being a dependent of look. And so that's also going

0:17:00 - 0:17:08     Text: to be a dependency relationship here. And then there's one final bit that might seem a little bit

0:17:08 - 0:17:15     Text: confusing to people. And that's actually when we have these prepositions, there are two ways that

0:17:15 - 0:17:23     Text: you can think that this might work. So if it was something like look in the crate,

0:17:25 - 0:17:33     Text: that seems like that is a dependent of crate, but you could think that you want to say look in

0:17:33 - 0:17:39     Text: and it's in the crate and give this dependency relationship with the sort of preposition

0:17:39 - 0:17:46     Text: as sort of thinking of it as the head of what was before our prepositional phrase. And that's

0:17:46 - 0:17:55     Text: a possible strategy in the dependency grammar. But what I'm going to show you today and what you're

0:17:55 - 0:18:02     Text: going to use in this assignment is dependency grammars that follow the representation of universal

0:18:02 - 0:18:09     Text: dependencies. And universal dependencies is a framework which actually I was involved in creating,

0:18:09 - 0:18:15     Text: which was set up to try and give a common dependency grammar over many different human languages.

0:18:15 - 0:18:23     Text: And in the design decisions that were made in the context of designing universal dependencies,

0:18:24 - 0:18:32     Text: what we decided was that for what in some languages you use prepositions, lots of other

0:18:32 - 0:18:39     Text: languages make much more use of case marking. So if you've seen something like German, you've seen

0:18:39 - 0:18:48     Text: more case markings like genitive and date of cases. And in other languages like Latin or Finnish,

0:18:49 - 0:18:55     Text: lots of Native American languages, you have many more case markings again, which cover most of

0:18:55 - 0:19:03     Text: the role of prepositions. So in universal dependencies, essentially in the crate is treated like a

0:19:03 - 0:19:12     Text: case marked noun. And so what we say is that the in is also a dependent of crate and then you're

0:19:12 - 0:19:21     Text: looking in the crate. So in the structure we adopt in as dependent of crate, this in as a

0:19:21 - 0:19:30     Text: dependent of kitchen, this by as a dependent of door. And then we have these prepositional phrases

0:19:30 - 0:19:37     Text: in the kitchen by the door and we want to work out well what they modify. Well in the kitchen

0:19:38 - 0:19:43     Text: is modifying crate right because it's a crate in the kitchen. So we're going to say that it's

0:19:43 - 0:19:52     Text: this piece is a dependent of crate. And then well what about by the door? Well it's not really

0:19:52 - 0:19:59     Text: meaning that's a kitchen by the door and it's not meaning to look by the door. Again it's a crate

0:19:59 - 0:20:06     Text: by the door. And so what we're going to have is the crate also has door as a dependent. And so

0:20:06 - 0:20:20     Text: that gives us our full dependency structure of this sentence. Okay. And so that's a teeny introduction

0:20:20 - 0:20:27     Text: to syntactic structure. I'm going to say a bit more about it and give a few more examples.

0:20:27 - 0:20:33     Text: But let me just for a moment sort of say a little bit about why are we interested in syntactic

0:20:33 - 0:20:41     Text: structure? Why do we need to know the structure of sentences? And this gets into how does human

0:20:41 - 0:20:50     Text: languages work? So human languages can can communicate very complex ideas. I mean in fact you know

0:20:50 - 0:20:57     Text: anything that humans know how to communicate to one another they communicate pretty much by

0:20:57 - 0:21:05     Text: using words. So we can structure and communicate very complex ideas. But we can't communicate a

0:21:05 - 0:21:14     Text: really complex idea by one word. We can't just you know choose a word like you know empathy and say

0:21:14 - 0:21:19     Text: it with a lot of meaning and say empathy and the other person's meant to understand everything

0:21:19 - 0:21:26     Text: about what that means. Right. We have to compose a complex meaning that explains things by putting

0:21:26 - 0:21:34     Text: words together into bigger units. And the syntax of a language allows us to put words together

0:21:34 - 0:21:42     Text: into bigger units where we can build up and convey to other people a complex meaning. And so

0:21:42 - 0:21:48     Text: then the listener doesn't get this syntactic structure. Right. The syntactic structure of the

0:21:48 - 0:21:56     Text: sentence is hidden from the listener. All the listener gets is a sequence of words one after another

0:21:56 - 0:22:03     Text: bang bang bang. So the listener has to be able to do what I was just trying to do in this example

0:22:03 - 0:22:11     Text: that as the sequence of words comes in that the listener works out which words modify which

0:22:11 - 0:22:18     Text: are the words and therefore can construct the structure of the sentence and hence the meaning of

0:22:18 - 0:22:27     Text: the sentence. And so in the same way if we want to build clever neural net models that can understand

0:22:27 - 0:22:34     Text: the meaning of sentences those clever neural net models also have to understand what is this

0:22:34 - 0:22:39     Text: structure of the sentence so that they can interpret the language correctly. And we'll go through

0:22:39 - 0:22:46     Text: some examples and see more of that. Okay. So the fundamental point that we're going to sort of

0:22:46 - 0:22:55     Text: spend a bit more time on is that these choices of how you build up the structure of a language

0:22:56 - 0:23:04     Text: change the interpretation of the language and a human listener or equally a natural language

0:23:04 - 0:23:13     Text: understanding program has to make in a sort of probabilistic fashion choices as to which words

0:23:13 - 0:23:19     Text: modify I depend upon which other words so that they're coming up with the interpretation of the

0:23:19 - 0:23:28     Text: sentence that they think was intended by the person who said it. Okay. So to get a sense of this

0:23:28 - 0:23:36     Text: and how sentence structure is interesting and difficult what I'm going to go through now is a

0:23:36 - 0:23:44     Text: few examples of different ambiguities that you find in natural language and I've got some funny

0:23:44 - 0:23:52     Text: examples from newspaper headlines but these are all real natural language ambiguities that you find

0:23:52 - 0:23:59     Text: throughout natural language. Well at this point I should say this is where I'm being guilty of

0:23:59 - 0:24:07     Text: saying natural language but I'm meaning in English. Some of these ambiguities you find in lots of

0:24:07 - 0:24:14     Text: other languages as well but which ambiguities that are for syntactic structure partly depend on the

0:24:14 - 0:24:21     Text: details of the language. So different languages have different syntactic constructions, different

0:24:21 - 0:24:29     Text: word orders, different amounts of words having different forms of words like case markings. And so

0:24:29 - 0:24:37     Text: depending on those details there might be different ambiguities. So here's one ambiguity which is

0:24:37 - 0:24:45     Text: one of the commentest ambiguities in English. So San Jose cops kill man with knife. So this sentence

0:24:45 - 0:24:55     Text: has two meanings either it's the San Jose cops who are killing a man and they're killing a man

0:24:55 - 0:25:03     Text: with a knife. And so that corresponds to a dependency structure where the San Jose cops

0:25:03 - 0:25:13     Text: are the subject of killing the man is the object of killing and then the knife is then the instrument

0:25:13 - 0:25:21     Text: with which they're doing the killing so that the knife is an oblique modifier for the instrument

0:25:21 - 0:25:28     Text: of killing. And so that's one possible structure for this sentence but it's probably not the right one.

0:25:29 - 0:25:37     Text: So what it actually probably was was that it was a man with a knife and the San Jose cops killed

0:25:37 - 0:25:49     Text: the man. So that corresponds to the knife then being a noun modifier of the man and then kill

0:25:49 - 0:25:55     Text: is still killing the man. So the man is the object of killing and the cops are still the subject.

0:25:57 - 0:26:05     Text: And so whenever you have a prepositional phrase like this that's coming further on in a sentence

0:26:06 - 0:26:13     Text: there's a choice of how to interpret it. It could be either interpreted as modifying a noun

0:26:13 - 0:26:20     Text: phrase that comes before it or it can be interpreted as modifying a verb that comes before it.

0:26:20 - 0:26:26     Text: So systematically in English you get these prepositional phrase attachment ambiguities

0:26:26 - 0:26:34     Text: throughout all of our sentences but you know to give two further observations on that you know

0:26:34 - 0:26:43     Text: the first observation is you know you encounter sentences with prepositional phrase attachment

0:26:44 - 0:26:51     Text: ambiguities every time you read a newspaper article every time you talk to somebody but most of

0:26:51 - 0:26:58     Text: the time you never notice them and that's because our human brains are incredibly good at considering

0:26:58 - 0:27:05     Text: the possible interpretations and going with the one that makes sense according to context.

0:27:07 - 0:27:14     Text: The second comment as I said different human languages expose different ambiguities. So for

0:27:14 - 0:27:20     Text: example this is an ambiguity that you normally don't get in Chinese because in Chinese

0:27:21 - 0:27:28     Text: prepositional phrases modifying a verb are normally placed before the verb and so there you

0:27:28 - 0:27:35     Text: don't standedly get this ambiguity but you know there are different other ambiguities that you find

0:27:35 - 0:27:43     Text: commonly in Chinese sentences. Okay so this ambiguity you find everywhere because prepositional

0:27:43 - 0:27:49     Text: phrases are really common at the right ends of sentences so here's another one scientist count

0:27:49 - 0:27:56     Text: whales from space so that gives us these two possible interpretations that there are whales from

0:27:56 - 0:28:04     Text: space and scientists accounting them and then the other one is how the scientists accounting the

0:28:04 - 0:28:11     Text: whales is that they're counting them from space and they're using satellites to count the

0:28:11 - 0:28:18     Text: sales which is the correct interpretation that the newspaper hopes that you're getting.

0:28:18 - 0:28:30     Text: And this problem gets much much more complex because many sentences in English have prepositional

0:28:30 - 0:28:37     Text: phrases all over the place so here's the kind of boring sentence that you find in the financial

0:28:37 - 0:28:45     Text: news the board approved its acquisition by Royal Trust Co Ltd of Toronto for $27 a share at its

0:28:45 - 0:28:51     Text: monthly meeting and while if you look at the structure of this sentence what we find is you know

0:28:51 - 0:29:01     Text: here's a verb then here's the object noun phrase so we've got the object noun phrase here and then

0:29:01 - 0:29:08     Text: after that what do we find well we find a prepositional phrase another prepositional phrase another

0:29:08 - 0:29:15     Text: prepositional phrase and another prepositional phrase and how to attach each of these is then

0:29:15 - 0:29:22     Text: ambiguous so the basic rule of how you can attach them is you can attach them to things to the left

0:29:23 - 0:29:31     Text: providing you don't create crossing attachments so in principle by Royal Trust Co Ltd

0:29:31 - 0:29:39     Text: could be attached to either approved or acquisition but in this case by Royal Trust Co Ltd it is

0:29:39 - 0:29:52     Text: the acquirer so it's a modifier of the acquisition okay so then we have of Toronto so of Toronto

0:29:52 - 0:29:59     Text: could be modifying Royal Trust Co Ltd it could be modifying the acquisition or it can be modifying

0:29:59 - 0:30:06     Text: the approved and in this case the of Toronto is telling you more about the company and so

0:30:06 - 0:30:15     Text: it's a modifier of Royal Trust Co Ltd okay so then the next one is for $27 a share and that could

0:30:15 - 0:30:23     Text: be modifying Toronto Royal Trust Co Ltd the acquisition or the approving and well in this case

0:30:25 - 0:30:32     Text: that's talking about the price of the acquisition so this one is mod go jumps back and this is now

0:30:32 - 0:30:41     Text: prepositional phrase that's modifying the acquisition and then at the end at its monthly meeting

0:30:42 - 0:30:49     Text: well that's where the approval is happening by the by the board so rather than any of these

0:30:49 - 0:30:58     Text: preceding four noun phrases at its monthly meeting is modifying the approval and so it

0:30:58 - 0:31:05     Text: attaches right back there and this example is kind of too big and so I couldn't fit it in one line

0:31:05 - 0:31:11     Text: but as I think maybe you can see that you know none of these dependencies cross each other

0:31:11 - 0:31:19     Text: and they connect at different places ambiguously so because we can chain these prepositions like this

0:31:19 - 0:31:26     Text: and attach them at different places like this human language sentences are actually extremely

0:31:26 - 0:31:37     Text: ambiguous so the number if you have a sentence with K prepositional phrases at the end of

0:31:37 - 0:31:44     Text: earth where here we have K equals four the number of parses this sentence has the number of different

0:31:44 - 0:31:50     Text: ways you can make these attachments is given by the cutler numbers so the cutler numbers are

0:31:50 - 0:31:58     Text: an exponentially growing series which arises in many tree like context so if you're doing something

0:31:58 - 0:32:04     Text: like triangulations of a polygon you get cutler numbers if you're doing triangulation and graphical

0:32:04 - 0:32:11     Text: models in CS228 you get cutler numbers but we don't need to worry about the details here the central

0:32:11 - 0:32:18     Text: point is this is an exponential series and so you're getting an exponential number of parses in terms

0:32:18 - 0:32:24     Text: of the number of prepositional phrases and so in general you know the number of parses human

0:32:24 - 0:32:31     Text: languages have is exponential in their length which is kind of bad news because if you're then trying

0:32:31 - 0:32:39     Text: to enumerate all the parses it you might fear that you really have to do a ton of work the thing to

0:32:39 - 0:32:48     Text: notice about structures like these prepositional phrase attachment ambiguities is that there's nothing

0:32:48 - 0:32:57     Text: that resolves these ambiguities in terms of the structure of the sentence so if you've done something

0:32:57 - 0:33:03     Text: like looked at the kind of grammars that are used in compilers that the grammars used in compilings

0:33:03 - 0:33:11     Text: and compilers for programming languages are mainly made to be unambiguous and to the extent that

0:33:11 - 0:33:18     Text: there are any ambiguities there are default rules that are used to say choose this one particular

0:33:19 - 0:33:27     Text: parse tree for your piece of a programming language and human languages just aren't like that

0:33:27 - 0:33:33     Text: they're globally ambiguous and the listening human is just meant to be smart enough to figure out

0:33:33 - 0:33:44     Text: what was intended so the analogy would be that you know in programming languages when you're working

0:33:44 - 0:33:54     Text: out what does an else clause modify well you've got the answer that you can either look at the

0:33:54 - 0:34:00     Text: curly braces to work out what the else clause modifies or if you're using Python you look at the

0:34:00 - 0:34:07     Text: indentation and it tells you what the else clause modifies where by contrast for human languages

0:34:08 - 0:34:16     Text: the it would be just write down else something doesn't matter how you do it you don't need parentheses

0:34:16 - 0:34:21     Text: you don't need indentation the human being will just figure out what the else clause is meant to

0:34:21 - 0:34:31     Text: pair up with okay lots of other forms of ambiguities in human languages so let's look at a few others

0:34:31 - 0:34:38     Text: another one that is very common over all sorts of languages is coordination scope ambiguities

0:34:39 - 0:34:44     Text: so here's a sentence shuttle veteran and long time that's your executive Fred Gregory appointed

0:34:44 - 0:34:53     Text: to board well this is an ambiguous sentence there are two possible readings of this one reading

0:34:53 - 0:34:59     Text: is that there are two people there's a shuttle veteran and there's a long time that's your

0:34:59 - 0:35:08     Text: executive Fred Gregory and they were both appointed to the board two people and the other possibility

0:35:08 - 0:35:17     Text: is there's someone named Fred Gregory who's a shuttle veteran and long time that's your executive

0:35:17 - 0:35:26     Text: and they're appointed to the verb one person and these two interpretations again correspond to having

0:35:26 - 0:35:36     Text: different paths structures so in one structure we've got a coordination of the shuttle veteran

0:35:36 - 0:35:43     Text: and the long time that's your executive Fred Gregory coordinated together in one case these

0:35:43 - 0:35:53     Text: are coordinated and then Fred Gregory specifies the name of the Nassar executive so it's then

0:35:54 - 0:36:01     Text: specifying who that executive is where the what in the other one the shuttle veteran and long time

0:36:01 - 0:36:09     Text: Nassar executive all together is then something that is a modifier of Fred Gregory

0:36:12 - 0:36:20     Text: okay so one time this is the unit that modifies Fred Gregory in the other one up here just long time

0:36:20 - 0:36:27     Text: Nassar executive modifies Fred Gregory and then that's can join together with the shuttle veteran

0:36:27 - 0:36:35     Text: and so that also gives different interpretations so this is a slightly reduced example of the

0:36:35 - 0:36:44     Text: I mean in newspaper headlines tend to be more ambiguous than many other pieces of text because

0:36:44 - 0:36:50     Text: they're written in this short and formed get things to fit and this isn't especially short and

0:36:50 - 0:36:59     Text: form whereas actually left out in explicit conjunction but this headline says doctor no heart cognitive

0:36:59 - 0:37:06     Text: issues and this was after I guess one of Trump it was after Trump's first physical and while this

0:37:06 - 0:37:12     Text: is an ambiguity because there are two ways that you can read this you can either read this as saying

0:37:12 - 0:37:22     Text: doctor no heart and cognitive issues which gives you one interpretation instead of that the way we

0:37:22 - 0:37:32     Text: should read it is that it's heart or cognitive and so it's then saying no heart or cognitive issues

0:37:32 - 0:37:41     Text: and we have a different narrower scope of the coordination and then we get a different reading.

0:37:43 - 0:37:51     Text: Okay I want to give a couple more examples of different kinds of ambiguities another one you see

0:37:51 - 0:37:57     Text: quite a bit is when you have modifiers that are adjectives and adverbs that there are different

0:37:57 - 0:38:04     Text: ways that you don't have things modifying other things this example is a little bit not safe for

0:38:04 - 0:38:14     Text: work but here goes students get first hand job experience so this is an ambiguous sentence and again

0:38:14 - 0:38:22     Text: we can think of it as a syntactic ambiguity in terms of which things modify which other things

0:38:22 - 0:38:34     Text: so the nice polite way to render this sentence is that first is modifying hand so we've got first hand

0:38:34 - 0:38:43     Text: it's job experience so job is a compound now modifying experience and it's first hand experience

0:38:43 - 0:38:53     Text: so first hand is then modifying experience and then get is the object of our first hand job

0:38:53 - 0:39:02     Text: experience is the object of get and the students are the subject of get but if you have a smarty

0:39:02 - 0:39:13     Text: a mind you can interpret this a different way and in the alternative interpretation you then have hand

0:39:13 - 0:39:24     Text: going together with job and the first is then a modifier of experience and job is still a

0:39:24 - 0:39:30     Text: modifier of experience and so then you get this different power structure and different interpretation

0:39:30 - 0:39:40     Text: there okay one more example in a way this example similar to the previous one it's sort of having

0:39:41 - 0:39:47     Text: modifier pieces that can modify different things but rather than just being with individual adjectives

0:39:47 - 0:39:56     Text: or individual adverbs is then much larger units such as verb phrases can often have attachment

0:39:56 - 0:40:04     Text: ambiguities so this sentence headline is mutilated body washes up on Rio Beach to be used for

0:40:04 - 0:40:11     Text: Olympics Beach volleyball so we have this big verb phrase here of to be used for Olympics Beach

0:40:11 - 0:40:21     Text: volleyball and then again we have this attachment decision that we could either say that that

0:40:21 - 0:40:33     Text: big verb phrase is modifying i is attached to the Rio Beach or we could say no no the to be used

0:40:33 - 0:40:43     Text: for Olympics Beach volleyball that that is modifying the mutilated body and it's a body that's

0:40:43 - 0:40:50     Text: to be used for the Olympics Beach volleyball which gives the funny reading yeah so I hope that's

0:40:50 - 0:40:58     Text: giving you at least a little bit of a sense of how human language syntactic structure is complex

0:40:58 - 0:41:06     Text: and big u.s and to work out the intended interpretations you need to know something about that structure

0:41:07 - 0:41:14     Text: in terms of how much you need to understand i mean you know this is under linguistics class if

0:41:14 - 0:41:19     Text: you'd like to learn more about human language structure you can go off and do a syntax class

0:41:19 - 0:41:26     Text: but you know we're not really going to spend a lot of time working through language structure

0:41:26 - 0:41:31     Text: but there will be some questions on this in the assignment and so we're expecting that you can

0:41:31 - 0:41:38     Text: be at the level that you can have sort of some intuitions as to which words and phrases are

0:41:38 - 0:41:44     Text: modifying other words and phrases and therefore you could choose between two dependency analyses

0:41:44 - 0:41:53     Text: which ones correct okay i've spent quite a bit of time on that so better keep going okay so

0:41:54 - 0:42:01     Text: the general idea is that knowing this sort of syntactic structure of a sentence can help us

0:42:01 - 0:42:07     Text: with semantic interpretation i mean as well as just generally saying we can understand language

0:42:07 - 0:42:13     Text: it's also used in many cases for simple practical forms of semantic extraction so people

0:42:13 - 0:42:19     Text: such as in biomedical informatics often want to get out particular relations such as protein

0:42:19 - 0:42:25     Text: protein interactions and while here's a sentence the results demonstrated that kai c interacts

0:42:25 - 0:42:35     Text: rhythmically with sasa kai and kai b and commonly that people can get out those kind of relationships

0:42:35 - 0:42:42     Text: by looking at patterns of dependency relations with particular verbs so for the interacts verb

0:42:42 - 0:42:48     Text: if you have a pattern of something being the subject and something else being the noun modifier

0:42:48 - 0:42:53     Text: of interacts well that's an interaction relationship but it gets a bit more complicated than that

0:42:53 - 0:42:59     Text: as in this example because often there are conjunctions so you also have another pattern

0:42:59 - 0:43:07     Text: where you have also interactions between the subject and the noun modifiers conjunct

0:43:07 - 0:43:17     Text: which will allow us to also find the kai and kai b examples okay um so i've sort of given an informal

0:43:17 - 0:43:24     Text: tour of dependency grammar to just try and uh quickly um say a little bit more about formally

0:43:24 - 0:43:33     Text: what a dependency grammar is so in dependency syntax what we say is that the syntactic structure

0:43:33 - 0:43:42     Text: of a sentence consists of relations between pairs of words um and it's a binary asymmetric relation

0:43:42 - 0:43:50     Text: i we draw arrows between pairs of words which we call dependencies now normally dependency

0:43:50 - 0:43:56     Text: grammars then type those grammatical relation type those arrows to express what kind of

0:43:56 - 0:44:02     Text: relation that there is and so that they have some kind of taxonomy of grammatical relation so we

0:44:02 - 0:44:08     Text: might have a subject grammatical relation of verbal auxiliary grammatical relation and a bleak

0:44:08 - 0:44:16     Text: modifier grammatical relation we have some kind of typology of grammatical relations um so and we

0:44:16 - 0:44:25     Text: refer to the arrows going between the head is the head here and something that is a dependent of

0:44:25 - 0:44:34     Text: it so the subject of a verb is the dependent of the verb or when you have a noun modifier like

0:44:34 - 0:44:45     Text: our sort of cuddly cat we say that um cuddly is a dependent of cat and so cat is the head of cuddly

0:44:45 - 0:44:55     Text: cat and so normally um dependencies like in these examples form a tree which is formal it so

0:44:55 - 0:45:04     Text: it's not just any graph with arrows we have an graph which is connected a cyclic and has a single

0:45:04 - 0:45:13     Text: root so here's the root of the graph um and so that gives us a dependency tree analysis um dependency

0:45:13 - 0:45:23     Text: grammars have a really really long history um so the famous first linguist um was panini um who

0:45:23 - 0:45:30     Text: wrote about the structure of Sanskrit um and mainly he worked on the sound system of Sanskrit

0:45:30 - 0:45:36     Text: and how sounds change in various contexts which what linguists call phonology and the different

0:45:36 - 0:45:43     Text: forms of Sanskrit words Sanskrit has rich morphology of inflecting nouns and verbs for different

0:45:43 - 0:45:50     Text: cases and forms um but he also worked a little on the syntactic structure of Sanskrit censors

0:45:50 - 0:45:58     Text: and essentially what he proposed was the dependency grammar over Sanskrit sentences and it turns out

0:45:58 - 0:46:05     Text: that sort of from most of recorded history when then when people have then um gone on and tried to

0:46:05 - 0:46:13     Text: put structures over human sentences um what they have used is dependency grammars um so there was a

0:46:13 - 0:46:20     Text: lot of work in the first millennium by Arabic grammarians of trying to work out the grammar um

0:46:20 - 0:46:26     Text: structure of sentences and effectively what they used was but you know kind what I've just presented

0:46:26 - 0:46:35     Text: as a dependency grammar so compared to you know 2500 years of history the ideas of having context

0:46:35 - 0:46:41     Text: free grammars and having constituency grammars is actually a really really recent invention so it

0:46:41 - 0:46:48     Text: was really sort of in the middle of the 20th century that the ideas of um constituency grammar and

0:46:48 - 0:46:54     Text: context free grammars would develop first by wells in the forties and then by known chomsky in the

0:46:54 - 0:47:01     Text: early 50s leading to things like the chomsky hierarchy that you might see um CS 103 or formal

0:47:01 - 0:47:10     Text: languages class um so for modern work on dependency grammar using kind of the terminology and um

0:47:10 - 0:47:16     Text: notation that I've just introduced that's normally attributed to Lucian Tania who was a French

0:47:16 - 0:47:24     Text: linguist um in around the sort of middle of the 20th century as well um dependency grammar was

0:47:24 - 0:47:31     Text: widely used in the 20th century um in a number of places I mean in particular it tends to be

0:47:31 - 0:47:37     Text: sort of much more natural and easier to think about for languages that have a lot of different

0:47:37 - 0:47:44     Text: case markings on nouns like nomad of accused of genitive data of instrumental kind of cases like

0:47:44 - 0:47:49     Text: you get in the language like Latin or Russian and a lot of those languages have much

0:47:49 - 0:47:55     Text: free word order than English so the subject or objective you know in English the subject has to

0:47:55 - 0:48:00     Text: be before the verb and the object has to be after the verb but lots of other languages have much

0:48:00 - 0:48:07     Text: free word order and instead use different forms of nouns to show you what's the subject or the

0:48:07 - 0:48:13     Text: object of the sentence and dependency grammars can often seem much more natural for those kinds of

0:48:13 - 0:48:19     Text: languages dependency grammars were also prominent at the very beginnings of computational linguistics so

0:48:20 - 0:48:27     Text: one of the first people working computational linguistics in the US was David Hayes so the

0:48:27 - 0:48:32     Text: professional society for computational linguistics is called the association for computational linguistics

0:48:32 - 0:48:37     Text: and he was actually one of the founders of the association for computational linguistics

0:48:37 - 0:48:44     Text: and he published in the early 1960s and early perhaps the first dependency grammar past how

0:48:44 - 0:48:54     Text: you dependency parser okay yeah a little teeny note just in case you see other things when

0:48:54 - 0:49:00     Text: when you have these arrows you can draw them in either direction you either draw arrows from their

0:49:00 - 0:49:06     Text: head or to the dependent or from the dependent to the head and actually different people have

0:49:06 - 0:49:13     Text: done one and the other right so the way ten year drew them was to draw them from the head to the

0:49:13 - 0:49:18     Text: the dependent and we're following that convention but you know if you're looking at something that

0:49:18 - 0:49:24     Text: somebody else has written with dependency arrows the first thing you have to work out is are they

0:49:24 - 0:49:31     Text: using the arrow heads at the heads or the dependence now and not one other thing here is that

0:49:31 - 0:49:39     Text: we a sentence is seen as having the overall head word of the sentence which every other word of

0:49:39 - 0:49:46     Text: the sentence hangs off it's a common convention to add this sort of fake route to every sentence

0:49:46 - 0:49:54     Text: that then points to the head word of the whole sentence here completed that just tends to make

0:49:54 - 0:50:01     Text: the algorithmic stuff easier because then you can say that every word of the sentence is dependent

0:50:01 - 0:50:08     Text: on precisely one other node where what you can be dependent on is either another word on the

0:50:08 - 0:50:14     Text: sentence or the fake route of the sentence and when we build our parsers we will introduce that

0:50:14 - 0:50:27     Text: fake route okay so that's sort of dependency grammars and dependency structure I now want to

0:50:27 - 0:50:36     Text: get us back to natural language processing and starting to build parsers for dependency grammars

0:50:36 - 0:50:45     Text: but before doing that I just want to say yeah where do we get our data from and that's actually

0:50:45 - 0:50:56     Text: an interesting story in some sense so the answer to that is well what we do is get

0:50:56 - 0:51:03     Text: human beings commonly linguists or other people who are actually interested in the structure

0:51:03 - 0:51:11     Text: of human sentences and we get them to sit around and hand parse sentences and give them dependency

0:51:11 - 0:51:22     Text: structures and we collect a lot of those parsers and we call that a tree bank and so this is

0:51:22 - 0:51:30     Text: something that really only started happening in the late 80s and took off in a big away in the 90s

0:51:30 - 0:51:36     Text: until then no one had attempted to build tree banks lots of people had attempted to build parsers

0:51:36 - 0:51:44     Text: and it seemed like well if you want to build a parser the efficient way to do it is to start writing

0:51:44 - 0:51:50     Text: a grammar so you start writing some grammar rules and you start writing a lexicon with words and

0:51:50 - 0:51:57     Text: parts of speech and you sit around working on your grammar when I was a PhD student one of my first

0:51:57 - 0:52:04     Text: summer jobs was spending the summer handwriting a grammar and it sort of seems like writing a

0:52:04 - 0:52:09     Text: grammar is more efficient because you're writing this one general thing that tells you the structure

0:52:09 - 0:52:15     Text: of a human language but there's just been this massive sea change partly driven by the adoption

0:52:15 - 0:52:22     Text: of machine learning techniques where it's now seen as axiomatic that the way to make progress

0:52:22 - 0:52:31     Text: is to have annotated data namely here a tree bank that shows you the structure of sentences

0:52:31 - 0:52:38     Text: and so what I'm showing here is a teeny extract from a universal dependencies tree bank and so that's

0:52:38 - 0:52:44     Text: what I mentioned earlier that this has been this effort to try and have a common dependency

0:52:44 - 0:52:49     Text: grammar representation that you can apply to lots of different human languages and so you can go

0:52:49 - 0:52:55     Text: over to this URL and see that there's about 60 different languages at the moment which have universal

0:52:55 - 0:53:05     Text: dependencies tree banks. So why are tree banks good? I mean it sort of seems like it's bad news if

0:53:05 - 0:53:12     Text: you have to have people sitting around for weeks and months hand-posing sentences it seems a lot

0:53:12 - 0:53:20     Text: slower and actually a lot less useful than having somebody writing a grammar which just has

0:53:21 - 0:53:30     Text: you know a much bigger multiply factor in the utility of their effort. It turns out that although

0:53:30 - 0:53:36     Text: that initial feeling seems sort of valid that in practice there's just a lot more you can do with

0:53:36 - 0:53:46     Text: the tree bank. So why are tree banks great? You know one reason is the tree banks are highly reusable

0:53:46 - 0:53:53     Text: so typically when people have written grammars they've written grammars for you know one particular

0:53:53 - 0:54:00     Text: parser and the only thing it was ever used in is that one particular parser but when you build a

0:54:00 - 0:54:09     Text: tree bank that's just a useful data resource and people use it for all kinds of things. So the

0:54:09 - 0:54:16     Text: well-known tree banks have been used by hundreds and hundreds of people and although all tree banks

0:54:16 - 0:54:22     Text: were initially built for the purposes of hey let's help natural language processing systems

0:54:22 - 0:54:28     Text: it turns out that people have actually been able to do lots of other things with tree banks.

0:54:28 - 0:54:34     Text: So for example these days psycho-linguists commonly use tree banks to get various kinds of

0:54:34 - 0:54:41     Text: statistics about data for thinking about psycho-linguistic models. Linguists use tree banks for

0:54:41 - 0:54:47     Text: looking at patterns of different syntactic constructions that occur that there's just been a lot

0:54:47 - 0:54:55     Text: of reuse of this data for all kinds of purposes but they have other advantages that I mentioned here

0:54:55 - 0:55:00     Text: you know when people are just sitting around saying oh what sentences are good they tend to

0:55:00 - 0:55:06     Text: only think of the core of language where lots of weird things happen in language and so if you

0:55:06 - 0:55:12     Text: actually just have some sentences and you have to go off and parse them then you actually have to

0:55:12 - 0:55:19     Text: deal with the totality of language. Since you're parsing actual sentences you get statistics so

0:55:19 - 0:55:25     Text: you naturally get the kind of statistics that are useful to machine learning systems by

0:55:25 - 0:55:31     Text: constructing a tree bank where you don't get them for free if you handwrite a grammar but then a

0:55:31 - 0:55:41     Text: final way which is perhaps the most important of all is if you actually want to be able to do

0:55:43 - 0:55:49     Text: science of building systems you need a way to evaluate these NLP systems.

0:55:49 - 0:55:59     Text: I mean it seems hard to believe now but you know back in the 90s 80s when people built NLP

0:55:59 - 0:56:07     Text: parsers it was literally the case that the way they were evaluated was you said to your friend

0:56:07 - 0:56:12     Text: oh I built this parser type in a sentence on the terminal and see what it gives you back it's

0:56:12 - 0:56:19     Text: pretty good hey and that was just the way business was done whereas what we'd like to know is well

0:56:19 - 0:56:25     Text: as I showed you earlier English sentences can have lots of different parsers commonly can this

0:56:27 - 0:56:33     Text: system choose the right parsers for particular sentences and therefore have the basis of

0:56:34 - 0:56:40     Text: interpreting them as a human being would and well we can only systematically do that evaluation

0:56:40 - 0:56:46     Text: if we have a whole bunch of sentences that have been handparsed by humans with their correct

0:56:46 - 0:56:54     Text: interpretations so the rise of tree banks turned parser building into an empirical science where people

0:56:54 - 0:57:02     Text: could then compete rigorously on the basis of look my parser has 2% higher accuracy than your parser

0:57:02 - 0:57:10     Text: in choosing the correct parsers for sentences. Okay so well how do we build a parser

0:57:10 - 0:57:16     Text: once we've got dependencies so there's sort of a bunch of sources of information that you could

0:57:16 - 0:57:25     Text: hope to use so one source of information is looking at the words on either end of the dependency

0:57:25 - 0:57:33     Text: so discussing issues that seems a reasonable thing to say and so it's likely that issues

0:57:33 - 0:57:43     Text: could be the object of discussing whereas if it was some other word right if you were thinking of

0:57:43 - 0:57:50     Text: making you know outstanding the object of discussion discussing outstanding that doesn't sound right

0:57:50 - 0:57:58     Text: so that wouldn't be so good. A second source of information is distance so most dependencies are

0:57:58 - 0:58:05     Text: relatively short distance some of them aren't some of long distance dependencies but they're

0:58:05 - 0:58:12     Text: relatively rare the vast majority of dependencies nearby and another source of information is the

0:58:12 - 0:58:23     Text: intervening material so there are certain things that dependencies rarely span so clauses and sentences

0:58:23 - 0:58:32     Text: are normally organized around verbs and so dependencies rarely span across intervening verbs.

0:58:33 - 0:58:39     Text: We can also use punctuation and written language things like commas which can give some indication

0:58:39 - 0:58:47     Text: of the structure and so punctuation may also indicate bad places to have long distance dependencies

0:58:47 - 0:58:56     Text: over and there's one final source of information which is what's referred to as valency which is

0:58:56 - 0:59:03     Text: forehead what kind of information does it usually have around it so if you have a noun

0:59:05 - 0:59:11     Text: there are things that you just know about what kinds of dependence nouns normally have so it's

0:59:11 - 0:59:21     Text: common that it will have a determiner to the left the cat on the other hand it's not going to be the

0:59:21 - 0:59:26     Text: case that there's a determiner to the right cat that that's just not what you get in English

0:59:28 - 0:59:34     Text: on the left you're also likely to have an adjective or modify that's where he had cuddly

0:59:34 - 0:59:42     Text: but again it's not so likely you're going to have the adjective or modifier over on the right

0:59:42 - 0:59:49     Text: for cuddly so there are sort of facts about what things different kinds of words take on the left

0:59:49 - 0:59:55     Text: and the right and so that's the valency of the heads and that's also a useful source of information

0:59:56 - 1:00:04     Text: okay so what do we need to do using that information to build a parser well effectively

1:00:04 - 1:00:10     Text: what we do is have a sentence I'll give a talk tomorrow on your networks and what we have to do

1:00:10 - 1:00:17     Text: is say for every word in that sentence we have to choose some other word that it's a dependent of

1:00:17 - 1:00:25     Text: where one possibility is it's a dependent of root so we're giving it a structure where we're

1:00:25 - 1:00:33     Text: saying okay for this word I've decided that it's a dependent on networks and then for this word

1:00:33 - 1:00:44     Text: it's also a dependent on networks and for this word it's a dependent on give so we're choosing

1:00:45 - 1:00:53     Text: one for each word and there are usually a few constraints so only one word is a dependent of root

1:00:53 - 1:01:00     Text: we have a tree we don't want cycles so we don't want to say that word a is dependent on word b and

1:01:00 - 1:01:11     Text: word b is dependent on word a and then there's one final issue which is whether arrows can cross

1:01:11 - 1:01:18     Text: or not so in this particular sentence we actually have these crossing dependencies you can see there

1:01:18 - 1:01:25     Text: I'll give a talk tomorrow on neural networks and this is the correct dependency paths for this

1:01:25 - 1:01:32     Text: sentence because what we have here is that it's a talk and it's a talk on neural networks so the

1:01:32 - 1:01:39     Text: on neural networks modifies the talk but which leads to these crossing dependencies I didn't have to

1:01:39 - 1:01:46     Text: say it like that I could have said I'll give a talk on neural networks tomorrow and then on your

1:01:46 - 1:01:55     Text: networks would be next to the talk so most of the time in languages dependencies are projector of

1:01:55 - 1:02:01     Text: the things stay together so the dependencies have a kind of a nesting structure of the kind that

1:02:01 - 1:02:08     Text: you also see in context free grammars but most languages have at least a few phenomena where you

1:02:08 - 1:02:16     Text: ended up with these ability for phrases to be split apart which lead to non-projective dependencies

1:02:16 - 1:02:23     Text: so in particular one of them in English is that you can take modifying phrases and clauses like

1:02:23 - 1:02:29     Text: the on neural networks here and shift them right towards the end of the sentence and get I'll give

1:02:29 - 1:02:35     Text: a talk tomorrow on neural networks and that then leads to non-projective sentences

1:02:37 - 1:02:43     Text: so a pause is projected if there are no crossing dependency arcs when the words are laid out

1:02:43 - 1:02:50     Text: and then in your order with all arcs above the words and if you have a dependency paths that

1:02:50 - 1:02:55     Text: correspond to a context free grammar tree it actually has to be protective because context free

1:02:55 - 1:03:01     Text: grammars necessarily have this sort of nested tree structure following the linear order

1:03:02 - 1:03:08     Text: but dependency grammars normally allow non-projective structures to account for

1:03:08 - 1:03:14     Text: displacement constituents and you can't easily get the semantics of certain

1:03:14 - 1:03:20     Text: constructions right without these non-projective dependencies so here's another example in English

1:03:20 - 1:03:28     Text: with question formation with what's called preposition stranding so the sentence is who did

1:03:28 - 1:03:34     Text: bill by the coffee from yesterday there's another way I could have said this it's less natural in

1:03:34 - 1:03:45     Text: English but I could have said from who did bill by the coffee yesterday in many languages of the

1:03:45 - 1:03:53     Text: world that's the only way you could have said it and when you do that from who is kept together

1:03:53 - 1:03:59     Text: and you have a projective pause for the sentence but English allows and indeed much prefers

1:03:59 - 1:04:07     Text: you to do what is referred to as preposition stranding where you move the who but you just leave

1:04:07 - 1:04:14     Text: the preposition behind and so you get who did bill by the coffee from yesterday and so then

1:04:14 - 1:04:19     Text: we're ending up with this non-projective dependency structure as I've shown there

1:04:21 - 1:04:28     Text: okay I'll come back to non-projectivity in a little bit how do we go about building

1:04:28 - 1:04:36     Text: dependency parsers well there are a whole bunch of ways that you can build dependency parsers

1:04:36 - 1:04:42     Text: very quickly I'll just say a few names and I'll tell you about one of them so you can use dynamic

1:04:42 - 1:04:48     Text: programming methods to build dependency parsers so I showed earlier that you can have an exponential

1:04:48 - 1:04:53     Text: number of parsers for a sentence and that sounds like really bad news for building a system

1:04:53 - 1:04:58     Text: well it turns out that you can be clever and you can work out a way to dynamic program finding

1:04:58 - 1:05:05     Text: that exponential number of parsers and then you can have an oh and cubed algorithm so you could do that

1:05:07 - 1:05:13     Text: you can use graph algorithms and I'll say a bit about that later by that may spill into next time

1:05:14 - 1:05:22     Text: so you can see since we're wanting to kind of connect up all the words into a tree using

1:05:22 - 1:05:27     Text: graph edges that you could think of doing that using using a minimum spanning tree algorithm of

1:05:27 - 1:05:34     Text: the sort that you hopefully saw in CS 161 and so that idea has been used for parsing

1:05:34 - 1:05:41     Text: constraint satisfaction ideas that you might have seen in CS 221 have been used for dependency parsing

1:05:43 - 1:05:48     Text: but the way I'm going to show now is transition based parsing or sometimes referred to as

1:05:48 - 1:05:57     Text: deterministic dependency parsing and the idea of this is once going to use a transition system

1:05:57 - 1:06:04     Text: so that's like shift reduce parsing if you've seen shift reduce parsing in something like a

1:06:04 - 1:06:11     Text: compiler's class or formal languages class that shift and reduce transition steps and so use

1:06:11 - 1:06:20     Text: a transition system to guide the construction of parsers and so let me just explain about that

1:06:21 - 1:06:33     Text: so let's see so this was an idea that was made prominent by Yorkin Nivre who's a Swedish

1:06:33 - 1:06:43     Text: computational linguist who introduced this idea of greedy transition based parsing so his idea is

1:06:43 - 1:06:50     Text: well what we're going to do for dependency parsing is we're going to be able to parse sentences

1:06:50 - 1:06:57     Text: by having a set of transitions which are kind of like shift reduce parser and it's going to just

1:06:57 - 1:07:06     Text: work left to right bottom up and parse a sentence so we're going to say we have a stack sigma

1:07:07 - 1:07:13     Text: buffer beta of the words that we have to process and we're going to build up a set of dependency

1:07:13 - 1:07:20     Text: arcs by using actions which are shift and reduce actions and putting those together this will give

1:07:20 - 1:07:27     Text: us the ability to put parse structures over sentences and let me go through the details of

1:07:27 - 1:07:34     Text: this and this is a little bit hairy when you first see it that's not so complex really and

1:07:35 - 1:07:44     Text: it's this kind of transition based dependency parser is what we'll use in assignment 3 so what we

1:07:44 - 1:07:51     Text: have so this is our transition system we have a starting point where we start with a stack that

1:07:51 - 1:07:57     Text: just has the root symbol on it and a buffer that has the sentence that's about to parse we're about

1:07:57 - 1:08:07     Text: to parse and so far we haven't built any dependency arcs and so at each point in time we can choose one

1:08:07 - 1:08:19     Text: of three actions we can shift which moves the next word onto the stack we can then do actions

1:08:19 - 1:08:26     Text: that are the reduce actions so there are two reduce actions to make it a dependency grammar we

1:08:26 - 1:08:34     Text: can either do a left arc reduce or a right arc reduce so when we do either of those we take

1:08:34 - 1:08:42     Text: the top two items on the stack and we make one of them a dependent of the other one so we can

1:08:42 - 1:08:50     Text: either say okay let's make wi a dependent of wj or else we can say okay let's make wj a dependent

1:08:50 - 1:09:00     Text: of wi and so the result of when we do that is the one that's the dependent disappears from the stack

1:09:00 - 1:09:07     Text: and so in the stacks over here there's one less item but then we add a dependency arc to our

1:09:07 - 1:09:14     Text: arc set so that we say that we've got either a dependency from j to i or a dependency from i to j

1:09:15 - 1:09:22     Text: and commonly when we do this we actually also specify what grammatical relation connects the two

1:09:22 - 1:09:31     Text: such as subject object now modifier and so we also have here a relation that's still probably

1:09:31 - 1:09:40     Text: still very abstract so let's go through an example so this is how a simple transition based dependency

1:09:40 - 1:09:46     Text: parser what's referred to as an arc standard transition based dependency parser would parse up i8

1:09:46 - 1:09:52     Text: the fish so remember these are the different operations that we can apply so to start off with we

1:09:52 - 1:09:59     Text: have root on the stack and the sentence in the buffer and we have no dependency arcs constructed

1:09:59 - 1:10:05     Text: so we have to choose one of the three actions and when there's only one thing on the stack the only

1:10:05 - 1:10:13     Text: thing we can do is shift so we shift now the stack looks like this so now we have to take another

1:10:13 - 1:10:20     Text: action and at this point we have a choice because we could immediately reduce so you know we could

1:10:20 - 1:10:28     Text: say okay let's just make i a dependent of root and we'd get a stack size of one again but that

1:10:28 - 1:10:36     Text: would be the wrong thing to do because i isn't the head of the sentence so what we should instead do

1:10:36 - 1:10:44     Text: is shift again and get i8 on the stack and fish still in the buffer well at that point we keep

1:10:44 - 1:10:53     Text: on parsing a bit further and so now what we can do is say well wait a minute now i is a dependent

1:10:53 - 1:11:02     Text: of eight and so we can do a left arc reduce and so i disappears from the stack so here's our new stack

1:11:02 - 1:11:10     Text: but we add to the set of arcs that we've added that i is the subject of eight okay well after that

1:11:11 - 1:11:16     Text: we could have we could reduce again because there's still two things on the stack but that'd be the

1:11:16 - 1:11:24     Text: wrong thing to do the right thing to do next would be to shift fish onto the stack and then at that

1:11:24 - 1:11:35     Text: point we can do a right arc reduce saying that eight is the object of fish and add a new dependency

1:11:35 - 1:11:44     Text: to our dependency set and then we can one more time do a right arc reduce to say that eight is the

1:11:44 - 1:11:51     Text: root of the whole sentence and add in that extra root relation with our pseudo root and at that

1:11:51 - 1:11:58     Text: point we reach the end condition so the end condition was the buffer was empty and there's one thing

1:11:58 - 1:12:06     Text: the root on the stack and at that point we can finish so this little transition machine does the

1:12:06 - 1:12:16     Text: parsing up of the sentence but there's one thing that's left to explain still here which is how do

1:12:16 - 1:12:22     Text: you choose the next action so as soon as you have two things or more on the stack what you do next

1:12:23 - 1:12:28     Text: you've always got a choice you could keep shifting at least if there's still things on the buffer

1:12:28 - 1:12:34     Text: or you can do a left arc or you can do a right arc and how do you know what choices correct

1:12:34 - 1:12:40     Text: and well one answer to that is to say well you don't know what choices correct and that's why

1:12:40 - 1:12:47     Text: parsing is hard and sentences are ambiguous you can do any of those things you have to explore

1:12:47 - 1:12:54     Text: all of them and well if you naively explore all of them then you do an exponential amount of work

1:12:54 - 1:13:05     Text: to parse the sentence so in the early 2000s you're a commandeer phrase and you know that's essentially

1:13:05 - 1:13:14     Text: what people have done in the 80s and 90s is explore every path but in the early 2000s you're a commandeer

1:13:14 - 1:13:22     Text: phrase essential observation was but wait a minute we know about machine learning now so why don't

1:13:22 - 1:13:31     Text: I try and train a classifier which predicts what the next action I should take is given this stack

1:13:31 - 1:13:40     Text: and buffer configuration because if I can write a machine learning classifier which can nearly

1:13:40 - 1:13:49     Text: always correctly predict the next action given a stack and buffer then I'm in a really good position

1:13:49 - 1:13:56     Text: because then I can build what's referred to as a greedy dependency parser which just goes

1:13:56 - 1:14:04     Text: bang bang bang word at a time okay here's the next thing run classifier choose next action run

1:14:04 - 1:14:10     Text: classifier choose next action run classifier choose next action so that the amount of work that

1:14:10 - 1:14:19     Text: we're doing becomes linear in the length of the sentence rather than that being cubic in the length

1:14:19 - 1:14:24     Text: of the sentence using dynamic programming or exponential in the length of the sentence if you

1:14:24 - 1:14:32     Text: don't use dynamic programming so for each at each step we predict the next action using some

1:14:32 - 1:14:38     Text: discriminative classifier so starting off he was using things like support vector machines

1:14:38 - 1:14:43     Text: but it can be anything at all like a softmax classifier that's closer to our neural networks

1:14:43 - 1:14:50     Text: and there are either for what I presented three classes if you're just thinking of the two

1:14:50 - 1:14:55     Text: reduces in the shift or if you're thinking of you're also assigning a relation and you have a set

1:14:55 - 1:15:03     Text: of our relations like 20 relations then that's be sort of 41 moves that you could decide on at each

1:15:03 - 1:15:10     Text: point and the features are effectively the configurations I was showing before what's the top of the

1:15:10 - 1:15:15     Text: stack word what part of speech is it what's the first word in the buffer what's that words part

1:15:15 - 1:15:21     Text: of speech etc and so in the simplest way of doing this you're now doing no search at all you

1:15:21 - 1:15:28     Text: would just sort of take each configuration and turn decide the most likely next move and you make

1:15:28 - 1:15:35     Text: it and that's a greedy dependency parser which is widely used you can do better if you want to do

1:15:35 - 1:15:42     Text: a lot more work so you can do what's called a beam search where you maintain a number of fairly

1:15:42 - 1:15:49     Text: good parse prefixes at each step and you can extend them out further and then you can evaluate

1:15:49 - 1:15:56     Text: later on which of those seems to be the best and so beam search is one technique to improve dependency

1:15:56 - 1:16:06     Text: parsing by doing a lot of work and it turns out that although these greedy transition based parsers

1:16:07 - 1:16:14     Text: are a fraction worse than the best possible ways known to parse sentences that they actually work

1:16:14 - 1:16:23     Text: very accurately almost as well and they have this wonderful advantage that they give you linear time

1:16:23 - 1:16:31     Text: parsing in terms of the length of your sentences and text and so if you want to do a huge amount of

1:16:31 - 1:16:39     Text: parsing they're just a fantastic thing to use because you've then got an algorithm that scales to

1:16:39 - 1:16:48     Text: the size of the web okay so I'm kind of a little bit behind so I guess I'm not going to get through all

1:16:48 - 1:16:54     Text: these slides today and we'll have to finish out the final slides tomorrow but just to push a teeny

1:16:54 - 1:17:03     Text: bit further I'll just save a couple more on the sort of what Neyfrit did for dependency parser and

1:17:03 - 1:17:09     Text: then I'll sort of introduce the neural form of that in the next class so conventionally you had this

1:17:09 - 1:17:15     Text: sort of stack and buffer configuration and you wanted to build a machine learning classifier

1:17:15 - 1:17:24     Text: and so the way that was done was by using symbolic features of this configuration and what kind of

1:17:25 - 1:17:33     Text: symbolic features did you use use these indicator features that picked out a small subset normally

1:17:33 - 1:17:39     Text: one to three elements of the configuration so you'd have a feature that could be something like

1:17:40 - 1:17:45     Text: the thing on the top of the stack is the word good which is an adjective or it could be

1:17:45 - 1:17:50     Text: the thing on the top of the stack is an adjective and the thing that's first and the buffer is an

1:17:50 - 1:17:56     Text: noun or it could just be looking at one thing and saying the first thing and the buffer is a verb

1:17:57 - 1:18:04     Text: so you'd have all of these features and because these features commonly involved words and commonly

1:18:04 - 1:18:12     Text: involved conjunctions of several conditions you had a lot of features and you know having

1:18:12 - 1:18:19     Text: mentions of words and conjunctions and conditions definitely helped to make these parsers work better

1:18:20 - 1:18:26     Text: but nevertheless because you had all of these sort of one zero symbolic features that you had a

1:18:26 - 1:18:34     Text: ton of such features so commonly these parsers were built using something like you know a million

1:18:34 - 1:18:41     Text: to ten million different features of sentences and I mentioned already the importance of evaluation

1:18:41 - 1:18:50     Text: let me just sort of quickly say how these parsers were evaluated so to evaluate a

1:18:51 - 1:18:59     Text: a parser for a particular sentence it was handpars our test set was handpars in the tree banks so we

1:18:59 - 1:19:05     Text: have gold dependencies what the human thought were right and so we can write those down those

1:19:05 - 1:19:13     Text: dependencies down as statements of saying the first word is the dependent of the second word

1:19:13 - 1:19:21     Text: fire a subject dependency and then the parser is also going to make similar claims as to what's

1:19:21 - 1:19:28     Text: a dependent on what and so there are two common metrics that are used one is just are you getting

1:19:28 - 1:19:35     Text: these dependency facts right so both of these dependency facts match and so that's referred to

1:19:35 - 1:19:42     Text: as the unlabeled accuracy score where we're just sort of measuring accuracies which are of all

1:19:42 - 1:19:49     Text: of the dependencies in the gold sentence and remember we have one dependency per word in the

1:19:49 - 1:19:55     Text: sentence so here we have five how many of them are correct and that's our unlabeled accuracy

1:19:55 - 1:20:03     Text: score of 80 percent but a slightly more rigorous and valuation is to say well no we're also going

1:20:03 - 1:20:09     Text: to label them and we're going to say that this is the subject that's actually called the root

1:20:09 - 1:20:19     Text: this one's the object so these dependencies have labels and you also need to get the grammatical

1:20:19 - 1:20:26     Text: relation label right and so that's then referred to as labeled accuracy score and although I got

1:20:26 - 1:20:34     Text: those two right for that is hmm I guess according to this example actually this is wrong it looks

1:20:34 - 1:20:41     Text: like I got oh no this is wrong there sorry that one's wrong there okay so I only got two of the

1:20:42 - 1:20:48     Text: dependencies correct in the sense that I both got what depends on what and the label correct

1:20:48 - 1:20:55     Text: and so my labeled accuracy score is only 40 percent okay so I'll stop there now for the

1:20:55 - 1:21:03     Text: introduction for dependency parsing and I still have an IOU which is how we can then bring

1:21:03 - 1:21:09     Text: neural nets into this picture and how they can be used to improve dependency parsing so I'll

1:21:09 - 1:21:26     Text: do that at the start of next time before then proceeding further into neural language models