0:00:00 - 0:00:12 Text: Okay, so for today, we're actually going to take a bit of a change of pace from what
0:00:12 - 0:00:19 Text: the last couple of lectures are have been about and we're going to focus much more on linguistics
0:00:19 - 0:00:25 Text: and natural language processing and so in particular, we're going to start looking at the topic
0:00:25 - 0:00:35 Text: of dependency parsing. And so this is the plan of what to go about through today. So I'm going
0:00:35 - 0:00:40 Text: to start out by going through some ideas that have been used in the syntactic structure of languages
0:00:40 - 0:00:48 Text: of constituency and dependency and introduce those and then focusing in more independent
0:00:48 - 0:00:55 Text: dependency structure, I'm then going to look at dependency grammars and dependency tree banks
0:00:55 - 0:01:00 Text: and then having done that, we're then going to move back into thinking about how to build
0:01:00 - 0:01:05 Text: natural language processing systems and so I'm going to introduce the idea of transition-based
0:01:05 - 0:01:12 Text: dependency parsing and then in particular having developed that idea, I'm going to talk about
0:01:12 - 0:01:18 Text: a way to build a simple but highly effective neural dependency parser. And so this simple
0:01:18 - 0:01:23 Text: highly effective neural dependency parser is essentially what we'll be asking you to build
0:01:23 - 0:01:29 Text: in the third assignment. So in some sense, we're getting a little bit ahead of ourselves here
0:01:29 - 0:01:36 Text: because in week two of the class, we teach you how to do both assignments two and three, but all
0:01:36 - 0:01:42 Text: of this material will come in really useful. Before I get underway, just a couple of announcements.
0:01:44 - 0:01:51 Text: So for a site, again for assignment two, you don't yet need to use the PyTorch framework,
0:01:51 - 0:01:58 Text: but now's a good time to work on getting PyTorch installed for your Python programming.
0:01:59 - 0:02:05 Text: Assignment three is in part also and in production using PyTorch, it's got a lot of scaffolding
0:02:05 - 0:02:13 Text: included in the assignment, but beyond that, this Friday, we've got a PyTorch tutorial and thoroughly
0:02:13 - 0:02:21 Text: encourage you to come along to that as well, look for it under the Zoom tab. And in the second half
0:02:21 - 0:02:28 Text: of the first day of week four, we have an explicit class that partly focuses on the final projects
0:02:28 - 0:02:34 Text: and what the choices are for those, but it's never too late to start thinking about the final project
0:02:34 - 0:02:40 Text: and what kind of things you want to do for the final project. So do come meet with people,
0:02:40 - 0:02:46 Text: there are sort of resources on the course pages about what different TAs know about. I've also
0:02:46 - 0:02:50 Text: talked to a number of people about final projects, but clearly I can't talk to everybody.
0:02:50 - 0:02:54 Text: So I encourage you to also be thinking about what you want to do for final projects.
0:02:56 - 0:03:05 Text: Okay, so what I wanted to do today was introduce how people think about the structure of sentences
0:03:06 - 0:03:12 Text: and put structure on top of them to explain how human language conveys meaning.
0:03:13 - 0:03:19 Text: And so our starting point for meaning and essentially what we've dealt with with word vectors
0:03:19 - 0:03:26 Text: up until now is we have words. And words are obviously an important part of the
0:03:26 - 0:03:35 Text: meaning of human languages. But for words in human languages, there's more that we can do with them
0:03:35 - 0:03:42 Text: in thinking about how to structure sentences. So in particular, the first most basic way that we
0:03:42 - 0:03:49 Text: think about words when we are thinking about how sentences are structured is we give to them what's
0:03:49 - 0:03:58 Text: called a part of speech. We can say that cat is a noun, buy is a preposition, doors and
0:03:58 - 0:04:07 Text: other noun, cuddly is an adjective. And then for the word, if it was given a different part of
0:04:07 - 0:04:12 Text: speech, if you saw any parts of speech in school, it was probably your told it was an article.
0:04:12 - 0:04:20 Text: Sometimes that is just put into the class of adjectives in modern linguistics and what you'll see
0:04:20 - 0:04:26 Text: in the resources that we use, words like that are referred to as determiners. And the idea is
0:04:26 - 0:04:31 Text: that there's a bunch of words includes art and art, but also other words like this and that,
0:04:33 - 0:04:42 Text: or even every which are words that occur at the beginning of something like the cuddly cat
0:04:42 - 0:04:49 Text: which have a determinative function of sort of picking out which cats that they're referring to.
0:04:49 - 0:04:55 Text: And so we refer to those as determiners. But it's not the case that when we want to communicate with
0:04:55 - 0:05:03 Text: language that we just have this word salad where we say a bunch of words, we just say you know whatever
0:05:04 - 0:05:12 Text: leaking kitchen tap and let the other person put it together, we put words together in a particular
0:05:12 - 0:05:19 Text: way to express meanings. And so therefore, languages have larger units of putting meaning together.
0:05:19 - 0:05:30 Text: And the question is how we represent and think about those. Now in modern work and particular
0:05:30 - 0:05:37 Text: in modern United States linguistics or even what you see in computer science classes when thinking
0:05:37 - 0:05:44 Text: about formal languages, the most common way to approach this is with the idea of context-free
0:05:44 - 0:05:50 Text: grammars which you see at least a little bit of in 103 if you've done 103, what a linguist would
0:05:50 - 0:05:57 Text: often refer to as free structure grammars. And the idea there is to say well there are bigger
0:05:57 - 0:06:05 Text: units in languages that we refer to as phrases. So something like the cuddly cat is a cat
0:06:05 - 0:06:13 Text: with some other words modifying it. And so we refer to that as a noun phrase. But then we have
0:06:14 - 0:06:24 Text: ways in which phrases can get larger by building things inside phrases. So the door here is also a
0:06:24 - 0:06:30 Text: noun phrase. But then we can build something bigger around it with a preposition. So this is a
0:06:30 - 0:06:37 Text: preposition. And then we have a prepositional phrase. And in general we can keep going. So we
0:06:37 - 0:06:43 Text: can then make something like the cuddly cat by the door. And then the door is a noun phrase.
0:06:43 - 0:06:51 Text: The cuddly cat is a noun phrase by the door is a prepositional phrase. But then when we put it
0:06:51 - 0:06:58 Text: all together the whole of this thing becomes a bigger noun phrase. And so it's working with these
0:06:58 - 0:07:07 Text: ideas of nested phrases, what in context free grammar terms you would refer to as non-terminals.
0:07:08 - 0:07:13 Text: So noun phrase and prepositional phrase would be non-terminals in the context free grammar.
0:07:13 - 0:07:20 Text: We can build up a bigger structure of human languages. So let's just do that for a little bit
0:07:20 - 0:07:29 Text: to review what happens here. So we start off saying okay you can say the cat and a dog. And so those
0:07:29 - 0:07:35 Text: are noun phrases. And so we want a rule that can explain those. So we could say a noun phrase goes
0:07:35 - 0:07:43 Text: to the termina noun. And then somewhere over the side we'd have a lexicon. And in our lexicon
0:07:43 - 0:07:54 Text: we'd say that dog is a noun and cat is a noun and is a determiner and that is a determiner.
0:07:55 - 0:08:02 Text: Okay so then we notice you can do a bit more than that. So you can say things like the large cat
0:08:03 - 0:08:11 Text: a barking dog. So that suggests we can have a noun phrase after the determiner.
0:08:11 - 0:08:17 Text: There can optionally be an adjective and then there's the noun and that can explain some things we
0:08:17 - 0:08:26 Text: can say. But we can also say the cat by the door or a barking dog in a crate. And so we can also
0:08:26 - 0:08:34 Text: put a prepositional phrase at the end and that's optional. But you can combine it together with an
0:08:34 - 0:08:42 Text: adjective for the example I gave like a barking dog on the table. And so that a grids grammar can
0:08:42 - 0:08:51 Text: handle that. So then we'll keep on and say well actually you can use multiple adjectives so you
0:08:51 - 0:09:00 Text: can say a large barking dog or a large barking cuddly cat. No maybe not. Well sentences like that.
0:09:00 - 0:09:05 Text: So we have any number of adjectives which we can represent with a star. Pots referred to as the
0:09:05 - 0:09:14 Text: cleanie star. So that's good. But I forgot a bit actually. For by the door I have to have a rule
0:09:14 - 0:09:21 Text: for producing by the door. So I also need a rule that's a prepositional phrase goes to a preposition
0:09:21 - 0:09:29 Text: followed by a noun phrase. And so then I also have to have prepositions and that can be in or on
0:09:29 - 0:09:37 Text: or by. Okay. And I can make other sentences of course with this as well like the large crate on
0:09:37 - 0:09:45 Text: the table or something like that or the large crate on the large table. Okay. So I chug along
0:09:46 - 0:09:52 Text: and then well I could have something like talk to the cat. And so now I need more stuff. So talk
0:09:52 - 0:10:00 Text: is a verb and two is still looks like a preposition. So I need to be able to make up something
0:10:01 - 0:10:11 Text: with that as well. Okay. So what I can do is say I can also have a rule for a verb phrase that
0:10:11 - 0:10:19 Text: goes to a verb and then after that for something like talk to the cat that it can take a prepositional
0:10:19 - 0:10:32 Text: phrase after it. And then I can say that the verb goes to talk or walked. Okay. Then I can pass
0:10:32 - 0:10:40 Text: and then I can cover those sentences. Oops. Okay. So that's that's the end of what I have here.
0:10:40 - 0:10:49 Text: So in this sort of a way I'm handwriting a grammar. So here is now I have this grammar
0:10:50 - 0:10:59 Text: and a lexicon. And for the examples that I've written down here, this grammar and this lexicon
0:10:59 - 0:11:10 Text: is sufficient to pause these sort of fragments of showing expansion that I just wrote down. I mean,
0:11:10 - 0:11:16 Text: of course there's a lot more to English than what you see here. Right. So if I have something like
0:11:17 - 0:11:29 Text: the cat walked behind the dog, then I need some more grammar rules. So it seems then I need a rule
0:11:29 - 0:11:34 Text: that says I can have a sentence that goes to a noun phrase followed by a verb phrase.
0:11:35 - 0:11:46 Text: And I can keep on doing things of this sort. Let's see one question that Ruth Ann asked was about
0:11:46 - 0:11:53 Text: what do the brackets mean and is the first np different from the second.
0:11:53 - 0:12:03 Text: And so for this notation on the brackets here, I mean, this is actually a common notation that's
0:12:03 - 0:12:10 Text: used in linguistics. It's sort of in some sense a little bit different to traditional computer
0:12:10 - 0:12:18 Text: science notation since the star is used in both to mean zero or more of something. So you could have
0:12:18 - 0:12:25 Text: zero one two three four five adjectives. Somehow it's usual in linguistics that when you're using
0:12:25 - 0:12:32 Text: the star, you also put parentheses around it to mean it's optional. So sort of parentheses and
0:12:32 - 0:12:38 Text: star are used together to mean any number of something. When it's parentheses just by themselves,
0:12:38 - 0:12:49 Text: that's then meaning zero or one. And then four are these two noun phrases different? No, they're both
0:12:49 - 0:12:56 Text: noun phrase rules. And so in our grammar, we can have multiple rules that expand noun phrase in
0:12:56 - 0:13:04 Text: different ways. But, you know, actually in my example here, my second rule because I wrote it
0:13:04 - 0:13:11 Text: quite generally, it actually covers the first rule as well. So actually at that point, I can cross out
0:13:11 - 0:13:17 Text: this first rule because I don't actually need it in my grammar. But in general, you know, you have
0:13:17 - 0:13:25 Text: a choice between writing multiple rules for noun phrase goes to categories, which effectively gives
0:13:25 - 0:13:35 Text: your disjunction or working out by various syntactic conventions how to compress them together. Okay.
0:13:36 - 0:13:42 Text: So that was what gets referred to in natural language processing as constituency grammars,
0:13:43 - 0:13:50 Text: where the standard form of constituency grammar is a context-free grammar of the sort that
0:13:50 - 0:13:56 Text: I trust you saw at least a teeny bit of either in CS 103 or something like a programming language
0:13:56 - 0:14:03 Text: as compilers formal languages class. There are other forms of grammars that also pick out constituency.
0:14:04 - 0:14:09 Text: There are things like tree adjoining grammars, but I'm not going to really talk about any of those now.
0:14:09 - 0:14:15 Text: What I actually want to present is a somewhat different way of looking at grammar, which is referred to
0:14:15 - 0:14:24 Text: as dependency grammar, which puts a dependency structure over sentences. Now actually,
0:14:24 - 0:14:30 Text: it's not that these two ways of looking at grammar have nothing to do with each other. I mean,
0:14:30 - 0:14:36 Text: there's a whole formal theory about the relationships between different kinds of grammars,
0:14:36 - 0:14:44 Text: and you can very precisely state relationships and isomorphisms between different grammars of
0:14:44 - 0:14:50 Text: different kinds. But on the surface, these two kinds of grammars look sort of different and
0:14:50 - 0:15:00 Text: emphasize different things. And for reasons of their sort of closeness to picking out relationships
0:15:00 - 0:15:09 Text: and sentences and their ease of use, it turns out that in modern natural language processing,
0:15:09 - 0:15:17 Text: starting, I guess, around 2000, sort of really in the last 20 years, NLP people have really swung
0:15:17 - 0:15:23 Text: behind dependency grammars. So if you look around now where people are using grammars in NLP,
0:15:23 - 0:15:29 Text: by far the most common thing that's being used is dependency grammars. So I'm going to teach us
0:15:29 - 0:15:36 Text: today a bit about those. And for what we're going to build in assignment three is building
0:15:36 - 0:15:44 Text: using supervised learning and neural dependency parser. So the idea of dependency grammar is that
0:15:44 - 0:15:52 Text: when we have a sentence, what we're going to do is we're going to say for each word, what other
0:15:52 - 0:16:00 Text: words modify it. So what we're going to do is when we say the large crate, we're going to say,
0:16:00 - 0:16:09 Text: okay, well, large is modifying crate and that is modifying crate in the kitchen. That is modifying
0:16:09 - 0:16:18 Text: kitchen by the door. That is modifying door. And so I'm showing modification, a dependency or
0:16:18 - 0:16:25 Text: an attachment relationship by drawing an arrow from the head to what's referred to
0:16:25 - 0:16:33 Text: in dependency grammar as the dependent. The thing that modifies further specifies or attaches
0:16:34 - 0:16:44 Text: to the head. Okay, so that's the start of this. Well, another dependency that is that, well,
0:16:46 - 0:16:52 Text: looking in the large crate, that where you're looking is in the large crate. So you're going to
0:16:52 - 0:17:00 Text: want to have the large in the large crate as being a dependent of look. And so that's also going
0:17:00 - 0:17:08 Text: to be a dependency relationship here. And then there's one final bit that might seem a little bit
0:17:08 - 0:17:15 Text: confusing to people. And that's actually when we have these prepositions, there are two ways that
0:17:15 - 0:17:23 Text: you can think that this might work. So if it was something like look in the crate,
0:17:25 - 0:17:33 Text: that seems like that is a dependent of crate, but you could think that you want to say look in
0:17:33 - 0:17:39 Text: and it's in the crate and give this dependency relationship with the sort of preposition
0:17:39 - 0:17:46 Text: as sort of thinking of it as the head of what was before our prepositional phrase. And that's
0:17:46 - 0:17:55 Text: a possible strategy in the dependency grammar. But what I'm going to show you today and what you're
0:17:55 - 0:18:02 Text: going to use in this assignment is dependency grammars that follow the representation of universal
0:18:02 - 0:18:09 Text: dependencies. And universal dependencies is a framework which actually I was involved in creating,
0:18:09 - 0:18:15 Text: which was set up to try and give a common dependency grammar over many different human languages.
0:18:15 - 0:18:23 Text: And in the design decisions that were made in the context of designing universal dependencies,
0:18:24 - 0:18:32 Text: what we decided was that for what in some languages you use prepositions, lots of other
0:18:32 - 0:18:39 Text: languages make much more use of case marking. So if you've seen something like German, you've seen
0:18:39 - 0:18:48 Text: more case markings like genitive and date of cases. And in other languages like Latin or Finnish,
0:18:49 - 0:18:55 Text: lots of Native American languages, you have many more case markings again, which cover most of
0:18:55 - 0:19:03 Text: the role of prepositions. So in universal dependencies, essentially in the crate is treated like a
0:19:03 - 0:19:12 Text: case marked noun. And so what we say is that the in is also a dependent of crate and then you're
0:19:12 - 0:19:21 Text: looking in the crate. So in the structure we adopt in as dependent of crate, this in as a
0:19:21 - 0:19:30 Text: dependent of kitchen, this by as a dependent of door. And then we have these prepositional phrases
0:19:30 - 0:19:37 Text: in the kitchen by the door and we want to work out well what they modify. Well in the kitchen
0:19:38 - 0:19:43 Text: is modifying crate right because it's a crate in the kitchen. So we're going to say that it's
0:19:43 - 0:19:52 Text: this piece is a dependent of crate. And then well what about by the door? Well it's not really
0:19:52 - 0:19:59 Text: meaning that's a kitchen by the door and it's not meaning to look by the door. Again it's a crate
0:19:59 - 0:20:06 Text: by the door. And so what we're going to have is the crate also has door as a dependent. And so
0:20:06 - 0:20:20 Text: that gives us our full dependency structure of this sentence. Okay. And so that's a teeny introduction
0:20:20 - 0:20:27 Text: to syntactic structure. I'm going to say a bit more about it and give a few more examples.
0:20:27 - 0:20:33 Text: But let me just for a moment sort of say a little bit about why are we interested in syntactic
0:20:33 - 0:20:41 Text: structure? Why do we need to know the structure of sentences? And this gets into how does human
0:20:41 - 0:20:50 Text: languages work? So human languages can can communicate very complex ideas. I mean in fact you know
0:20:50 - 0:20:57 Text: anything that humans know how to communicate to one another they communicate pretty much by
0:20:57 - 0:21:05 Text: using words. So we can structure and communicate very complex ideas. But we can't communicate a
0:21:05 - 0:21:14 Text: really complex idea by one word. We can't just you know choose a word like you know empathy and say
0:21:14 - 0:21:19 Text: it with a lot of meaning and say empathy and the other person's meant to understand everything
0:21:19 - 0:21:26 Text: about what that means. Right. We have to compose a complex meaning that explains things by putting
0:21:26 - 0:21:34 Text: words together into bigger units. And the syntax of a language allows us to put words together
0:21:34 - 0:21:42 Text: into bigger units where we can build up and convey to other people a complex meaning. And so
0:21:42 - 0:21:48 Text: then the listener doesn't get this syntactic structure. Right. The syntactic structure of the
0:21:48 - 0:21:56 Text: sentence is hidden from the listener. All the listener gets is a sequence of words one after another
0:21:56 - 0:22:03 Text: bang bang bang. So the listener has to be able to do what I was just trying to do in this example
0:22:03 - 0:22:11 Text: that as the sequence of words comes in that the listener works out which words modify which
0:22:11 - 0:22:18 Text: are the words and therefore can construct the structure of the sentence and hence the meaning of
0:22:18 - 0:22:27 Text: the sentence. And so in the same way if we want to build clever neural net models that can understand
0:22:27 - 0:22:34 Text: the meaning of sentences those clever neural net models also have to understand what is this
0:22:34 - 0:22:39 Text: structure of the sentence so that they can interpret the language correctly. And we'll go through
0:22:39 - 0:22:46 Text: some examples and see more of that. Okay. So the fundamental point that we're going to sort of
0:22:46 - 0:22:55 Text: spend a bit more time on is that these choices of how you build up the structure of a language
0:22:56 - 0:23:04 Text: change the interpretation of the language and a human listener or equally a natural language
0:23:04 - 0:23:13 Text: understanding program has to make in a sort of probabilistic fashion choices as to which words
0:23:13 - 0:23:19 Text: modify I depend upon which other words so that they're coming up with the interpretation of the
0:23:19 - 0:23:28 Text: sentence that they think was intended by the person who said it. Okay. So to get a sense of this
0:23:28 - 0:23:36 Text: and how sentence structure is interesting and difficult what I'm going to go through now is a
0:23:36 - 0:23:44 Text: few examples of different ambiguities that you find in natural language and I've got some funny
0:23:44 - 0:23:52 Text: examples from newspaper headlines but these are all real natural language ambiguities that you find
0:23:52 - 0:23:59 Text: throughout natural language. Well at this point I should say this is where I'm being guilty of
0:23:59 - 0:24:07 Text: saying natural language but I'm meaning in English. Some of these ambiguities you find in lots of
0:24:07 - 0:24:14 Text: other languages as well but which ambiguities that are for syntactic structure partly depend on the
0:24:14 - 0:24:21 Text: details of the language. So different languages have different syntactic constructions, different
0:24:21 - 0:24:29 Text: word orders, different amounts of words having different forms of words like case markings. And so
0:24:29 - 0:24:37 Text: depending on those details there might be different ambiguities. So here's one ambiguity which is
0:24:37 - 0:24:45 Text: one of the commentest ambiguities in English. So San Jose cops kill man with knife. So this sentence
0:24:45 - 0:24:55 Text: has two meanings either it's the San Jose cops who are killing a man and they're killing a man
0:24:55 - 0:25:03 Text: with a knife. And so that corresponds to a dependency structure where the San Jose cops
0:25:03 - 0:25:13 Text: are the subject of killing the man is the object of killing and then the knife is then the instrument
0:25:13 - 0:25:21 Text: with which they're doing the killing so that the knife is an oblique modifier for the instrument
0:25:21 - 0:25:28 Text: of killing. And so that's one possible structure for this sentence but it's probably not the right one.
0:25:29 - 0:25:37 Text: So what it actually probably was was that it was a man with a knife and the San Jose cops killed
0:25:37 - 0:25:49 Text: the man. So that corresponds to the knife then being a noun modifier of the man and then kill
0:25:49 - 0:25:55 Text: is still killing the man. So the man is the object of killing and the cops are still the subject.
0:25:57 - 0:26:05 Text: And so whenever you have a prepositional phrase like this that's coming further on in a sentence
0:26:06 - 0:26:13 Text: there's a choice of how to interpret it. It could be either interpreted as modifying a noun
0:26:13 - 0:26:20 Text: phrase that comes before it or it can be interpreted as modifying a verb that comes before it.
0:26:20 - 0:26:26 Text: So systematically in English you get these prepositional phrase attachment ambiguities
0:26:26 - 0:26:34 Text: throughout all of our sentences but you know to give two further observations on that you know
0:26:34 - 0:26:43 Text: the first observation is you know you encounter sentences with prepositional phrase attachment
0:26:44 - 0:26:51 Text: ambiguities every time you read a newspaper article every time you talk to somebody but most of
0:26:51 - 0:26:58 Text: the time you never notice them and that's because our human brains are incredibly good at considering
0:26:58 - 0:27:05 Text: the possible interpretations and going with the one that makes sense according to context.
0:27:07 - 0:27:14 Text: The second comment as I said different human languages expose different ambiguities. So for
0:27:14 - 0:27:20 Text: example this is an ambiguity that you normally don't get in Chinese because in Chinese
0:27:21 - 0:27:28 Text: prepositional phrases modifying a verb are normally placed before the verb and so there you
0:27:28 - 0:27:35 Text: don't standedly get this ambiguity but you know there are different other ambiguities that you find
0:27:35 - 0:27:43 Text: commonly in Chinese sentences. Okay so this ambiguity you find everywhere because prepositional
0:27:43 - 0:27:49 Text: phrases are really common at the right ends of sentences so here's another one scientist count
0:27:49 - 0:27:56 Text: whales from space so that gives us these two possible interpretations that there are whales from
0:27:56 - 0:28:04 Text: space and scientists accounting them and then the other one is how the scientists accounting the
0:28:04 - 0:28:11 Text: whales is that they're counting them from space and they're using satellites to count the
0:28:11 - 0:28:18 Text: sales which is the correct interpretation that the newspaper hopes that you're getting.
0:28:18 - 0:28:30 Text: And this problem gets much much more complex because many sentences in English have prepositional
0:28:30 - 0:28:37 Text: phrases all over the place so here's the kind of boring sentence that you find in the financial
0:28:37 - 0:28:45 Text: news the board approved its acquisition by Royal Trust Co Ltd of Toronto for $27 a share at its
0:28:45 - 0:28:51 Text: monthly meeting and while if you look at the structure of this sentence what we find is you know
0:28:51 - 0:29:01 Text: here's a verb then here's the object noun phrase so we've got the object noun phrase here and then
0:29:01 - 0:29:08 Text: after that what do we find well we find a prepositional phrase another prepositional phrase another
0:29:08 - 0:29:15 Text: prepositional phrase and another prepositional phrase and how to attach each of these is then
0:29:15 - 0:29:22 Text: ambiguous so the basic rule of how you can attach them is you can attach them to things to the left
0:29:23 - 0:29:31 Text: providing you don't create crossing attachments so in principle by Royal Trust Co Ltd
0:29:31 - 0:29:39 Text: could be attached to either approved or acquisition but in this case by Royal Trust Co Ltd it is
0:29:39 - 0:29:52 Text: the acquirer so it's a modifier of the acquisition okay so then we have of Toronto so of Toronto
0:29:52 - 0:29:59 Text: could be modifying Royal Trust Co Ltd it could be modifying the acquisition or it can be modifying
0:29:59 - 0:30:06 Text: the approved and in this case the of Toronto is telling you more about the company and so
0:30:06 - 0:30:15 Text: it's a modifier of Royal Trust Co Ltd okay so then the next one is for $27 a share and that could
0:30:15 - 0:30:23 Text: be modifying Toronto Royal Trust Co Ltd the acquisition or the approving and well in this case
0:30:25 - 0:30:32 Text: that's talking about the price of the acquisition so this one is mod go jumps back and this is now
0:30:32 - 0:30:41 Text: prepositional phrase that's modifying the acquisition and then at the end at its monthly meeting
0:30:42 - 0:30:49 Text: well that's where the approval is happening by the by the board so rather than any of these
0:30:49 - 0:30:58 Text: preceding four noun phrases at its monthly meeting is modifying the approval and so it
0:30:58 - 0:31:05 Text: attaches right back there and this example is kind of too big and so I couldn't fit it in one line
0:31:05 - 0:31:11 Text: but as I think maybe you can see that you know none of these dependencies cross each other
0:31:11 - 0:31:19 Text: and they connect at different places ambiguously so because we can chain these prepositions like this
0:31:19 - 0:31:26 Text: and attach them at different places like this human language sentences are actually extremely
0:31:26 - 0:31:37 Text: ambiguous so the number if you have a sentence with K prepositional phrases at the end of
0:31:37 - 0:31:44 Text: earth where here we have K equals four the number of parses this sentence has the number of different
0:31:44 - 0:31:50 Text: ways you can make these attachments is given by the cutler numbers so the cutler numbers are
0:31:50 - 0:31:58 Text: an exponentially growing series which arises in many tree like context so if you're doing something
0:31:58 - 0:32:04 Text: like triangulations of a polygon you get cutler numbers if you're doing triangulation and graphical
0:32:04 - 0:32:11 Text: models in CS228 you get cutler numbers but we don't need to worry about the details here the central
0:32:11 - 0:32:18 Text: point is this is an exponential series and so you're getting an exponential number of parses in terms
0:32:18 - 0:32:24 Text: of the number of prepositional phrases and so in general you know the number of parses human
0:32:24 - 0:32:31 Text: languages have is exponential in their length which is kind of bad news because if you're then trying
0:32:31 - 0:32:39 Text: to enumerate all the parses it you might fear that you really have to do a ton of work the thing to
0:32:39 - 0:32:48 Text: notice about structures like these prepositional phrase attachment ambiguities is that there's nothing
0:32:48 - 0:32:57 Text: that resolves these ambiguities in terms of the structure of the sentence so if you've done something
0:32:57 - 0:33:03 Text: like looked at the kind of grammars that are used in compilers that the grammars used in compilings
0:33:03 - 0:33:11 Text: and compilers for programming languages are mainly made to be unambiguous and to the extent that
0:33:11 - 0:33:18 Text: there are any ambiguities there are default rules that are used to say choose this one particular
0:33:19 - 0:33:27 Text: parse tree for your piece of a programming language and human languages just aren't like that
0:33:27 - 0:33:33 Text: they're globally ambiguous and the listening human is just meant to be smart enough to figure out
0:33:33 - 0:33:44 Text: what was intended so the analogy would be that you know in programming languages when you're working
0:33:44 - 0:33:54 Text: out what does an else clause modify well you've got the answer that you can either look at the
0:33:54 - 0:34:00 Text: curly braces to work out what the else clause modifies or if you're using Python you look at the
0:34:00 - 0:34:07 Text: indentation and it tells you what the else clause modifies where by contrast for human languages
0:34:08 - 0:34:16 Text: the it would be just write down else something doesn't matter how you do it you don't need parentheses
0:34:16 - 0:34:21 Text: you don't need indentation the human being will just figure out what the else clause is meant to
0:34:21 - 0:34:31 Text: pair up with okay lots of other forms of ambiguities in human languages so let's look at a few others
0:34:31 - 0:34:38 Text: another one that is very common over all sorts of languages is coordination scope ambiguities
0:34:39 - 0:34:44 Text: so here's a sentence shuttle veteran and long time that's your executive Fred Gregory appointed
0:34:44 - 0:34:53 Text: to board well this is an ambiguous sentence there are two possible readings of this one reading
0:34:53 - 0:34:59 Text: is that there are two people there's a shuttle veteran and there's a long time that's your
0:34:59 - 0:35:08 Text: executive Fred Gregory and they were both appointed to the board two people and the other possibility
0:35:08 - 0:35:17 Text: is there's someone named Fred Gregory who's a shuttle veteran and long time that's your executive
0:35:17 - 0:35:26 Text: and they're appointed to the verb one person and these two interpretations again correspond to having
0:35:26 - 0:35:36 Text: different paths structures so in one structure we've got a coordination of the shuttle veteran
0:35:36 - 0:35:43 Text: and the long time that's your executive Fred Gregory coordinated together in one case these
0:35:43 - 0:35:53 Text: are coordinated and then Fred Gregory specifies the name of the Nassar executive so it's then
0:35:54 - 0:36:01 Text: specifying who that executive is where the what in the other one the shuttle veteran and long time
0:36:01 - 0:36:09 Text: Nassar executive all together is then something that is a modifier of Fred Gregory
0:36:12 - 0:36:20 Text: okay so one time this is the unit that modifies Fred Gregory in the other one up here just long time
0:36:20 - 0:36:27 Text: Nassar executive modifies Fred Gregory and then that's can join together with the shuttle veteran
0:36:27 - 0:36:35 Text: and so that also gives different interpretations so this is a slightly reduced example of the
0:36:35 - 0:36:44 Text: I mean in newspaper headlines tend to be more ambiguous than many other pieces of text because
0:36:44 - 0:36:50 Text: they're written in this short and formed get things to fit and this isn't especially short and
0:36:50 - 0:36:59 Text: form whereas actually left out in explicit conjunction but this headline says doctor no heart cognitive
0:36:59 - 0:37:06 Text: issues and this was after I guess one of Trump it was after Trump's first physical and while this
0:37:06 - 0:37:12 Text: is an ambiguity because there are two ways that you can read this you can either read this as saying
0:37:12 - 0:37:22 Text: doctor no heart and cognitive issues which gives you one interpretation instead of that the way we
0:37:22 - 0:37:32 Text: should read it is that it's heart or cognitive and so it's then saying no heart or cognitive issues
0:37:32 - 0:37:41 Text: and we have a different narrower scope of the coordination and then we get a different reading.
0:37:43 - 0:37:51 Text: Okay I want to give a couple more examples of different kinds of ambiguities another one you see
0:37:51 - 0:37:57 Text: quite a bit is when you have modifiers that are adjectives and adverbs that there are different
0:37:57 - 0:38:04 Text: ways that you don't have things modifying other things this example is a little bit not safe for
0:38:04 - 0:38:14 Text: work but here goes students get first hand job experience so this is an ambiguous sentence and again
0:38:14 - 0:38:22 Text: we can think of it as a syntactic ambiguity in terms of which things modify which other things
0:38:22 - 0:38:34 Text: so the nice polite way to render this sentence is that first is modifying hand so we've got first hand
0:38:34 - 0:38:43 Text: it's job experience so job is a compound now modifying experience and it's first hand experience
0:38:43 - 0:38:53 Text: so first hand is then modifying experience and then get is the object of our first hand job
0:38:53 - 0:39:02 Text: experience is the object of get and the students are the subject of get but if you have a smarty
0:39:02 - 0:39:13 Text: a mind you can interpret this a different way and in the alternative interpretation you then have hand
0:39:13 - 0:39:24 Text: going together with job and the first is then a modifier of experience and job is still a
0:39:24 - 0:39:30 Text: modifier of experience and so then you get this different power structure and different interpretation
0:39:30 - 0:39:40 Text: there okay one more example in a way this example similar to the previous one it's sort of having
0:39:41 - 0:39:47 Text: modifier pieces that can modify different things but rather than just being with individual adjectives
0:39:47 - 0:39:56 Text: or individual adverbs is then much larger units such as verb phrases can often have attachment
0:39:56 - 0:40:04 Text: ambiguities so this sentence headline is mutilated body washes up on Rio Beach to be used for
0:40:04 - 0:40:11 Text: Olympics Beach volleyball so we have this big verb phrase here of to be used for Olympics Beach
0:40:11 - 0:40:21 Text: volleyball and then again we have this attachment decision that we could either say that that
0:40:21 - 0:40:33 Text: big verb phrase is modifying i is attached to the Rio Beach or we could say no no the to be used
0:40:33 - 0:40:43 Text: for Olympics Beach volleyball that that is modifying the mutilated body and it's a body that's
0:40:43 - 0:40:50 Text: to be used for the Olympics Beach volleyball which gives the funny reading yeah so I hope that's
0:40:50 - 0:40:58 Text: giving you at least a little bit of a sense of how human language syntactic structure is complex
0:40:58 - 0:41:06 Text: and big u.s and to work out the intended interpretations you need to know something about that structure
0:41:07 - 0:41:14 Text: in terms of how much you need to understand i mean you know this is under linguistics class if
0:41:14 - 0:41:19 Text: you'd like to learn more about human language structure you can go off and do a syntax class
0:41:19 - 0:41:26 Text: but you know we're not really going to spend a lot of time working through language structure
0:41:26 - 0:41:31 Text: but there will be some questions on this in the assignment and so we're expecting that you can
0:41:31 - 0:41:38 Text: be at the level that you can have sort of some intuitions as to which words and phrases are
0:41:38 - 0:41:44 Text: modifying other words and phrases and therefore you could choose between two dependency analyses
0:41:44 - 0:41:53 Text: which ones correct okay i've spent quite a bit of time on that so better keep going okay so
0:41:54 - 0:42:01 Text: the general idea is that knowing this sort of syntactic structure of a sentence can help us
0:42:01 - 0:42:07 Text: with semantic interpretation i mean as well as just generally saying we can understand language
0:42:07 - 0:42:13 Text: it's also used in many cases for simple practical forms of semantic extraction so people
0:42:13 - 0:42:19 Text: such as in biomedical informatics often want to get out particular relations such as protein
0:42:19 - 0:42:25 Text: protein interactions and while here's a sentence the results demonstrated that kai c interacts
0:42:25 - 0:42:35 Text: rhythmically with sasa kai and kai b and commonly that people can get out those kind of relationships
0:42:35 - 0:42:42 Text: by looking at patterns of dependency relations with particular verbs so for the interacts verb
0:42:42 - 0:42:48 Text: if you have a pattern of something being the subject and something else being the noun modifier
0:42:48 - 0:42:53 Text: of interacts well that's an interaction relationship but it gets a bit more complicated than that
0:42:53 - 0:42:59 Text: as in this example because often there are conjunctions so you also have another pattern
0:42:59 - 0:43:07 Text: where you have also interactions between the subject and the noun modifiers conjunct
0:43:07 - 0:43:17 Text: which will allow us to also find the kai and kai b examples okay um so i've sort of given an informal
0:43:17 - 0:43:24 Text: tour of dependency grammar to just try and uh quickly um say a little bit more about formally
0:43:24 - 0:43:33 Text: what a dependency grammar is so in dependency syntax what we say is that the syntactic structure
0:43:33 - 0:43:42 Text: of a sentence consists of relations between pairs of words um and it's a binary asymmetric relation
0:43:42 - 0:43:50 Text: i we draw arrows between pairs of words which we call dependencies now normally dependency
0:43:50 - 0:43:56 Text: grammars then type those grammatical relation type those arrows to express what kind of
0:43:56 - 0:44:02 Text: relation that there is and so that they have some kind of taxonomy of grammatical relation so we
0:44:02 - 0:44:08 Text: might have a subject grammatical relation of verbal auxiliary grammatical relation and a bleak
0:44:08 - 0:44:16 Text: modifier grammatical relation we have some kind of typology of grammatical relations um so and we
0:44:16 - 0:44:25 Text: refer to the arrows going between the head is the head here and something that is a dependent of
0:44:25 - 0:44:34 Text: it so the subject of a verb is the dependent of the verb or when you have a noun modifier like
0:44:34 - 0:44:45 Text: our sort of cuddly cat we say that um cuddly is a dependent of cat and so cat is the head of cuddly
0:44:45 - 0:44:55 Text: cat and so normally um dependencies like in these examples form a tree which is formal it so
0:44:55 - 0:45:04 Text: it's not just any graph with arrows we have an graph which is connected a cyclic and has a single
0:45:04 - 0:45:13 Text: root so here's the root of the graph um and so that gives us a dependency tree analysis um dependency
0:45:13 - 0:45:23 Text: grammars have a really really long history um so the famous first linguist um was panini um who
0:45:23 - 0:45:30 Text: wrote about the structure of Sanskrit um and mainly he worked on the sound system of Sanskrit
0:45:30 - 0:45:36 Text: and how sounds change in various contexts which what linguists call phonology and the different
0:45:36 - 0:45:43 Text: forms of Sanskrit words Sanskrit has rich morphology of inflecting nouns and verbs for different
0:45:43 - 0:45:50 Text: cases and forms um but he also worked a little on the syntactic structure of Sanskrit censors
0:45:50 - 0:45:58 Text: and essentially what he proposed was the dependency grammar over Sanskrit sentences and it turns out
0:45:58 - 0:46:05 Text: that sort of from most of recorded history when then when people have then um gone on and tried to
0:46:05 - 0:46:13 Text: put structures over human sentences um what they have used is dependency grammars um so there was a
0:46:13 - 0:46:20 Text: lot of work in the first millennium by Arabic grammarians of trying to work out the grammar um
0:46:20 - 0:46:26 Text: structure of sentences and effectively what they used was but you know kind what I've just presented
0:46:26 - 0:46:35 Text: as a dependency grammar so compared to you know 2500 years of history the ideas of having context
0:46:35 - 0:46:41 Text: free grammars and having constituency grammars is actually a really really recent invention so it
0:46:41 - 0:46:48 Text: was really sort of in the middle of the 20th century that the ideas of um constituency grammar and
0:46:48 - 0:46:54 Text: context free grammars would develop first by wells in the forties and then by known chomsky in the
0:46:54 - 0:47:01 Text: early 50s leading to things like the chomsky hierarchy that you might see um CS 103 or formal
0:47:01 - 0:47:10 Text: languages class um so for modern work on dependency grammar using kind of the terminology and um
0:47:10 - 0:47:16 Text: notation that I've just introduced that's normally attributed to Lucian Tania who was a French
0:47:16 - 0:47:24 Text: linguist um in around the sort of middle of the 20th century as well um dependency grammar was
0:47:24 - 0:47:31 Text: widely used in the 20th century um in a number of places I mean in particular it tends to be
0:47:31 - 0:47:37 Text: sort of much more natural and easier to think about for languages that have a lot of different
0:47:37 - 0:47:44 Text: case markings on nouns like nomad of accused of genitive data of instrumental kind of cases like
0:47:44 - 0:47:49 Text: you get in the language like Latin or Russian and a lot of those languages have much
0:47:49 - 0:47:55 Text: free word order than English so the subject or objective you know in English the subject has to
0:47:55 - 0:48:00 Text: be before the verb and the object has to be after the verb but lots of other languages have much
0:48:00 - 0:48:07 Text: free word order and instead use different forms of nouns to show you what's the subject or the
0:48:07 - 0:48:13 Text: object of the sentence and dependency grammars can often seem much more natural for those kinds of
0:48:13 - 0:48:19 Text: languages dependency grammars were also prominent at the very beginnings of computational linguistics so
0:48:20 - 0:48:27 Text: one of the first people working computational linguistics in the US was David Hayes so the
0:48:27 - 0:48:32 Text: professional society for computational linguistics is called the association for computational linguistics
0:48:32 - 0:48:37 Text: and he was actually one of the founders of the association for computational linguistics
0:48:37 - 0:48:44 Text: and he published in the early 1960s and early perhaps the first dependency grammar past how
0:48:44 - 0:48:54 Text: you dependency parser okay yeah a little teeny note just in case you see other things when
0:48:54 - 0:49:00 Text: when you have these arrows you can draw them in either direction you either draw arrows from their
0:49:00 - 0:49:06 Text: head or to the dependent or from the dependent to the head and actually different people have
0:49:06 - 0:49:13 Text: done one and the other right so the way ten year drew them was to draw them from the head to the
0:49:13 - 0:49:18 Text: the dependent and we're following that convention but you know if you're looking at something that
0:49:18 - 0:49:24 Text: somebody else has written with dependency arrows the first thing you have to work out is are they
0:49:24 - 0:49:31 Text: using the arrow heads at the heads or the dependence now and not one other thing here is that
0:49:31 - 0:49:39 Text: we a sentence is seen as having the overall head word of the sentence which every other word of
0:49:39 - 0:49:46 Text: the sentence hangs off it's a common convention to add this sort of fake route to every sentence
0:49:46 - 0:49:54 Text: that then points to the head word of the whole sentence here completed that just tends to make
0:49:54 - 0:50:01 Text: the algorithmic stuff easier because then you can say that every word of the sentence is dependent
0:50:01 - 0:50:08 Text: on precisely one other node where what you can be dependent on is either another word on the
0:50:08 - 0:50:14 Text: sentence or the fake route of the sentence and when we build our parsers we will introduce that
0:50:14 - 0:50:27 Text: fake route okay so that's sort of dependency grammars and dependency structure I now want to
0:50:27 - 0:50:36 Text: get us back to natural language processing and starting to build parsers for dependency grammars
0:50:36 - 0:50:45 Text: but before doing that I just want to say yeah where do we get our data from and that's actually
0:50:45 - 0:50:56 Text: an interesting story in some sense so the answer to that is well what we do is get
0:50:56 - 0:51:03 Text: human beings commonly linguists or other people who are actually interested in the structure
0:51:03 - 0:51:11 Text: of human sentences and we get them to sit around and hand parse sentences and give them dependency
0:51:11 - 0:51:22 Text: structures and we collect a lot of those parsers and we call that a tree bank and so this is
0:51:22 - 0:51:30 Text: something that really only started happening in the late 80s and took off in a big away in the 90s
0:51:30 - 0:51:36 Text: until then no one had attempted to build tree banks lots of people had attempted to build parsers
0:51:36 - 0:51:44 Text: and it seemed like well if you want to build a parser the efficient way to do it is to start writing
0:51:44 - 0:51:50 Text: a grammar so you start writing some grammar rules and you start writing a lexicon with words and
0:51:50 - 0:51:57 Text: parts of speech and you sit around working on your grammar when I was a PhD student one of my first
0:51:57 - 0:52:04 Text: summer jobs was spending the summer handwriting a grammar and it sort of seems like writing a
0:52:04 - 0:52:09 Text: grammar is more efficient because you're writing this one general thing that tells you the structure
0:52:09 - 0:52:15 Text: of a human language but there's just been this massive sea change partly driven by the adoption
0:52:15 - 0:52:22 Text: of machine learning techniques where it's now seen as axiomatic that the way to make progress
0:52:22 - 0:52:31 Text: is to have annotated data namely here a tree bank that shows you the structure of sentences
0:52:31 - 0:52:38 Text: and so what I'm showing here is a teeny extract from a universal dependencies tree bank and so that's
0:52:38 - 0:52:44 Text: what I mentioned earlier that this has been this effort to try and have a common dependency
0:52:44 - 0:52:49 Text: grammar representation that you can apply to lots of different human languages and so you can go
0:52:49 - 0:52:55 Text: over to this URL and see that there's about 60 different languages at the moment which have universal
0:52:55 - 0:53:05 Text: dependencies tree banks. So why are tree banks good? I mean it sort of seems like it's bad news if
0:53:05 - 0:53:12 Text: you have to have people sitting around for weeks and months hand-posing sentences it seems a lot
0:53:12 - 0:53:20 Text: slower and actually a lot less useful than having somebody writing a grammar which just has
0:53:21 - 0:53:30 Text: you know a much bigger multiply factor in the utility of their effort. It turns out that although
0:53:30 - 0:53:36 Text: that initial feeling seems sort of valid that in practice there's just a lot more you can do with
0:53:36 - 0:53:46 Text: the tree bank. So why are tree banks great? You know one reason is the tree banks are highly reusable
0:53:46 - 0:53:53 Text: so typically when people have written grammars they've written grammars for you know one particular
0:53:53 - 0:54:00 Text: parser and the only thing it was ever used in is that one particular parser but when you build a
0:54:00 - 0:54:09 Text: tree bank that's just a useful data resource and people use it for all kinds of things. So the
0:54:09 - 0:54:16 Text: well-known tree banks have been used by hundreds and hundreds of people and although all tree banks
0:54:16 - 0:54:22 Text: were initially built for the purposes of hey let's help natural language processing systems
0:54:22 - 0:54:28 Text: it turns out that people have actually been able to do lots of other things with tree banks.
0:54:28 - 0:54:34 Text: So for example these days psycho-linguists commonly use tree banks to get various kinds of
0:54:34 - 0:54:41 Text: statistics about data for thinking about psycho-linguistic models. Linguists use tree banks for
0:54:41 - 0:54:47 Text: looking at patterns of different syntactic constructions that occur that there's just been a lot
0:54:47 - 0:54:55 Text: of reuse of this data for all kinds of purposes but they have other advantages that I mentioned here
0:54:55 - 0:55:00 Text: you know when people are just sitting around saying oh what sentences are good they tend to
0:55:00 - 0:55:06 Text: only think of the core of language where lots of weird things happen in language and so if you
0:55:06 - 0:55:12 Text: actually just have some sentences and you have to go off and parse them then you actually have to
0:55:12 - 0:55:19 Text: deal with the totality of language. Since you're parsing actual sentences you get statistics so
0:55:19 - 0:55:25 Text: you naturally get the kind of statistics that are useful to machine learning systems by
0:55:25 - 0:55:31 Text: constructing a tree bank where you don't get them for free if you handwrite a grammar but then a
0:55:31 - 0:55:41 Text: final way which is perhaps the most important of all is if you actually want to be able to do
0:55:43 - 0:55:49 Text: science of building systems you need a way to evaluate these NLP systems.
0:55:49 - 0:55:59 Text: I mean it seems hard to believe now but you know back in the 90s 80s when people built NLP
0:55:59 - 0:56:07 Text: parsers it was literally the case that the way they were evaluated was you said to your friend
0:56:07 - 0:56:12 Text: oh I built this parser type in a sentence on the terminal and see what it gives you back it's
0:56:12 - 0:56:19 Text: pretty good hey and that was just the way business was done whereas what we'd like to know is well
0:56:19 - 0:56:25 Text: as I showed you earlier English sentences can have lots of different parsers commonly can this
0:56:27 - 0:56:33 Text: system choose the right parsers for particular sentences and therefore have the basis of
0:56:34 - 0:56:40 Text: interpreting them as a human being would and well we can only systematically do that evaluation
0:56:40 - 0:56:46 Text: if we have a whole bunch of sentences that have been handparsed by humans with their correct
0:56:46 - 0:56:54 Text: interpretations so the rise of tree banks turned parser building into an empirical science where people
0:56:54 - 0:57:02 Text: could then compete rigorously on the basis of look my parser has 2% higher accuracy than your parser
0:57:02 - 0:57:10 Text: in choosing the correct parsers for sentences. Okay so well how do we build a parser
0:57:10 - 0:57:16 Text: once we've got dependencies so there's sort of a bunch of sources of information that you could
0:57:16 - 0:57:25 Text: hope to use so one source of information is looking at the words on either end of the dependency
0:57:25 - 0:57:33 Text: so discussing issues that seems a reasonable thing to say and so it's likely that issues
0:57:33 - 0:57:43 Text: could be the object of discussing whereas if it was some other word right if you were thinking of
0:57:43 - 0:57:50 Text: making you know outstanding the object of discussion discussing outstanding that doesn't sound right
0:57:50 - 0:57:58 Text: so that wouldn't be so good. A second source of information is distance so most dependencies are
0:57:58 - 0:58:05 Text: relatively short distance some of them aren't some of long distance dependencies but they're
0:58:05 - 0:58:12 Text: relatively rare the vast majority of dependencies nearby and another source of information is the
0:58:12 - 0:58:23 Text: intervening material so there are certain things that dependencies rarely span so clauses and sentences
0:58:23 - 0:58:32 Text: are normally organized around verbs and so dependencies rarely span across intervening verbs.
0:58:33 - 0:58:39 Text: We can also use punctuation and written language things like commas which can give some indication
0:58:39 - 0:58:47 Text: of the structure and so punctuation may also indicate bad places to have long distance dependencies
0:58:47 - 0:58:56 Text: over and there's one final source of information which is what's referred to as valency which is
0:58:56 - 0:59:03 Text: forehead what kind of information does it usually have around it so if you have a noun
0:59:05 - 0:59:11 Text: there are things that you just know about what kinds of dependence nouns normally have so it's
0:59:11 - 0:59:21 Text: common that it will have a determiner to the left the cat on the other hand it's not going to be the
0:59:21 - 0:59:26 Text: case that there's a determiner to the right cat that that's just not what you get in English
0:59:28 - 0:59:34 Text: on the left you're also likely to have an adjective or modify that's where he had cuddly
0:59:34 - 0:59:42 Text: but again it's not so likely you're going to have the adjective or modifier over on the right
0:59:42 - 0:59:49 Text: for cuddly so there are sort of facts about what things different kinds of words take on the left
0:59:49 - 0:59:55 Text: and the right and so that's the valency of the heads and that's also a useful source of information
0:59:56 - 1:00:04 Text: okay so what do we need to do using that information to build a parser well effectively
1:00:04 - 1:00:10 Text: what we do is have a sentence I'll give a talk tomorrow on your networks and what we have to do
1:00:10 - 1:00:17 Text: is say for every word in that sentence we have to choose some other word that it's a dependent of
1:00:17 - 1:00:25 Text: where one possibility is it's a dependent of root so we're giving it a structure where we're
1:00:25 - 1:00:33 Text: saying okay for this word I've decided that it's a dependent on networks and then for this word
1:00:33 - 1:00:44 Text: it's also a dependent on networks and for this word it's a dependent on give so we're choosing
1:00:45 - 1:00:53 Text: one for each word and there are usually a few constraints so only one word is a dependent of root
1:00:53 - 1:01:00 Text: we have a tree we don't want cycles so we don't want to say that word a is dependent on word b and
1:01:00 - 1:01:11 Text: word b is dependent on word a and then there's one final issue which is whether arrows can cross
1:01:11 - 1:01:18 Text: or not so in this particular sentence we actually have these crossing dependencies you can see there
1:01:18 - 1:01:25 Text: I'll give a talk tomorrow on neural networks and this is the correct dependency paths for this
1:01:25 - 1:01:32 Text: sentence because what we have here is that it's a talk and it's a talk on neural networks so the
1:01:32 - 1:01:39 Text: on neural networks modifies the talk but which leads to these crossing dependencies I didn't have to
1:01:39 - 1:01:46 Text: say it like that I could have said I'll give a talk on neural networks tomorrow and then on your
1:01:46 - 1:01:55 Text: networks would be next to the talk so most of the time in languages dependencies are projector of
1:01:55 - 1:02:01 Text: the things stay together so the dependencies have a kind of a nesting structure of the kind that
1:02:01 - 1:02:08 Text: you also see in context free grammars but most languages have at least a few phenomena where you
1:02:08 - 1:02:16 Text: ended up with these ability for phrases to be split apart which lead to non-projective dependencies
1:02:16 - 1:02:23 Text: so in particular one of them in English is that you can take modifying phrases and clauses like
1:02:23 - 1:02:29 Text: the on neural networks here and shift them right towards the end of the sentence and get I'll give
1:02:29 - 1:02:35 Text: a talk tomorrow on neural networks and that then leads to non-projective sentences
1:02:37 - 1:02:43 Text: so a pause is projected if there are no crossing dependency arcs when the words are laid out
1:02:43 - 1:02:50 Text: and then in your order with all arcs above the words and if you have a dependency paths that
1:02:50 - 1:02:55 Text: correspond to a context free grammar tree it actually has to be protective because context free
1:02:55 - 1:03:01 Text: grammars necessarily have this sort of nested tree structure following the linear order
1:03:02 - 1:03:08 Text: but dependency grammars normally allow non-projective structures to account for
1:03:08 - 1:03:14 Text: displacement constituents and you can't easily get the semantics of certain
1:03:14 - 1:03:20 Text: constructions right without these non-projective dependencies so here's another example in English
1:03:20 - 1:03:28 Text: with question formation with what's called preposition stranding so the sentence is who did
1:03:28 - 1:03:34 Text: bill by the coffee from yesterday there's another way I could have said this it's less natural in
1:03:34 - 1:03:45 Text: English but I could have said from who did bill by the coffee yesterday in many languages of the
1:03:45 - 1:03:53 Text: world that's the only way you could have said it and when you do that from who is kept together
1:03:53 - 1:03:59 Text: and you have a projective pause for the sentence but English allows and indeed much prefers
1:03:59 - 1:04:07 Text: you to do what is referred to as preposition stranding where you move the who but you just leave
1:04:07 - 1:04:14 Text: the preposition behind and so you get who did bill by the coffee from yesterday and so then
1:04:14 - 1:04:19 Text: we're ending up with this non-projective dependency structure as I've shown there
1:04:21 - 1:04:28 Text: okay I'll come back to non-projectivity in a little bit how do we go about building
1:04:28 - 1:04:36 Text: dependency parsers well there are a whole bunch of ways that you can build dependency parsers
1:04:36 - 1:04:42 Text: very quickly I'll just say a few names and I'll tell you about one of them so you can use dynamic
1:04:42 - 1:04:48 Text: programming methods to build dependency parsers so I showed earlier that you can have an exponential
1:04:48 - 1:04:53 Text: number of parsers for a sentence and that sounds like really bad news for building a system
1:04:53 - 1:04:58 Text: well it turns out that you can be clever and you can work out a way to dynamic program finding
1:04:58 - 1:05:05 Text: that exponential number of parsers and then you can have an oh and cubed algorithm so you could do that
1:05:07 - 1:05:13 Text: you can use graph algorithms and I'll say a bit about that later by that may spill into next time
1:05:14 - 1:05:22 Text: so you can see since we're wanting to kind of connect up all the words into a tree using
1:05:22 - 1:05:27 Text: graph edges that you could think of doing that using using a minimum spanning tree algorithm of
1:05:27 - 1:05:34 Text: the sort that you hopefully saw in CS 161 and so that idea has been used for parsing
1:05:34 - 1:05:41 Text: constraint satisfaction ideas that you might have seen in CS 221 have been used for dependency parsing
1:05:43 - 1:05:48 Text: but the way I'm going to show now is transition based parsing or sometimes referred to as
1:05:48 - 1:05:57 Text: deterministic dependency parsing and the idea of this is once going to use a transition system
1:05:57 - 1:06:04 Text: so that's like shift reduce parsing if you've seen shift reduce parsing in something like a
1:06:04 - 1:06:11 Text: compiler's class or formal languages class that shift and reduce transition steps and so use
1:06:11 - 1:06:20 Text: a transition system to guide the construction of parsers and so let me just explain about that
1:06:21 - 1:06:33 Text: so let's see so this was an idea that was made prominent by Yorkin Nivre who's a Swedish
1:06:33 - 1:06:43 Text: computational linguist who introduced this idea of greedy transition based parsing so his idea is
1:06:43 - 1:06:50 Text: well what we're going to do for dependency parsing is we're going to be able to parse sentences
1:06:50 - 1:06:57 Text: by having a set of transitions which are kind of like shift reduce parser and it's going to just
1:06:57 - 1:07:06 Text: work left to right bottom up and parse a sentence so we're going to say we have a stack sigma
1:07:07 - 1:07:13 Text: buffer beta of the words that we have to process and we're going to build up a set of dependency
1:07:13 - 1:07:20 Text: arcs by using actions which are shift and reduce actions and putting those together this will give
1:07:20 - 1:07:27 Text: us the ability to put parse structures over sentences and let me go through the details of
1:07:27 - 1:07:34 Text: this and this is a little bit hairy when you first see it that's not so complex really and
1:07:35 - 1:07:44 Text: it's this kind of transition based dependency parser is what we'll use in assignment 3 so what we
1:07:44 - 1:07:51 Text: have so this is our transition system we have a starting point where we start with a stack that
1:07:51 - 1:07:57 Text: just has the root symbol on it and a buffer that has the sentence that's about to parse we're about
1:07:57 - 1:08:07 Text: to parse and so far we haven't built any dependency arcs and so at each point in time we can choose one
1:08:07 - 1:08:19 Text: of three actions we can shift which moves the next word onto the stack we can then do actions
1:08:19 - 1:08:26 Text: that are the reduce actions so there are two reduce actions to make it a dependency grammar we
1:08:26 - 1:08:34 Text: can either do a left arc reduce or a right arc reduce so when we do either of those we take
1:08:34 - 1:08:42 Text: the top two items on the stack and we make one of them a dependent of the other one so we can
1:08:42 - 1:08:50 Text: either say okay let's make wi a dependent of wj or else we can say okay let's make wj a dependent
1:08:50 - 1:09:00 Text: of wi and so the result of when we do that is the one that's the dependent disappears from the stack
1:09:00 - 1:09:07 Text: and so in the stacks over here there's one less item but then we add a dependency arc to our
1:09:07 - 1:09:14 Text: arc set so that we say that we've got either a dependency from j to i or a dependency from i to j
1:09:15 - 1:09:22 Text: and commonly when we do this we actually also specify what grammatical relation connects the two
1:09:22 - 1:09:31 Text: such as subject object now modifier and so we also have here a relation that's still probably
1:09:31 - 1:09:40 Text: still very abstract so let's go through an example so this is how a simple transition based dependency
1:09:40 - 1:09:46 Text: parser what's referred to as an arc standard transition based dependency parser would parse up i8
1:09:46 - 1:09:52 Text: the fish so remember these are the different operations that we can apply so to start off with we
1:09:52 - 1:09:59 Text: have root on the stack and the sentence in the buffer and we have no dependency arcs constructed
1:09:59 - 1:10:05 Text: so we have to choose one of the three actions and when there's only one thing on the stack the only
1:10:05 - 1:10:13 Text: thing we can do is shift so we shift now the stack looks like this so now we have to take another
1:10:13 - 1:10:20 Text: action and at this point we have a choice because we could immediately reduce so you know we could
1:10:20 - 1:10:28 Text: say okay let's just make i a dependent of root and we'd get a stack size of one again but that
1:10:28 - 1:10:36 Text: would be the wrong thing to do because i isn't the head of the sentence so what we should instead do
1:10:36 - 1:10:44 Text: is shift again and get i8 on the stack and fish still in the buffer well at that point we keep
1:10:44 - 1:10:53 Text: on parsing a bit further and so now what we can do is say well wait a minute now i is a dependent
1:10:53 - 1:11:02 Text: of eight and so we can do a left arc reduce and so i disappears from the stack so here's our new stack
1:11:02 - 1:11:10 Text: but we add to the set of arcs that we've added that i is the subject of eight okay well after that
1:11:11 - 1:11:16 Text: we could have we could reduce again because there's still two things on the stack but that'd be the
1:11:16 - 1:11:24 Text: wrong thing to do the right thing to do next would be to shift fish onto the stack and then at that
1:11:24 - 1:11:35 Text: point we can do a right arc reduce saying that eight is the object of fish and add a new dependency
1:11:35 - 1:11:44 Text: to our dependency set and then we can one more time do a right arc reduce to say that eight is the
1:11:44 - 1:11:51 Text: root of the whole sentence and add in that extra root relation with our pseudo root and at that
1:11:51 - 1:11:58 Text: point we reach the end condition so the end condition was the buffer was empty and there's one thing
1:11:58 - 1:12:06 Text: the root on the stack and at that point we can finish so this little transition machine does the
1:12:06 - 1:12:16 Text: parsing up of the sentence but there's one thing that's left to explain still here which is how do
1:12:16 - 1:12:22 Text: you choose the next action so as soon as you have two things or more on the stack what you do next
1:12:23 - 1:12:28 Text: you've always got a choice you could keep shifting at least if there's still things on the buffer
1:12:28 - 1:12:34 Text: or you can do a left arc or you can do a right arc and how do you know what choices correct
1:12:34 - 1:12:40 Text: and well one answer to that is to say well you don't know what choices correct and that's why
1:12:40 - 1:12:47 Text: parsing is hard and sentences are ambiguous you can do any of those things you have to explore
1:12:47 - 1:12:54 Text: all of them and well if you naively explore all of them then you do an exponential amount of work
1:12:54 - 1:13:05 Text: to parse the sentence so in the early 2000s you're a commandeer phrase and you know that's essentially
1:13:05 - 1:13:14 Text: what people have done in the 80s and 90s is explore every path but in the early 2000s you're a commandeer
1:13:14 - 1:13:22 Text: phrase essential observation was but wait a minute we know about machine learning now so why don't
1:13:22 - 1:13:31 Text: I try and train a classifier which predicts what the next action I should take is given this stack
1:13:31 - 1:13:40 Text: and buffer configuration because if I can write a machine learning classifier which can nearly
1:13:40 - 1:13:49 Text: always correctly predict the next action given a stack and buffer then I'm in a really good position
1:13:49 - 1:13:56 Text: because then I can build what's referred to as a greedy dependency parser which just goes
1:13:56 - 1:14:04 Text: bang bang bang word at a time okay here's the next thing run classifier choose next action run
1:14:04 - 1:14:10 Text: classifier choose next action run classifier choose next action so that the amount of work that
1:14:10 - 1:14:19 Text: we're doing becomes linear in the length of the sentence rather than that being cubic in the length
1:14:19 - 1:14:24 Text: of the sentence using dynamic programming or exponential in the length of the sentence if you
1:14:24 - 1:14:32 Text: don't use dynamic programming so for each at each step we predict the next action using some
1:14:32 - 1:14:38 Text: discriminative classifier so starting off he was using things like support vector machines
1:14:38 - 1:14:43 Text: but it can be anything at all like a softmax classifier that's closer to our neural networks
1:14:43 - 1:14:50 Text: and there are either for what I presented three classes if you're just thinking of the two
1:14:50 - 1:14:55 Text: reduces in the shift or if you're thinking of you're also assigning a relation and you have a set
1:14:55 - 1:15:03 Text: of our relations like 20 relations then that's be sort of 41 moves that you could decide on at each
1:15:03 - 1:15:10 Text: point and the features are effectively the configurations I was showing before what's the top of the
1:15:10 - 1:15:15 Text: stack word what part of speech is it what's the first word in the buffer what's that words part
1:15:15 - 1:15:21 Text: of speech etc and so in the simplest way of doing this you're now doing no search at all you
1:15:21 - 1:15:28 Text: would just sort of take each configuration and turn decide the most likely next move and you make
1:15:28 - 1:15:35 Text: it and that's a greedy dependency parser which is widely used you can do better if you want to do
1:15:35 - 1:15:42 Text: a lot more work so you can do what's called a beam search where you maintain a number of fairly
1:15:42 - 1:15:49 Text: good parse prefixes at each step and you can extend them out further and then you can evaluate
1:15:49 - 1:15:56 Text: later on which of those seems to be the best and so beam search is one technique to improve dependency
1:15:56 - 1:16:06 Text: parsing by doing a lot of work and it turns out that although these greedy transition based parsers
1:16:07 - 1:16:14 Text: are a fraction worse than the best possible ways known to parse sentences that they actually work
1:16:14 - 1:16:23 Text: very accurately almost as well and they have this wonderful advantage that they give you linear time
1:16:23 - 1:16:31 Text: parsing in terms of the length of your sentences and text and so if you want to do a huge amount of
1:16:31 - 1:16:39 Text: parsing they're just a fantastic thing to use because you've then got an algorithm that scales to
1:16:39 - 1:16:48 Text: the size of the web okay so I'm kind of a little bit behind so I guess I'm not going to get through all
1:16:48 - 1:16:54 Text: these slides today and we'll have to finish out the final slides tomorrow but just to push a teeny
1:16:54 - 1:17:03 Text: bit further I'll just save a couple more on the sort of what Neyfrit did for dependency parser and
1:17:03 - 1:17:09 Text: then I'll sort of introduce the neural form of that in the next class so conventionally you had this
1:17:09 - 1:17:15 Text: sort of stack and buffer configuration and you wanted to build a machine learning classifier
1:17:15 - 1:17:24 Text: and so the way that was done was by using symbolic features of this configuration and what kind of
1:17:25 - 1:17:33 Text: symbolic features did you use use these indicator features that picked out a small subset normally
1:17:33 - 1:17:39 Text: one to three elements of the configuration so you'd have a feature that could be something like
1:17:40 - 1:17:45 Text: the thing on the top of the stack is the word good which is an adjective or it could be
1:17:45 - 1:17:50 Text: the thing on the top of the stack is an adjective and the thing that's first and the buffer is an
1:17:50 - 1:17:56 Text: noun or it could just be looking at one thing and saying the first thing and the buffer is a verb
1:17:57 - 1:18:04 Text: so you'd have all of these features and because these features commonly involved words and commonly
1:18:04 - 1:18:12 Text: involved conjunctions of several conditions you had a lot of features and you know having
1:18:12 - 1:18:19 Text: mentions of words and conjunctions and conditions definitely helped to make these parsers work better
1:18:20 - 1:18:26 Text: but nevertheless because you had all of these sort of one zero symbolic features that you had a
1:18:26 - 1:18:34 Text: ton of such features so commonly these parsers were built using something like you know a million
1:18:34 - 1:18:41 Text: to ten million different features of sentences and I mentioned already the importance of evaluation
1:18:41 - 1:18:50 Text: let me just sort of quickly say how these parsers were evaluated so to evaluate a
1:18:51 - 1:18:59 Text: a parser for a particular sentence it was handpars our test set was handpars in the tree banks so we
1:18:59 - 1:19:05 Text: have gold dependencies what the human thought were right and so we can write those down those
1:19:05 - 1:19:13 Text: dependencies down as statements of saying the first word is the dependent of the second word
1:19:13 - 1:19:21 Text: fire a subject dependency and then the parser is also going to make similar claims as to what's
1:19:21 - 1:19:28 Text: a dependent on what and so there are two common metrics that are used one is just are you getting
1:19:28 - 1:19:35 Text: these dependency facts right so both of these dependency facts match and so that's referred to
1:19:35 - 1:19:42 Text: as the unlabeled accuracy score where we're just sort of measuring accuracies which are of all
1:19:42 - 1:19:49 Text: of the dependencies in the gold sentence and remember we have one dependency per word in the
1:19:49 - 1:19:55 Text: sentence so here we have five how many of them are correct and that's our unlabeled accuracy
1:19:55 - 1:20:03 Text: score of 80 percent but a slightly more rigorous and valuation is to say well no we're also going
1:20:03 - 1:20:09 Text: to label them and we're going to say that this is the subject that's actually called the root
1:20:09 - 1:20:19 Text: this one's the object so these dependencies have labels and you also need to get the grammatical
1:20:19 - 1:20:26 Text: relation label right and so that's then referred to as labeled accuracy score and although I got
1:20:26 - 1:20:34 Text: those two right for that is hmm I guess according to this example actually this is wrong it looks
1:20:34 - 1:20:41 Text: like I got oh no this is wrong there sorry that one's wrong there okay so I only got two of the
1:20:42 - 1:20:48 Text: dependencies correct in the sense that I both got what depends on what and the label correct
1:20:48 - 1:20:55 Text: and so my labeled accuracy score is only 40 percent okay so I'll stop there now for the
1:20:55 - 1:21:03 Text: introduction for dependency parsing and I still have an IOU which is how we can then bring
1:21:03 - 1:21:09 Text: neural nets into this picture and how they can be used to improve dependency parsing so I'll
1:21:09 - 1:21:26 Text: do that at the start of next time before then proceeding further into neural language models