Problem 1: Bigrams (3pts)
(a) Compute the bigram count table, C(w2|w1) for the sentence "I saw Susie sitting in a shoe shine shop. Where Susie sits Susie shines, and where Susie shines Susie sits." Put w1 in the left hand column, and w2 in the top row. Include punctuation, clitics, and sentence start and end markers as individual tokens, and index words using their lemmatized forms. (1 pt)
(b) Compute the bigram probability table, P(w2|w1) for above sentence, assuming the following overall unigram counts: C(I)=2, C(see)= 25, C(Susie)=10, C(sit)=20, C(in)=80, C(a)=90, C(shoe)=50, C(shine) = 20, C(shop) = 20, C(Where)=60, C(,)=80, C(and)=100. Assume there are 75 sentences in the corpus, and they all end with a period. (1pt)
(c) Compute the probability and perplexity of the first sentence in (a) using the bigram approximation. (1pt)
Problem 2: Smoothing (3pts)
(a) Smooth the count table you calculated in 1(a) using Laplace smoothing, and recalculate the probability table as well. Assume V=30. (2pts)
(b) Recalculate the probability and perplexity of the first sentence in 1(a) using the smoothed table. (1pt)
Problem 3: POS Tagging (2pts). Use the Penn Treebank tags to tag each word in the following sentences. Remember to tag punctuation.
1. There is a stubbornness about me that never can bear to be frightened at the will of others. My courage always rises at every attempt to intimidate me. 
2. Conventionality is not morality. 
3. We must have ideals and try to live up to them, even if we never quite succeed. Life would be a sorry business without them. With them it's grand and great. 
4. If neurotic is wanting two mutually exclusive things at one and the same time, then I'm neurotic as hell. I'll be flying back and forth between one mutually exclusive thing and another for the rest of my days. 
Problem 4: Brill Tagging (2pts).
(a) Consider the following sentence and two different taggings:
Most likely tags: John/NNP made/VBD up/IN the/DT story/NN ./.
Correct tags: John/NNP made/VBD up/RP the/DT story/NN ./.
Instantiate each of Brill’s templates for the “before” or “preceding” cases (Figure 5.20 in the book) given this data to generate 6 transformations (that is, do not instantiate templates using “following” or “after”). (1pt)
(b) Considering examples 5.4-5.6 in the book, which of these transformations do you think will be most effective on a large corpus, and why? (1pt)
· /NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/VBG
· All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT corner/NN
· Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD