Statistical analysis of Linguistic data
This is a special topics course on linguistic data. More and more data
these days have linguistic content--so this class will investigate what
it takes to drop such linguistic data into a statistical model.
The main readings will be:
I've posted the homework from pervious years (.pdf, .Rnw and for a quick .html
view). I'll update this file as the semester goes along. So
don't print it out! Just keep checking the web.
- Homework 1 is due Sept 24th.
- Sept 10: Regular expressions (.pdf)
- Sept 12: Ngrams (.pdf)
- N-grams (Chapter 4 of JM)
- Sept 13 at noon: Justin Rising and Josh Magarick are running a session called
"Python for Statisticians". Lunch will be served!
- Oct 1: Backoff and information theory.
- Oct 3:
- Oct 29: No class: rain day
- Oct 31: Streaming methods
- Nov 5: The power of large blocks
- Nov 7: Parsing
- read chapters 13 and 14 of JM.
- Nov 12:Statistical parsing
- Nov 14: notes
- Nov 26: Machine traslation (chapter 25)
- Nov 28: CCA
- slides for today's lecture
- paper with
- CCA goes back to the 1930's, so there should be pleanty of
web material to look over. I won't put it up. But if you find
something nice, email it to me and I'll post it.
- Dec 3: Disambiguation
- Dec 5: Hadamard transformations?
Last modified: Wed Dec 5 14:23:59 EST 2012