Statistical NLP Book

First an explication: a few weeks ago I had a revelation that computational linguistics is the field for me. So, I found the best book I could on the subject, which seems to be Manning and Schuetze's Foundations of Statistical Natural Language Processing (FSNLP). Just reading the introduction on Google Books was enough to convince me to drop some hard cash. The writing is brilliant. I'm only through Chapter One, but it's already worth it.

Manning and Schuetze - Foundations of Statistical Natural Language Processing

I did all the exercises to the first chapter. My solutions:

Because this blog needs more pictures, here's a plot of word frequency versus rank (in terms of frequency) for randomly generated text using a three-character alphabet:

Zipf's Law - log-log frequency-vs-rank plot for random text with a three-character alphabet

This log-log frequency-vs-rank plot shows an approximately straight line. This is a confirmation of Zipf's law, i.e., that frequency and rank are inversely related.

Hopefully I'll have the stamina to do the same for Chapters Two+...


Originally published on Quasiphysics.