2011¶

December 19, 2011
in linguistics
2 min read

The Genetic and Lexical Stacks

Many complex phenomena may be decomposed using a stack. For example, one might decompose contemporary scientific theory into a stack as follows: physics -- chemistry -- biology -- psychology -- sociology.

December 18, 2011
in math
2 min read

Extending the Knuth Operator

I just learned about the Knuth up-arrow notation yesterday. Basically, Knuth's up-arrow is the answer to the question "What comes next in the sequence \((+, \times, \wedge)\)?" You could call it iterated exponentiation. Later operators in the sequence are called "higher-order", and may be defined in terms of the previous order function.

December 13, 2011
in ml
8 min read

Visualizing KNN Regression

K-nearest neighbor (KNN) regression is a popular machine learning algorithm. However, without visualization, one might not be aware of some quirks that are often present in the regression. Below I give a visualization of KNN regression which show this quirkiness.

December 9, 2011
in linguistics
2 min read

A Theory of Language Evolution

My interests in evolutionary algorithms on one hand and language on the other have led me to ponder the evolution of language.

November 28, 2011
in linguistics
2 min read

Optimizing HSK Study with MaxRank

Short Version

I've put together a PDF containing the revised HSK vocab for levels 1--6, sorted in such a way to maximize word learning rate. The list was sourced from Lingomi, with sorting applied using the MaxRank method. I have found this particular presentation of the list to be especially useful; so I put it here in hopes that others can also benefit.

November 22, 2011
in linguistics
3 min read

Thematic Chinese Vocabulary

Learning vocabulary in thematic groups is an effective way to learn. However, as is often the case, it is challenging to find good learning materials. For thematic vocabulary, we want sources which simultaneously do the following:

contain a sufficient quantity of vocabulary in the desired fields (i.e., have both breadth and depth)
organize words and phrases by theme (i.e., are thematic)
give some example usages (i.e., provide context)

Specifically for Chinese, I've found two excellent resources thus far.

November 16, 2011
in math, linguistics
2 min read

Sinograms in Mathematics

While taking notes for ai-class, I found myself in a conundrum that any amateur mathematician can relate to: I ran out of appropriate letters in the Roman and Greek alphabets.

November 11, 2011
in linguistics
1 min read

A Very Brief History of English

The history of English fascinates me. Here follows my very brief but hopefully reasonably factual account.

August 17, 2011
in nlp
3 min read

Clustering Jane Austen

Motivation

I'm curious about unsupervised word sense disambiguation, and unsupervised machine learning in general. For that, Manning and Schuetze tell us we need clustering. I jumped ahead to Chapter 14 to experiment with clustering algorithms.

July 20, 2011
in nlp
3 min read

Automated Annotation Tool

The other day I picked up my Chinese copy of Alice in Wonderland that I picked up in Beijing last year. My intention was to lay in the sun by the lake until I had finished the first page, using the dictionary as needed to achieve basic comprehension. The result was a bad sunburn and only two of four paragraphs finished. What went wrong?