CS 322: Natural Language Processing

Document Classification Redux

You have tinkered your document classification system into shape using your original pair of authors. Now, I want you to pick a new pair of authors, and try your system on Louisa May Alcott and P.G. Wodehouse. (I chose these two because they're a pair of English speakers who have lots of stuff on gutenberg.org, and are separated by genre, gender, an ocean, and nearly a century.)

The key for this exercise is to leave your system entirely unchanged (to the extent that is possible), and to see if the system itself works well for new authors, or whether, perhaps, you just tinkered it into working nicely for your original pair of authors.

Send me a summary, via e-mail, containing the following information:

Be honest about this--as before, I'm not grading on the accuracy. Instead, I'm grading on your selection of techniques, the quality of your reporting, your insights into the success or failure of your methods, your understanding of what you're doing, etc. This follow-up assignment will be worth about a quarter of the points of the original assignment.

Good luck.