After researching several different approaches to topic modeling, we decided that Latent Dirichlet Allocation, or LDA, was the best algorithm to implement for our purposes. LDA is unsupervised (latent) but allows the user to manipulate the number of unique words within a topic or unique topics within a document, based on a set of hyperparameters, or Dirichlet priors.
The output of the algorithm is groups of words, called "topics," that are presumed to be semantically linked in the text.
The Loquela application allows users to explore the results of their generated topic model. In addition to basic metadata information, Loquela includes three visualization tools: word clouds, heat maps, and an annotated text.
These tools work together to place the topic model back into the context of the original corpus, allowing comparison of both words within a topic, and topics within a document. With Loquela, the user can easily interact with and inspect the data yielded by LDA.
Youngest, most mathematical, most enamored of colorful chalk.
Computer Science/English Major. Enjoys digital humanities projects and short-ish walks on the beach.
Filmmaker, fanatic of text and audio generators, music collector. Enjoys photography and digital manipulation.
Latinist, Musician, Mead-drinker
Medieval and Renaissance Studies minor
air-speed velocity (unladen): unknown
CS/Music/French. Likes Python, piano, pooches, parakeets, interrupting alliteration, participles, parmesan.