The Project

Topic Modeling of Latin Text

Topic modeling is a method of statistically identifying abstract topics that are present throughout a set of documents by grouping together words in those documents that are related. For our Computer Science Comps, several members of the Carleton College History department asked us to create a topic modeling algorithm that would work on Latin texts. The result was Loquela.

Loquela has 2 parts: the algorithm, and the application. The algorithm runs topic modeling on any Latin or English corpus, and allows the user a great deal of freedom in building a model using parameters that are appropriate to their corpus. The application allows the user to explore the results of that algorithm in the context of the corpus through several different data visualization tools.


The Algorithm

Latent Dirichlet Allocation

After researching several different approaches to topic modeling, we decided that Latent Dirichlet Allocation, or LDA, was the best algorithm to implement for our purposes. LDA is unsupervised (latent) but allows the user to manipulate the number of unique words within a topic or unique topics within a document, based on a set of hyperparameters, or Dirichlet priors.

The output of the algorithm is groups of words, called "topics," that are presumed to be semantically linked in the text.


The App

Visualizing Topic Models

The Loquela application allows users to explore the results of their generated topic model. In addition to basic metadata information, Loquela includes three visualization tools: word clouds, heat maps, and an annotated text.

These tools work together to place the topic model back into the context of the original corpus, allowing comparison of both words within a topic, and topics within a document. With Loquela, the user can easily interact with and inspect the data yielded by LDA.


The Team

digital humanists in action

Estelle Bayer

Youngest, most mathematical, most enamored of colorful chalk.

Martha Durrett

Computer Science/English Major. Enjoys digital humanities projects and short-ish walks on the beach.

Brendan Friesen

Filmmaker, fanatic of text and audio generators, music collector. Enjoys photography and digital manipulation.

Adam Klein

Tabletop gaming enthusiast and overall nerd. Enjoys coding in JavaScript for some reason.

Bard Swallow

Latinist, Musician, Mead-drinker
Medieval and Renaissance Studies minor
air-speed velocity (unladen): unknown

Sam Wiseman

CS/Music/French. Likes Python, piano, pooches, parakeets, interrupting alliteration, participles, parmesan.

The Code

Get Started