Latent Dirichlet Allocation

Assign a new topic to a word based on existing text assignments

A corpus can be stored as a nested array of documents containing words in the order in which they occur in the text.

Topic De-Assignment

The topic assignment is removed from one word at a time. The likelihood that the word is assigned to any given topic will then be calculated. Here, probability of assignment is represented as the area of a 2D graph.

Probabilistic Selection

Once the probability of assignment to every topic has been calculated, a new topic is randomly selected.

Topic Updating

The word is then assigned the newly-determined topic and re-integrated into the corpus. Appropriate topic assignment tracking variables are likewise updated, and the next word in the corpus is de-assigned.