Visual Correlation for Exploring Paradigmatic Language Change

Goals and approach

Paradigmatic language change occurs, when paradigmatically related words with similar usage rise or fall together. Such change is the rule rather than an exception. Words rarely increase or decrease in isolation but together with similar words. In the short term, this is usually due to thematic change, but in the longer term, also grammatical preferences change.

This visualization serves for exploring paradigmatic change by correlating the two main factors involved: Frequency change and distributional semantics of words.

Frequency change is visualized by means of color ranging from violet for decreasing frequency to red for increasing frequency. To this end we fit logistic growth curves to the observed word frequencies in fixed intervals (e.g. year or decade), and map the slope of the growth curve to the color range. Thereby words with similar slope are colored similarly.

Semantics of words is visualized by positioning them in two dimensions such that words with similar usage contexts are positioned closely together. This is accomplished in two steps: First, word embeddings are computed. To calculate individual word embeddings for each interval, the embeddings for the first interval are randomly initialized and the embeddings for each subsequent interval are initialized with the previous embeddings. With this approach, the embeddings are comparable across time. Second, the 100-200 dimensions resulting from the first step are further reduced to two dimensions using t-Distributed Stochastic Neighbour Embedding (t-sne).

A quick guide

The visualization consists of two main areas. To the left, a bubble chart represents the color encoded semantic space of words, with the size of bubbles proportional to the square root of the relative frequency in the chosen interval. Paradigmatic change shows in the form of islands with words of similar color. Clicking on a word shows its frequency change, and optionally its semantic trajectory over time and concordance for the current interval.

To the right frequency change of individual words is represented by simple line charts showing the fitted 2nd order polynomials of the logit transformed relative frequencies. The line chart also doubles as a selector for individual intervals.

The buttons on the top serve for searching and navigating:

The buttons on the bottom change visualization settings:

Corpora

The visualization is currently available for the following corpora:

Further reading

For some more information see:

Contact

Peter Fankhauser. fankhauser at ids-mannheim.de