Crossword Letter Frequencies

I scrapped NYT crossword puzzle answers and plotted the frequencies of the letters. It turns out that the actual data is a bit taboo, with many past projects having to remove them from the internet (see here or here). I used a site like this one, which have been aggregating answers for a few years.

The data consisted of a total of 85687 answers; some answers were pruned due to numerical answers and other oddities from the data source. The tools used were Python for wrangling and Seaborn for plotting.

It’s clear that there is an abundance of A, E and S in crosswords, while T and H in the English language is comparatively higher in frequency (use of the word “the” probably). I would love to have cleaner data and see the frequency changes along Monday through Sunday similar to this and this.

 

 

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.