Finding common bigrams
Find common bigrams Bigrams are 2-letter combos. When designing a keyboard layout, it’s common to optimize for comfort and speed by analyzing bigrams. Here’s a simple shell script to do a quick-and-dirty bigram analysis.
Start with a corpus Download Shai’s corpus for Colemak:
cd ~/Downloads curl -fsSLo corpus.txt.xz https://colemak.com/pub/corpus/iweb-corpus-samples-cleaned.txt.xz
Extract the .txt file:
unxz corpus.txt.xz Split into individual words Separate that corpus into individual words, one per lined, and all lowecase letters:
One minute to read