logo

Finding common bigrams

Find common bigrams Bigrams are 2-letter combos. When designing a keyboard layout, it’s common to optimize for comfort and speed by analyzing bigrams. Here’s a simple shell script to do a quick-and-dirty bigram analysis. Start with a corpus Download Shai’s corpus for Colemak: cd ~/Downloads curl -fsSLo corpus.txt.xz https://colemak.com/pub/corpus/iweb-corpus-samples-cleaned.txt.xz Extract the .txt file: unxz corpus.txt.xz Split into individual words Separate that corpus into individual words, one per lined, and all lowecase letters:
One minute to read