Thursday, 27 August 2020

Some maths with 4-letter words

 

Have you ever done those puzzles where you have to get from one word to another by changing one letter at a time, with the rule that each change results in a proper word?  Usually this game is played with 4-letter words.  For example, to change LEAD to GOLD:  LEAD, LOAD, GOAD, GOLD.  Just 3 steps.  Of course you cannot do any better here, because 3 letters differ between the two words and you can only change one at a time.  But sometimes you need more steps.  I tried WOLF to LAMB and the best I could do was 7 steps: WOLF, WOLD, SOLD, SOLE, SALE, SAME, LAME, LAMB.

Thinking about this a bit, it occurred to me that there are probably some “isolated” words that cannot be turned into any other word.  Two examples I can think of are EVIL and EXAM.  I would imagine that most words can be reached from one another, but can we prove that?  And are there smaller groups of words that can be reached from each other, but not from any other words?  And, within the main group (if there is one), which pair takes the longest minimum number of steps to get from one word to the other?

It turns out that this is a wonderful application of the branch of mathematics known as Graph Theory.  In graph theory, a graph is not the notorious x-y plot of GCSE Maths, but a network of points (vertices) joined by lines (edges).  One problem solved in graph theory is to find the shortest path between two points, which we all now use when we turn on our satnav or use Google Maps (where “shortest” usually relates to time, rather than distance).  To find the shortest path, we use the Dijkstra algorithm (or one of its later improvements). Another less useful but mathematically very interesting application is in proving the four-colour map theorem, which says that you only need four colours to colour any map of regions so that adjacent regions have different colours. 

How is graph theory applied to our 4-letter words?  Each word becomes a vertex of a graph, and two vertices are joined by an edge if the corresponding words differ by just one letter, for example MIKE and MINE.  Our game is then played by applying the Dijkstra algorithm to find the shortest path between two vertices (words).  The longest such path, which is called the diameter of the graph, will answer our question about the most difficult pair of words.  Graphs are analysed and classified by looking at connected components, which are groups of vertices that can be reached from one another via edges but cannot reach any other vertices in the graph.  In a graph representing a country where the vertices are towns and the edges are roads, then islands (without bridges) would be connected components.  Such an analysis would answer our questions about isolated words and small groups.

So I thought it would be fun to apply some graph theory tools to our word game.  I started with the Official Scrabble Words list which has 5,638 4-letter words (some of which are admittedly rather obscure!).  It turns out that 5,563 of them are in one huge connected component, so can be reached one from another.  Of the 75 remaining words, 59 are completely isolated, and the other 16 are in tiny islands of no more than 3 words, the first of which is AMMO-AMBO-UMBO.

And what is the diameter of the graph (or more correctly, the subgraph consisting of the huge connected component?  It turns out to be 15, with many examples, but the first in my list was UNAU to YEOW (are these really words?):  UNAU, UNAI, UNCI, UNCE, ONCE, ONIE, OWIE, OWSE, OOSE, MOSE, MESE, MENE, MENU, MEOU, MEOW, YEOW.  And showing the limits of my vocabulary, the best answer for WOLF to LAMB is 6 steps, one better than I could manage: WOLF, WOLD, WALD, WALE, WAME, LAME, LAMB.

The fun continues: Another interesting question is to find the “best-placed” word – one whose worst pairing has the shortest number of hops.  This is the (Jordan) centre of the graph.  (If you think about it, the physical centre of a disc is the point whose longest distance to any point in the disc is the shortest).  If you’re still with me, that shortest longest distance (sic), called the radius of the graph, must be at least 8.  Why?  Because, if it were 7, you could get from any word to the centre in 7 steps, and from the centre to any other word in 7 steps, making 14 in total, and we know there are some pairs that need 15 steps.

Well, it turns out that the radius is 8, and is achieved by just one word, AIAS (an aia is a maid or nurse, in India).  So there you have it – you can get between any two four-letter words in 15 steps or fewer, and you can even guarantee 16 or fewer by visiting the word AIAS on the way.

One last result: the average number of steps between two 4-letter words is about 4.8.  This is not a million miles from the average "distance" between Facebook friends, which on research.facebook.com is reported as 3.57.

UPDATE for 2, 3 and 5 letter words

 So I had to repeat this work for other word lengths of course!  The results are in this table: