Have you ever done those puzzles where you have to get from
one word to another by changing one letter at a time, with the rule that each
change results in a proper word? Usually
this game is played with 4-letter words.
For example, to change LEAD to GOLD:
LEAD, LOAD, GOAD, GOLD. Just 3
steps. Of course you cannot do any better
here, because 3 letters differ between the two words and you can only change
one at a time. But sometimes you need
more steps. I tried WOLF to LAMB and the
best I could do was 7 steps: WOLF, WOLD, SOLD, SOLE, SALE, SAME, LAME, LAMB.
Thinking about this a bit, it occurred to me that there are
probably some “isolated” words that cannot be turned into any other word. Two examples I can think of are EVIL and
EXAM. I would imagine that most words
can be reached from one another, but can we prove that? And are there smaller groups of words that
can be reached from each other, but not from any other words? And, within the main group (if there is one),
which pair takes the longest minimum number of steps to get from one word to
the other?
It turns out that this is a wonderful application of the
branch of mathematics known as Graph Theory. In graph theory, a graph is not the notorious
x-y plot of GCSE Maths, but a network of points (vertices) joined by lines
(edges). One problem solved in
graph theory is to find the shortest path between two points, which we all now
use when we turn on our satnav or use Google Maps (where “shortest” usually relates
to time, rather than distance). To find
the shortest path, we use the Dijkstra algorithm (or one of its later
improvements). Another less useful but mathematically very
interesting application is in proving the four-colour map theorem, which says
that you only need four colours to colour any map of regions so that adjacent regions
have different colours.
How is graph theory applied to our 4-letter words? Each word becomes a vertex of a graph, and
two vertices are joined by an edge if the corresponding words differ by just
one letter, for example MIKE and MINE.
Our game is then played by applying the Dijkstra algorithm to find the shortest
path between two vertices (words). The
longest such path, which is called the diameter of the graph, will
answer our question about the most difficult pair of words. Graphs are analysed and classified by
looking at connected components, which are groups of vertices that can
be reached from one another via edges but cannot reach any other vertices in
the graph. In a graph representing a
country where the vertices are towns and the edges are roads, then islands
(without bridges) would be connected components. Such an analysis would answer our questions
about isolated words and small groups.
So I thought it would be fun to apply some graph theory tools
to our word game. I started with the Official
Scrabble Words list which has 5,638 4-letter words (some of which are
admittedly rather obscure!). It turns
out that 5,563 of them are in one huge connected component, so can be reached
one from another. Of the 75 remaining
words, 59 are completely isolated, and the other 16 are in tiny islands of no
more than 3 words, the first of which is AMMO-AMBO-UMBO.
And what is the diameter of the graph (or more correctly,
the subgraph consisting of the huge connected component? It turns out to be 15, with many examples,
but the first in my list was UNAU to YEOW (are these really words?): UNAU, UNAI, UNCI, UNCE, ONCE, ONIE, OWIE, OWSE,
OOSE, MOSE, MESE, MENE, MENU, MEOU, MEOW, YEOW.
And showing the limits of my vocabulary, the best answer for WOLF to LAMB
is 6 steps, one better than I could manage: WOLF, WOLD, WALD, WALE, WAME, LAME,
LAMB.
The fun continues: Another interesting question is to find
the “best-placed” word – one whose worst pairing has the shortest number of
hops. This is the (Jordan)
centre of the graph. (If you think
about it, the physical centre of a disc is the point whose longest distance to any
point in the disc is the shortest). If
you’re still with me, that shortest longest distance (sic), called the radius
of the graph, must be at least 8. Why? Because, if it were 7, you could get from any
word to the centre in 7 steps, and from the centre to any other word in 7 steps, making 14 in total,
and we know there are some pairs that need 15 steps.
Well, it turns out that the radius is 8, and is
achieved by just one word, AIAS (an aia is a maid or nurse, in India). So there you have it – you can get between
any two four-letter words in 15 steps or fewer, and you can even guarantee 16
or fewer by visiting the word AIAS on the way.
One last result: the average number of steps between two 4-letter words is
about 4.8. This is not a million miles from the average "distance" between Facebook friends, which on research.facebook.com is reported as 3.57.
UPDATE for 2, 3 and 5 letter words
So I had to repeat this work for other word lengths of course! The results are in this table: