Addlestone, Addlestone golf links, Albany street, Albert hall, Albert road, Albert terrace, Aldershot, Asia, Astronomical exchange, Atlantic, Baker street, Banstead, Barnes, Barnet, Belsize road, Berkshire, Berlin, Birmingham, Bishopsgate street, Blackfriars bridge, Blackwater, British museum, Broadstairs, Brompton, Brompton road, Bushey park, Byfleet, Byfleet golf links, Byfleet road, Cannon street, Cardigan, Castle hill, Chalk farm, English Channel, Chatham, Chelmsford, Chertsey, Chipping barnet, Chipping ongar, Chobham, Chobham road, Clacton, Clapham, Clapham junction, Colchester, College arms – pub, Coombe, Crewe, Crouch, Crystal palace, Deal, Derby, Ditton, Dublin, Ealing, East barnet, East end, East ham, Edgware, Edgware road, Edinburgh, Earth, England, Epping, Epsom, Epsom downs, Epsom high street, Esher, Essex, Euston, Exhibition road, Eybridge, Fleet street, Foulness, Foundling hospital, France, Fulham, Fulham road, Germany, Great north road, Guildford, Hadley, Haggerston, Halliford, Hamburg, Hammersmith, Hampstead, Hampsted, Hampton, Hampton court, Hanwell, Harrow road, Harwich, Haverstock hill, High barnet, Highbury, Highgate, Horse guards, Horsell, Horsell bridge, Horsell common, Hounslow, Hoxton, Hyde park, Imperial institute, Inkerman, Inkerman barracks, Irish sea, Isleworth, Kensington, Kensington gardens, Kew, Kew lodge, Kilburn, Kingston, Kingston hill, Knaphill, Laleham, Lambeth, Langham, Langham hotel, Leatherhead, Lick observatory, Limehouse, Lisbon, Liverpool street, London, Malden, Maldon, Manchester, Marble arch, Marylebone, Marylebone road, Mauritius, Mars, Maybury, Maybury bridge, Maybury hill, Maybury inn, Merrow, Middlesex, Midland railway company, Mole, Molesey, Mortlake, Moscow, Natural history museum, Naze, Neasden, New barnet, New malden, Newhaven, Nice, Ockham, Old woking, Oriental college, Oriental terrace, Ostend, Ottershaw, Oxford street, Painshill, Painshill park, Paris, Park road, Park terraces, Parliament, Perrotin, Petersham, Pinner, Pompeii, Pool, Portland place, Portman square, Portsmouth, Primrose hill, Putney, Putney bridge, Putney common, Putney hill, Pyrford, Pyrford church, Regent street, Regent’s canal, Regent’s park, Richmond, Richmond bridge, Richmond hill, Richmond park, Ripley, Ripley street, Roehampton, Royal academy of arts, Send, Serpentine, Sheen, Shepperton, Shepperton church, Shepperton lock, Shoebury, Shoeburyness, Shoreditch, Sodom and gomorrah, South kensington, Southampton, Southend, Spotted dog – pub, St. albans, St. edmund’s terrace, St. george’s hill, St. haggerston, St. john’s wood, St. martin’s-le-grand, St. pancras, St. paul’s cathedral, Staines, Stanmore, Strand, Street cobham, Sunbury, Surrey, Thames, Thames valley, Thames-side, The mole, The wandle, Tillingham, Tower bridge, Trafalgar square, Twickenham, Upper halliford, Venus, Victoria, Virginia water, Walham green, Waltham abbey, Waltham abbey powder mills, Walton, Wandsworth, Waterloo, Waterloo bridge, Waterloo road, Wellington street, West hill, West surrey, Westbourne park, Westminster, Westminster bridge, Weybridge, Wimbledon, Wimbledon common, Winchester, Windsor, Woking, Woolwich, Zoological gardens.
You may be confused at first. Please read on.
As a kid, I was first brought to the attention of “The War of the Worlds” through Jeff Wayne’s musical adaption, and naturally it freaked me out! (Seriously it gave me nightmares and my fear of the dark)
I have since watched the film 2005 film, and again more recently.
Although the two adaptations had scenes that crossed over, there were differences between them and so I set upon finding which version was more true.
(In short we are lucky to have so many great versions of the story, including the novel itself)
The original story was written by H. G. Wells which was first published in 1897. I was surprised at the incredible depth and detail Wells takes the trouble to recount as the lonesome protagonist during the martians’ escapade.
The thing that struck me most about the novel are the endless places that Wells mentions in the book to really help the reader feel like they were in amongst the chaos. Not much thought was given to the distance between two places until you remind yourself that 1897 was a car-less time and infact travelling by foot or carriage, there were sometimes imense distances involved. I wanted to create something that would be more visual to a reader in modern times, a map like the sort that J. R. R. Tolkein included in his fiction novels.
The Map Idea
Wouldn’t it be cool to piece together a map based on the places that feature in the book? Luckily we have a great platform in the form of Google Maps that will allow us to place pins on places of interest. Before this stage however there is work to be done.
The age of the novel allows anyone to gain access to the text in many different formats for either free or for a small fee. I downloaded my copy at Gutenberg.org.
I downloaded the most basic format (Plaintext UTF-8) which will allow me to manipulate the novel without the need for an advanced parser.
Wells capitalizes all the places that he mentions in the book, which helps greatly when trying to identify them. I split the entire book into a list of words which would have been seperated by the space character.
From this, I created a “shortlist” of words that were capitalized. During this process, I made sure to keep consecutive capitalized words together incase they made up a place name.
The shortlist was rather large, so I got hold of Ordnance Survey data and created a list of all unique places and roads that appear throughout the UK. I then ran the shortlist through the OS data, yet again creating two new lists;
- Shortlist strings that match with the OS data
- Shortlist strings that didn’t match the OS data
Both the lists had duplicate items in them. The problem stemmed from the fact that in order to keep place names like “St. Paul’s” together, I had to ignore all punctuation. So we might have 2 similar entries within the list, such like “St. Paul’s,” and “St. Paul’s.”.
This became a bigger issue then I first thought, with sometimes 5 duplicates for the same place. However, after having a total of 1500 strings to sort through, I wasn’t too troubled with refining it further by hand.
I went through the list manually, checking the “more probable” list first against the novel. I’m glad that I did this method; yes it seems a little long winded, but there are some corner cases that this resolves.
For example, Wells writes about St. Pauls Cathedral, but simply refers to it as “St. Paul’s”. There isn’t a place simply called St. Paul’s however in context, you can understand that the Cathedral was exactly what he was talking about.
There is another word he uses such as “Naze”. In the text, he refers to it as “the Naze”, but in context, he might be refering to either Walton-on-the Naze or a place just north of Walton called The Naze. I am unsure as to the exact location of this place as of yet, but I have left it in the final list. (I have a feeling it might be the latter, although Walton-on-the-Naze railway station was opened in 1867, fairly recent in relation to the publish date)
Amongst the “less probable” list, I found some interesting places such as planets, continents, and altogether different countries. He even refers to an observatory in America which now seems to be a historic site of interest.
The final list is mostly a direct copy of places from the book, shown exactly as found from the text. As stated above, there are changes that I have made, that might not be directly found in the novel, but rather (with 100% certainty) of the place being refered to in the book.
So to round this off; although this endeavor didn’t exactly perform as planned, I am happier with the final list then I might have been simply allowing a simple algorithm choose what it thinks to be a place and discard the rest. Some manual work has gone into this to make a nice comprehensive list to work with when deciding where to pin these places to a map.
After refining the list to the one that is mostly complete, I may now have something to work with. I could pick out the whole sentences that contain the place names and try and create a weighing algorithm to more closeley identify places. I wonder how many similar words are used in sentences where other place names reside.
Also I wonder how many place names contain similar sequences of letters that may also be an identifier in trying to decide whether a word is a noun or not.