Posted on Friday April 5, 2013

How to Write a Spelling Corrector in C#

Github repository - feel free to send improvements

Peter Norvig’s spelling corrector is fairly famous in nerd-circles as it describes the first steps in creating a Google-style spelling corrector that will take something like “Speling”, recognise its closest word and reply “Did you mean Spelling?”.

His original is a few years old now, and only 21 lines of compact Python. Below is my attempt to convert it to C#. There are already some links to C# conversions on Peter Norvig’s page, however I wanted one that was closer to C# and didn’t rely on a 3rd party library for collection helpers, as Frederic Torres’s does. The other C# version was a 404 last time I looked.

Hopefully there’s no obvious errors, but feel free to reply if there are - I am a Python newbie and got a lot of help from the Java conversion and trawling through a few Python tutorials on its powerful-but-hard-to-read (but admittedly really concise) set syntax, and also with the help of Simon my colleague pointing out some glaring errors. I haven’t gone for brevity, as it’s 140+ lines of code, nor efficiency. Peter Norvig describes some speed ups you can perform on the original page, and one obvious one is to use a standard dictionary file and store this in a Bloom filter, with the trained words stored in the same dictionary format, looking through the bloom filter as a second measure.

The dictionary big.txt can be found here.