Introduction to Genetic Algorithms in C#

A long time ago I mentioned in this post that I was planning on writing up some notes I made at university about Genetic Algorithms (from now on, known as GAs) and my version of a very simple example in C#. Years later…here it is! C# isn’t the most popular choice for artificial or natural intelligence programming, that job is largely the domain of Java or other more academic friendly languages. This means there aren’t a great deal of C# examples out there for neural networks, search and genetic algorithms and programming.

Continue Reading


Installing MVC3 on Mono with Ubuntu

I managed to get the wiki engine I spend a lot of more spare time writing, Roadkill working on Ubuntu with Mono this weekend. Unfortunately for me, a lot of the documentation is patchy which meant it took a few hours to get it up and running by scouring Stackoverflow, blogs and news groups. It is infact very simple to get MVC3 working with Apache on Linux, provided you have the right Apache config settings and are willing to add a few hacks into your code to cater for the gaps (NotImplementedExceptions) in the Mono framework.

Below are two snippets for getting Mono and MVC3 going with Ubuntu. The majority of the credit goes to:

Copy the second snippet into /var/www/default.txt and save the first snippet as “install.sh” into your user directory, running it using “sudo sh install.sh”. The full Roadkill bash script can be found here. I started making EC2 AMIs for the installation, and then found a few issues that made me eventually abandon my dreams of a cheap $5 Roadkill Ubuntu server, as it would be a lot of effort maintaining both Windows and Mono tests for Roadkill.

Don’t take that as criticism of Mono + MVC3 though – Roadkill does a lot with the framework that most apps wouldn’t. If you are running data-driven MVC apps, then using a database like MongoDB would work very well with Mono on Ubuntu.



Object, Donut, OutputCache and Browser Caching in ASP.NET MVC

I’ve spent the last week working on sorting out the caching in Roadkill as prior to 1.6 it relied on NHibernate’s second level in memory cache and some incorrect 304s.

Roadkill now has 3 levels of caching:

Outputcache/donut caching

This is caching the view result aka the HTML output and returning it, to save the overhead of the view engine running. I’ve permanently disabled this for now as it was a bit of a premature optimisation until Roadkill installs are done for large sites. There’s an MVC donut caching package which allows you decorate actions with an attribute to set them as cacheable, but unfortunately it returns straight away from the action, which meant it couldn’t be used with the other existing attributes in Roadkill.

The razor view engine can already handle 1000s of requests per second on a standard server, so it won’t be a major bottle neck. There’s a number of other optimisations that also be done, such as disabling the other view engines, but in reality browser caching combined with modern fast connections make this a bit redundant until Roadkill hits Wikipedia’s page view count, which will probably be never.

Object caching

The new ORM I’ve switched to, Lightspeed, has second level caching built in, but I’ve put an in-memory cache layer after it in the ‘service’ layer, so that when you’re not logged in all pages have their markdown transformed into HTML and then thrown into the cache. The cache uses the System.Runtime.Cache so can be scaled up to a distributed cache like Azure or Memcached at any point.

Browser Caching

304notmodified

This is where I learnt a lot about client-side caching, specifically in conjunction with MVC. Roadkill version 1.5 and earlier added a if-not-modified header check but I discovered how it was doing it was completely wrong. There are basically 3 variations of browser caching you can do, which aren’t mutually exclusive:

Url based with expires

This is where you add a ‘thumbprint’ to the file or resource, for example appending a timestamp to the end of a filename or onto a querystring after the file. This is then combined with a strict cache header.

Time/date based expiry (strict header)

To make url based caching have any effect, your pages or files need to set one of the two possible expiry headers: max age or expires. These are known as strict cache headers as the browser is meant to obey them or risk being put in the naughty corner. Expires is a date in the future the resource is due to expire, in an ISO format. Max-age is an alternative and is seconds in the future. You’re not meant to set an expiry date of over a year, and as far as I can tell modern browsers support both equally well.

Etag and If modified since (loose caching)

Obviously once you set a resource to expire in the future, you can’t then force the browser to refresh without asking the person using the browser to clear their cache. Using a thumbprint in your url is one way around this but is only really practical if you can dynamically generate file names, and Google reports that it also causes 404s and cache problems with older versions of Squid, which is the most popular cache server that a lot of ISPs use.

The smart alternative is to set your expires or max-age header to a date in the past, and check the resource date on the server on every request. This is how Roadkill does it, by returning a 304 HTTP status code if the content hasn’t changed. One caveat to doing this is to make sure you also set the content length to zero or the entire file content (payload) is sent back. In MVC the HttpStatusCodeResult does this for you.

There are two ways of letting the browser check the freshness of the resources: using etags and using last modified headers. The Google page linked at the bottom has more info on this, but in summary the etag technique requires a unique hash for the resource which would change each time the file changes, making a date slightly more efficient for larger files and pages as you’re not having to calculate any type of MD5.

The overhead of this 304 method is the browser still has to make server requests, however the server only gets bytes of data (just the header) rather than the full content. You get greater control and ensure your site is always fresh at the expense of more requests from the browser, which is exactly what’s need for lower page view wikis and CMSs.

If you’re guaranteed not to change the site more than once a day, week, month then obviously the strict caching methods is better suited. One feature I’m hoping to add to Roadkill in the future is for users to be able to set expiry dates on pages, as an advanced feature you can switch in.

If you’d like to see the caching in action in Roadkill, the source code files are here and a usage of the attribute here.

Most of the information on browser caching is from this excellent Google guide, and the HTTP RFC


How to Write a Spelling Corrector in C#

Peter Norvig’s spelling corrector is fairly famous in nerd-circles as it describes the first steps in creating a Google-style spelling corrector that will take something like “Speling”, recognise its closest word and reply “Did you mean Spelling?”.

His original is a few years old now, and only 21 lines of compact Python. Below is my attempt to convert it to C#. There are already some links to C# conversions on Peter Norvig’s page, however I wanted one that was closer to C# and didn’t rely on a 3rd party library for collection helpers, as Frederic Torres’s does. The other C# version was a 404 last time I looked.

Hopefully there’s no obvioius errors, but feel free to shout idjut if there are – I am a Python newbie and got a lot of help from the Java conversion and trawling through a few Python tutorials on its crazy (but admittedly really concise) set syntax, and also with the help of Simon my colleague pointing out some glaring errors. I haven’t gone for brevity, as it’s 140+ lines of code, nor efficiency. Peter Norvig describes some speed ups you can perform on the original page, and one obvious one is to use a standard dictionary file and store this in a Bloom filter, with the trained words stored in the same dictionary format, looking through the bloom filter as a second measure.

If you want the full solution it’s here, including the “big.txt” dictionary file.



Moving away from NHibernate in Roadkill

In the next version of Roadkill (1.6) I’ve moved away from NHibernate, the ORM that has been powering it for two years since version 1 and to a commercially supported ORM called Lightspeed.

Unlike the Umbraco drama that emerged last year this wasn’t for performance reasons or a badly engineered business layer. I give myself a big pat on back infact, as it was really simple to switch firstly to a Mongodb repository and then over to Lightspeed. Nor was it from lack of NH experience as I’ve used NH day to day for 3 or so years.

Continue Reading


UdpTraceListener – a UDP TraceListener compatible with log4net/log4j

This class is a TraceListener implementation that uses the log4j XML format and sends the XML to a UDP socket. This means you can configure a trace listener to send all your logs to something like http://log2console.codeplex.com.. There is also Harvester which has a TraceListener implementation for streaming over a network.

udptracelistener

Just configure a new UDP receiver in log4console and they messages will start to stream through. This works nicely with LightSpeed, where you can configure it to send its SQL statements to a TraceListener.

Feel free to include the source wherever you need it, it’s under an MIT licence.



Creating an instance from a string or type name in StructureMap

As part of the refactor I’m doing for Roadkill, I’m loading custom types from the config file as default instances, via StructureMap. The types are defined as strings in the config file, and in future more plugins will be loaded this way.

This took me around 4 hours to figure out over the past few days, partly from lots of false leads out there but also missing the blindingly object ObjectFactory.Model property. The way you can achieve it is fairly straightforward in 2.5+. I have a base UserManager type, and a concrete type that implements it.

ObjectFactory.Initialize(x =>
{
	x.Scan(scanner =>
	{
		scanner.TheCallingAssembly();
		scanner.SingleImplementationsOfInterface();
		scanner.WithDefaultConventions();

		// Plugin UserManagers
		if (Directory.Exists(pluginPath))
			scanner.AssembliesFromPath(pluginPath);
		
		// etc.
	}
});
ObjectFactory.Configure(x =>
{
	string pluginName = "Roadkill.SomePlugin.MyCustomUserManager";
	InstanceRef userManagerRef = ObjectFactory.Model.InstancesOf<UserManager>().FirstOrDefault(t => t.ConcreteType.FullName == pluginName);
	x.For<UserManager>().HybridHttpOrThreadLocalScoped().TheDefault.Is.OfConcreteType(userManagerRef.ConcreteType);
});

Three things worth noting if you want to do this:

  1. You want to get the InstancesOf and not instances – you don’t want it create an instance of you plugin type before everything else is registered.
  2. You may need to perform this inside Configure(), as Initialize scans for types.
  3. An obsolete message shows for TheDefault.Is.OfConcreteType but the method it tells you to use doesn’t work with a Type

The full implementation is on Bitbucket Roadkill repository under

Roadkill / src / Roadkill.Core / IoC / IocSetup.cs


Moqs versus Stubs (2013 edition)

This is an age old debate which I’ll chirp in with my opinion. According to Gojko Adzic there’s two types of TDD people in the world: classic stubbers and the new(ish)-school Moqers (or Mockito in the Java world). MOQ is an amazing tool, but consider the following test code:

_mockRepository = new Mock();
_mockRepository.Setup(x => x.GetPageById(It.IsAny<int>())).Returns<int>(x => _pageList.FirstOrDefault(p => p.Id == x));
_mockRepository.Setup(x => x.FindPagesContainingTag(It.IsAny<string>())).Returns<string>(x => _pageList.Where(p => p.Tags.ToLower().Contains(x.ToLower())));

Eeek. Is that easy to read? Does FindPagesContainingTag return a string? No, an IEnumerable of Page objects. Compare this to a stub implementation (the classical way of mocking) of the IRepository:

public class RepositoryStub : IRepository
{
    public IEnumerable<Page>FindPagesContainingTag(string tag)
    {
        return _pageList.Where(p => p.Tags.ToLower().Contains(tag.Tolower()));
    }
}

The other problem I have with the Moq example is you essentially have the stub class’s source in your [Setup] method most of the time. The advice Bob Martin gives for tests is they’re meant to be very easy to read for the next author, there should be no confusion at all. It’s fairly clear Moq doesn’t promote this, or at least not the messy way I write the mocks. So I’m probably in the classical camp, but is there no half-way house for the two in .NET?


From Squarespace to WordPress

The site is now powered by WordPress, having tried it out a few years ago and rejected it, I’ve come to like it and its plugin architecture – it’s fairly slick now and supports everything I needed. It has a wealth of themes which Squarespace doesn’t have, far more flexible and about the same price.

The move was frustrating, as Squarespace exports its content in Typepad format, a terrible non-structured export format which meant I lost all the formatting of the snippets. I ended up having to reformat all 130 or so blog post snippets (around 200 chunks of C#/XML blocks in total) putting the majority of them on Github which retains the formatting.

I also moved away from Disqus at the same time, to keep the comments in one place. So, stick to WordPress for code blogs would be my advice to any other dev thinking of starting up a repository of snippets and code snippets.


Fixing “No response from server for url http://localhost:7055/hub/session/” problems with Selenium

If you’re using a Windows Server 2008 R2 for your CI builds and tests, you may run into issues with running Selenium tests that Server 2003 didn’t have. It took me a good 1/2 day of trial and error to fix the problem, but it’s so obscure I thought I’d share. The error you’ll see from Selenium is usually this one:

“No response from server for url http://localhost:7055/hub/session/…”

Continue Reading


Pages:1234567...14