blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

Anthony Steele's Blog

Don't Repeat Yourself, part 2

I have written a small utility that I call DuplicateFinder, since it finds duplicate lines of code in files. 

I spent a year in the late 1990s working in small company. We asked prospective hires to submit code samples, and I was one of the people looking at the code. There are problems with this method, in that it's too easy to submit someone else's code as yours, but I learned a lot from it; mainly that the bar is quite low.

A distressing percentage of the code (around 50%) was code that I would not be proud of - mostly it exhibited the symptom of very long blocks of code in the form's event hander methods. Often enough, an event handler method would contain code very similar to the code in the next event handler. This lead me to conclude that the learning most needed was not design patterns or other fancy OO concepts but simple, basic procedural decomposition: elimination of duplication by factoring code into reusable, flexible, well-named, appropriately scoped methods. A lot follows from the coding principle of Don't Repeat Yourself.

Recently I wrote about this subject, noting that there were programs to detect duplicate code: Simian and the Duplicate detector function in Eclipse. Howard wrote about Simian before me. David Höhn offered to whip up a PERL script to do the same task. This spurred me on to think about doing it in C#, and I soon came up with a simple (but probably not very efficient) approach.

Implementing this, I found that the details were quite hairy for what was a small algorithm, but some test cases eventually whipped it into shape. Then I had to put it to one side; the long train commute during which I was tinkering with this code gave way to a car commute, and I was studying for a Microsoft certification.

However, the MS exam is behind me (I passed!) and I am now pleased to say that version 1.0 of DuplicateFinder is available on CodePlex here:
http://www.codeplex.com/DuplicateFinder


It's a simple C# utility, useable from the command line, but all the hard work is done in a library that could be hooked up to a different UI or integrated into other tools.

Source, executables and more details on the program are behind the link. Here's a sample session:

>DupFinder.exe -t4 test5*.txt
Processing in C:\Code\DuplicateFinder\TestData
2 files read
Duplicate of length 5 at:
 Line 2-6 in C:\Code\DuplicateFinder\TestData\Test5Lines1.txt
 Line 2-6 in C:\Code\DuplicateFinder\TestData\Test5Lines2.txt
1 duplicate found

I hope that you like it!

Published 12 June 2007 20:42 by Anthony.Steele

Comments

No Comments
Anonymous comments are disabled

About Anthony.Steele

Programmer in c# for Conchango
Powered by Community Server (Personal Edition), by Telligent Systems