A while ago (June last year) I wrote a utility to detect runs of duplicate lines in files, which is useful for looking for repetitive code that should be refactored. Then I stopped work on it, since it was done. The original blog post is here and the project is up on CodePlex here.
This year I have revisited it with two new features which I think make it much more usable.
The first feature is an MSBuild Task wrapper to compliment the command line interface. This means that instead of using a command line like:
DuplicateFinder -r -t8 -eAssemblyInfo *.cs
You can now also use an equivalent MSBuild Script:
<Project DefaultTargets="RunTest"
xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<UsingTask TaskName="DuplicateFinder"
AssemblyFile="$(MSBuildExtensionsPath)\DuplicateFinder.Tasks.dll" />
<ItemGroup>
<TestFiles
Include="..\..\**\*.cs"
Exclude="..\..\**\AssemblyInfo.cs"
/>
</ItemGroup>
<Target Name="RunTest">
<DuplicateFinder Files="@(TestFiles)"
DuplicateThreshold="8"
/>
</Target>
</Project>
This may be more verbose, but it has the advantage that you can run it as part of your automated build process. In case you don't always look at the build output, you can use the CountThresholdForError and LengthThresholdForError options to the MSBuild task to fail the build if the duplicates in the source are too numerous or too long.
The second new feature is to cut down or eliminate the false positives. Whole files can already be excluded, but we also need to exclude duplicates where the first line starts with a particular prefix. I'll show you the reason for this.
If we run the duplicate finder on its own source code (excluding the generated AssemblyInfo.cs files, which we know will all look much the same), we get the following:
> DupFinder.exe -r -t7 -eAssemblyInfo.cs *.cs
Processing in C:\Temp\DuplicateFinder\DuplicateFinderLib
6 files read
Duplicate of length 7 at:
Line 1-7 in C:\Temp\DuplicateFinder\DuplicateFinderLib\DuplicateEventArgs.cs
Line 1-7 in C:\Temp\DuplicateFinder\DuplicateFinderLib\LineItem.cs
Line 1-7 in C:\Temp\DuplicateFinder\DuplicateFinderLib\LineItemList.cs
1 duplicate found
The duplicate, line 1-7 of three different files, consists of these lines:
using System;
using System.Collections.Generic;
using System.Text;
namespace DuplicateFinderLib
{
/// <summary>
While these lines may be the same, I don't regard them as "bad" or "cut and paste" code. So I exclude duplicates where the first line starts with "using". Like so:
DupFinder.exe -r -t7 -eAssemblyInfo.cs -xusing *.cs
Processing in C:\Temp\DuplicateFinder\DuplicateFinderLib
6 files read
0 duplicates found
You can also do this in the MSBuild file. More than one prefix can be excluded.