DupeFinder is a simple application for locating, moving, renaming and deleting duplicate files in a directory structure. It's perfect both for users who haven't kept their hard drives very well organized and need to do some cleaning to free space and for users who like to keep lots of backup copies of important data "just in case" something bad should happen.
An application designed for only occasional maintenance such as this needs to be easy to learn so that users don't spend more time figuring it out than actually getting work done with it. DupeFinder sports a very clean and simple interface that stays out of the way and let's you concentrate on what's important: your data.
News
- 2008.05.11
- DupeFinder 1.1.0 is available. This version removes the need for an external md5sum command line utility. This improves performance calculating MD5 digests for small files and eliminates a cumbersome dependency for Windows users.
- 2006.06.13
- DupeFinder 1.0.2 is available. This is a bugfix version to fix problems with new versions of Qt/PyQt and to avoid problems with symbolic links.
- 2005.06.23
- DupeFinder 1.0.1 is available. This is a bugfix version to fix a problem where the interface code did not work correctly with newer versions of PyQt (3.14 or newer). Only people with problems running the 1.0 version need to download this update.
- 2004.05.12
- DupeFinder 1.0 released.
Features
Although DupeFinder is a quite small application, it should have all of the features you will need to remove and reorganize large directories full of duplicate files:
- Well designed graphical interface with full tooltip and "What's This?" question button support, useful in an application which you probably won't need to use frequently
- Quick processing by eliminating analysis of unwanted data through file extension filtering
- View files in external applications by double-clicking
- Rename files in place or move to new locations
- Default settings disallow deletion of all copies of duplicate files to prevent accidental data loss
- Generate simple reports identifying groups of duplicate files for later processing
While everything works pretty well in most cases, there are a few issues with DupeFinder to be aware of. I hope to fix most of the following bugs sometime soon:
- May crash if files containing "~" or ":" characters are encountered*
- Zero byte files cannot be deleted*
- May not be able to delete files with Unicode characters in filename
- Only one file viewer can be selected at a time
- Display does not update if identified duplicates are moved, renamed or modified external to DupeFinder
* these bugs were previously reported but no longer appear to be true, and likely only occur with older versions of Python, Qt, or PyQt.
Requirements
DupeFinder is built on two primary tools: the Python language and the Qt application toolkit. A Python interpreter and the Qt libraries are included in most desktop Linux, BSD and UNIX distributions. Mac OS X (at least the newer versions) includes Python, and Qt is also available for free, though it is not part of a standard install.
Qt is primarily a C++ toolkit, so this means that the PyQt Qt bindings for Python are also required. These are not standard on many/most Linux, etc. distributions, though they are available for all of the systems mentioned.
Versions previous to 1.1.0 require the md5sum utility. This utility is standard on Linux and similar systems, though I've read on Mac OS X it goes by the name md5 instead. I have not confirmed this, but if so then simply change the single occurrence of md5sum in FindDupFiles.py to md5 to run the app on a Mac.
Running DupeFinder on Windows should be possible but probably isn't worth the effort, unless most of the components are already in place for other applications. Qt and PyQt for Windows are only available with a commercial license (this will change when Qt 4 is released). Python is a separate install. Alternatively it is probably possible to satisfy all of the dependencies through X11 on Cygwin.
One more thing: although DupeFinder is intended to be run graphically and interactively, the FindDupFiles.py script can be run standalone from the console. It takes a root search directory followed by any number of file extension filters as command line arguments and outputs the identified duplicate file groups (in no particular order) to STDOUT. This output can be piped to a pager such as less for immediate inspection or redirected straight to a text file using the ">" shell operator (on UNIX-like systems) for logging/reporting.
Screenshots
Here's a couple of images showing DupeFinder in action. There's not much more to it than what you see here, actions to choose directories and move or rename files utilize standard Qt dialogs.
DupeFinder 1.0 Start Dialog
DupeFinder 1.0 Results Dialog
License
DupeFinder is Free Software, and is licensed under the GPL (GNU Public License) version 2.0.
Downloads
DupeFinder is currently available only as Python source. Standalone binaries may be made available in the future.
- DupeFinder 1.1.0 Source
-
Everything needed to run the app, assuming Python, Qt, PyQt and the md5sum utility are installed ;-)
This package contains four source files in a self-contained directory. Simply extract the data from the archive to any desired location, then run the application by executing
python DupeFinder.py
inside the directory from your system's command line.Also included are the Qt Designer *.ui files for the two dialog classes. Neither is necessary for running the program (two of the *.py files are these interfaces compiled directly into Python code), but any developers who want to modify DupeFinder will need them, and they're small enough to not warrant separate downloads.
Older versions available here.
- DupeFinder Test Data
-
A small directory structure containing files which can be used to test the capabilities of DupeFinder in a risk free manner. File names identify the file size and content, and identical files all have the same main name but may differ in extension, e.g. file 2a is the same as 2a.ext, and is the same size but contains different data than file 2b.
Contact me at arkaein@monsterden.net with any questions, suggestions, bug reports or patches for DupeFinder.