Sunday, November 25, 2012

Python: Finding Duplicate Files

I've begun to learn a little bit of python because I'd like to make a script (that works in Windows or Linux) for finding duplicate files (whether the filenames are the same or not). This will mostly be useful in cleaning up my music library, for I'm suspicious that playing around with different programs has created more duplicates of my music files than I have already found.

Yes, I realize that someone out there has probably already written a script for such a task, but where would be the fun in using that? This is how I practice my love for programming.

duplicates.py

After toying with with python for a little bit (and realizing that I will have to spend more time on this later), here is the current state of my code:

For the syntax highlighting above, my thanks to the individual referenced upon clicking the green '?' in the upper-right.

Next steps

The following is what I have left to do:
  • Write better comments for my code
  • Choose one of two (or both!) routes for completion:
    • only duplicates within a folder count
    • duplicates count even if they are not in the same folder (this may be slightly harder)
  • Code the script's ability to recurse into subdirectories
  • Output a file with a list of folders from which duplicates should be manually removed
  • Tidy up the script by removing unnecessary debugging print-statements
I should also like to get this code up on Bitbucket at some point, but I may have to wait until winter break. Here is my mercurial Python repository. Please feel free to leave comments if I'm doing something completely stupid!

Mother-Approved Explanation

At the request of my mother, I add this section below. "If people are going to read your thoughts, you're going to have to make them simpler or they'll never return to your blog..." in her own words, more or less. The short explanation of this post is as follows:

I'm teaching myself a high-level, programming language (python) for fun. I'm using it to search out duplicates within my music library. I still have to comment my code better (so other programmers can understand it), and I still have more code to write before it is finished and tidied up. Once I'm done, I will hopefully have this code published online for others (and their mothers) to use.

No comments:

Post a Comment

Please keep your comments respectful and in the spirit of constructive criticism.