Merging directories with the magic of Python.

We finally got the last projects out of that monstrosity ‘Final Cut Server’, but one project at the end was a nightmare to export, and we weren’t sure which files from the end actually were in a different version of the project that we already had.

We essentially needed to merge two different versions of projects directories, making sure not to lose any files, and we didn’t want to lose the organization of the files.

Here’s a quick python script I wipped up to make it quicker.

With the 4000 odd files in the project, it took under a second to run, and it turned out we only had about 20 files which hadn’t already been merged.  Much simpler to sort out.

The script took about 10 minutes to write and test. This is why you should learn to program.  Hacking stuff like this up is easy, and saves *so* much time.

(Yes, you probably could do this with a couple of lines of perl or BASH, but what the heck.)

#!/usr/bin/env python

from subprocess import Popen, PIPE
from os import stat
from os.path import basename, abspath

def run(*command):
    found = Popen(command, stdout=PIPE)
    return found.communicate()[0]

def files_in(dirname):
    return [x for x in run('find', abspath(dirname), '-type','f', '-print0').split(chr(0)) if x]

if __name__ == '__main__':
    from sys import argv

    try:
        sourcedir = files_in(argv[1])
        destdir = files_in(argv[2])
    except IndexError:
        print 'Usage:'
        print argv[0], '  '
        print 'Where you want to check if files in  are also in '
        print '(but perhaps with a different relative path)'
        exit(1)

    print '---------------------------------------------------'
    print '{0} files in {1}'.format(len(sourcedir), abspath(argv[1]))
    print '{0} files in {1}'.format(len(destdir), abspath(argv[2]))
    print '---------------------------------------------------'

    destnames = {}

    for destfile in destdir:
        destnames[basename(destfile)] = {'size': stat(destfile).st_size,
                                         'path': destfile }

    for newfile in sourcedir:
        base = basename(newfile)
        if base not in destnames:
            print newfile, 'is NOT in the new dir'
        else:
            destfile = destnames[base]
            if stat(newfile).st_size != destfile['size']:
                print '{0}({1}) differs from {2}({3})'.format(
                      newfile, stat(newfile).st_size,
                      destfile['path'], destfile['size'])