One of the main features of Git is that it keeps track of everything that has been committed… everything.
This can be an issue if, back in time, someone had committed some huge files. Even if these files have been removed they are kept in the Git repository archives.
In the end all these old files can slow down a lot of your Git manipulations.
In this little tutorial we will see how we can find these files and delete them properly by rewriting the Git history.
Step 0: Prepare your Git repository
Before doing the cleanup, be sure that you have cloned the whole repository and branches.
You can use this small bash script that I have found here to help you.
#!/bin/bash
for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master`; do
git branch --track ${branch##*/} $branch
done
Step 1: Export and filter the big files
From the Git repository that you want to clean-up we will first export a list of all the files ever versioned. Don't worry if this takes a while to process.
$ git verify-pack -v objects/pack/*.idx > files.txt
From this list we will then be able to filter and find what we are looking for. Here is the magic command.
$ join -o "1.1 1.2 2.3" < (git rev-list --objects --all | sort) < ( cat files.txt | sort -k3 -g | tail -5 | sort) | sort -k3 -g
From this command we are extracting the 5 biggest files from the Git repository, their paths and their sizes. This will give you something like this.
bca72b793ab3db0e423a1865ee7cae7e273eca94 Assets/Art/Textures/Big_File.psd 258713292
633db8bdc72d227ca2e054fd006dac4091078a2d Assets/Art/Environment/Textures/Im_So_Big.psd 260855564
208e445678928260fbac309ee3ba522e3fd84f50 Assets/Art/Textures/What_A_Big_File.psd 290092325
9b2bffb966216587ee14fa24e74e663fa0eff5de Assets/Art/Environment/Textures/Wow_Im_So_Huge.psd 301903493
47895d134b5d228f97fb9b279aafe3d1346a4a20 Assets/Art/Environment/Textures/Im_Bigger_Than_You_Think.psd 353411556
Step 2: Remove the files
If you are sure that these files are no longer relevant and can be removed properly from the repository, let's nuke them one by one!
$ git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all
Replace filename by the path of the file and wait for Git to rewrite everything properly.
Step 3: Push and tell everyone about your changes
It's time to push your changes !
$ git push origin --force --all
$ git push origin --force --tags
Now you have to tell all the contributors to rebase their local copy or get a fresh version from the repository, that is the tricky part but it's mandatory if you want everyone to have the cleaned-up version ;)