Removing a Large File from History
You accidentally committed a 500 MB video, a compiled binary, or a dataset to your repository. Even after you delete the file and commit again, the file still lives inside Git's object database — every clone will download it. This page shows you how to permanently erase large files from the entire commit history.
Why deleting the file is not enough
The file is gone from HEAD but not from history
# You deleted the file and committed git rm dataset.csv git commit -m "Remove large dataset" git push origin main # But the repo is still huge git count-objects -vH # size-pack: 487.3 MiB ← the file is still in .git/objects # And anyone who clones gets all 487 MB git clone https://github.com/user/repo.git # slow!
Step 1: Find the large files
Find the top 10 largest objects in the repo
# List all objects sorted by size
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 --reverse | head -10 | awk '{ printf "%-60s %s\n", $3, $2 }' | numfmt --field=2 --to=iec
# Example output:
# videos/demo.mp4 487M
# data/training-set.csv 34M
# assets/background-hd.png 12MSimpler: check which commits added large files
git log --all --stat | grep -E '^s+[0-9]+ file' | sort -rn | head -5
Step 2: Remove with git filter-repo
git filter-repo is the recommended tool (the official replacement for the deprecated git filter-branch). It is significantly faster and safer.
Install git filter-repo
# macOS brew install git-filter-repo # pip (any platform) pip install git-filter-repo # Check the version git filter-repo --version
Remove a specific file from all history
# IMPORTANT: work on a fresh clone (filter-repo requires a clean state) git clone https://github.com/user/repo.git repo-clean cd repo-clean # Remove the file from every commit in history git filter-repo --path videos/demo.mp4 --invert-paths # Rewrite 1432 commits... 100% # Ref 'refs/heads/main' was rewritten # Verify the file is gone git log --all --full-history -- videos/demo.mp4 # (no output — it is gone from history) # Check the new repo size git count-objects -vH # size-pack: 2.1 MiB ← down from 487 MB
Remove multiple files or a whole directory
# Remove multiple specific files git filter-repo --path videos/demo.mp4 --path data/training-set.csv --invert-paths # Remove an entire directory git filter-repo --path videos/ --invert-paths # Remove by glob pattern (filter-repo supports Python regex) git filter-repo --path-glob '*.mp4' --invert-paths
Step 3: Force-push the rewritten history
Push all rewritten refs to the remote
# filter-repo removes the 'origin' remote as a safety measure # Re-add it git remote add origin https://github.com/user/repo.git # Force-push all branches and tags git push origin --all --force-with-lease git push origin --tags --force-with-lease
Step 4: Prune the remote cache
Even after force-pushing, hosting platforms cache old objects for a period (GitHub caches for up to 90 days). Contact support if you need immediate removal of sensitive files from a public repository.
Force garbage collection locally
git reflog expire --expire=now --all git gc --prune=now --aggressive
Alternative: BFG Repo Cleaner
BFG is a Java-based tool that is faster for simple use cases like "remove all files larger than 50 MB".
Using BFG
# Download BFG jar from https://rtyley.github.io/bfg-repo-cleaner/ # Requires Java # Clone a bare repo (required by BFG) git clone --mirror https://github.com/user/repo.git repo.git cd repo.git # Remove all blobs larger than 50 MB java -jar bfg.jar --strip-blobs-bigger-than 50M # Remove a specific file from history java -jar bfg.jar --delete-files demo.mp4 # Clean up and push git reflog expire --expire=now --all && git gc --prune=now --aggressive git push
Comparison: git filter-repo vs BFG
Feature | git filter-repo | BFG Repo Cleaner |
|---|---|---|
Language | Python | Java / Scala |
Speed | Very fast | Fast |
Flexibility | Very high (custom callbacks, regex) | Good for common cases |
Officially recommended | Yes (replaces filter-branch) | Popular but unofficial |
Remove by size | Possible with script | Built-in: --strip-blobs-bigger-than 50M |
Remove sensitive text | Yes: --replace-text | Yes: --replace-text |
Installation | pip install git-filter-repo | Download JAR, needs Java |
What teammates need to do after a history rewrite
Delete their local clone of the repository.
Re-clone from the remote:
git clone https://github.com/user/repo.git.Re-apply any local branches they had:
git checkout -b my-branch origin/my-branch.
Prevention: stop large files before they are committed
Add common large file types to .gitignore
# .gitignore *.mp4 *.avi *.mov *.zip *.tar.gz *.csv *.parquet *.pkl *.bin *.exe *.dmg node_modules/ dist/ build/
Use Git LFS for files that legitimately belong in the repo
# Install Git LFS git lfs install # Track specific file types with LFS git lfs track "*.mp4" git lfs track "*.psd" # This creates a .gitattributes file git add .gitattributes git commit -m "Configure Git LFS for large files" # Now large files are stored in LFS, not in the Git object database git add videos/demo.mp4 git commit -m "Add demo video (stored in LFS)"