GitRemoving a Large File from History

Removing a Large File from History

You accidentally committed a 500 MB video, a compiled binary, or a dataset to your repository. Even after you delete the file and commit again, the file still lives inside Git's object database — every clone will download it. This page shows you how to permanently erase large files from the entire commit history.

Why deleting the file is not enough

The file is gone from HEAD but not from history

Bash
# You deleted the file and committed
git rm dataset.csv
git commit -m "Remove large dataset"
git push origin main

# But the repo is still huge
git count-objects -vH
# size-pack: 487.3 MiB   ← the file is still in .git/objects

# And anyone who clones gets all 487 MB
git clone https://github.com/user/repo.git  # slow!
Step 1: Find the large files

Find the top 10 largest objects in the repo

Bash
# List all objects sorted by size
git rev-list --objects --all   | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)'   | sed -n 's/^blob //p'   | sort --numeric-sort --key=2 --reverse   | head -10   | awk '{ printf "%-60s %s\n", $3, $2 }'   | numfmt --field=2 --to=iec

# Example output:
# videos/demo.mp4                                              487M
# data/training-set.csv                                        34M
# assets/background-hd.png                                    12M

Simpler: check which commits added large files

Bash
git log --all --stat | grep -E '^s+[0-9]+ file' | sort -rn | head -5
Step 2: Remove with git filter-repo

git filter-repo is the recommended tool (the official replacement for the deprecated git filter-branch). It is significantly faster and safer.

Install git filter-repo

Bash
# macOS
brew install git-filter-repo

# pip (any platform)
pip install git-filter-repo

# Check the version
git filter-repo --version
Warning
`git filter-repo` rewrites every commit in your repository. All commit SHA hashes will change. Every teammate must re-clone or carefully reset their local copy. Co-ordinate with your team before running this.

Remove a specific file from all history

Bash
# IMPORTANT: work on a fresh clone (filter-repo requires a clean state)
git clone https://github.com/user/repo.git repo-clean
cd repo-clean

# Remove the file from every commit in history
git filter-repo --path videos/demo.mp4 --invert-paths
# Rewrite 1432 commits... 100%
# Ref 'refs/heads/main' was rewritten

# Verify the file is gone
git log --all --full-history -- videos/demo.mp4
# (no output — it is gone from history)

# Check the new repo size
git count-objects -vH
# size-pack: 2.1 MiB   ← down from 487 MB

Remove multiple files or a whole directory

Bash
# Remove multiple specific files
git filter-repo --path videos/demo.mp4 --path data/training-set.csv --invert-paths

# Remove an entire directory
git filter-repo --path videos/ --invert-paths

# Remove by glob pattern (filter-repo supports Python regex)
git filter-repo --path-glob '*.mp4' --invert-paths
Step 3: Force-push the rewritten history

Push all rewritten refs to the remote

Bash
# filter-repo removes the 'origin' remote as a safety measure
# Re-add it
git remote add origin https://github.com/user/repo.git

# Force-push all branches and tags
git push origin --all --force-with-lease
git push origin --tags --force-with-lease
Warning
Force-pushing rewrites the remote history. Every open pull request will need to be closed and re-opened, because the base commit SHAs have changed.
Step 4: Prune the remote cache

Even after force-pushing, hosting platforms cache old objects for a period (GitHub caches for up to 90 days). Contact support if you need immediate removal of sensitive files from a public repository.

Force garbage collection locally

Bash
git reflog expire --expire=now --all
git gc --prune=now --aggressive
Alternative: BFG Repo Cleaner

BFG is a Java-based tool that is faster for simple use cases like "remove all files larger than 50 MB".

Using BFG

Bash
# Download BFG jar from https://rtyley.github.io/bfg-repo-cleaner/
# Requires Java

# Clone a bare repo (required by BFG)
git clone --mirror https://github.com/user/repo.git repo.git
cd repo.git

# Remove all blobs larger than 50 MB
java -jar bfg.jar --strip-blobs-bigger-than 50M

# Remove a specific file from history
java -jar bfg.jar --delete-files demo.mp4

# Clean up and push
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push
Comparison: git filter-repo vs BFG

Feature

git filter-repo

BFG Repo Cleaner

Language

Python

Java / Scala

Speed

Very fast

Fast

Flexibility

Very high (custom callbacks, regex)

Good for common cases

Officially recommended

Yes (replaces filter-branch)

Popular but unofficial

Remove by size

Possible with script

Built-in: --strip-blobs-bigger-than 50M

Remove sensitive text

Yes: --replace-text

Yes: --replace-text

Installation

pip install git-filter-repo

Download JAR, needs Java

What teammates need to do after a history rewrite
  1. Delete their local clone of the repository.

  2. Re-clone from the remote: git clone https://github.com/user/repo.git.

  3. Re-apply any local branches they had: git checkout -b my-branch origin/my-branch.

Warning
Teammates should NOT run `git pull` after a history rewrite — it will create a diverged history that is difficult to untangle. A fresh clone is the only safe path.
Prevention: stop large files before they are committed

Add common large file types to .gitignore

Bash
# .gitignore
*.mp4
*.avi
*.mov
*.zip
*.tar.gz
*.csv
*.parquet
*.pkl
*.bin
*.exe
*.dmg
node_modules/
dist/
build/

Use Git LFS for files that legitimately belong in the repo

Bash
# Install Git LFS
git lfs install

# Track specific file types with LFS
git lfs track "*.mp4"
git lfs track "*.psd"

# This creates a .gitattributes file
git add .gitattributes
git commit -m "Configure Git LFS for large files"

# Now large files are stored in LFS, not in the Git object database
git add videos/demo.mp4
git commit -m "Add demo video (stored in LFS)"
Tip
Set up a pre-commit hook that rejects files over a size threshold before they ever reach the repository: `git config hooks.maxfilesize 5242880` (5 MB) along with a hook script that checks file sizes.