GitManaging Repository Size

Managing Repository Size

A Git repository that started at a few megabytes can balloon to gigabytes over time. Large repos slow down clones, fetches, and many Git operations proportionally. Understanding why repos grow and how to diagnose and fix the problem is an important skill for any team maintaining a long-lived codebase.

Why Repositories Grow
  • Binary files committed without LFS — images, videos, compiled assets, design files; every version stored in full

  • Build artifacts committed accidentallynode_modules/, dist/, .class files, compiled binaries

  • Secrets and credential files.env files committed before gitignore was configured

  • Large files deleted in a later commit — the file is gone from the working tree but still lives in history

  • Log files, database dumps, or large data files committed for convenience

  • Merged feature branches with large temporary files that were never cleaned up

Diagnosing Repository Size

Check total object count and size

Bash
git count-objects -vH

Example output

Text
count: 0
size: 0 bytes
in-pack: 24891
packs: 1
size-pack: 1.24 GiB   ← your repo's on-disk size
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

Total .git directory size

Bash
du -sh .git
Finding the Biggest Objects in History

The most powerful way to identify what is bloating your repository is to combine git rev-list (to enumerate all objects) with git cat-file (to get their sizes). This pipeline works on any pack file and shows you the largest objects in your entire history.

Find the 20 largest objects in git history

Bash
git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  grep '^blob' |
  sort -k3 -n -r |
  head -20 |
  awk '{print $3, $4}'

Example output — showing culprits

Text
156293120 videos/product-demo.mp4
 89478485 design/mockups-v3.psd
 52428800 build/app-release.apk
 41943040 data/dump-2023-01-15.sql
 31457280 node_modules.tar.gz
 24117248 assets/hero-video.mov
Tip
Install `git-sizer` (GitHub's open-source tool) for an even more detailed analysis. It identifies not just large objects but oversized commits, deeply nested trees, and other repository health metrics in a single report: `brew install git-sizer`.
Finding When a File Was Added

Find all commits that ever touched a specific file

Bash
git log --all --full-history -- "videos/product-demo.mp4"

# Show the commit that first added the file
git log --all --full-history --diff-filter=A -- "videos/product-demo.mp4"

Example output

Text
commit 7d3e5f2a1b9c4d6e8f0a2b4c6d8e0f1a3b5c7d9e
Author: Jane Smith <jane@example.com>
Date:   Tue Mar 14 11:23:45 2023

    feat: add product demo video to assets
GitHub Size Guidelines

Threshold

Behavior

< 1 GB

Ideal — fast to clone, no issues

1 GB

GitHub recommended upper limit — start investigating

5 GB

GitHub will email a warning about repository size

100 MB per file

GitHub hard limit on individual file pushes (enforced by pre-receive hook)

50 MB per file

GitHub shows a warning when pushing files above this size

> 5 GB total

May experience degraded performance and clone failures

Option 1: git filter-repo (Recommended)

git filter-repo is the modern, fast replacement for the deprecated git filter-branch. It rewrites history to completely remove specific files or paths from every commit they ever appeared in.

Install git filter-repo

Bash
pip install git-filter-repo
# or
brew install git-filter-repo

Remove a specific file from all history

Bash
# IMPORTANT: work on a fresh clone
git clone --mirror https://github.com/user/repo.git
cd repo.git

# Remove the large file from all history
git filter-repo --path videos/product-demo.mp4 --invert-paths

# Also remove a whole directory
git filter-repo --path node_modules --invert-paths

# Remove multiple paths
git filter-repo --path videos/product-demo.mp4 --path data/dump.sql --invert-paths
Warning
Removing a file from history with `git filter-repo` rewrites every commit that contained that file — all commit SHAs change. This is a **destructive, irreversible operation** on the repository history. After running it, you must force-push to the remote, and every collaborator must delete their local clone and re-clone from scratch. Old clones with the original history should not be pushed. Coordinate this with your entire team before proceeding.

Force push rewritten history

Bash
# After filter-repo, push all branches and tags
git push --force --all
git push --force --tags

# Run gc to actually free the disk space
git gc --prune=now
Option 2: Git LFS for Future Large Files

For new large files going forward, use Git LFS. This keeps binary data out of the Git object store while still tracking it with version control. See the Git LFS page for full details.

Prevention is Better than Cleanup
  • Add a comprehensive .gitignore before the first commit — include node_modules/, dist/, build/, *.log, *.env

  • Use .env.example (committed) and .env (gitignored) for environment variables

  • Run git lfs track "*.psd" "*.mp4" "*.ai" before adding any binary assets

  • Add a pre-commit hook that rejects files over a size limit (e.g., 10 MB)

  • Review git status and git diff --stat before every commit

  • Set up GitHub's secret scanning and size warnings on your organization

Pre-commit hook to reject large files

Bash
#!/bin/sh
# Save as .git/hooks/pre-commit and chmod +x
LIMIT=10485760  # 10 MB in bytes
FILES=$(git diff --cached --name-only)
for FILE in $FILES; do
  if [ -f "$FILE" ]; then
    SIZE=$(wc -c < "$FILE")
    if [ "$SIZE" -gt "$LIMIT" ]; then
      echo "Error: $FILE is $(($SIZE / 1048576)) MB — over the 10 MB limit."
      echo "Use Git LFS for large files: git lfs track '$FILE'"
      exit 1
    fi
  fi
done