Git Internals: Under the Hood
Most developers use Git as a black box — type git commit and something magical happens. But Git has a beautifully simple internal model. Understanding it transforms you from someone who memorizes commands into someone who can reason about what Git is doing, debug any situation, and use advanced features with confidence. Everything in Git is built on a tiny number of concepts.
The .git Directory
Explore the .git directory
ls -la .git/
.git directory structure
.git/
├── HEAD ← which branch you are on (or detached commit)
├── config ← repository-local git config
├── description ← used by GitWeb, ignored locally
├── COMMIT_EDITMSG ← message from the last commit
├── index ← the staging area (binary file)
├── packed-refs ← compressed ref storage
├── hooks/ ← hook scripts (pre-commit, post-merge, etc.)
│ ├── pre-commit.sample
│ └── ...
├── info/
│ └── exclude ← like .gitignore but not tracked
├── logs/
│ ├── HEAD ← reflog for HEAD
│ └── refs/
│ └── heads/
│ └── main ← reflog for main branch
├── objects/ ← all Git objects (blobs, trees, commits, tags)
│ ├── info/
│ ├── pack/ ← packfiles for efficiency
│ ├── a3/ ← loose objects (first 2 hex chars = directory)
│ │ └── f1c2d... ← object file (remaining 38 chars = filename)
│ └── ...
└── refs/ ← human-readable pointers to commits
├── heads/
│ ├── main ← file containing: a3f1c2d...
│ └── feature/auth
├── tags/
│ └── v1.0.0
└── remotes/
└── origin/
└── mainGit is a Content-Addressable Key-Value Store
At its core, Git is a database. The key is the SHA hash of the content. The value is the content itself, compressed with zlib. You put content in and get a hash back. You give Git a hash and get the content back. That's it. Everything else (branches, history, merging) is built on top of this simple foundation.
Git as a key-value store: the plumbing view
# Store any content in Git's object database echo "Hello, Git internals!" | git hash-object --stdin -w # 8f14e0b9bbd2c8fc72c99b35d0a5b61e07b19b3c # Retrieve it back using the hash git cat-file -p 8f14e0b9bbd2c8fc72c99b35d0a5b61e07b19b3c # Hello, Git internals!
The Four Object Types
Object Type | What It Stores | Points To |
|---|---|---|
blob | Raw file content (bytes) | Nothing |
tree | Directory listing: filename + mode + hash | blobs and other trees |
commit | Author + message + timestamp + tree hash + parent hashes | One tree, zero or more parent commits |
tag | Target hash + tagger + message + GPG signature | Usually a commit |
The Object Graph
ASCII diagram: how objects relate
commit a3f1c2d
│
├─ author: Jane <jane@example.com>
├─ message: "Add auth module"
├─ parent ──────────────────────→ commit 9e2b0f1 (previous commit)
│
└─ tree 7c4d8a3
│
├── README.md ──→ blob 1a2b3c4 (file content)
├── package.json ──→ blob 5d6e7f8 (file content)
└── src/ ──→ tree 9a0b1c2
│
├── index.ts ──→ blob 2b3c4d5
└── auth.ts ──→ blob 6e7f8a9Every commit points to exactly one tree (a snapshot of the entire project). The tree points to blobs (file contents) and sub-trees (subdirectories). The commit also points to its parent commit(s), forming the history chain. A merge commit has two parents.
How git add Works Internally
When you run git add myfile.txt, Git performs these exact steps:
What git add does under the hood
# What git add src/auth.ts actually does: # Step 1: Read the file content and compute its SHA-1 hash sha=$(git hash-object src/auth.ts) # Output: 6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f # Step 2: Compress and store the blob in .git/objects/ # It is stored at: .git/objects/6e/7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f # Step 3: Add the path + hash to the index (staging area) # .git/index now knows: "src/auth.ts" = blob 6e7f8a9b... # The ACTUAL plumbing equivalent of git add: git hash-object -w src/auth.ts # store the blob git update-index --add src/auth.ts # update the index
How git commit Works Internally
What git commit does under the hood
# Step 1: Write a tree object from the current index TREE=$(git write-tree) echo "Tree hash: $TREE" # Tree hash: 7c4d8a3b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f # Step 2: Create a commit object pointing to the tree PARENT=$(git rev-parse HEAD) COMMIT=$(git commit-tree $TREE -p $PARENT -m "Add auth module") echo "Commit hash: $COMMIT" # Commit hash: a3f1c2d83e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b # Step 3: Update HEAD (the current branch) to point to the new commit git update-ref HEAD $COMMIT # That is ALL that git commit does. # The high-level command runs these three plumbing steps plus editor invocation.
Plumbing vs Porcelain Commands
Git has two layers of commands: porcelain (user-friendly, output may change) and plumbing (stable, script-friendly, low-level). Every porcelain command is built from plumbing.
Porcelain (user-facing) | Plumbing (low-level equivalent) |
|---|---|
git add | git hash-object -w + git update-index |
git commit | git write-tree + git commit-tree + git update-ref |
git checkout | git read-tree + git checkout-index |
git status | git diff-index HEAD + git ls-files |
git log | git rev-list + git cat-file |
git push | git send-pack + git update-ref |
git fetch | git fetch-pack + git update-ref |
Content Deduplication
Because objects are identified by their content hash, identical content is stored only once. If two different files happen to have the same bytes, they share a single blob object. If you copy a file across directories, Git does not store it twice. This is why Git repositories stay surprisingly compact.
Demonstrating deduplication
# Create two files with identical content echo "same content" > file1.txt echo "same content" > file2.txt git add file1.txt file2.txt git commit -m "Two files, same content" # Inspect the index to see their blob hashes git ls-files --stage # 100644 b14df6442... 0 file1.txt # 100644 b14df6442... 0 file2.txt # ^^^^^^^^^^^ # SAME hash! Only one blob object stored.
Why This Makes You a Better Git User
Detached HEAD makes sense — HEAD is just a pointer; when it points to a hash instead of a branch, that is "detached".
Rebasing is not scary — it just creates new commit objects with new parents; the original commits still exist until garbage collected.
Fast-forward merges are trivial — Git just moves a branch pointer forward; no new commit object is created.
git reset is predictable — you are just moving a pointer to a different commit, optionally updating the index and working tree.
Merge conflicts are structural — they happen when two trees cannot be combined without ambiguity; Git is merging tree objects.
git reflogis your safety net — the reflog tracks every pointer movement, so "lost" commits are always recoverable.