Git Internals Overview — Git

Git Internals: Under the Hood

Most developers use Git as a black box — type git commit and something magical happens. But Git has a beautifully simple internal model. Understanding it transforms you from someone who memorizes commands into someone who can reason about what Git is doing, debug any situation, and use advanced features with confidence. Everything in Git is built on a tiny number of concepts.

The .git Directory

Every Git repository is just a normal directory with a `/.git` subdirectory containing all of Git's data. The working tree you see is just Git's cached view of that data. Delete the working tree files and you can recreate them. Delete `/.git` and you lose your entire history.

Explore the .git directory

Bash

ls -la .git/

.git directory structure

Text

.git/
├── HEAD              ← which branch you are on (or detached commit)
├── config            ← repository-local git config
├── description       ← used by GitWeb, ignored locally
├── COMMIT_EDITMSG    ← message from the last commit
├── index             ← the staging area (binary file)
├── packed-refs       ← compressed ref storage
├── hooks/            ← hook scripts (pre-commit, post-merge, etc.)
│   ├── pre-commit.sample
│   └── ...
├── info/
│   └── exclude       ← like .gitignore but not tracked
├── logs/
│   ├── HEAD          ← reflog for HEAD
│   └── refs/
│       └── heads/
│           └── main  ← reflog for main branch
├── objects/          ← all Git objects (blobs, trees, commits, tags)
│   ├── info/
│   ├── pack/         ← packfiles for efficiency
│   ├── a3/           ← loose objects (first 2 hex chars = directory)
│   │   └── f1c2d...  ← object file (remaining 38 chars = filename)
│   └── ...
└── refs/             ← human-readable pointers to commits
    ├── heads/
    │   ├── main      ← file containing: a3f1c2d...
    │   └── feature/auth
    ├── tags/
    │   └── v1.0.0
    └── remotes/
        └── origin/
            └── main

Git is a Content-Addressable Key-Value Store

At its core, Git is a database. The key is the SHA hash of the content. The value is the content itself, compressed with zlib. You put content in and get a hash back. You give Git a hash and get the content back. That's it. Everything else (branches, history, merging) is built on top of this simple foundation.

Git as a key-value store: the plumbing view

Bash

# Store any content in Git's object database
echo "Hello, Git internals!" | git hash-object --stdin -w
# 8f14e0b9bbd2c8fc72c99b35d0a5b61e07b19b3c

# Retrieve it back using the hash
git cat-file -p 8f14e0b9bbd2c8fc72c99b35d0a5b61e07b19b3c
# Hello, Git internals!

The Four Object Types

Object Type	What It Stores	Points To
blob	Raw file content (bytes)	Nothing
tree	Directory listing: filename + mode + hash	blobs and other trees
commit	Author + message + timestamp + tree hash + parent hashes	One tree, zero or more parent commits
tag	Target hash + tagger + message + GPG signature	Usually a commit

The Object Graph

ASCII diagram: how objects relate

Text

commit a3f1c2d
  │
  ├─ author: Jane <jane@example.com>
  ├─ message: "Add auth module"
  ├─ parent ──────────────────────→ commit 9e2b0f1 (previous commit)
  │
  └─ tree 7c4d8a3
       │
       ├── README.md    ──→ blob 1a2b3c4  (file content)
       ├── package.json ──→ blob 5d6e7f8  (file content)
       └── src/         ──→ tree 9a0b1c2
                              │
                              ├── index.ts   ──→ blob 2b3c4d5
                              └── auth.ts    ──→ blob 6e7f8a9

Every commit points to exactly one tree (a snapshot of the entire project). The tree points to blobs (file contents) and sub-trees (subdirectories). The commit also points to its parent commit(s), forming the history chain. A merge commit has two parents.

How git add Works Internally

When you run git add myfile.txt, Git performs these exact steps:

What git add does under the hood

Bash

# What git add src/auth.ts actually does:

# Step 1: Read the file content and compute its SHA-1 hash
sha=$(git hash-object src/auth.ts)
# Output: 6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f

# Step 2: Compress and store the blob in .git/objects/
# It is stored at: .git/objects/6e/7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f

# Step 3: Add the path + hash to the index (staging area)
# .git/index now knows: "src/auth.ts" = blob 6e7f8a9b...

# The ACTUAL plumbing equivalent of git add:
git hash-object -w src/auth.ts           # store the blob
git update-index --add src/auth.ts       # update the index

How git commit Works Internally

What git commit does under the hood

Bash

# Step 1: Write a tree object from the current index
TREE=$(git write-tree)
echo "Tree hash: $TREE"
# Tree hash: 7c4d8a3b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f

# Step 2: Create a commit object pointing to the tree
PARENT=$(git rev-parse HEAD)
COMMIT=$(git commit-tree $TREE -p $PARENT -m "Add auth module")
echo "Commit hash: $COMMIT"
# Commit hash: a3f1c2d83e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b

# Step 3: Update HEAD (the current branch) to point to the new commit
git update-ref HEAD $COMMIT

# That is ALL that git commit does.
# The high-level command runs these three plumbing steps plus editor invocation.

Plumbing vs Porcelain Commands

Git has two layers of commands: porcelain (user-friendly, output may change) and plumbing (stable, script-friendly, low-level). Every porcelain command is built from plumbing.

Porcelain (user-facing)	Plumbing (low-level equivalent)
git add	git hash-object -w + git update-index
git commit	git write-tree + git commit-tree + git update-ref
git checkout	git read-tree + git checkout-index
git status	git diff-index HEAD + git ls-files
git log	git rev-list + git cat-file
git push	git send-pack + git update-ref
git fetch	git fetch-pack + git update-ref

Content Deduplication

Because objects are identified by their content hash, identical content is stored only once. If two different files happen to have the same bytes, they share a single blob object. If you copy a file across directories, Git does not store it twice. This is why Git repositories stay surprisingly compact.

Demonstrating deduplication

Bash

# Create two files with identical content
echo "same content" > file1.txt
echo "same content" > file2.txt

git add file1.txt file2.txt
git commit -m "Two files, same content"

# Inspect the index to see their blob hashes
git ls-files --stage
# 100644 b14df6442... 0  file1.txt
# 100644 b14df6442... 0  file2.txt
#        ^^^^^^^^^^^
#        SAME hash! Only one blob object stored.

Why This Makes You a Better Git User

Detached HEAD makes sense — HEAD is just a pointer; when it points to a hash instead of a branch, that is "detached".
Rebasing is not scary — it just creates new commit objects with new parents; the original commits still exist until garbage collected.
Fast-forward merges are trivial — Git just moves a branch pointer forward; no new commit object is created.
git reset is predictable — you are just moving a pointer to a different commit, optionally updating the index and working tree.
Merge conflicts are structural — they happen when two trees cannot be combined without ambiguity; Git is merging tree objects.
git reflog is your safety net — the reflog tracks every pointer movement, so "lost" commits are always recoverable.

Tip

Run `git cat-file -p HEAD` to see the raw commit object for your current commit. Then `git cat-file -p $(git rev-parse HEAD^{tree})` to see the tree. Then pick a blob hash and run `git cat-file -p <blob-hash>` to see the raw file content. This three-step exploration shows exactly how your code is stored in Git.