How Git Works

You can use Git productively without knowing how it works under the hood — but a small amount of mental model goes a very long way. Most of the confusion beginners have with Git evaporates once they understand four ideas: snapshots, the three trees, the commit graph, and references.

1. Git stores snapshots, not diffs

Most older version control systems store changes as diffs: “file A had these lines added and these removed.” Git is different.

Every commit in Git is a full snapshot of every tracked file at that moment. If a file did not change between two commits, Git stores a pointer to the previous copy instead of duplicating it. If it did change, Git stores the new version.

That sounds wasteful but is brilliantly efficient in practice: Git compresses heavily and shares storage between snapshots, so the on-disk size of a Git repo is often smaller than equivalent history in older diff-based systems.

Why snapshots matter

Because every commit is a full picture, Git can do operations like “show me this file two years ago” or “restore the entire project to commit X” extremely fast — there is no chain of diffs to walk. It just reads the snapshot.

2. The three trees: working directory, staging, repository

At any moment, your project exists in three different places at once. Git operations move content between them.

The three trees

Text

┌────────────────────┐    git add    ┌──────────────┐   git commit  ┌────────────┐
│ Working Directory  │ ────────────▶ │ Staging Area │ ────────────▶ │ Repository │
│ (your real files)  │               │   (index)    │               │  (.git)    │
└────────────────────┘ ◀──────────── └──────────────┘ ◀──────────── └────────────┘
                       git restore                    git restore --staged
                       git checkout

Working directory — the files you can see and edit in your editor right now.
Staging area (also called the index) — a holding pen where you collect the changes you want in the next commit. Allows you to commit only some of your changes, not everything.
Repository (the .git folder) — the permanent history. Every committed snapshot lives here forever, content-addressed by hash.

Almost every Git command is about moving content between these three places. git add moves changes from the working directory to the staging area. git commit moves staged changes into the repository. git restore and friends move things the other way.

3. The commit graph

Each commit in Git records four things:

A snapshot of the project (technically a tree object).
The parent commit(s) — the commit(s) this one came from. Usually one parent; a merge commit has two.
Metadata — author, committer, timestamp, commit message.
A SHA-1 hash that uniquely identifies this commit. The hash is computed from everything above, so any tiny change produces a totally different hash.

Because each commit points to its parent, commits form a directed acyclic graph (DAG). Most of the time the DAG looks like a tree of branches off a main line:

A small commit graph

Text

           A───B───C   ◀── main
                    \
                     D───E   ◀── feature-x

A, B, C are commits on main.
D and E were branched off C onto feature-x.
Each letter is a 40-character SHA hash in real life.

Merging brings two lines back together as a new commit with two parents:

After merging feature-x into main

Text

           A───B───C───────M   ◀── main
                    \     /
                     D───E       ◀── feature-x (still here)

M is the merge commit. Its parents are C and E.

4. Branches are just pointers

This is the single most important thing to understand about Git. A branch is not a copy of the files. A branch is a 41-byte file containing the hash of one commit. That is literally all.

When you create a branch, Git writes one new tiny file.
When you commit on a branch, Git just moves that pointer forward to the new commit.
When you delete a branch, Git deletes the pointer — the commits themselves are still in the repo until garbage collection runs.

That is why Git branching is so fast. Other VCS tools copy files around when you branch; Git literally writes 41 bytes.

5. HEAD: the “you are here” pointer

HEAD is a special pointer that says which branch you are currently on. When you make a commit, Git uses HEAD to know which branch to move forward.

HEAD points to a branch which points to a commit

Text

HEAD ──▶ main ──▶ C (commit)
                  A───B───C   ◀── main, HEAD here
                           \
                            D───E   ◀── feature-x

6. Content is addressed by hash

Every object Git stores — every file blob, every directory tree, every commit — is identified by the SHA-1 hash of its content. Two files with identical content always produce the same hash, so Git never stores them twice. Conversely, if you change even one byte of a file, its hash changes completely. That is what makes Git tamper-evident.

Tip

When you see those scary 40-character strings in `git log`, those are SHA-1 hashes. You rarely need to type the whole thing — the first 7 characters are almost always enough to uniquely identify a commit in a repo.

Putting it all together

Here is the entire mental model in one sentence:

The whole model in one sentence

Git is a content-addressed snapshot database, with a graph of commits, with branches that are pointers to commits, with a HEAD pointer to a branch, and three “trees” (working directory, staging area, repository) that you move content between.

Every Git command you will ever use is some variation of:

Move content between the three trees (add, commit, restore, reset).
Move a branch pointer to a different commit (merge, rebase, reset).
Move HEAD to a different branch (switch, checkout).
Sync the local commit graph with another copy of it (fetch, pull, push, clone).
Inspect the graph and trees (status, log, diff, show, blame).

A worked example

Watch the trees move

Bash

# Start: nothing in repo, nothing in staging, working dir has hello.txt
echo "hello" > hello.txt
git status
# → hello.txt is "untracked"

git add hello.txt
git status
# → hello.txt is "staged for commit"

git commit -m "Add hello.txt"
git status
# → working tree clean — everything is in the repo

# Now modify the file
echo "hello again" >> hello.txt
git status
# → hello.txt is "modified" (changes in working dir, not staged)

git diff
# Shows the diff between working dir and staging

git add hello.txt
git diff
# Now shows nothing — the working dir and staging match
git diff --staged
# Shows the diff between staging and repo

Once these pieces click together, almost every Git command makes intuitive sense. You stop typing commands by rote and start thinking in terms of where the content is and where you want it to go.