How Git Works
You can use Git productively without knowing how it works under the hood — but a small amount of mental model goes a very long way. Most of the confusion beginners have with Git evaporates once they understand four ideas: snapshots, the three trees, the commit graph, and references.
1. Git stores snapshots, not diffs
Most older version control systems store changes as diffs: “file A had these lines added and these removed.” Git is different.
Every commit in Git is a full snapshot of every tracked file at that moment. If a file did not change between two commits, Git stores a pointer to the previous copy instead of duplicating it. If it did change, Git stores the new version.
That sounds wasteful but is brilliantly efficient in practice: Git compresses heavily and shares storage between snapshots, so the on-disk size of a Git repo is often smaller than equivalent history in older diff-based systems.
2. The three trees: working directory, staging, repository
At any moment, your project exists in three different places at once. Git operations move content between them.
The three trees
┌────────────────────┐ git add ┌──────────────┐ git commit ┌────────────┐
│ Working Directory │ ────────────▶ │ Staging Area │ ────────────▶ │ Repository │
│ (your real files) │ │ (index) │ │ (.git) │
└────────────────────┘ ◀──────────── └──────────────┘ ◀──────────── └────────────┘
git restore git restore --staged
git checkoutWorking directory — the files you can see and edit in your editor right now.
Staging area (also called the index) — a holding pen where you collect the changes you want in the next commit. Allows you to commit only some of your changes, not everything.
Repository (the
.gitfolder) — the permanent history. Every committed snapshot lives here forever, content-addressed by hash.
Almost every Git command is about moving content between these three places. git add moves changes from the working directory to the staging area. git commit moves staged changes into the repository. git restore and friends move things the other way.
3. The commit graph
Each commit in Git records four things:
A snapshot of the project (technically a tree object).
The parent commit(s) — the commit(s) this one came from. Usually one parent; a merge commit has two.
Metadata — author, committer, timestamp, commit message.
A SHA-1 hash that uniquely identifies this commit. The hash is computed from everything above, so any tiny change produces a totally different hash.
Because each commit points to its parent, commits form a directed acyclic graph (DAG). Most of the time the DAG looks like a tree of branches off a main line:
A small commit graph
A───B───C ◀── main
\
D───E ◀── feature-x
A, B, C are commits on main.
D and E were branched off C onto feature-x.
Each letter is a 40-character SHA hash in real life.Merging brings two lines back together as a new commit with two parents:
After merging feature-x into main
A───B───C───────M ◀── main
\ /
D───E ◀── feature-x (still here)
M is the merge commit. Its parents are C and E.4. Branches are just pointers
This is the single most important thing to understand about Git. A branch is not a copy of the files. A branch is a 41-byte file containing the hash of one commit. That is literally all.
When you create a branch, Git writes one new tiny file.
When you commit on a branch, Git just moves that pointer forward to the new commit.
When you delete a branch, Git deletes the pointer — the commits themselves are still in the repo until garbage collection runs.
That is why Git branching is so fast. Other VCS tools copy files around when you branch; Git literally writes 41 bytes.
5. HEAD: the “you are here” pointer
HEAD is a special pointer that says which branch you are currently on. When you make a commit, Git uses HEAD to know which branch to move forward.
HEAD points to a branch which points to a commit
HEAD ──▶ main ──▶ C (commit)
A───B───C ◀── main, HEAD here
\
D───E ◀── feature-x6. Content is addressed by hash
Every object Git stores — every file blob, every directory tree, every commit — is identified by the SHA-1 hash of its content. Two files with identical content always produce the same hash, so Git never stores them twice. Conversely, if you change even one byte of a file, its hash changes completely. That is what makes Git tamper-evident.
Putting it all together
Here is the entire mental model in one sentence:
Every Git command you will ever use is some variation of:
Move content between the three trees (
add,commit,restore,reset).Move a branch pointer to a different commit (
merge,rebase,reset).Move HEAD to a different branch (
switch,checkout).Sync the local commit graph with another copy of it (
fetch,pull,push,clone).Inspect the graph and trees (
status,log,diff,show,blame).
A worked example
Watch the trees move
# Start: nothing in repo, nothing in staging, working dir has hello.txt echo "hello" > hello.txt git status # → hello.txt is "untracked" git add hello.txt git status # → hello.txt is "staged for commit" git commit -m "Add hello.txt" git status # → working tree clean — everything is in the repo # Now modify the file echo "hello again" >> hello.txt git status # → hello.txt is "modified" (changes in working dir, not staged) git diff # Shows the diff between working dir and staging git add hello.txt git diff # Now shows nothing — the working dir and staging match git diff --staged # Shows the diff between staging and repo
Once these pieces click together, almost every Git command makes intuitive sense. You stop typing commands by rote and start thinking in terms of where the content is and where you want it to go.