SHA-1 Hashes — Git

SHA Hashes in Git

Every object in Git is identified by a SHA hash — a 40-character hexadecimal string that serves as both the name and the integrity check of that object. Understanding how these hashes work explains why Git history is immutable, how Git detects corruption, and why changing any byte in any commit changes every subsequent commit hash.

What SHA-1 Is

SHA-1 (Secure Hash Algorithm 1) is a cryptographic hash function that takes any input and produces a fixed 160-bit (20-byte, 40 hex character) output. The same input always produces the same output. Different inputs (almost certainly) produce different outputs. Crucially, you cannot reverse the process — given a hash, you cannot reconstruct the input without trying every possibility.

SHA-1 demo: same input, same hash

Bash

# Same content always produces the same hash
echo "hello world" | git hash-object --stdin
# 8c7e5a667f1b771847ad9a65b8d2fd67e80cc76d

echo "hello world" | git hash-object --stdin
# 8c7e5a667f1b771847ad9a65b8d2fd67e80cc76d  (identical)

# One character change → completely different hash
echo "hello World" | git hash-object --stdin
# 93ba5c3b3db4e0aa11f2c6e3c9f1e5d7a8b2c4d6  (totally different)

How Git Computes the Hash

Git does not hash just the raw content. It prepends a header with the object type and size, then hashes the combination. The format is exactly:

Hash input format

Text

<type> <size in bytes><content>

For a blob containing "hello
" (6 bytes):
  "blob 6hello
"

For a commit:
  "commit <size>tree <hash>
parent <hash>
author ...

<message>"

Manually reproduce Git's hash computation

Bash

# Git computes: SHA1("blob " + length + "" + content)
# Let's verify this manually with Python:
python3 -c "
import hashlib
content = b'hello
'
header = f'blob {len(content)}'.encode()
sha1 = hashlib.sha1(header + content).hexdigest()
print(sha1)
"
# ce013625030ba8dba906f756967f9e9ca394464a

# Verify Git produces the same hash:
echo -n "hello" | git hash-object --stdin
# ce013625030ba8dba906f756967f9e9ca394464a  ← matches!

Reading the HEAD Hash

Common ways to see and use commit hashes

Bash

# Full 40-character hash of HEAD
git rev-parse HEAD
# a3f1c2d83e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b

# Short hash (Git picks minimum unique length, usually 7-8 chars)
git rev-parse --short HEAD
# a3f1c2d

# Short hash with custom length
git rev-parse --short=10 HEAD
# a3f1c2d83e

# Hash of a specific file at HEAD
git rev-parse HEAD:README.md
# 8f14e0b9bbd2c8fc72c99b35d0a5b61e07b19b3c

# Hash of a branch tip
git rev-parse main
# a3f1c2d83e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b

# Hash of HEAD~3 (3 commits ago)
git rev-parse HEAD~3
# 9e2b0f1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a

Short Hashes

Git allows you to use abbreviated hashes as long as they are unambiguous within the repository. The default is 7 characters, but Git will automatically use more if needed. For very large repositories (like the Linux kernel with millions of objects), 12 characters may be needed.

Short hash behavior

Bash

# 7 chars is usually enough for small-medium repos
git show a3f1c2d

# Git finds the full object even with just 7 chars
git cat-file -t a3f1c2d
# commit

# If ambiguous (rare), Git tells you:
# error: short SHA1 a3f1c2d is ambiguous
# hint: The candidates are:
#   a3f1c2d blob
#   a3f1c2d commit

Configure minimum abbreviation length

Bash

# Force 12-character short hashes globally (for large repos)
git config --global core.abbrev 12

# Or per-repository
git config core.abbrev 10

Why Changing Any Byte Changes All Downstream Hashes

This is the key property that makes Git history immutable. A commit's hash includes the parent commit's hash. If you change a commit, its hash changes. Every child commit referenced the old parent hash — so every child commit changes too, cascading through all descendants.

Hash cascade diagram

Text

Original history:
  A (hash: a1b2c3) ← B (hash: d4e5f6, parent=a1b2c3) ← C (hash: g7h8i9, parent=d4e5f6)

Change commit A's message:
  A' (hash: x9y8z7) ← different hash because content changed

Now B must update its parent reference:
  B' (hash: w6v5u4, parent=x9y8z7) ← B's hash changes too

And C must update:
  C' (hash: t3s2r1, parent=w6v5u4) ← C's hash changes too

Every commit downstream of A has a new hash.
This is why "amending" a published commit requires force-push.

Collision Concerns and the SHAttered Attack

In 2017, Google's researchers produced the first known SHA-1 collision — two different PDF files with the same SHA-1 hash (the SHAttered attack). This raised concerns about Git's integrity. However, Git's response was pragmatic: because Git's hash input includes a type prefix and length, and because forging a meaningful Git object collision is vastly harder than forging a general SHA-1 collision, the practical risk remains very low.

Check if your Git version has collision detection

Bash

# Git 2.13+ detects the SHAttered attack
git version
# git version 2.43.0

# Git now rejects objects that trigger known collision patterns
# You can test with the SHAttered PDFs:
# git hash-object -w shattered-1.pdf
# fatal: collision attack detected

SHA-256 Migration

Git 2.29 (released 2020) introduced experimental support for SHA-256 as the hashing algorithm, producing 64-character hashes. SHA-256 has vastly better collision resistance and longer expected lifespan. Migration is opt-in and not yet widely deployed.

Initialize a new repo with SHA-256

Bash

# Create a new repo using SHA-256 (Git 2.29+)
git init --object-format=sha256 my-sha256-repo
cd my-sha256-repo

echo "hello" > test.txt
git add test.txt
git rev-parse HEAD
# 3c3a81de16...  ← 64-character SHA-256 hash

Note

SHA-256 repositories are not compatible with SHA-1 repositories. You cannot push a SHA-256 repo to GitHub yet (as of 2024). Migration tooling is still maturing.

The Birthday Problem and Practical Risk

The Birthday Problem asks: how many items do you need before a collision becomes likely? For SHA-1's 160-bit space, you need about 2^80 objects before accidental collision becomes probable. The Linux kernel — the largest known Git repository — has about 10 million objects. That is 2^23. You would need 2^57 more objects before random collision becomes likely. For practical purposes, accidental SHA-1 collision in a Git repository is essentially impossible in any project you will ever work on.

SHA-1 vs SHA-256 in Git

Property	SHA-1 (default)	SHA-256 (experimental)
Hash length	160 bits (40 hex chars)	256 bits (64 hex chars)
First available in Git	Git 1.0 (2005)	Git 2.29 (2020)
GitHub support	Full	Not yet (as of 2024)
Known collisions	Yes (SHAttered, 2017)	None (computationally infeasible)
Collision resistance (practical)	Very high (crafted attacks possible)	Extremely high
Backward compatibility	Universal	Incompatible with SHA-1 repos
Recommended for new repos	Yes (pragmatic choice)	Experimental only

Integrity Verification

Verify repository integrity using SHA hashes

Bash

# Check that all objects are valid and uncorrupted
git fsck
# Checking object directories: 100% (256/256), done.
# Checking connectivity: done.

# Verbose output showing all objects checked
git fsck --verbose 2>&1 | head -20

# Check a specific object
git cat-file -e a3f1c2d  # exits 0 if valid, 1 if not found
echo $?   # 0 = valid

Tip

Run `git fsck` periodically on important repositories, especially after system crashes or storage failures. It verifies every object's hash matches its content, catching any bit-rot or corruption early.

Warning

Never reference Git objects by anything other than their full hash in scripts. Short hashes can become ambiguous as a repository grows. Use `git rev-parse <ref>` to resolve any name to a full 40-character hash before storing or comparing.