JR Trove
All articles
SecurityMay 31, 20269 min readJay Rajput

Hash Functions Demystified: MD5, SHA-1, SHA-256 for Non-Cryptographers

What hash functions actually do, why MD5 is dead for security but fine for checksums, when to use SHA-256 vs SHA-3 vs BLAKE3, and the password-hashing trap that catches every junior developer.

Hash Functions Demystified: MD5, SHA-1, SHA-256 for Non-Cryptographers

Every working developer touches hash functions: verifying file downloads, generating cache keys, checking duplicates, storing passwords, signing JWTs, building Git commits. And yet — the most common production breach in 2025 was still "stored passwords with MD5" or "stored passwords with SHA-256 without salt". The fundamentals matter, and most online explanations either oversimplify ("hashes are one-way") or dive into cryptographic mathematics that few developers need.

This guide is the working developer's reference. What hash functions do, which ones to use in 2026 for which purpose, why MD5 is dead for some uses but perfectly fine for others, and the password-hashing trap that catches every junior developer.

What a hash function does

A hash function takes any input (a string, a file, a number) and produces a fixed-size output that uniquely (almost) represents that input.

Three core properties:

  1. Deterministic: same input always produces same output. md5("hello") is always 5d41402abc4b2a76b9719d911017c592.

  2. Fixed-size output: regardless of input length, the output is always the same length. MD5 produces 128 bits (32 hex chars). SHA-256 produces 256 bits (64 hex chars). SHA-512 produces 512 bits.

  3. Avalanche effect: changing one bit of input changes ~50% of output bits. md5("hello") and md5("Hello") look completely unrelated.

A cryptographic hash function adds three more properties:

  1. Pre-image resistance: given a hash, can't reverse-engineer the input.

  2. Second pre-image resistance: given an input + its hash, can't find a different input with the same hash.

  3. Collision resistance: hard to find ANY two different inputs that produce the same hash.

When a hash function fails property 6 — when two inputs that produce the same hash can be found efficiently — it's "broken". MD5 was broken in 2004. SHA-1 was broken in 2017. SHA-256 is still solid.

Which hash function for which purpose

The single biggest source of confusion: "broken" depends on use case. Let's break it down.

File integrity checks (download verification)

You downloaded a file. The website lists its SHA-256 hash. You compute the hash of your download. Match = file wasn't corrupted in transit or tampered with.

Use: SHA-256. Universally trusted, fast.

Avoid: MD5 for security-sensitive downloads. Acceptable for casual "did the file transfer correctly" checks (low-risk, accidental corruption only). Many Linux distros still list MD5 alongside SHA-256 for this reason.

Cache keys / content addressing

You want to deduplicate identical content or generate consistent cache keys.

Use: SHA-256 or BLAKE3 (BLAKE3 is faster). MD5 is also fine here — collisions are theoretical for non-adversarial inputs.

Don't worry about: cryptographic strength. You're not defending against attackers, you're indexing content.

Password storage

The big one. The use case where most production breaches originate.

Use: bcrypt, scrypt, Argon2. These are deliberately slow — adaptive cost factors that take 100ms+ per hash, making brute-force attacks infeasible.

NEVER use: MD5, SHA-1, SHA-256, SHA-3, or any general-purpose hash directly for password storage. They're optimised for speed. A modern GPU computes 80 billion SHA-256 hashes per second — every password under 12 characters cracks in hours.

Why this is confused: "SHA-256 is cryptographically strong." True. But "cryptographically strong" doesn't mean "suitable for password storage." The properties needed for passwords (slow, adaptive, salted) are deliberately absent from general-purpose hashes.

Use bcrypt generator for password hashes in your application. Or use your framework's auth system, which gets this right by default.

Digital signatures (JWT, SSH keys, code signing)

You need to prove a message came from someone holding a private key.

Use: SHA-256 (in 2026) or SHA-3. These pair with RSA or ECDSA signing.

Avoid: SHA-1 (broken for collisions since 2017, signature forgery feasible).

Blockchain / Git commits / Merkle trees

Content-addressed storage where the hash IS the identifier.

Use: Whatever the protocol specifies. Git uses SHA-1 (with SHA-256 transition in progress). Bitcoin uses double-SHA-256. Most modern systems use SHA-256 or BLAKE3.

HMAC (message authentication)

Combining a secret key with a hash function to verify message integrity AND authenticity.

Use: HMAC-SHA256 (industry standard). For higher performance, HMAC-BLAKE3.

The hash function family tree

In rough order of adoption:

  • MD5 (1992): 128-bit output. Broken for collisions since 2004. Still used for non-security checksums where speed matters.
  • SHA-1 (1995): 160-bit output. Broken for collisions since 2017 (SHAttered attack). Phased out of TLS, code signing, Git.
  • SHA-2 family (2001): SHA-224, SHA-256, SHA-384, SHA-512. Currently the universal recommendation for new systems. No known collision attacks.
  • SHA-3 / Keccak (2015): designed by different team via NIST competition, totally different internal structure from SHA-2. Adoption is slow because SHA-2 is fine.
  • BLAKE2 (2012) / BLAKE3 (2020): faster than SHA-256 (BLAKE3 is 10× faster), same security. Modern alternative.
  • bcrypt (1999) / scrypt (2009) / Argon2 (2015): password-specific, deliberately slow. Argon2 won the Password Hashing Competition in 2015.

For new code in 2026: SHA-256 for general hashing, BLAKE3 if performance matters and you control both sides, Argon2id for passwords.

The password hashing trap

The single most common security mistake in junior code:

// WRONG
const passwordHash = sha256(password);

Why this is broken:

  1. Speed: SHA-256 is designed to be FAST. A consumer GPU does 80 billion/sec. Every 8-character password cracks in seconds.

  2. No salt: two users with the same password get the same hash. Attackers compute "rainbow tables" once, use forever.

  3. No iteration: even with salt, single-pass hash means one GPU operation per guess.

The correct pattern:

// CORRECT
import bcrypt from 'bcrypt';
const passwordHash = await bcrypt.hash(password, 12); // cost factor 12
// Stores as: $2b$12$NjQUx9...XK9zJpKL.salt-and-hash

The bcrypt output includes algorithm version, cost factor, and salt — all in one self-contained string. Verification:

const isValid = await bcrypt.compare(password, storedHash);

In 2026, the recommended cost factors:

  • bcrypt: cost 12 (about 250ms per hash on modern server CPUs).
  • scrypt: N=16384, r=8, p=1 (default in most libraries).
  • Argon2id: memory 19456 KB, iterations 2, parallelism 1.

These will all take 100-300ms per hash, which is fine for human login flows and prohibitively slow for brute-force.

Salt and pepper

Salt is random data added to each password before hashing. It prevents:

  • Rainbow table attacks (precomputed hash → password lookups).
  • Two users with the same password getting identical hashes.

Modern password hashes (bcrypt, scrypt, Argon2) generate and store the salt automatically. You don't need to manage it separately.

Pepper is a secret added to all passwords before hashing, stored separately from the database (typically in environment variables or HSM). It protects against database leaks: even if attackers steal the password hash table, they can't crack passwords without also having the pepper.

Pepper is optional and adds operational complexity. Most teams skip it; mature security-conscious teams add it.

How fast is "fast" really?

Modern hashing speeds (single thread, server CPU, 2026):

  • MD5: ~600 MB/s
  • SHA-256 (CPU): ~400 MB/s
  • SHA-256 (AES-NI / SHA extensions): ~3 GB/s
  • BLAKE3 (multi-threaded): ~10 GB/s
  • bcrypt (cost 12): ~4 hashes/sec (intentionally slow)
  • Argon2id (default params): ~3 hashes/sec (intentionally slow)

For large-file hashing, BLAKE3 is dramatic — hashing a 10 GB file in 1 second instead of 30. For password verification at login, slowness is the feature, not a bug.

Common hash function mistakes

After reviewing thousands of codebases:

  1. Using MD5 for security. Use SHA-256.
  2. Using SHA-256 for passwords. Use bcrypt/Argon2.
  3. Implementing your own salt management. Use the library's built-in salt.
  4. Storing the salt in plain text in a config file. The salt should be per-password, generated and stored with the hash.
  5. Comparing hashes with ==. Use timing-safe comparison (crypto.timingSafeEqual in Node) to prevent timing attacks.
  6. Truncating hashes to save space. 128-bit MD5 truncated to 64 bits = guaranteed collisions.
  7. Hashing concatenated values without separators. hash(user + password) is vulnerable; hash(user + ":" + password) is better; hmac(secret, user + ":" + password) is best.
  8. Using URL-safe Base64 of a hash as if it's the hash. The encoded form is text-friendly but isn't the canonical hash value for comparison.

Hash size: does it matter?

In 2026, the recommendations:

  • 128-bit output (MD5): too small for security. Acceptable only for non-adversarial uses (cache keys, checksums where attackers can't influence input).
  • 160-bit output (SHA-1): deprecated. Avoid for any new use.
  • 256-bit output (SHA-256, BLAKE3 default): current security minimum. 2^128 brute-force resistance.
  • 384-bit / 512-bit output: overkill for most uses but cheap to use. Pick when interoperating with standards that require it.

For most applications, SHA-256 (or BLAKE3 in 256-bit mode) is the right choice. Bigger isn't meaningfully more secure given current and projected attacker capabilities.

Tools to use

The bottom line

Hash functions are simple in theory and dangerous in practice. The crux: "is this hash function strong enough?" depends entirely on what you're using it for.

For passwords: never use MD5/SHA/anything general-purpose. Use bcrypt, scrypt, or Argon2id.

For everything else (file integrity, cache keys, digital signatures, content addressing): SHA-256 is the universal default in 2026. BLAKE3 if you need speed.

MD5 is fine for non-adversarial uses (checksumming a download for accidental corruption, building a cache key from a URL).

MD5 is broken for anything an attacker might attack (signing, password storage, content addressing in adversarial contexts).

Match the tool to the job and you avoid 99% of the production breaches that this article exists to prevent.