JSON Formatting and Validation: The Developer's Complete Guide for 2026
Everything modern developers need to know about JSON: formatting standards, validation, the spec gotchas, JSON5/JSONC, schema validation, performance with large files, and the tools that actually help.
JSON Formatting and Validation: The Developer's Complete Guide for 2026
JSON has been the universal data interchange format for two decades. Every API speaks it. Every config file ships it. Every log line aggregates into it. And yet — most engineers I work with carry small, persistent confusions about the spec edges: trailing commas, comment support, NaN handling, big integers, date encoding. This guide is the no-nonsense reference for working with JSON in 2026.
What JSON actually is (and isn't)
JSON — JavaScript Object Notation — was standardised as ECMA-404 in 2013 and refined as RFC 8259 in 2017. The full grammar fits on one page. The data types are exactly six:
- String — UTF-8 sequence in double quotes.
- Number — IEEE 754 double-precision float. No integer/float distinction in the spec.
- Boolean —
trueorfalse(lowercase, no quotes). - Null —
null(lowercase, no quotes). - Array — ordered list in square brackets.
- Object — unordered key-value pairs in curly braces. Keys must be strings.
Everything you can express in JSON is built from those six types. There is no native date, no native binary, no native bigint, no native NaN/Infinity. Anyone telling you JSON "supports" any of those is using a non-standard extension.
What JSON doesn't support — the omissions that surprise developers:
- No comments. The original spec author (Douglas Crockford) removed them explicitly. Files calling themselves
.jsonwith "// like this" are invalid JSON; they're JSON-with-comments (JSONC) or JSON5. - No trailing commas. Adding a comma after the last element is the most common manual-edit error.
- No single quotes. Only double quotes work for strings.
- No unquoted keys. That's JavaScript object syntax, not JSON.
- No undefined. Use
nullfor missing values. - No functions, dates, regexes, classes. No way to serialise computation.
These restrictions are the point — JSON's simplicity is what made it win over XML.
Formatting: pretty-print vs minified
Two canonical representations: pretty-printed (with indentation, multi-line — for humans) and minified (single line, no whitespace — for machines).
When to use which:
- Minified for API responses, network payloads, log lines. Saves 30–60% bytes. Modern HTTP gzip/brotli compression closes most of that gap but every byte still matters at scale.
- Pretty-printed for config files, fixtures, anything a human will edit. Indentation reveals structure.
- Inline JSON in HTML/scripts should be minified — it's machine-bound.
Indentation conventions: 2 spaces for web/JS ecosystems, 4 spaces for Java/Python/.NET. Tabs cause display inconsistencies — avoid.
Use JSON formatter to switch between formats with one click.
Validation: what "valid JSON" actually means
Validation has three layers, often confused.
Layer 1: Syntactic validity
The document parses cleanly. No unclosed brackets, no trailing commas, no smart-quote mistakes. Test with JSON validator or in code with try/catch around JSON.parse.
Most common syntactic errors: trailing comma after last array/object element, single quotes instead of double, missing quotes around keys, unescaped double quote inside string value, BOM (byte order mark) at file start.
Layer 2: Structural validity (against a schema)
The document parses and matches a contract. JSON Schema (current draft: 2020-12) handles this — define required fields, types, min/max, enum values, formats (email, URL, date) — then validate any document against the schema.
Validate with Ajv (JavaScript), jsonschema (Python), or at API gateway level. Schema validation catches errors that pure syntactic parsing misses.
Layer 3: Semantic validity (business rules)
"Purchase date is not in the future." "Total equals sum of line items." These aren't expressible in JSON Schema reliably — handle in application code.
The big integer problem
JSON's spec says all numbers are IEEE 754 doubles. That means integers above 2^53 (9,007,199,254,740,992) lose precision when parsed in JavaScript. JSON.parse of a number larger than that silently corrupts it.
This bites you with Twitter/X user IDs (64-bit since 2010), database BIGINT columns, cryptographic IDs, timestamps in microseconds since epoch.
Solutions:
- Encode big numbers as strings in your JSON — universal compatibility, mild ugliness.
- Use BigInt in JavaScript with a reviver function — fragile, doesn't survive re-serialisation.
- Use json-bigint package in Node — handles bigints cleanly but breaks JSON.stringify roundtrip.
For new APIs in 2026, encode large IDs as strings. It's the only safe option.
Date encoding
JSON has no native date type. Three common encodings, in declining order of preference:
- ISO 8601 string: "2026-05-31T15:30:00Z" — readable, unambiguous, sortable as string.
- Unix epoch number: 1748706600 — compact, ambiguous (seconds or milliseconds?), not human-readable.
- Custom format string like "31-May-2026" — locale-dependent, ambiguous. Avoid.
Always use ISO 8601 with explicit timezone (Z for UTC, or +05:30 for offset). Never serialise dates in local time without timezone.
JSON5, JSONC, and the comment problem
When you need comments and trailing commas (config files, especially), choose:
- JSON5 — superset with comments, trailing commas, unquoted keys, single quotes, hex numbers, NaN/Infinity. Used by Babel, Rollup config files.
- JSONC — JSON with Comments only. Used by VS Code settings, TypeScript tsconfig.json.
Both require non-standard parsers. Don't ship JSON5/JSONC to APIs that expect standard JSON — convert at the boundary.
For human-edited config that absolutely needs comments, prefer YAML or TOML. They were designed for the job. JSON-with-extensions always feels like a half-solution.
Performance with large files
JSON's simplicity comes at a cost: parsing speed and memory for large documents.
Benchmarks on a 100 MB JSON file with deeply nested arrays:
- JSON.parse (V8 Node 22): ~1.2 seconds, ~400 MB peak memory. Synchronous — blocks event loop.
- stream-json library with event-based parser: ~3.5 seconds, ~50 MB peak memory. Async, doesn't block.
- simdjson (SIMD-accelerated C++): ~180 ms, ~250 MB memory. Fastest cross-platform option.
- yyjson (C library with Rust/Node bindings): ~120 ms, ~280 MB. Currently the fastest known.
When to use which: under 10 MB use JSON.parse, 10–500 MB use streaming parser, over 500 MB or hot path use simdjson or yyjson — or switch to a more efficient format (protobuf, MessagePack, Arrow).
Pro tip: if you're processing millions of small JSON documents (one per log line), use NDJSON (newline-delimited JSON) — one JSON document per line, streamable, append-friendly.
Common JSON anti-patterns
After reviewing hundreds of APIs, the same mistakes repeat:
- Wrapping single values: returning {"data": 42} when just 42 would do.
- Stringifying booleans: "active": "true" instead of "active": true. Forces every client to do string comparison.
- Mixing types in arrays: [1, "two", true] is valid JSON but a code smell. Each array element should be the same shape.
- Null vs absent vs empty: null, "", and "key not present" should mean three different things. Pick conventions and document them.
- Embedding stringified JSON inside JSON — double-encoded, forces clients to double-parse. Always a sign of a pipeline shortcut.
- Recursive references that crash naive parsers. Schema systems (Avro, Protobuf) handle this; JSON does not.
JSON-to-something conversions
Common tasks: JSON to CSV flattens nested objects into columns. JSON to YAML for converting API responses into editable config. JSON to TypeScript generates type definitions from sample JSON (quicktype is the gold standard).
A pre-deploy checklist for JSON APIs
Before shipping a new JSON API:
- Document the response shape with JSON Schema.
- Validate every outgoing response in dev/staging.
- Use ISO 8601 for all dates with explicit timezone.
- Encode integers above 2^53 as strings.
- Null vs empty string vs missing-key — pick conventions, document them.
- Set Content-Type: application/json; charset=utf-8 header.
- Enable HTTP compression (gzip or brotli) — reduces JSON payload by 70–90%.
- For arrays of records, consider NDJSON if streaming is a concern.
Tools to use
- JSON Formatter — pretty-print, minify, syntax-highlight.
- JSON Validator — error messages pinpoint the exact line and column.
- JSON to CSV — flat or pivot rendering for nested arrays.
- JSON to YAML — converter preserving key order.
- JSON Diff — semantic diff that ignores key order changes.
The bottom line
JSON's win was simplicity. Twenty years in, that simplicity remains its strongest feature — and its sharpest edges. Master the spec gotchas (no comments, no trailing commas, no bigint, no native dates), use JSON Schema to encode contracts, pick the right tool for the file size you're working with, and avoid the common anti-patterns.
JSON isn't glamorous. But it's the format every system in your stack speaks. Knowing it cold pays dividends every working day.