Newline Delimited JSON (NDJSON) is a specialized data format that provides a simple yet powerful way to handle streaming JSON data. While standard JSON is excellent for representing structured data, NDJSON addresses specific use cases where processing large datasets and streaming data is required.
This article takes a quick look at NDJSON and how it differs from regular JSON.
Understanding NDJSON
NDJSON is essentially a format where each line is a valid JSON value, typically an object or array, followed by a newline character (\n
). Unlike standard JSON, which requires the entire document to be parsed as a single entity, NDJSON can be processed one line at a time, making it ideal for streaming and large-scale data processing.
While there isn’t an RFC for NDJSON at the time of writing (like there is for JSON), you can check out the NDJSON project page on Github.
Example of NDJSON
Here’s how NDJSON looks in practice:
{"name": "Ashley", "timestamp": "2024-01-02T10:00:00Z", "action": "login"}
{"name": "Briar", "timestamp": "2024-01-02T10:01:15Z", "action": "purchase"}
{"name": "Cate", "timestamp": "2024-01-02T10:02:30Z", "action": "logout"}
Each line is a complete, valid JSON object, and the entire file can be processed one line at a time.
Example of Regular JSON
By contrast, here’s a JSON document:
[
{"name": "Ashley", "timestamp": "2024-01-02T10:00:00Z", "action": "login"},
{"name": "Briar", "timestamp": "2024-01-02T10:01:15Z", "action": "purchase"},
{"name": "Cate", "timestamp": "2024-01-02T10:02:30Z", "action": "logout"}
]
The only differences are the square brackets at the start and end (enclosing all other JSON documents inside) and the commas at the end of most lines. NDJSON simply eliminates these square brackets and commas from the file.
The result of this is that NDJSON is not valid JSON. It wouldn’t pass a JSON validation, due to it containing multiple JSON documents. But this is not to say that each line doesn’t contain valid JSON. Each line must contain valid JSON, and each line must be separated by a newline character (\n
).
Benefits of NDJSON
The NDJSON format provides a number of benefits, including:
Streaming-Friendly
NDJSON’s line-by-line structure makes it perfect for streaming applications. Each line can be processed independently, allowing for efficient memory usage when handling large datasets. This is particularly valuable in scenarios where you need to process data in real-time or work with datasets too large to fit in memory.
Easy to Generate and Parse
The format’s simplicity means it’s straightforward to generate and parse. Any tool that can handle text files line by line can work with NDJSON. This makes it an excellent choice for logs, data pipelines, and inter-process communication.
Human-Readable
Like regular JSON, NDJSON remains human-readable while maintaining machine-parseability. This dual nature makes it valuable for debugging and development purposes.
Common Use Cases
Data Streaming
NDJSON excels in scenarios where data needs to be transmitted or processed continuously:
- Log aggregation systems
- Real-time analytics pipelines
- Data migration tasks
- ETL (Extract, Transform, Load) processes
Large Dataset Processing
When working with massive datasets, NDJSON offers several advantages:
- Allows for line-by-line processing
- Enables parallel processing
- Reduces memory overhead
- Supports resume-ability in case of failures
Implementation Considerations
Writing NDJSON
When generating NDJSON:
- Ensure each line is a valid JSON value
- Use proper JSON escaping for special characters
- Add a newline character after each JSON value
- Avoid pretty-printing or formatting that spans multiple lines
Reading NDJSON
When consuming NDJSON:
- Process the file line by line
- Parse each line as independent JSON
- Handle malformed lines gracefully
- Consider implementing error recovery mechanisms
Summary
NDJSON represents a pragmatic solution for handling streaming JSON data. It’s basically just a whole lot of JSON documents, each on its own line. Its simplicity, efficiency, and compatibility with existing tools make it an excellent choice for many modern data processing needs.