What is NDJSON?

Newline Delimited JSON (NDJSON) is a specialized data format that provides a simple yet powerful way to handle streaming JSON data. While standard JSON is excellent for representing structured data, NDJSON addresses specific use cases where processing large datasets and streaming data is required.

This article takes a quick look at NDJSON and how it differs from regular JSON.

Understanding NDJSON

NDJSON is essentially a format where each line is a valid JSON value, typically an object or array, followed by a newline character (\n). Unlike standard JSON, which requires the entire document to be parsed as a single entity, NDJSON can be processed one line at a time, making it ideal for streaming and large-scale data processing.

While there isn’t an RFC for NDJSON at the time of writing (like there is for JSON), you can check out the NDJSON project page on Github.

Example of NDJSON

Here’s how NDJSON looks in practice:

{"name": "Ashley", "timestamp": "2024-01-02T10:00:00Z", "action": "login"}
{"name": "Briar", "timestamp": "2024-01-02T10:01:15Z", "action": "purchase"}
{"name": "Cate", "timestamp": "2024-01-02T10:02:30Z", "action": "logout"}

Each line is a complete, valid JSON object, and the entire file can be processed one line at a time.

Example of Regular JSON

By contrast, here’s a JSON document:

[
    {"name": "Ashley", "timestamp": "2024-01-02T10:00:00Z", "action": "login"},
    {"name": "Briar", "timestamp": "2024-01-02T10:01:15Z", "action": "purchase"},
    {"name": "Cate", "timestamp": "2024-01-02T10:02:30Z", "action": "logout"}
]

The only differences are the square brackets at the start and end (enclosing all other JSON documents inside) and the commas at the end of most lines. NDJSON simply eliminates these square brackets and commas from the file.

The result of this is that NDJSON is not valid JSON. It wouldn’t pass a JSON validation, due to it containing multiple JSON documents. But this is not to say that each line doesn’t contain valid JSON. Each line must contain valid JSON, and each line must be separated by a newline character (\n).

Benefits of NDJSON

The NDJSON format provides a number of benefits, including:

Streaming-Friendly

NDJSON’s line-by-line structure makes it perfect for streaming applications. Each line can be processed independently, allowing for efficient memory usage when handling large datasets. This is particularly valuable in scenarios where you need to process data in real-time or work with datasets too large to fit in memory.

Easy to Generate and Parse

The format’s simplicity means it’s straightforward to generate and parse. Any tool that can handle text files line by line can work with NDJSON. This makes it an excellent choice for logs, data pipelines, and inter-process communication.

Human-Readable

Like regular JSON, NDJSON remains human-readable while maintaining machine-parseability. This dual nature makes it valuable for debugging and development purposes.

Common Use Cases

Data Streaming

NDJSON excels in scenarios where data needs to be transmitted or processed continuously:

  • Log aggregation systems
  • Real-time analytics pipelines
  • Data migration tasks
  • ETL (Extract, Transform, Load) processes

Large Dataset Processing

When working with massive datasets, NDJSON offers several advantages:

  • Allows for line-by-line processing
  • Enables parallel processing
  • Reduces memory overhead
  • Supports resume-ability in case of failures

Implementation Considerations

Writing NDJSON

When generating NDJSON:

  • Ensure each line is a valid JSON value
  • Use proper JSON escaping for special characters
  • Add a newline character after each JSON value
  • Avoid pretty-printing or formatting that spans multiple lines

Reading NDJSON

When consuming NDJSON:

  • Process the file line by line
  • Parse each line as independent JSON
  • Handle malformed lines gracefully
  • Consider implementing error recovery mechanisms

Summary

NDJSON represents a pragmatic solution for handling streaming JSON data. It’s basically just a whole lot of JSON documents, each on its own line. Its simplicity, efficiency, and compatibility with existing tools make it an excellent choice for many modern data processing needs.