JSONL for Developers

Introduction

A complete guide to JSON Lines format for developers, including technical specifications, implementation examples, and real-world use cases. JSONL (JSON Lines) is a text format where each line contains exactly one valid JSON object. Perfect for streaming data, large datasets, and real-time processing.

Quick Start

Get started with our free online tools:

What is JSONL?

JSONL (JSON Lines) is a text format where each line contains exactly one valid JSON object. Unlike traditional JSON files that wrap data in arrays or objects, JSONL files consist of multiple independent JSON objects, each on its own line.

Example of JSONL:

example.jsonl
{"name": "Alice", "age": 30, "city": "New York"}
{"name": "Bob", "age": 25, "city": "San Francisco"}
{"name": "Charlie", "age": 35, "city": "Chicago"}

Why Developers Choose JSONL

Memory Efficiency

Traditional JSON requires loading the entire file into memory. A 1GB JSON file needs 1GB+ of RAM. JSONL allows processing line by line, using only the memory needed for one record.

Streaming and Real-time Processing

JSONL is designed for streaming. You can process data as it arrives, making it perfect for real-time analytics, log processing, and API streaming.

Fault Tolerance

In traditional JSON, one syntax error breaks the entire file. JSONL allows you to skip invalid lines and continue processing.

Parallel Processing

Each line in JSONL is independent, making it ideal for distributed computing, MapReduce operations, and machine learning training.

Technical Specifications

MIME Types

  • Primary: application/jsonl
  • Alternative: application/x-jsonl
  • Legacy: text/jsonl

File Extensions

  • .jsonl (most common)
  • .ndjson (Newline Delimited JSON)
  • .jsonlines (less common)

Character Encoding

  • Recommended: UTF-8
  • Supported: Any encoding that preserves line boundaries
  • Line endings: Both \n (Unix/Linux) and \r\n (Windows) are supported

Implementation Examples

Python

example.py
import json

# Reading JSONL
with open('data.jsonl', 'r') as f:
    for line in f:
        record = json.loads(line.strip())
        process_record(record)

# Writing JSONL
records = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
with open('output.jsonl', 'w') as f:
    for record in records:
        f.write(json.dumps(record) + '\n')

JavaScript/Node.js

example.js
const fs = require('fs');
const readline = require('readline');

// Reading JSONL
const fileStream = fs.createReadStream('data.jsonl');
const rl = readline.createInterface({
  input: fileStream,
  crlfDelay: Infinity
});

rl.on('line', (line) => {
  const record = JSON.parse(line);
  processRecord(record);
});

// Writing JSONL
const records = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}];
const jsonl = records.map(record => JSON.stringify(record)).join('\n');
fs.writeFileSync('output.jsonl', jsonl);

PHP

example.php
// Reading JSONL
$file = fopen('data.jsonl', 'r');
while (($line = fgets($file)) !== false) {
    $record = json_decode(trim($line), true);
    processRecord($record);
}
fclose($file);

// Writing JSONL
$records = [["id" => 1, "name" => "Alice"], ["id" => 2, "name" => "Bob"]];
$jsonl = implode("\n", array_map('json_encode', $records));
file_put_contents('output.jsonl', $jsonl);

Common Use Cases

Log File Analysis

Instead of massive JSON log files that crash log viewers, JSONL enables efficient streaming:

logs.jsonl
{"timestamp": "2024-01-15T10:30:00Z", "level": "INFO", "message": "User login", "user_id": 123}
{"timestamp": "2024-01-15T10:30:01Z", "level": "ERROR", "message": "Database connection failed", "error_code": "DB001"}
{"timestamp": "2024-01-15T10:30:02Z", "level": "INFO", "message": "Retry successful", "user_id": 123}

Machine Learning Datasets

ML frameworks prefer JSONL for training data:

training_data.jsonl
{"text": "This is a positive review", "label": "positive", "confidence": 0.95}
{"text": "This product is terrible", "label": "negative", "confidence": 0.88}
{"text": "It's okay, nothing special", "label": "neutral", "confidence": 0.72}

API Streaming

Many modern APIs use JSONL for real-time data delivery:

events.jsonl
{"event": "user_joined", "user_id": 456, "timestamp": 1642248000}
{"event": "message_sent", "user_id": 456, "message": "Hello world", "timestamp": 1642248001}
{"event": "user_left", "user_id": 456, "timestamp": 1642248002}

Performance Considerations

File Size Optimization

  • Compression: JSONL compresses well with gzip (often 70-80% reduction)
  • No whitespace: Remove unnecessary spaces to reduce file size
  • Efficient encoding: Use UTF-8 for optimal compatibility

Processing Speed

  • Line-by-line reading: Much faster than loading entire files
  • Parallel processing: Each line can be processed independently
  • Memory usage: Constant memory usage regardless of file size

Best Practices

What Not to Do

  • Don't wrap the entire file in an array
  • Don't use trailing commas
  • Don't mix line endings
  • Don't ignore encoding

What to Do

  • Validate each line as valid JSON
  • Handle errors gracefully
  • Use consistent formatting
  • Consider compression for large files
  • Document your schema

When to Use JSONL vs Other Formats

Use JSONL when:

  • Processing large datasets
  • Streaming data in real-time
  • Building fault-tolerant systems
  • Working with machine learning datasets
  • Creating log files
  • Building data pipelines

Use traditional JSON when:

  • Small datasets that fit in memory
  • API responses that need to be parsed as a single unit
  • Configuration files
  • Data that needs to be validated as a complete structure

Tools and Resources

Online Tools

Related Guides

Getting Started

  1. Convert existing data: Take a CSV or JSON array and convert it to JSONL
  2. Build a simple parser: Write a basic JSONL processor in your preferred language
  3. Experiment with streaming: Try processing a large JSONL file line by line
  4. Test error handling: Create files with invalid lines and see how your code handles them

Ready to Get Started?

Try our free online tools to convert, validate, and work with JSONL data.