GitHub - Asmod4n/mruby-fast-json: JSON Parsing and dumping for mruby, fast.

mruby‑fast‑json

A high‑performance JSON parser and encoder for MRuby, powered by Powered by simdjson Learn more at https://simdjson.org

mruby-fast-json provides:

Ultra‑fast JSON.parse using simdjson’s DOM parser
Strict error reporting mapped to Ruby exception classes
Full UTF‑8 validation on both parse and dump
Optional symbolized keys (symbolize_names: true)
JSON.dump with correct escaping and Unicode handling
Round‑trip safety for all supported types
Big integer support (uint64 → MRuby integer)
Precise error classes for malformed JSON

This gem is designed to be a drop‑in replacement for JSON.parse and JSON.dump in MRuby environments where performance and correctness matter.

Features

✔ Fast, SIMD‑accelerated parsing

Backed by simdjson, parsing is extremely fast even for large documents.

✔ Symbolized keys

JSON.parse('{"name":"Alice"}', symbolize_names: true)
# => { :name => "Alice" }

✔ Full UTF‑8 validation

Invalid UTF‑8 sequences raise JSON::UTF8Error.

✔ Correct JSON escaping

All control characters, quotes, backslashes, and C0 controls are escaped according to the JSON spec.

✔ Big integer support

Numbers larger than INT64_MAX become MRuby integers, not floats.

✔ Detailed error classes

Malformed JSON raises specific exceptions such as:

JSON::TapeError
JSON::StringError
JSON::UnclosedStringError
JSON::DepthError
JSON::NumberError
JSON::BigIntError
JSON::UnescapedCharsError
…and many more

Usage

Parsing JSON

obj = JSON.parse('{"name":"Alice","age":30}')
obj["name"]  # => "Alice"
obj["age"]   # => 30

Symbolized keys

obj = JSON.parse('{"name":"Alice"}', symbolize_names: true)
obj[:name]   # => "Alice"
obj["name"]  # => nil

Nested structures

obj = JSON.parse('{"user":{"id":1,"name":"Bob"}}')
obj["user"]  # => { "id" => 1, "name" => "Bob" }

Arrays

arr = JSON.parse('[true, null, 42, "hi"]')
# => [true, nil, 42, "hi"]

Dumping JSON

JSON.dump({ "x" => 1, "y" => "z" })
# => '{"x":1,"y":"z"}'

Arrays

JSON.dump([true, nil, "text"])
# => '[true,null,"text"]'

UTF‑8 round‑trip

obj = { "emoji" => "😀😃😄" }
json = JSON.dump(obj)
JSON.parse(json)  # => same structure

OnDemand JSON API (Lazy Parsing)

A high‑performance, zero‑copy, streaming JSON interface for MRuby, powered by simdjson’s OnDemand parser.

The OnDemand API provides:

Lazy parsing — fields are parsed only when accessed
Zero‑copy string access when possible
Fast field lookup (doc["key"], doc.at(index))
JSON Pointer support (doc.at_pointer("/a/b/0"))
Streaming iteration over arrays and objects
Deterministic error handling mapped to Ruby exceptions
Native deserialization into Ruby objects via native_ext_deserialize

This API is ideal for large JSON documents, streaming workloads, or performance‑critical environments.

Quick Start

json = '{"user":{"id":1,"name":"Alice"},"tags":[1,2,3]}'
doc  = JSON.parse_lazy(json)

doc["user"]["name"]   # => "Alice"
doc["tags"][1]        # => 2

Unlike JSON.parse, this does not build a full Ruby object tree. Values are parsed on demand, directly from the underlying buffer.

Zero‑Copy Parsing

If the input string has enough capacity for simdjson’s padding, the parser uses it directly:

JSON.zero_copy_parsing = true
doc = JSON.parse_lazy(str)

If not, the string is resized and frozen, or a padded buffer is allocated.

JSON::Document API

A JSON::Document represents a lazily parsed JSON value. It supports:

Field Lookup

String keys

doc["name"]        # => value or nil
doc.fetch("name")  # => value or raises KeyError

Array indexing

doc.at(0)          # => value or nil
doc.fetch(0)       # => value or raises IndexError

You may only use .at once per array, when you need to iterate over an array take a look at the Iteration APIs below.

JSON Pointer

doc.at_pointer("/user/name")   # => "Alice"

JSON Path (simdjson extension)

doc.at_path("$.user.id")       # => 1

Wildcards

doc.at_path_with_wildcard("$.items[*].id")
# => [1, 2, 3]

Or with a block:

doc.at_path_with_wildcard("$.items[*].id") do |id|
  puts id
end

Iteration

Arrays

doc.array_each do |value|
  puts value
end

Or return an array:

doc.array_each
# => [ ... ]

Objects

doc.object_each do |key, value|
  puts "#{key} = #{value}"
end

Error Handling

All simdjson errors are mapped to Ruby exceptions:

JSON::NoSuchFieldError
JSON::OutOfBoundsError
JSON::TapeError
JSON::DepthError
JSON::UTF8Error
JSON::NumberError
JSON::BigIntError
JSON::UnescapedCharsError
JSON::OndemandParserInUseError
…and many more

Lookup misses (NO_SUCH_FIELD, INDEX_OUT_OF_BOUNDS, etc.) return nil for:

doc["key"]
doc.find_field
doc.find_field_unordered
doc.at
doc.at_pointer
doc.at_path

But raise for:

doc.fetch

Native Deserialization (Zero‑Magic, Explicit Contracts)

You can define a Ruby class with a schema:

class Foo
  attr_accessor :foo
  native_ext_deserialize :@foo, JSON::Type::String
end

Then deserialize directly from an OnDemand document:

doc = JSON.parse_lazy('{"foo":"hello"}')
foo = doc.into(Foo.new)

foo.foo   # => "hello"

How it works

Each class stores a hidden schema hash: :@ivar => JSON::Type::X
The C++ layer iterates the schema and attempts to:
- find the field
- check the JSON type
- convert the value
- assign the ivar
No fallback, no coercion, no guessing
If at least one field matches → success
If none match → JSON::IncorrectTypeError
If simdjson reports an error → raised immediately

This is a deterministic, explicit, zero‑magic deserialization pipeline.

Supported Types

JSON::Type::Array
JSON::Type::Object
JSON::Type::Number
JSON::Type::String
JSON::Type::Boolean
JSON::Type::Null

Performance Notes

OnDemand parsing is streaming: fields are parsed only when accessed.
You have to access fields in order or an error is thrown, when you need to start from the beginning of a stream you can call .rewind on a JSON::Document.

When to Use OnDemand

Use OnDemand when:

You parse large JSON documents
You only need a subset of fields
You want maximum performance
You want deterministic, schema‑driven deserialization
You want to avoid building full Ruby objects

Use DOM (JSON.parse) when:

You need a complete Ruby object tree
You want to modify the parsed structure
You prefer simplicity over performance

Example: High‑Performance Pipeline

class User
  attr_accessor :id, :name

  native_ext_deserialize :@id,   JSON::Type::Number
  native_ext_deserialize :@name, JSON::Type::String
end

doc = JSON.load_lazy("users.json")

users = []
doc.array_each do |user_doc|
  u = User.new
  users << user_doc.into(u)
end

This avoids building any intermediate Ruby hashes or arrays.

Lazy File Loading (`JSON.load_lazy`)

JSON.load_lazy loads a JSON file into a padded_string and returns a lazily‑parsed JSON::Document. This is the most efficient way to process large JSON files in MRuby.

Unlike JSON.parse(File.read(...)), this API:

avoids allocating a Ruby string for the entire file
uses simdjson’s padded_string::load for optimal I/O
parses lazily — fields are parsed only when accessed
supports zero‑copy access to string values
keeps the underlying buffer alive automatically

Usage

doc = JSON.load_lazy("data.json")

doc["user"]["name"]   # parsed on demand
doc.array_each do |item|
  puts item["id"]
end

How it works

JSON.load_lazy(path) performs:

Load file into simdjson::padded_string This ensures correct padding and optimal memory layout.
Wrap it in a Ruby JSON::PaddedString This object owns the buffer and ensures lifetime safety.
Create a JSON::PaddedStringView A lightweight view into the padded buffer.
Create a JSON::OndemandParser If none is provided.
Create a JSON::Document Bound to the view and parser.

The result is a fully lazy, streaming JSON document.

Example: Streaming a Large File

doc = JSON.load_lazy("big.json")

doc.array_each do |record|
  puts record["id"]
end

This avoids building a giant Ruby array and keeps memory usage minimal.

With a Reusable Parser

You can reuse a parser across multiple files:

parser = JSON::OndemandParser.new

doc1 = JSON.load_lazy("file1.json", parser)
doc2 = JSON.load_lazy("file2.json", parser)

This reduces allocations and improves throughput.

Error Handling

All simdjson errors are mapped to Ruby exceptions:

Lookup misses return nil:

But strict methods raise:

doc.fetch("missing")   # => KeyError

Integration with native_ext_deserialize

Lazy documents can be deserialized directly into Ruby objects:

class User
  attr_reader :id, :name
  native_ext_deserialize :@id,   JSON::Type::Number
  native_ext_deserialize :@name, JSON::Type::String
end

doc = JSON.load_lazy("user.json")
user = User.new
doc.into(user)

This avoids building intermediate Ruby hashes entirely.

When to Use `load_lazy`

Use it when:

You’re loading large JSON files
You want streaming access
You want minimal memory overhead
You want to deserialize directly into Ruby objects
You want simdjson’s full performance without DOM overhead

Error Handling

Malformed JSON raises specific exceptions:

JSON.parse('{"a":1,}')        # => JSON::OndemandParserError
JSON.parse('"unterminated')   # => JSON::UnclosedStringError
JSON.parse('tru')             # => JSON::TAtomError
JSON.parse('"\xC0"')          # => JSON::StringError
JSON.parse('{"x":12.3.4}')    # => JSON::NumberError
JSON.parse('')                # => JSON::EmptyInputError

Invalid UTF‑8 inside strings:

JSON.parse("\"\xC0\xAF\"")
# => JSON::UTF8Error

Huge integers:

JSON.parse('{"x":' + '9' * 20000 + '}')
# => JSON::BigIntError

Escaping Rules

JSON.dump escapes strings according to the JSON spec:

Printable ASCII → unchanged
Quotes and backslashes → escaped
Control chars → \b \f \n \r \t
Other C0 controls → \u00XX
Valid UTF‑8 → preserved

Example:

JSON.dump("\"\bλ😀\n")
# => "\"\\\"\\bλ😀\\n\""

Development & Testing

The test suite covers:

Parsing primitives
Symbolized keys
Nested structures
UTF‑8 correctness
Error conditions
Escaping rules
Big integer handling
Round‑trip stability

Run tests with:

Requirements

You need at least a C++20 compatible compiler.

License

Apache-2.0

mruby‑fast‑json

Features

✔ Fast, SIMD‑accelerated parsing

✔ Symbolized keys

✔ Full UTF‑8 validation

✔ Correct JSON escaping

✔ Big integer support

✔ Detailed error classes

Usage

Parsing JSON

Symbolized keys

Nested structures

Arrays

Dumping JSON

Arrays

UTF‑8 round‑trip

OnDemand JSON API (Lazy Parsing)

Quick Start

Zero‑Copy Parsing

JSON::Document API

Field Lookup

String keys

Array indexing

JSON Pointer

JSON Path (simdjson extension)

Wildcards

Iteration

Arrays

Objects

Error Handling

Native Deserialization (Zero‑Magic, Explicit Contracts)

How it works

Performance Notes

When to Use OnDemand

Example: High‑Performance Pipeline

Lazy File Loading (JSON.load_lazy)

Usage

How it works

Example: Streaming a Large File

With a Reusable Parser

Error Handling

Integration with native_ext_deserialize

When to Use load_lazy

Error Handling

Escaping Rules

Development & Testing

Requirements

License

Lazy File Loading (`JSON.load_lazy`)

When to Use `load_lazy`