NDL Language
NDL (nested data language) is a data serialization and config language for representing nested structures.
// line comment
/* block comment /* with nesting */ */
scene {
size { x 1920 y 1080 }
camera.type "orthographic"
layers [ {
name "background"
textures [ `background.png` `mask.png` ]
scale { x 1.2 y 1.0 }
} {
name "foreground"
enabled false
} ]
}
Design principles
Goals:
- Minimal syntax, no elements that can be removed without changing semantics
- Sane set of primitive types, that maps well to common programming languages
- Orthogonal grammar, that makes it easy to tell a type of any token without looking at the context
- Keep documents with deep nesting levels well readable and editable by hand
- Unambiguous representation of paths to values, that allows for easy access to arbitrary data
Non-goals:
- Fastest possible wire format - if performance is a bottleneck, use something like protobuf instead
- Programmable config language - instead NDL has minimal set of features for convenient static config
Specification
Keys are arbitrary strings, that are represented either:
- as is, if the key fully matches
[a-zA-Z_][a-zA-Z0-9_-]*and is not one of reserved keywordsnull,true,false,inf,nan - in single quotes
''in any other case, the contents of a quoted key use the same escape rules as interpreted string values (see below)
There are following types of values:
- map - a sequence of key-value pairs, enclosed in
{}brackets, separated by whitespace - array - a sequence of values, enclosed in
[]brackets, separated by whitespace -
string - a sequence of UTF-8 characters, may be written in 2 forms:
- raw string - enclosed in back quotes
``, can contain any character except`, characters are written as is - interpreted string - enclosed in double quotes
"", escape sequences\n,\t,\u{...},\',\",\\are supported, may be multiline, physical newlines are preserved
- raw string - enclosed in back quotes
- int - arbitrary size integer, written in either:
- decimal format, fully matching
-?(0|[1-9][0-9]*)(e.g.12,0, but not01) - hexadecimal format, fully matching
-?0x[0-9A-Fa-f]+(e.g.0xFF) - binary format, fully matching
-?0b[01]+(e.g.0b1011)
- decimal format, fully matching
- real - real number, either:
- decimal dot real, fully matching
-?(0|[1-9][0-9]*)\.[0-9]+(e.g.12.3,-0.1, but not01.2) - in e notation, fully matching
-?(0|[1-9][0-9]*)(\.[0-9]+)?[eE]-?[0-9]+(e.g.1.2e-3,-1e9) inf,-inf,nan(negative nan is invalid)
- decimal dot real, fully matching
- bool -
trueorfalse - null type -
null
Document contains exactly one value of any type. If this type is map, {}
brackets must be omitted. Empty (or comment-only) document is an empty map. NDL document file extension
is .ndl.
A key in the map may be written as dot-separated path, e.g. k1.k2.kN.
In this case, nested maps are created for each path level, e.g. k1 {},
k1 { k2 {} }, etc. No whitespace is allowed between path parts and dots.
After expanding key paths, maps at the same path are merged recursively. If multiple map values have the same path, they are merged. Map/non-map and non-map/non-map conflicts are invalid (NDL does not allow value overriding).
// speaking less formally, this is allowed
category.sub1 { key1 "val1" key2 "val2" }
category.sub2 { key1 "val1" key2 "val2" }
category { key "val" }
// and is interpreted as
category {
sub1 { key1 "val1" key2 "val2" }
sub2 { key1 "val1" key2 "val2" }
key "val"
}
Dotted paths
Maps and arrays may contain values of any type, including other maps and arrays. For any value, it is possible to write a path to it - a sequence of map keys and array indices. NDL semantics allows to unambiguously represent a path to any value as a string, constructed as follows:
- Start with an empty string and iterate over the path elements from the root to the value
- If current path element is a map, append a dot followed by the key name, following key representation rules (see above)
- If current path element is an array, append a dot followed by the 0-based index
- A dot at the beginning of the path is omitted
category { array [ { 'weird key' "val" } ] }
// a path to the value "val" here can be
// represented as category.array.0.'weird key'
This is a convenient way to refer to a value by its path, and is recommended to implement by libraries. Though, you can't write dotted paths with array indices in NDL code itself, due to ambiguity problems it would cause.
Canonical format
It is recommended to format NDL documents as follows:
- All tokens must be separated by either one ASCII space or line break followed by
indentation tabs (except
[]and{}, which are written without whitespace). - If array or map contains no nested arrays, maps, comments, multiline strings, and its length (number of Unicode characters in canonical inline format) does not exceed 60 characters, it is written on a single line
- Root map is always written on multiple lines
- In other cases, it is written on multiple lines, adding one level of tab indentation. In maps, each key-value pair is written on a separate line, separated by one ASCII space, in arrays, each element is written on a separate line. A line break is written after the opening brace/bracket and before closing brace/bracket
- Multiline arrays whose direct elements are only maps are written in one indentation
level instead of two, and braces are separated by one ASCII space:
[ {,} {,} ](this overrides previous rule on line break after opening bracket) - In multiline arrays and maps, existing line breaks are preserved, but if there are two and more consecutive empty lines, they are collapsed into one. Empty lines after opening brace/bracket and before closing brace/bracked are also removed
- Line comments are written with one ASCII space after
// - Block comments are written with either one ASCII space after
/*and before*/, if there is no user-written line break there
Why another standard
You probably thought of the "how standards proliferate" meme. Well, no existing standard covered the needs of OpenWallpaper's declarative scene frontend I'm planning to develop. I tried to avoid making a new standard, but was left with no choice.
- JSON - good language for data exchange, but hard to edit by hand because of unnecessary verbosity (commas and colons), does not differentiate int and real numbers
- YAML - over-engineered and full of footguns, mainly because it allows for unquoted strings that can be interpreted in an unintended way
- TOML - awesome for flat configs, but gains about the same verbosity as JSON at deep nesting levels
- KDL - shares the idea of minimal syntax, but represents an XML-like tree instead of a structure
- HCL, HOCON, UCL - advanced config languages with programming elements, that are probably overkill for many use cases