1. Purpose
This document describes a vocabulary defining keywords that can be used to reference values stored in
- the instance data
- the schema data
- external JSON data
where the dereferenced values serve as input for keywords in a derived subschema.
The intent for this keyword is to cover the use cases discussed across several issues in the JSON Schema specification GitHub repositories regarding data access feature requests, including a proposal for a $data
keyword.
2. Declarations
The ID for this vocabulary is https://docs.json-everything.net/schema/vocabs/data-2023
, the same as this document’s URL.
The meta-schema which validates keyword usage for this vocabulary can be found at https://json-everything.net/meta/vocab/data-2023
, which also serves as its $id
value.
For convenience, a 2020-12 dialect extension meta-schema is also available at
https://json-everything.net/meta/data-2023
, which also serves as its$id
value. This dialect meta-schema extends the standard 2020-12 dialect to include only the keywords defined in this vocabulary. To support multiple vocabulary extensions, you’ll need to make your own dialect meta-schema which incorporates all of the vocabularies you want to use.
3. Definitions
3.1. Formed Schema
The schema object created as a result of dereferencing all of the values in either the data
or optionalData
keywords as described in section 4.1. That is, each keyword produces an independent formed schema.
3.2. Host Schema
The schema object which contains either the data
or optionalData
keyword. The processing rules that govern this schema also govern the formed schema, as specified by section 4.2.
4. The data
and optionalData
Keywords
4.1. Syntax and Semantics
The values of data
and optionalData
must be objects. The keys of these objects are interpreted as JSON Schema keywords, and the values MUST be one of
- JSON Pointers per RFC 6901
- Relative JSON Pointers
- JSON Path per (RFC 9535)
- fragment-only IRI using IRI-encoded JSON Pointer identifier per RFC 6901, §6
- Absolute IRIs per RFC 3987, optionally including an additional IRI-encoded JSON Pointer fragment identifier
Relative IRIs are not allowed as they follow a similar syntax to JSON Pointers and need to be distinguishable.
data
and optionalData
MUST NOT contain any keys which are defined by the JSON Schema Core Vocabulary.
data
and optionalData
both operate in two phases:
- All of the values are dereferenced per sections 4.2 and 4.3.
- The resolved object is then interpreted as a schema which is applied to the instance at the current location.
The validation and annotation results of data
and optionalData
are those of the formed schemas. More detail regarding output can be found in section 4.4.
4.2. Contextual Behavior
data
and optionalData
MUST be processed contextually in the same manner as the host schema. Specifically,
- IRI resolution MUST be performed in the same manner as
$ref
(per section 3.2). - Keys that are ignored by the host schema MUST also be ignored by the formed schema.
Implementations SHOULD validate that the resolved data forms a valid schema against the host schema’s meta-schema (as specified by $schema
).
It is not necessary for an implementation to perform a meta-schema validation of the formed schema. Other mechanisms internal to the implementation (such as deserialization) may suffice to verify the data is in the correct form.
4.3. Resolving data
The string values within data
and optionalData
are dereferenced in different ways depending on the format of the value.
If the value is a JSON Pointer, it is resolved against the instance root.
If the value is a Relative JSON Pointer, it is resolved against the instance at the location currently being evaluated.
If the value is a JSON Path, the query is executed against the instance at the current location. The resulting values are then taken as a JSON array.
JSON Path returns a nodelist, which contains both values and their locations within the instance. This operation discards the locations and retains only the values.
Otherwise, it must be resolved in accordance with the rules of $ref
resolution for the relevant JSON Schema specification (e.g. draft 2020-12, §8.2). However, unlike $ref
which requires that the indicated data must represent a valid schema object, data
and optionalData
references can identify any value which is valid for the associated keyword.
For each successfully resolved reference, the full value at the specified location MUST be returned.
4.3.1. Reference Resolution Failure
The data
and optionalData
keywords differ only in their behavior when resolving a reference fails.
For data
, if a reference cannot be resolved, or if a resolved value is not valid for the associated keyword, evaluation MUST halt. Implementations SHOULD use native features of their language to report the failure as appropriate. Implementations MAY continue to attempt to resolve other references so that multiple resolution failures can be reported together, however further schema evaluation MUST NOT continue.
For optionalData
, if a reference cannot be resolved, or if a resolved value is not valid for the associated keyword, that keyword MUST be ignored and excluded from the resulting formed schema. As a consequence, evaluation MUST proceed as if that keyword is absent.
4.3.2. External Data Access
Implementations SHOULD provide a means to pre-load and cache any external references prior to evaluation but MAY be configured to fetch external documents at evaluation time. Documents fetched from IRIs which contain a JSON Pointer fragment MUST be interpreted using a media type, such as application/schema-instance+json
, that allows resolution of such fragments.
Users should be aware that fetching data from external locations may carry certain security risks not covered by this document.
4.4. Output
The evaluation output of the formed schema is reported into the overall schema output incorporating “data” or “optionalData” into the evaluation path as appropriate and following on with additional pointer segments as navigable within the formed schema.
Annotation results of the formed schema are retained as per the host schema so that they can be processed by other keywords such as unevaluatedItems
and unevaluatedProperties
.
5. Examples
5.1. data
Example
The following defines a schema to validate an object instance with a bar
property that must contain an numeric value less than or equal to the value in the instance’s foo
property.
1
2
3
4
5
6
7
8
9
10
11
{
"$schema": "https://json-everything.net/meta/data-2023",
"type": "object",
"properties": {
"foo": { "type": "number" },
"bar": {
"type": "number",
"data": { "maximum": "/foo" }
}
}
}
The data
property declares that its parent subschema should validate a minimum
keyword whose value is the value in the minValue
property of the instance.
The following shows how a change in the foo
property, or its absence, can affect the validation result of the bar
property and thus the entire instance.
1
2
3
4
5
6
7
8
9
10
11
12
// passing
{ "bar": 5, "foo": 10 }
{ "foo": 10 }
{}
// failing
{ "bar": 5, "foo": 0 }
// resolution failure
{ "bar": 20 }
5.2. optionalData
Example
In the following schema, bar
is required to be less than foo
, however foo
itself is not required.
1
2
3
4
5
6
7
8
9
10
11
{
"$schema": "https://json-everything.net/meta/data-2023",
"type": "object",
"properties": {
"foo": { "type": "number" },
"bar": {
"type": "number",
"optionalData": { "maximum": "/foo" }
}
}
}
The intent behind such a schema is two-fold:
- If
bar
is present withoutfoo
,bar
may be of any numeric value. - If both
foo
andbar
are present,bar
’s value must be less than or equal tofoo
’s value.
The fact that foo
is optional presents a case where /foo
may fail to resolve, yielding an indeterminant validation result. optionalData
addresses this problem by ignoring resolution failures and allowing evaluation to proceed.
1
2
3
4
5
6
7
8
9
10
11
// passing
{ "bar": 5, "foo": 10 }
{ "bar": 10 }
{ "foo": 10 }
{}
// failing
{ "bar": 5, "foo": 0 }
5.3. data
Using a JSON Path Reference
The schema below requires that selection
be an ID from within the options
array, however the options
array is a collection of objects, where each item contains an ID.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
"$schema": "https://json-everything.net/meta/data-2023",
"type": "object",
"properties": {
"options": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": { "type": "integer" },
"value": { "type": "string" }
},
"required": ["id", "value"]
}
},
"selection": {
"data": {
"enum": "$.options[*].id"
}
}
},
"required": ["options", "selection"]
}
In this example, $.options[*].id
evaluated against the instance returns an array of the available IDs which is used in enum
. Then, selection
is validated to be one of those IDs.
If selection
is present in the array’s ID values, then the validation passes.
1
2
3
4
5
6
7
8
9
{
"options": [
{ "id": 1, "value": "foo" },
{ "id": 2, "value": "bar" },
{ "id": 3, "value": "baz" },
{ "id": 4, "value": "quux" }
],
"selection": 2
}
Otherwise, validation fails
1
2
3
4
5
6
7
8
9
{
"options": [
{ "id": 1, "value": "foo" },
{ "id": 2, "value": "bar" },
{ "id": 3, "value": "baz" },
{ "id": 4, "value": "quux" }
],
"selection": 42
}