Home A Vocabulary for Accessing Data Stored in JSON (2023)
A Vocabulary for Accessing Data Stored in JSON (2023)
Cancel

A Vocabulary for Accessing Data Stored in JSON (2023)

1. Purpose

This document describes a vocabulary defining keywords that can be used to reference values stored in

  • the instance data
  • the schema data
  • external JSON data

where the dereferenced values serve as input for keywords in a derived subschema.

The intent for this keyword is to cover the use cases discussed across several issues in the JSON Schema specification GitHub repositories regarding data access feature requests, including a proposal for a $data keyword.

2. Declarations

The ID for this vocabulary is https://docs.json-everything.net/schema/vocabs/data-2023, the same as this document’s URL.

The meta-schema which validates keyword usage for this vocabulary can be found at https://json-everything.net/meta/vocab/data-2023, which also serves as its $id value.

For convenience, a 2020-12 dialect extension meta-schema is also available at https://json-everything.net/meta/data-2023, which also serves as its $id value. This dialect meta-schema extends the standard 2020-12 dialect to include only the keywords defined in this vocabulary. To support multiple vocabulary extensions, you’ll need to make your own dialect meta-schema which incorporates all of the vocabularies you want to use.

3. Definitions

3.1. Formed Schema

The schema object created as a result of dereferencing all of the values in either the data or optionalData keywords as described in section 4.1. That is, each keyword produces an independent formed schema.

3.2. Host Schema

The schema object which contains either the data or optionalData keyword. The processing rules that govern this schema also govern the formed schema, as specified by section 4.2.

4. The data and optionalData Keywords

4.1. Syntax and Semantics

The values of data and optionalData must be objects. The keys of these objects are interpreted as JSON Schema keywords, and the values MUST be one of

Relative IRIs are not allowed as they follow a similar syntax to JSON Pointers and need to be distinguishable.

data and optionalData MUST NOT contain any keys which are defined by the JSON Schema Core Vocabulary.

data and optionalData both operate in two phases:

  1. All of the values are dereferenced per sections 4.2 and 4.3.
  2. The resolved object is then interpreted as a schema which is applied to the instance at the current location.

The validation and annotation results of data and optionalData are those of the formed schemas. More detail regarding output can be found in section 4.4.

4.2. Contextual Behavior

data and optionalData MUST be processed contextually in the same manner as the host schema. Specifically,

  • IRI resolution MUST be performed in the same manner as $ref (per section 3.2).
  • Keys that are ignored by the host schema MUST also be ignored by the formed schema.

Implementations SHOULD validate that the resolved data forms a valid schema against the host schema’s meta-schema (as specified by $schema).

It is not necessary for an implementation to perform a meta-schema validation of the formed schema. Other mechanisms internal to the implementation (such as deserialization) may suffice to verify the data is in the correct form.

4.3. Resolving data

The string values within data and optionalData are dereferenced in different ways depending on the format of the value.

If the value is a JSON Pointer, it is resolved against the instance root.

If the value is a Relative JSON Pointer, it is resolved against the instance at the location currently being evaluated.

If the value is a JSON Path, the query is executed against the instance at the current location. The resulting values are then taken as a JSON array.

JSON Path returns a nodelist, which contains both values and their locations within the instance. This operation discards the locations and retains only the values.

Otherwise, it must be resolved in accordance with the rules of $ref resolution for the relevant JSON Schema specification (e.g. draft 2020-12, §8.2). However, unlike $ref which requires that the indicated data must represent a valid schema object, data and optionalData references can identify any value which is valid for the associated keyword.

For each successfully resolved reference, the full value at the specified location MUST be returned.

4.3.1. Reference Resolution Failure

The data and optionalData keywords differ only in their behavior when resolving a reference fails.

For data, if a reference cannot be resolved, or if a resolved value is not valid for the associated keyword, evaluation MUST halt. Implementations SHOULD use native features of their language to report the failure as appropriate. Implementations MAY continue to attempt to resolve other references so that multiple resolution failures can be reported together, however further schema evaluation MUST NOT continue.

For optionalData, if a reference cannot be resolved, or if a resolved value is not valid for the associated keyword, that keyword MUST be ignored and excluded from the resulting formed schema. As a consequence, evaluation MUST proceed as if that keyword is absent.

4.3.2. External Data Access

Implementations SHOULD provide a means to pre-load and cache any external references prior to evaluation but MAY be configured to fetch external documents at evaluation time. Documents fetched from IRIs which contain a JSON Pointer fragment MUST be interpreted using a media type, such as application/schema-instance+json, that allows resolution of such fragments.

Users should be aware that fetching data from external locations may carry certain security risks not covered by this document.

4.4. Output

The evaluation output of the formed schema is reported into the overall schema output incorporating “data” or “optionalData” into the evaluation path as appropriate and following on with additional pointer segments as navigable within the formed schema.

Annotation results of the formed schema are retained as per the host schema so that they can be processed by other keywords such as unevaluatedItems and unevaluatedProperties.

5. Examples

5.1. data Example

The following defines a schema to validate an object instance with a bar property that must contain an numeric value less than or equal to the value in the instance’s foo property.

1
2
3
4
5
6
7
8
9
10
11
{
  "$schema": "https://json-everything.net/meta/data-2023",
  "type": "object",
  "properties": {
    "foo": { "type": "number" },
    "bar": {
      "type": "number",
      "data": { "maximum": "/foo" }
    }
  }
}

The data property declares that its parent subschema should validate a minimum keyword whose value is the value in the minValue property of the instance.

The following shows how a change in the foo property, or its absence, can affect the validation result of the bar property and thus the entire instance.

1
2
3
4
5
6
7
8
9
10
11
12
// passing
{ "bar": 5, "foo": 10 }

{ "foo": 10 }

{}

// failing
{ "bar": 5, "foo": 0 }

// resolution failure
{ "bar": 20 }

5.2. optionalData Example

In the following schema, bar is required to be less than foo, however foo itself is not required.

1
2
3
4
5
6
7
8
9
10
11
{
  "$schema": "https://json-everything.net/meta/data-2023",
  "type": "object",
  "properties": {
    "foo": { "type": "number" },
    "bar": {
      "type": "number",
      "optionalData": { "maximum": "/foo" }
    }
  }
}

The intent behind such a schema is two-fold:

  1. If bar is present without foo, bar may be of any numeric value.
  2. If both foo and bar are present, bar’s value must be less than or equal to foo’s value.

The fact that foo is optional presents a case where /foo may fail to resolve, yielding an indeterminant validation result. optionalData addresses this problem by ignoring resolution failures and allowing evaluation to proceed.

1
2
3
4
5
6
7
8
9
10
11
// passing
{ "bar": 5, "foo": 10 }

{ "bar": 10 }

{ "foo": 10 }

{}

// failing
{ "bar": 5, "foo": 0 }

5.3. data Using a JSON Path Reference

The schema below requires that selection be an ID from within the options array, however the options array is a collection of objects, where each item contains an ID.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "$schema": "https://json-everything.net/meta/data-2023",
  "type": "object",
  "properties": {
    "options": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "id": { "type": "integer" },
          "value": { "type": "string" }
        },
        "required": ["id", "value"]
      }
    },
    "selection": {
      "data": {
        "enum": "$.options[*].id"
      }
    }
  },
  "required": ["options", "selection"]
}

In this example, $.options[*].id evaluated against the instance returns an array of the available IDs which is used in enum. Then, selection is validated to be one of those IDs.

If selection is present in the array’s ID values, then the validation passes.

1
2
3
4
5
6
7
8
9
{
  "options": [
    { "id": 1, "value": "foo" },
    { "id": 2, "value": "bar" },
    { "id": 3, "value": "baz" },
    { "id": 4, "value": "quux" }
  ],
  "selection": 2
}

Otherwise, validation fails

1
2
3
4
5
6
7
8
9
{
  "options": [
    { "id": 1, "value": "foo" },
    { "id": 2, "value": "bar" },
    { "id": 3, "value": "baz" },
    { "id": 4, "value": "quux" }
  ],
  "selection": 42
}
Contents