JsonSchema.Net.DataGeneration is a tool that can create JSON data instances using a JSON schema as a framework.
For example, given the schema:
1
2
3
4
5
6
{
"type": "object",
"properties": {
"foo": { "type": "string" }
}
}
it can generate a JSON document like
1
2
3
{
"foo": "bar"
}
Under the covers, the library uses the fabulous Bogus library, which is commonly used to generate random test data, and a few other tricks.
Use Cases
Schema Debugging
One of the more practical uses of a data generator is checking whether a schema actually says what you think it says. The generator just follows the rules, so if the output looks wrong, the schema isn’t strict enough.
Missing required
Suppose you want a user record that always has a username:
1
2
3
4
5
6
7
{
"type": "object",
"properties": {
"username": { "type": "string" },
"email": { "type": "string", "format": "email" }
}
}
properties only describes what a property looks like if it shows up. It doesn’t make the property show up. So the generator is perfectly happy producing:
1
{}
or
1
{ "email": "someone@example.com" }
Both are valid. Adding "required": ["username"] is what actually makes username mandatory, and the generator will reflect that.
Overly Permissive Types
A schema for an age field written as:
1
{ "type": "number" }
will cheerfully produce 3.14 or -7.9. Those are valid numbers, just not valid ages. The schema should be:
1
2
3
4
5
{
"type": "integer",
"minimum": 0,
"maximum": 130
}
additionalProperties Surprises
Without "additionalProperties": false, the generator can (and will) tack on extra properties beyond whatever is listed in properties:
1
2
3
4
5
6
{
"type": "object",
"properties": {
"id": { "type": "integer" }
}
}
might produce:
1
2
3
4
5
{
"id": 42,
"xQ7": true,
"lorem": "ipsum dolor"
}
If you only want id, say so with "additionalProperties": false.
Capabilities
This library is quite powerful. It supports most JSON Schema keywords, including if/then/else and aggregation keywords (oneOf, allOf, etc.).
It currently does not support:
$dynamicRef- annotation / metadata keywords (e.g.
title,description) content*keywordsdependencies/dependent*keywords
Everything else should be mostly supported. Feel free to open an issue if you find something isn’t working as you expect.
$refsupport does not check for infinite loops such as occur with schemas like{ "$ref": "#" }. If your schema includes a reference like this, a stack overflow exception is likely.
Strings
Without any additional parameters, string generation uses Bogus’s Lorem Ipsum generator to create some nice (but oddly readable) garbage text.
format
All of the formats listed in the draft 2020-12 specification are supported, at least to the extent that they can be validated by JsonSchema.Net.
If a format is specified, it will be used.
pattern
Regular expressions specified via pattern support combined constraint evaluation, including scenarios where multiple required patterns must be satisfied together.
Supported scenarios include:
- multiple
patternconstraints across composed schemas - forbidden patterns via
not - interactions between
patternandminLength/maxLength - interactions between
patternandformat
Some highly complex or mutually incompatible regex combinations may still be impossible to satisfy. In those cases, generation fails with detailed error information.
Numerics
Integer and number generation each have custom algorithms that produce values that align with minimums, maximums, multiples, and even anti-multiples (numbers that should not be divisors).
For this schema,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"type": "integer",
"minimum": 0,
"maximum": 100,
"multipleOf": 4,
"allOf": [
{
"not": {
"minimum": 40,
"maximum": 60
},
"not": {
"multipleOf": 3
}
}
]
}
the only valid integers are
- either in [0-39] or [61-100]
- a multiple of 4
- not a multiple of 3
The library will generate such values with ease.
Arrays & Objects
Care needs to be taken when specifying arrays that can have additional items or objects that can have additional properties. This library will unsubtly create moderatly deep trees of data if allowed.
For example, this schema doesn’t specify what the items should look like:
1
2
3
{
"type": "array"
}
So, the generator will happily create literally any JSON value for the items, including unconstrained objects and arrays.
To combat this, there are some built-in limitations:
- Item and property counts default to 0-10.
- Arrays and objects have a lower chance of generating than the simpler types (null, integer, number, string).
Generating Data
All you need to generate data is a schema object. This can be built inline or read in from an external source. The instructions for that are on the “Overview” tab.
Once you have your schema object, simply call the .GenerateData() extension method, and it will return a result to you.
1
2
3
var schema = JsonSchema.FromFile("myFile.json");
var generationResult = schema.GenerateData();
var sampleData = generationResult.Result;
The result object has several properties:
IsSuccessindicates whether the system was able to generate a valueResultholds the value as aJsonElement, if successfulErrorMessageholds any error message, if unsuccessfulInnerResultsholds result objects from nested generations. This can be useful for debugging.Location(if available) identifies where generation failed in the target instance, as aJsonPointerSchemaLocations(if available) identifies one or more schema locations related to the failure, also asJsonPointers
Error Reporting
When generation fails, start with the top-level GenerationResult returned by .GenerateData():
- If
IsSuccessisfalse, inspectErrorMessageandInnerResults. InnerResultscontains nested failures from branches, properties, array items, or composed schemas.- Leaf failures can provide:
Locationfor the relative instance path that failedSchemaLocationsfor the schema path(s) involved in that failure
In practice, a single generation failure can contain multiple nested reasons. Walking the InnerResults tree is the best way to produce a full error report.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void PrintFailures(GenerationResult result, string indent = "")
{
if (result.IsSuccess) return;
if (!string.IsNullOrWhiteSpace(result.ErrorMessage))
{
Console.WriteLine($"{indent}Reason: {result.ErrorMessage}");
if (result.Location != null)
Console.WriteLine($"{indent}At: {result.Location}");
if (result.SchemaLocations is { Count: > 0 })
{
Console.WriteLine($"{indent}Schema path(s):");
foreach (var schemaLocation in result.SchemaLocations)
Console.WriteLine($"{indent}- {schemaLocation}");
}
}
if (result.InnerResults == null) return;
foreach (var inner in result.InnerResults)
PrintFailures(inner, indent + " ");
}
var schema = JsonSchema.FromFile("myFile.json");
var generationResult = schema.GenerateData();
if (!generationResult.IsSuccess)
PrintFailures(generationResult);
Summary
So, uh, yeah. I guess that’s it really.
Happy generating.