Partial JSON deserialize by JsonPath with Json.NET

Last Updated on by

Post summary: Code examples how to deserialize only part of a big JSON file by JsonPath when using NewtonSoft Json.NET.

Code shown in examples bellow is available in GitHub DotNetSamples/JsonPathConverter repository.

Use case description

Imagine you have a big JSON which you want to deserialize into a C# object.

{
  "node1": {
    "node1node1": "node1node1value",
    "node1node2": [ "value1", "value2" ],
    "node1node3": {
      "node1node3node1": "node1node3node1value"
    }
  },
  "node2": true,
  "node3": {
    "node3node1": "node3SubNode1Value",
    "node3node2": {
      "node3node2node1": {
        "node3node2node1node1": [ 1, 2, 3 ]
      },
      "node3node2node2": "node3node2node1value"
    }
  },
  "node4": "{\"node4node1\": \"n4n1value\", \"node4node2\": \"n4n1value\"}"
}

File above is actually pretty small and used for demo purposes. In practice you can stumble upon terrifyingly big JSON files. NewtonSoft.Json or Json.NET is defacto the JSON standard for .NET, so it is being used to parse the JSON file. In order to deserialize this JSON to a C# object you need a model class that represent the JSON nodes. Although immense effort you can create such, but why bother if you are going to use just a fraction of all JSON data. This is where JsonPath comes in play. Json.NET allows you to query JSON by JsonPath, so one option is to manually query the JSON, find data you need and assign it to your C# object. This is not an elegant solution. Since query by JsonPath is possible this can be used in a JsonConverter that will automatically do the job. What is needed is a custom JsonPathConverter and a model class that will be deserialized to, both are described bellow.

JSON model class

It is easier to describe the JSON model first. Bellow is a code for JSON model class that will collect only data we need.

using System.Collections.Generic;
using Newtonsoft.Json;

namespace JsonPathConverter
{
	[JsonConverter(typeof(JsonPathConverter))]
	public class JsonModel
	{
		[JsonProperty("node1.node1node2")]
		public IList<string> Node1Array { get; set; }

		[JsonProperty("node2")]
		public bool Node2 { get; set; }

		[JsonProperty("node3.node3node2.node3node2node1.node3node2node1node1")]
		public IList<int> Node3Array { get; set; }

		[JsonConverter(typeof(JsonPathConverter))]
		[JsonProperty("node4")]
		public NestedJsonModel Node4 { get; set; }
	}

	public class NestedJsonModel
	{
		[JsonProperty("node4node2")]
		public string NestedNode2 { get; set; }
	}
}

JSON model class is annotated with [JsonConverter(typeof(JsonPathConverter))] which tells Json.NET to use JsonPathConverter class to do the conversion. JsonPathConverter is implemented in such a way that JsonProperty is a mandatory for each property in order to be parsed: [JsonProperty(“node1.node1node2”)].

JSON as a string

You may have noticed already the weird case where node4 in JSON file has actually a string value which is escaped JSON string. This is something unusual and may not be pretty good programming practice, but I’ve encountered it in a production code, so examples given here cover this weirdo as well. There is special NestedJsonModel class which this JSON string is being deserialized to.

JsonPathConverter

Code bellow implements JsonConverter abstract class and implements needed methods.

public class JsonPathConverter : JsonConverter
{
	public override bool CanWrite => false;

	public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
	{
		var jObject = JObject.Load(reader);
		var targetObj = Activator.CreateInstance(objectType);

		foreach (var prop in objectType.GetProperties().Where(p => p.CanRead && p.CanWrite))
		{
			var jsonPropertyAttr = prop.GetCustomAttributes(true).OfType<JsonPropertyAttribute>().FirstOrDefault();
			if (jsonPropertyAttr == null)
			{
				throw new JsonReaderException($"{nameof(JsonPropertyAttribute)} is mandatory when using {nameof(JsonPathConverter)}");
			}

			var jsonPath = jsonPropertyAttr.PropertyName;
			var token = jObject.SelectToken(jsonPath);

			if (token != null && token.Type != JTokenType.Null)
			{
				var jsonConverterAttr = prop.GetCustomAttributes(true).OfType<JsonConverterAttribute>().FirstOrDefault();
				object value;
				if (jsonConverterAttr == null)
				{
					serializer.Converters.Clear();
					value = token.ToObject(prop.PropertyType, serializer);
				}
				else
				{
					value = JsonConvert.DeserializeObject(token.ToString(), prop.PropertyType,
						(JsonConverter)Activator.CreateInstance(jsonConverterAttr.ConverterType));
				}
				prop.SetValue(targetObj, value, null);
			}
		}

		return targetObj;
	}

	public override bool CanConvert(Type objectType)
	{
		return true;
	}

	public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
	{
		throw new NotImplementedException();
	}
}

Deserialization work is done in public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer) method. JSON is loaded to a NewtonSoft JObject and instance of result object is created. All properties of this result object are iterated in a foreach loop. It is important to not that properties should have both get and set in order to be considered in deserialization: objectType.GetProperties().Where(p => p.CanRead && p.CanWrite). If you have properties with just get or just set they will be ignored. JsonPropertyAttribute for each property is taken. If there is no such then exception is thrown. This part can be changed. JsonPath can be considered to be the property name: var jsonPath = jsonPropertyAttr == null ? prop.Name : jsonPropertyAttr.PropertyName. This is tricky though as C# is case sensitive and it might not work as property could start with capital letter, but JSON itself to be with lower case. Once there is JsonPath defined JObject is queried with jObject.SelectToken(jsonPath). This should return a valid token. In case of valid token result object property is checked for JsonConverterAttribute. If such exists then JSON is deserialized with this newly found JsonConverter instance. If there is no converter attached to this property then all existing converters are cleared and token is converted into object. Clearing part is important as in case of recursive call it will throw exception.

Usage

Once job above is done usage is pretty easy:

var fileContent = File.ReadAllText("jsonFile.json");
var result = JsonConvert.DeserializeObject<JsonModel>(fileContent);

result.Node1Array.Should().BeEquivalentTo(new List<string> {"value1", "value2"});
result.Node2.Should().Be(true);
result.Node3Array.Should().BeEquivalentTo(new List<int> { 1, 2, 3 });
result.Node4.NestedNode2.Should().Be("n4n1value");

Conclusion

In this post I have shown how to partially deserialize JSON by JsonPath picking only data that you need.

Read more...