Refining My Approach to Parsing a Large schema.org JSON in Cursor

Hey everyone!

I’m working with a very large JSON file structured according to schema.org standards. While the general structure of the objects is mostly consistent, there are some variations—certain keys appear in some objects but not in others.

I exposed a small portion of the JSON to ChatGPT, and it generated a parser that works well for the repeating parts. However, since the supported context is smaller than the JSON file, it couldn’t identify the structural variations in the parts it wasn’t exposed to.

My goal is to have Cursor access the entire JSON structure so it can generate a parser that works in 100% of cases, handling all valid variations within the schema.

Given the file size, what are the best practices for fine-tuning this approach? Some key questions I have:

  • How can I split the JSON so the LLM can process it without losing context?
  • What’s an efficient way to identify and unify all key variations across objects?
  • Are there any recommended tools or approaches to validate and extract a complete schema from the data?
  • Which LLM performs best for this type of task? Has anyone tested Claude, Gemini, GPT, DeepSeek, or others for parsing large and complex JSON structures?

I’d really appreciate any insights or suggestions!

Cursor will automatically split and process it.

It might be a good idea to give some examples and first get code to identify the key variations across objects and then maybe process.

1 Like