My aim is to extract structured data from webpages.
I’m able to extract the HTML5 Microdata (Schema.org) from webpages with microdata iand library in go. But, I can’t extract the JSON-LD format present in the webpages. Didn’t find any further documentations on it.
This JSON is contained in <script> element reachable using this CSS selector:
head > script:nth-child(128)
It seems to be loaded by some JavaScript code. This means that it is not directly contained in the HTML. Making a HTTP GET for the URL will not give you a file that contains this data. So you will first have to find out which other HTTP request returns this JSON.
After this problem is solved, we can look for a library to parse the JSON-LD. Maybe this project can help: