Hi! I’m doing some web scraping work and for some reason the request Body I’ve got from the page has some strange structure which I’m not familiar with. Could someone have a look and tell me what is it so I can search how to parse that properly? I how be really glad!
My code for scraping:
import
"context"
"fmt"
"io/ioutil"
"log"
"net/http"
"github.com/headzoo/surf"
"github.com/headzoo/surf/agent"
//Url to login
url := "https://www.linkedin.com/uas/login?session_redirect=&goback=&trk=hb_signin"
bow := surf.NewBrowser()
fmt.Println("create new browser")
bow.SetUserAgent(agent.Chrome())
fmt.Println("New user-Agent")
err := bow.Open(url)
if err != nil {
panic(err)
}
fmt.Println("Login page oppened")
//Login
fm, _ := bow.Form("form.ajax-form")
fm.Input("session_key", "my_login")
fm.Input("session_password", "my_pass")
if fm.Submit() != nil {
panic(err)
}
fmt.Println("Subbimited")
//open special link after login to get to feed
err = bow.Open("https://www.linkedin.com/?allowUnsupportedBrowser=true")
if err != nil {
log.Fatal(err)
}
body := bow.Body()
ioutil.WriteFile("html", []byte(body), 0666)
}
Some html I got from the page:
< code style=“display: none” id=“bpr-guid-266082” >
{“data”:{"$deletedFields":[“launchAlert”],“mediaConfig”:“crc4KtMwSB9CyP/2sFIqSw==,root,mediaConfig”,"$type":“com.linkedin.voyager.common.Configuration”,"$id":“crc4KtMwSB9CyP/2sFIqSw==,root”},“included”:[{“mprConfig”:“crc4KtMwSB9CyP/2sFIqSw==,root,mediaConfig,mprConfig”,"$deletedFields":[],"$type":“com.linkedin.voyager.common.MediaConfig”,"$id":“crc4KtMwSB9CyP/2sFIqSw==,root,mediaConfig”}