Trouble parsing HTML and getting tags

Nicolas_Sassi · February 28, 2018, 8:41pm

Hi! I’m doing some web scraping work and for some reason the request Body I’ve got from the page has some strange structure which I’m not familiar with. Could someone have a look and tell me what is it so I can search how to parse that properly? I how be really glad!

My code for scraping:

import 
"context"
"fmt"
"io/ioutil"
"log"
"net/http"

"github.com/headzoo/surf"
"github.com/headzoo/surf/agent"

    //Url to login
url := "https://www.linkedin.com/uas/login?session_redirect=&goback=&trk=hb_signin"

bow := surf.NewBrowser()
fmt.Println("create new browser")
bow.SetUserAgent(agent.Chrome())
fmt.Println("New user-Agent")
err := bow.Open(url)
if err != nil {
	panic(err)
}
fmt.Println("Login page oppened")
//Login
fm, _ := bow.Form("form.ajax-form")
fm.Input("session_key", "my_login")
fm.Input("session_password", "my_pass")
if fm.Submit() != nil {
	panic(err)
}
fmt.Println("Subbimited")
    //open special link after login to get to feed
err = bow.Open("https://www.linkedin.com/?allowUnsupportedBrowser=true")
if err != nil {
	log.Fatal(err)
}
body := bow.Body()
ioutil.WriteFile("html", []byte(body), 0666)

}

Some html I got from the page:

< code style=“display: none” id=“bpr-guid-266082” >
{“data”:{"$deletedFields":[“launchAlert”],“mediaConfig”:“crc4KtMwSB9CyP/2sFIqSw==,root,mediaConfig”,"$type":“com.linkedin.voyager.common.Configuration”,"$id":“crc4KtMwSB9CyP/2sFIqSw==,root”},“included”:[{“mprConfig”:“crc4KtMwSB9CyP/2sFIqSw==,root,mediaConfig,mprConfig”,"$deletedFields":[],"$type":“com.linkedin.voyager.common.MediaConfig”,"$id":“crc4KtMwSB9CyP/2sFIqSw==,root,mediaConfig”}

lutzhorn · February 28, 2018, 9:21pm

This HTML snippet contains a code element. This element can contain anything. There is no way to parse it as HTML. It looks like JavaScript.

stangeorge · March 1, 2018, 4:12am

Could you try with curl using this:
curl -u username:password https://www.linkedin.com/uas/login?session_redirect=&goback=&trk=hb_signin

Then you can compare this response with what you got from go.

system · May 30, 2018, 4:12am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.