I'm a beginner & I get errors parsing a url with http.Get()

tofl · December 26, 2018, 4:25pm

Hi,

I’m new to Golang and I’ve been struggling with a problem since yesterday. First here’s the part of the code that’s not working :

func main() {
    var s SitemapIndex

    resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)
    resp.Body.Close()

    xml.Unmarshal(bytes, &s)

    for _, Location := range s.Locations {
        resp, err := http.Get(Location)
        ioutil.ReadAll(resp.Body)

    }

}

The problem comes from this line : resp, err := http.Get(Location). err tells me that there’s a prblem with Location & resp is <nil>.

Here’s the full error when I print err :

parse
https://www.washingtonpost.com/news-sitemaps/politics.xml
: first path segment in URL cannot contain colon

So the error comes from the url. Which is strange because the previous URL, provided explicitely, doesn’t produce any error and both url are the same format. Nevertheless, I did try to remove https:// and www. from the url & it doesn’t solve my problem.

I tried looking up the error message on the Internet but didn’t find any solution, not even a proper explanation…

I really don’t know what to do & I have to admit I could do with some help…

Thanks !

NobbZ · December 26, 2018, 5:23pm

Could you please share the definition of SitemapIndex?

Or even better, provide a full, but minified package that reproduces your problem.

But on a first glance, I’d assume that the parsed “location” is prefixed and suffixed by newlines or spaces or any other kind of whitespace, as there is whitespace in the XML which might get normalized into a single whitespace token during parsing the XML but wont be removed.

Try trimming Location before trying to load it.

tofl · December 26, 2018, 5:37pm

I’m gonna do that, thanks. I’ll look for a way to do it.

Here are all my structs by the way :

type SitemapIndex struct {

Locations []string `xml:"sitemap&gt;loc"`

}

type News struct {

Titles []string `xml:"url&gt;news&gt;title"`

Keywords []string `xml:"url&gt;news&gt;keywords"`

Locations []string `xml:"url&gt;loc"`

}

type NewsMap struct {

Keyword string

Location string

}

NobbZ · December 26, 2018, 5:52pm

This seems to work on a first glance:

package main

import (
        "encoding/xml"
        "fmt"
        "io/ioutil"
        "net/http"
        "strings"
)

type SitemapIndex struct {
        Locations []string `xml:"sitemap>loc"`
}

func main() {
        var s SitemapIndex

        resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
        bytes, _ := ioutil.ReadAll(resp.Body)
        resp.Body.Close()

        xml.Unmarshal(bytes, &s)

        fmt.Printf("%#v", s)

        for _, Location := range s.Locations {
                resp, _ := http.Get(strings.Trim(Location, "\n"))
                t, _ := ioutil.ReadAll(resp.Body)
                fmt.Println(string(t))
        }
}

tofl · December 26, 2018, 5:57pm

Thank you so much ! It worked ! On my side I used strings.TrimSpace(). What’s the difference (both methods seem to do the same thing) ?

NobbZ · December 26, 2018, 6:10pm

In my case I just trimmed newlines, nothing else. TrimSpace removes all white space according to the documentation, whatever “all whitespace” means.

tofl · December 26, 2018, 6:12pm

Okay, it makes sense, thanks.

Also if you tried the code, could you tell me how it ran for you ? I ran it but had to wait about a minute to get the results. I don’t think it’s my internet connection because I have a good connection.

NobbZ · December 26, 2018, 6:17pm

Yeah, it takes forever, I haven’t measured though. But since curling the initial site-map already takes a long time as well, I just assumed the server is slow.

tofl · December 26, 2018, 6:23pm

Thank you.

system · March 26, 2019, 6:23pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.