I'm a beginner & I get errors parsing a url with http.Get()

Hi,

I’m new to Golang and I’ve been struggling with a problem since yesterday. First here’s the part of the code that’s not working :

func main() {
    var s SitemapIndex

    resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)
    resp.Body.Close()

    xml.Unmarshal(bytes, &s)

    for _, Location := range s.Locations {
        resp, err := http.Get(Location)
        ioutil.ReadAll(resp.Body)

    }

}

The problem comes from this line : resp, err := http.Get(Location). err tells me that there’s a prblem with Location & resp is <nil>.

Here’s the full error when I print err :

parse
https://www.washingtonpost.com/news-sitemaps/politics.xml
: first path segment in URL cannot contain colon

So the error comes from the url. Which is strange because the previous URL, provided explicitely, doesn’t produce any error and both url are the same format. Nevertheless, I did try to remove https:// and www. from the url & it doesn’t solve my problem.

I tried looking up the error message on the Internet but didn’t find any solution, not even a proper explanation…

I really don’t know what to do & I have to admit I could do with some help… :sweat:

Thanks !

Could you please share the definition of SitemapIndex?

Or even better, provide a full, but minified package that reproduces your problem.

But on a first glance, I’d assume that the parsed “location” is prefixed and suffixed by newlines or spaces or any other kind of whitespace, as there is whitespace in the XML which might get normalized into a single whitespace token during parsing the XML but wont be removed.

Try trimming Location before trying to load it.

I’m gonna do that, thanks. I’ll look for a way to do it.

Here are all my structs by the way :

type SitemapIndex struct {

Locations []string `xml:"sitemap&gt;loc"`

}

type News struct {

Titles []string `xml:"url&gt;news&gt;title"`

Keywords []string `xml:"url&gt;news&gt;keywords"`

Locations []string `xml:"url&gt;loc"`

}

type NewsMap struct {

Keyword string

Location string

}

This seems to work on a first glance:

package main

import (
        "encoding/xml"
        "fmt"
        "io/ioutil"
        "net/http"
        "strings"
)

type SitemapIndex struct {
        Locations []string `xml:"sitemap>loc"`
}

func main() {
        var s SitemapIndex

        resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
        bytes, _ := ioutil.ReadAll(resp.Body)
        resp.Body.Close()

        xml.Unmarshal(bytes, &s)

        fmt.Printf("%#v", s)

        for _, Location := range s.Locations {
                resp, _ := http.Get(strings.Trim(Location, "\n"))
                t, _ := ioutil.ReadAll(resp.Body)
                fmt.Println(string(t))
        }
}

Thank you so much ! It worked ! On my side I used strings.TrimSpace(). What’s the difference (both methods seem to do the same thing) ?

In my case I just trimmed newlines, nothing else. TrimSpace removes all white space according to the documentation, whatever “all whitespace” means.

1 Like

Okay, it makes sense, thanks.

Also if you tried the code, could you tell me how it ran for you ? I ran it but had to wait about a minute to get the results. I don’t think it’s my internet connection because I have a good connection.

Yeah, it takes forever, I haven’t measured though. But since curling the initial site-map already takes a long time as well, I just assumed the server is slow.

Thank you.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.