Cant perform HTTP request

I’m trying to perform a simple HTTP request, to get some data that i’ll parse later.

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"time"
)

type apiClient struct {
	transport *http.Client
}

var cli apiClient

func initalize() {
	client := apiClient{
		transport: &http.Client{
			Timeout: time.Second * 5,
		},
	}
	
	cli = client
}

func main() {
	initalize()

	data := []byte{}
	req, err := http.NewRequest(http.MethodGet, "https://www.allareacodes.com/area_code_listings_by_state.htm", bytes.NewBuffer(data))
	if err != nil {
		log.Fatal(err)
	}

	resp, err := cli.transport.Do(req)
	if err != nil {
		log.Fatal(err) 
	}
	defer resp.Body.Close()

	bs, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(string(bs))
}

The site basically has some area codes that I wanna get.

The response I receive is

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>403 ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
<BR clear="all">
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: uUg5BOBTKvC0WZzRJQRYcy-zdCPc82qEbWs87vut9qzFKG27UFzwFw==
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

I can visit the site through the browser, and postman. Also did some investigation through postman, and it seems there’re some headers generated

Do you get some idea from my code? Pass the url to this function…

func json2map(url string) interface{} {
  // call the API and get body
  resp, err := http.Get(url)
  if err != nil {
    logg(err.Error())
  }
  defer resp.Body.Close()

  // json to map
  var result interface{}
  err = json.NewDecoder(resp.Body).Decode(&result)
  if err != nil {
    logg(err.Error())
  }

  return (result)
}
1 Like

It is probably due to your user agent.

If I use curl https://the-url-from-screenshot I get the same 403 HTML as you from your Go code, if though I use the UA from my browser like in curl --user-agent "Mozilla/5.0 …" https://the-url-from-screenshot I get a response that seems to contain all the data you want.

Though this site seems to forbid automated access through their TOS anyway.

1 Like

Have you seen the resp.Body your’e decoding? It’s still

Like I said, i’m trying to get the the content of the page (area status codes)

Setting the UA works. req.Header.Set("User-Agent", "Mozilla/5.0") Will this be considered a solution?

If it helps, it is a solution. Though remember, as I understand the TOS that is forbidden usage. Beware that this might result into permanent ban from the service.