Colly Web Scraper

Anir · January 3, 2022, 12:23pm

Good day all,

My first post here.

Please can someone assist me with this. I have started using Colly. I have been successful thus far with getting information that I need from various websites.

However I now have hit a dead end. Lets take this url as an example:

I have managed to pull the item description with the below code perfectly :

func ScrapeData (){

// Instantiate default collector
c := colly.NewCollector()


c.OnHTML("body" , func(e *colly.HTMLElement) {
	// Extract the Class Name from the HTML Body element
	fmt.Printf("** Program is running ** \n")

	// Assign the scraped data to variables
	name:= e.ChildText(".prod-name")
	price:= e.Attr(".price prod--price")

	// print the data obtained
	fmt.Println("Description of item : "+ name)
	fmt.Println("The price of the item is : "+ price)
})

// Vist the url that the data will be scraped from
c.Visit("https://www.woolworths.co.za/prod/Food/Food-Cupboard/Coffee-Tea-Hot-Drinks/Coffee/Instant-Coffee/Espresso-Instant-Coffee-100-g/_/A-6009175211321")

}

The Html looks like below

The .prod-name found on the webpage come through perfectly.

The price prod–price does not come through.

I have a number of options but just could not get this right so I decided to post it here as there might be someone more experienced that could shed some light on the issue.

Thanks so much

christophberger · January 4, 2022, 7:56pm

Hi @Anir,

I don’t know Colly, and this is only a quick thought, but should e.Attr() probably be e.ChildAttr()?

The colly docs are quite brief but Attr() seems to just select an attribute of the element itself.

Another thing that caught my eyes: in e.Attr(".price prod--price") the attribute string starts with a dot, like a class selector. The attribute class="price prod--price" in the HTML contains two classes. I’d guess that e.Attr() either needs the verbatim text of the attribute ("price prod--price"), or the name of the attribute ("class"), or a selector for each of the two classes (".price .prod--price").

But these are only some unwashed thoughts, hope they help with troubleshooting further…