String manipulation

johnn · June 6, 2017, 4:06am

package main

import "fmt"
import "net/http"
import “io/ioutil”

func main() {

resp, err := http.Get(“http://test.com/”)
if err != nil {
// handle error
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)

data := fmt.Sprintf("%s", body)

fmt.Println(data)

}

i`m very new in this…
i need a way to print data between ><img src="GRAB STRING " width="
like grab all data from between img src and width…and print it , tried using strings package no luck

christophberger · June 6, 2017, 5:48am

If the text (or at least, each line of the text) contains at most one occurrence of img src="...", I would try a combination of strings.Index to find “src=” and strings.SplitN (with sep="\"" and n=1) to isolate the text between the double quotes.

If there are more occurrences of img src= in the same line, you could first split the text into individual HTML tags (with sep="<") and apply the above Index/SplitN to each of the resulting slices.

Another option are regular expressions, although a regexp can quickly become quite complex. (Shameless plug - here is a quick primer on regular expression that I posted a while ago.)

And finally, you could throw some HTML parser on the problem (a quick search on godoc.org retrieved golang.org/x/net/html as just one example. But this approach seems almost like using a sledgehammer to crack a nut.

dfc · June 6, 2017, 5:48am

As a suggestion, take the network out of the picture and practice solving a simpler version of the problem

package main

import (
	"fmt"
)

func main() {
	body := `<p>hello world <img src="http://example.com" /></p>`
	img := body // here is where you parse body and extract the part you want
	fmt.Println(img)
}

johnn · June 6, 2017, 5:58am

ok one line i can handle with strings.split

package main
import (
“fmt”
“strings”
)
func main() {
body := <p>hello world <img src="http://example.com" /></p>
// img := body // here is where you parse body and extract the part you want
split1 := strings.Split(body,<img src=")
split2 := strings.Split(split1[1], " />)
done := fmt.Sprintf("%s", split2[0])
fmt.Println(done)

}

problem is with strings.split i can only extract one line

nathankerr · June 6, 2017, 6:10am

So, you want to get all the image sources from a website?

An easy way to do this is with github.com/PuerkitoBio/goquery:

package main

import (
	"log"

	"github.com/PuerkitoBio/goquery"
)

func main() {
	log.SetFlags(log.Lshortfile)

	doc, err := goquery.NewDocument("https://www.yahoo.com")
	if err != nil {
		log.Fatalln(err)
	}

	doc.Find("img").Each(func(i int, s *goquery.Selection) {
		src, ok := s.Attr("src")
		if !ok {
			return
		}

		log.Println(src)
	})
}

NOTE: this method won’t find images that are added to the page with JavaScript.

dfc · June 6, 2017, 6:12am

I was going to say; the trick here is you cannot reasonably do this with string munging. You need some kind of xquery library to extract the element from the img tag.

johnn · June 6, 2017, 7:39am

tnx for your help

made it work with bufio

package main 

import "fmt"
import "net/http"
import "io/ioutil"
import "bufio"
import "strings"

var urll = "http://test.com"

func main() {
WTF()
}

func WTF() {


resp, err := http.Get(urll)
if err != nil {
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)

data := fmt.Sprintf("%s", body)

scanner := bufio.NewScanner(strings.NewReader(data))
        for scanner.Scan() {

if  strings.Contains(scanner.Text(), "img src"){
 bodydata := fmt.Sprintf("%s", scanner.Text())


scanner := bufio.NewScanner(strings.NewReader(bodydata))
        for scanner.Scan() {

data1 := strings.Split(scanner.Text(), "img src=\"")
lastdata := fmt.Sprintf("%s", data1[1])

scanner := bufio.NewScanner(strings.NewReader(lastdata))
        for scanner.Scan() {

data2 := strings.Split(scanner.Text(), "\"")
YO := fmt.Sprintf("%s", data2[0])
fmt.Println(YO)
}
}
}
}


}

lutzhorn · June 6, 2017, 7:42am

Hint: You can make the code in your post look nicer if you indent ever line with four spaces. When you edit your post, the </> button does this for you.

johnn · June 6, 2017, 7:53am

got it tnx

system · September 4, 2017, 8:03am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.