csv.NewReader adds space to front of first record


(Jason Edgar) #1

Hi there, I have a CSV file I’m reading and the result i’m getting for the first record I’m reading has a space at the start of it (or maybe it’s not a space but that’s how it shows up in the console). I can’t see any reason for this or any documentation mentioning it.
Can anyone point me in the right direction? Thanks.

package main

import (
	"encoding/csv"
	"fmt"
	"os"
)

func main() {

	// Struct to hold schedule.csv
	type Sched struct {
		schDate  string
		schShift string
	}

	// hardcodings
	filename := "schedule.csv"

	// Open CSV file
	f, err := os.Open(filename)
	if err != nil {
		panic(err)
	}
	defer f.Close()

	// Read File into a Variable
	lines, err := csv.NewReader(f).ReadAll()
	if err != nil {
		panic(err)
	}

	// Loop through lines & turn into object
	for _, line := range lines {
		data := Sched{
			schDate:  line[0],
			schShift: line[1],
		}
		if data.schShift == "" {
			break
		}

		fmt.Printf("START:%s\n", data.schDate)
	}
}

(Eric Lindblad) #2

A Byte Order Mark (BOM)?

An awk one-liner to remove a BOM and retain CRLF line endings.


(Jason Edgar) #3

I opened it in notepad and saved again with no odd characters shown. Thought that might have cleared those so I was focused on the code. I’ll start my CSV from scratch and see what I get


(Jason Edgar) #4

And maybe thats all it was :slight_smile: sometimes it’s not my coding
Thanks


(Eric Lindblad) #5

The MS® program notepad.exe can save in various file formats.

9588#issuecomment-69937174

" …encoding/csv, like all Go code, expects UTF-8"

– I.L.T.

From an SO post.

Bytes         |  Encoding Form
--------------------------------------
00 00 FE FF   |  UTF-32, big-endian
FF FE 00 00   |  UTF-32, little-endian
FE FF         |  UTF-16, big-endian
FF FE         |  UTF-16, little-endian
EF BB BF      |  UTF-8

A couple of scripts for converting line endings.

sh-4.3$ cat.exe d2u
#!/bin/awk -f
#
# File: d2u
# Date: 28-Oct-2008
#
# This file is part of MSYS (MinGW.org's Minimal SYStem).
# Contributed by Keith Marshall
# Assigned, by the author, to the public domain.
#
  {sub("\r$",""); printf "%s\n", $0}
#
# d2u: end of file
sh-4.3$ cat.exe u2d
#!/bin/awk -f
#
# File: u2d
# Date: 28-Oct-2008
#
# This file is part of MSYS (MinGW.org's Minimal SYStem).
# Contributed by Keith Marshall
# Assigned, by the author, to the public domain.
#
  {sub("\r$",""); printf "%s\r\n", $0}
#
# u2d: end of file
sh-4.3$

Usage.

sh-4.3$ d2u oldfile.csv > newfile.csv
sh-4.3$

If you want UNIX end-of-line style instead try this (modified from the SO post):

awk '{sub(/^\xef\xbb\xbf/,"")}{print}' oldfile.csv > newfile.csv

If there are multiple BOMs in a UTF-8 encoded file cat file1 file2 > file3, my blog’s one-liner and the above will remove them all.