Structs, Language Specification

I’m trying to understand this:

StructType = “struct” “{” { FieldDecl “;” } “}” . FieldDecl = (IdentifierList Type | EmbeddedField) [ Tag ] . EmbeddedField = [ “*” ] TypeName . Tag = string_lit .

To me, “the Go baby”, it looks like a bunch of jibberish. Confusing. I’m trying to take it apart, piece by piece.

Is there anyone who can explain each component? Don’t forget the purpose of each quotation mark. And the =

Hi Cherolyn,

When including something cryptic like that, it helps us if you explain where you found it.

Luckily, I recognized it as part of the Go Programming Language Specification:

It’s not Go code, so as a beginning programmer, you would not understand it unless you have studied programming language design, or maybe linguistics. The grammar is Extended Backus-Naur Form (EBNF) (Extended Backus–Naur form - Wikipedia), an extension of BNF (Backus–Naur form - Wikipedia) that is used to specify the grammar of programming languages. That’s right: a grammar that describes a grammar.

If you want to learn how to read EBNF, the two links I included so far have explanations. Even if it’s too difficult for you to understand right now, at least you might get a feeling for it, and it will help you understand how Go and other programming languages work.

If you are really interested in computers, someday you may want to study this topic carefully. It’s one of the foundational parts of modern computer science.

It started in the late 1950s when linguist (and later, political activist) Noam Chomsky at MIT (Noam Chomsky - Wikipedia) developed a new theory of linguistics, and introduced a theory of formal grammars (Formal grammar - Wikipedia). Chomsky theorized that human language ability is inborn (part of our DNA, brain structure, and such), and not just a learned behavior. It was a new idea at the time, but over several decades has stood up very well, and now is widely accepted.

Computer scientists picked up on the theory and applied it to computers, resulting in the first computer programming languages. BNF was used to describe and specify the grammar of Algol 60, a predecessor of Pascal, C, and Go.

(By the way, one of the things that made the early versions of Unix so cool was that the system was loaded with software tools that were based on formal grammars and programming language compiler theory. Unix was like a toolkit loaded with tools that were useful for making more tools.)

The part you are asking about describes how to specify a struct type. First, let me copy the EBNF rules here more cleanly:

StructType = “struct” “{” { FieldDecl “;” } “}” .
FieldDecl = (IdentifierList Type | EmbeddedField) [ Tag ] .
EmbeddedField = [ “*” ] TypeName .
Tag = string_lit .

What the first line is saying is that to define a struct, you first write struct, then an open brace ({), then insert some number of FieldDecls, followed by a semicolon (;), and finally a close brace (}). The braces with double quotes around them are typed literally into the code, and the ones without double quotes say that what’s inside them can repeat many times. (This is what happens when you have a grammar specifying a grammar. :wink: )

So far, we have this:

struct {
   // field declarations
}

So what is a FieldDecl? That is specified on the next line of the EBNF:

FieldDecl = (IdentifierList Type | EmbeddedField) [ Tag ] .

This is saying that a FieldDecl is either:

  1. An IdentifierList followed by a Type, or
  2. an EmbeddedField

and either of those can optionally be followed by a Tag. The period (dot) at the end indicates that it is the end of the rule. To learn what IdentifierList, Type, EmbeddedField, and Tag are, you need to go to the part of the EBNF where they are defined.

I’ll just stick to the simple case (#1) and explain that if you have an IdentifierList followed by a type, it is like a constant or variable declaration without needing to use the const or var keyword. IdentifierList is defined like this:

IdentifierList = identifier { “,” identifier } .

and you can see that line in the rules for constant declarations here: The Go Programming Language Specification - The Go Programming Language

The above line is saying that an identifier list is an identifier, followed by zero or more occurrances of a comma and another identifier. This rule is basically saying that you can write any number of identifiers (actually, one or more), separated by commas. And remember from rule #1 above that the identifier(s) must be followed by a type.

So here’s an example of a struct type definition:

struct {
    a int
    x, y float32
    name string
}

(The rules said we need to include semicolons, but there are none there. That’s because the Go compiler includes them automatically for us. There are - arguably - better ways to handle that in a programming language. :zipper_mouth_face: But no worries.)

There can also be embedded fields in structs, which is a step more advanced. That’s what the other part of the rule is for. I hope what I explained so far is enough to help you get an idea of what is going on.

The bottom line is that it’s probably better if you learn about structs from a book or other educational resource on Go. :sweat_smile: But if you want to raise your level as a programmer and computer engineer, it’s really good to understand how to read BNF/EBNF!

2 Likes

Oh my gosh is this useful information, and I am saving it to a file. You have definitely helped me to understand both structs and the Language Specification! And so much more for future study. Thank you for your time.

Cool. This really helps

Just finished going over your reply very carefully. Thanks again for this helpful information.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.