Naming convention: differentiate between N and n

nyggus · May 10, 2021, 5:21am

Hi,

I am writing a package for some survey sampling stuff, related to stratification. In all related formulas, the following statistical features are used:

N for population size
n for sample size from the population
Nh for the size of the hth stratum
nh for the sample size from the hth stratum

In wrote similar programs in several other languages, and I always used the corresponding naming convention for variables representing the above features:

N for N (an integer)
n for n (an integer)
N_h for Nh (a slice)
n_h for nh (a slice)

For anyone knowing the stratification stuff (what the algorithms are all about) this would be clear as sun. Now, how to do this in Go? I cannot use small n, but I must differentiate between n and N. I can make it PopulationN and SampleN, though it’s rather wordy. But what about differentiating slices N_h and n_h? I should not be using underscores, so I I would make them PopulationNh and SampleNh, though both have capital N though the all the formulas make a clear distinction between N (for population size) and n (for sample size).

I have been working with this stuff for over 15 years and I think I should do fine with the following four variable names: PopulationN, SampleN, PopulationNh and SampleNh, but they will not read naturally and for the very first time a language’s naming convention makes variable names sound unnatural.

Maybe PopNh and SampNh? Do not read well. So maybe even PNh and Snh? Nope, these won’t do: P stands for proportion and S stands for standard deviation (and I am talking about the corresponding formulas, not just generally about statistics). Thus I think I should use the longer names, though the shortest version has one advantage: I am differentiating capital and lower “n”, something representing the actual names in the statistical formulas. But I don’t think this would work in PopulationNh and Samplenh, since they do not read well. Unless I would do them Population_Nh and Sample_nh…? But this is like selling the same information twice, and so the prefixes (Population_ and Sample_) are redundant, being there only because I cannot start a variable’s name with a lower letter.

Are there any situations in which I could break Go’s naming convention, like that of not using the underscore? Or maybe you have some ideas how to make these particular names better? I understand this is a peculiar situation because I need to find the representation in Go of formulas in which n differs from N, something I cannot directly do in Go.

Sure, I can use various names, like above, but I always pay much attention to good naming, one that well represents the phenomenon (here, statistical formulas) and that at the same time reads well.

Hm, last thought: maybe _Nh and _nh will do?

skillian · May 10, 2021, 1:00pm

Forgive me, for I’m not the slightest bit familiar with “sampling stuff, related to stratification,” so I’m a little unclear on this. What’s the problem with using Nh and nh? Are these package-level constants?

nyggus · May 10, 2021, 1:17pm

They are not constants, they are slices. They would both form a struct’s fields, but both need to be exported, hence both need to start with capital letters.

skillian · May 10, 2021, 1:52pm

I see you said that; sorry. Got it!

Do you ever need to mutate the slices? What about a function with multiple returns?

type S struct {
    vars struct {
        Nh []int
        nh []int
    }
}

func (s *S) Nhnh() (Nh, nh []int) {
    return s.vars.Nh, s.vars.nh
}

nyggus · May 10, 2021, 4:47pm

Thanks. I will need to think about this, or rather try if this works as expected. But for the moment it seems like quite an idea. Thanks! I will play with this and return here later to share my experience.

nyggus · May 11, 2021, 4:45am

Thank you, Sean, once more. Here’s how I did it (initially, though — I will see how it goes later):

type Stratification struct {
	Stratum    []int     // stratum assignment (length of N)
	Nh         []int     // stratum sizes
	Wh         []int     // stratum weights
	Sh         []float64 // stratum-wise standard deviation
	OptFun     float64   // the value of the optimization function
	Conditions bool      // does the stratification meet all the conditions?
	Population           // representation of the population
	Sample               // representation of a sample for a given stratification
}

type Population struct {
	X    []float64 // auxiliary variable
	N    int     // population size
	L    int     // number of strata
	Mean float64 // overall mean of X
}

type Sample struct {
	n  int     // assumed overal sample size
	cv float64 // assumed coefficient of variation
	nh []int   // sample sizes from the strata
}

(I use three structs because each of them represents a different part of the problem to be solved.)

From what you wrote it follows that the upper-letter export does not relate to fields in nested structs, right? So, here, I will be able to do the following:

var S Stratification
S.nh = []int{5, 6, 7}

and S.nh will be exported anyway, even though the nh field starts with a lower letter. This would not have worked, however, had I tried to import the Sample.nh field? Please correct me if I get this wrong.

skillian · May 11, 2021, 10:23am

No, unfortunately, that lower case nh field is not exported.

nyggus · May 11, 2021, 10:31am

Oops. But this was the source and context of my question: I need to export both Nh and nh, and this makes all the issues. Otherwise I would simply use them, but I really need to expert both.

skillian · May 11, 2021, 11:43am

The short answer is you cannot. I was trying to come up with alternatives for you. You cannot, under any conditions, have lower-cased fields that are exported.

skillian · May 11, 2021, 11:46am

Can’t you call your Sample.nh field just Sample.Nh? I recognize that that can potentially be confusing, but because it’s a field on a struct called “Sample”, does that make it clear enough that you’re talking about the sample nh and not the population Nh?

nyggus · May 11, 2021, 12:00pm

Well, this is life . So, clearly I must come up with something. Those wordy names are not good, so perhaps I will have to change a better solution. If I fail, I will have to use them.

I have another idea, something maybe not perfect, but perhaps good. I am thinking of using nh and related fields (starting with lower caps) anyway, since they will be heavily used in computation and should well represent the corresponding for statistical formulas — but no need to expert them, since all the computation is done internally. Once the final solution is derived, I will assign the resulting optimal values of these lower-cap fields (like nh) to a different struct (e.g. FinalSolution), which will include the final stratification. I will do with wordy field names in this struct, since no need anymore to make it reflect any formula. I am not saying this is the ideal solution, but it’s one that will use names that very well represent the corresponding statistical formulas.

I think this issue nicely shows that one does have to think quite a lot about code design. Maybe this all is your daily stuff, but I am accustomed to working with Python and R and naming in them seems simpler. But this simplicity often leads to rather nasty names, if one does not pay too much attention to naming and just uses what has come to one’s mind. Go’s naming convention discourages using long names (and I like descriptive names), and so one needs to ponder a lot which names to use, and this is a good thing, something that can greatly enhance code design.

system · August 9, 2021, 12:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.