How to make Go routines release their used memory

jzakiya · March 16, 2021, 7:51pm

I created a multi-threaded twinprimes sieve in Go, to compare with other implementations.
I realize Go doesn’t have true parallelism, but concurrency using Go routines.
The problem I’ve encountered is that it appears each Go routine retains use of the memory it uses until all the threads/processes end, instead of releasing memory when its particular process ends.
This causes the program to eat up more and more memory as the number of Go routines increase.
As the inputs get larger more memory is used until the program bombs from exceeding available memory.

Below is the code section at its end for performing the Go routines.

  sums := make([]uint, pairscnt)
  lastwins := make([]uint, pairscnt)

  var wg sync.WaitGroup
  for i, r_hi := range restwins {
    wg.Add(1)
    go func(i, r_hi int) {
      defer wg.Done()
      l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
      lastwins[i] = l; sumss[i] = c
      fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
    }(i, r_hi)
  }
  wg.Wait()

Below is a gist of the total code.

You can compile and run it and see its memory use, using e.g. htop, as inputs increase.
Even though I have 16 GB of memory, it gets eaten up fast as inputs values increase.

Is there a way for the Go routines to immediately release memory when they cease processing?

gist.github.com

https://gist.github.com/jzakiya/fbc77b8fdd12b0581a0ff7c2476373d9

twinprimes_ssoz.go

// This Go source file is a multiple threaded implementation to perform an
// extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N.

// Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
// Output is the number of twin primes <= N, or in range N1 to N2; the last
// twin prime value for the range; and the total time of execution.

// This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
// 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
// probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

This file has been truncated. show original

petrus · March 16, 2021, 8:54pm

Are you going to give us precise instructions on how to run your dense blob of code to reproduce your issue, or is this a guessing game?

My guess is that you don’t understand concurrency, scheduling, and goroutines. If you start 1_000_000 CPU bound goroutines running concurrently then you will need memory for up to 1_000_000 goroutines which likely exceeds your total real memory. At the very least, you will be thrashing your paging and swap datasets mercilessly.

jzakiya · March 16, 2021, 10:50pm

Wow, that’s allot of hostility you’re putting out there.

All the instruction to compile and run the code are within the comments at the top of the code.
Did you read them?

I’ve personally done this algorithm so far in D, C++, Nim, Rust, and Crystal, and none of the others eat memory like the Go implementation does. So the issue isn’t me, the issue is what Go is doing to make it eat memory, and how to fix it so it doesn’t.

If you don’t know how to do that, that’s OK. Maybe someone else does. If no one else can, then Go just isn’t good for this use case, and life goes on.

petrus · March 16, 2021, 10:56pm

You are not using Go to its best advantage.

jzakiya · March 16, 2021, 11:20pm

I am trying out Go to perform an algorithm that is of interest to me.
This provides me a reason to learn it, and compare it to other languages.
It is understood different languages (tools) are better at some things than others.
I’m trying to ascertain if Go can do this algorithm better than what I’ve implemented.

Terra-Finale · March 17, 2021, 4:45pm

If you have a bad day…don’t bring it here. Your answers don’t help at all.
In what way is he not using go to its best advantage? Point it out.
You are merely throwing unhelpful condescending statements. Relax and chill. We are here to learn too.

jzakiya · March 19, 2021, 3:42pm

Does this forum have a Code of Conduct policy?

@petrus continues to make demeaning, disrespectful, and disruptive comments, which produce a hostile and unfriendly environment for me and others.

Why is he allowed to do this?

I am new to Go, and based on my treatment here, and how I’ve seen him interact with others, I have to consider whether to keep participating in this forum, even though I feel the language itself has lots of merit.

But you’re judged by how you treat people.

NobbZ · March 20, 2021, 8:28am

Please use the reporting mechanisms of the forum if you feel insulted. In general we treat each other with respect, though I am currently not aware of some (written) CoC.

To report a post please click the 3 dots icon on a post and then the flag icon.

Though I have to say, I am used to more kindness by petrus and well written explanations, not sure what got him over the last couple of days to be like this…

julio77it · March 20, 2021, 10:41pm

Hi, what version of Go are you using ?

jzakiya · March 21, 2021, 1:00am

I’m using 1.16

jzakiya · March 21, 2021, 1:31am

Today is saw this thread in another forum on go routines,

https://forum.nim-lang.org/t/7667

which cites this article:

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/

I don’'t know how relevant the points made in the article are about the phenomena I’m seeing, but it seems it may, because of the issues of retaining used memory in the go routines.

petrus · March 30, 2021, 3:17pm

(this post has been removed for violating the Code of Conduct)

Terra-Finale · March 31, 2021, 3:13pm

Get a life Moron. You are petty and stupid.

Christophe_Meessen · April 8, 2021, 10:08am

I suspect you have a memory leak because if the assumption you have was true, many people would have complained about it. Your use of go routines is standard and basic.

So to me, the question boils down to “where is the memory leak?”.

It is perfectly legit to start a million go routines. They wont be started all together. The number of running go routines will be limited by the number of cores of your CPU. This is done for you.

Go is doing true parallelism in an optimal way and hides the technical details for you because it’s complex and difficult to get right. Memory release is managed by the garbage collector and it should be able to manage the kind of task you gave, unless there is a memory leak. In Go a memory leak would be data that remains referenced when you assumed it is not, or a buffer that grows indefinitely.

Are you sure the code you show is the one producing the problem ? There is a typo with `sumsS. The shown code will not compile.

Sorry, I didn’t look at the code in the gist by lack of time.

jzakiya · April 8, 2021, 1:17pm

Yes, sumss is a typo in that snippet, it should be sums, as you realized, but the code works.

It’s really interesting that you make all these statements about what I could be doing wrong to cause some hypothesized memory leak, but then admit you’ve never bothered to run the code in the gist that will produce the problem I’m talking about.

This same algorithm has been done now in D, C++, Nim, Java, Rust, and Crystal, and Go is the only language that has exhibited this problem. The problem ain’t me, the problem ain’t the algorithm, the problem is in Go.

This forum is the weirdest language forum I’ve ever experienced.

I cite a problem, I provide the code which produces the problem, but apparently, nobody runs the code to see the problem for themself, and all the comments are about me, or things I’m doing (or not) wrong.

I just have to assume the people here just aren’t interested in understanding what in Go is causing it to have this problem, and fixing it. A shame.

Christophe_Meessen · April 8, 2021, 5:57pm

You assumed that go routines were not releasing memory when done. So you also make assumptions.

I currently don’t have the time and the mean to test your code myself and investigate the problem. I appologized for that.

For me the shame, if any, is that you take it for granted that we run your code and find the reason of the memory problem you saw, and call it a shame if we don’t do so.

I may have a look at it this week-end. The cause of the problem may not be trivial to find.

Christophe_Meessen · April 9, 2021, 12:38pm

I have now some time to look into your problem. Could you please give the input values that produces the problem ?

I tested echo 10000000 | TestPrimes and echo 100000 1000000 | TestPrimes without seeing any problem. The program terminates immediately with some output that looks normal.

I also tested echo 1 100000000000 | TestPrimes which took a longer time to finish but still no memory problem.

Are you running the code on Windows or Linux ? I’m running the code on Linux. I assume it’s linux because you suggest to use htop. But I can’t reproduce the problem.

jzakiya · April 9, 2021, 2:30pm

Yes, I’m on Linux.

I have 16GB of memory, i7 4C|8CT running Go 1.16.

As the input size increase more mem is used.
See mem spike between: 15_000_000_000_000 and 16_000_000_000_000
This is when program switches to a larger generator that spawns 22,275 threads.

Christophe_Meessen · April 9, 2021, 4:06pm

I only have 8GB memory. The process crashes with status 137 when the values echo 1 100000000000000 | TestPrimes is given. Apparently, this is the OS that kills the process because it uses too much memory. note that there are 14 zeros.

I then tested by reducing the number of OS threads that can run concurrently. Go routines are green threads run by OS threads. GOMAXPROCS=2;(echo 1 100000000000000 | TestPrimes). No crashes.

Without specifying GOMAXPROCS I see 5 TestPrimes threads in htop. I would expect it to be 4 because there are 4 cores. The program outputs “threads = 4”.

By specifying GOMAXPROCS=2, I see only three threads and the program still outputs “threads = 4”. In this configuration the system doesn’t run out of memory.

I suspect that with 5 OS TestPrimes threads the garbage collector can’t run enough and this could explain why it runs out of memory. I will have to ask the experts if that is correct.

Christophe_Meessen · April 9, 2021, 6:22pm

After further investigation, the GOMAXPROCS doesn’t really help, but the suspected cause is confirmed.

What happens, as I understand it now, is that you start many go routines that all allocate memory blocks. Go start the go routine but every go routine get a little share of the CPU. You then have many go routines running concurrently but not in parallel. The garbage collector can’t cope with the work to do and garbage collect all the memory. The system becomes congested and gets out of memory.

I tried to add sync.Pool to recycle buffers and it does have a significant effect once the buffers are recycled. Unfortunately, still too many go routines are started before recycling becomes effective.

Another strategy I tested is to limit the number of concurrent go routine allowed to run. And this works. It seam slower, but it is faster because you don’t have many thousands go routines competing for the CPU. This code doesn’t crash.

You must import this package github.com/korovkin/limiter.

And this is the code

	limit := limiter.NewConcurrencyLimiter(10)
	fmt.Printf("ChM: [main] restwins=%d\n", len(restwins))
	for i, r_hi := range restwins { // sieve twinpair restracks
		limit.Execute(func() {
			fmt.Printf("execute %d cnt: %v\n", i, limit.GetNumInProgress())
			l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
			lastwins[i] = l
			■■■■[i] = c
			fmt.Printf("%d of %d twinpairs done\n", (i + 1), pairscnt)
		})
	}
	limit.Wait()
	fmt.Printf("\r%d of %d twinpairs done", pairscnt, pairscnt)

I tested it with the values 1 100000000000000. Before, it crashed or freezed the system. Now it doesn’t crash. You can adjust the value 10 to see how it affect speed.

I hope this helps.