There is a data race conditions for main
thread (for
loop keeps reading the done
variable) and setup
thread for writing into done
variable without acknowledging one another.
The scenario is like you’re cutting spring onion into small slices and having your sibling is pushing the long vegetable to you. The 2 of you don’t talk and synchronize to one another. Ideally, the 2 of you can get the job done by luck. However, there is a great chance that:
- Either you chop your sibling’s hand because he/she pushes it too fast.
- Not getting the job done because your sibling thought the work is completed.
- The two of you get into a fight when you accidentally chopped your sibling’s fingernail. The job doesn’t get done and possibly you two burn the house down.
It is a concurrency problem, not about heap or stack memories. In concurrency, accessing a memory (variable) atomically or non-atomically should be clearly planned and stated.
Atomic Synchronization
Whenever 2 or more concurrent processes having performing both read
and write
unto a share memory, you need to do it atomically.
In the example above, notice that setup
sets done
variable to true
in its own pace, while main
is polling done
at its on discreet. There can be 4 kinds of scenarios may happen:
-
done
is expected false
, main
is reading it, setup
is not writing it
-
done
is expected true
, main
is not reading it, setup
is writing it
-
done
is expected true
, main
is reading it, setup
is writing it
-
done
is expected false
, main
is reading it, setup
is writing it
-
done
is expected true
, main
is reading it, setup
is still writing it
Ideally, we’re expecting case 1 and case 2, in which the sample code mostly likely hit it that way. However, in multitasking, case 3, 4, and 5 are the one that we must handle, which are the fundamentals to race conditions. To handle case 3 and case 4, we want one of them to wait ensure the value is properly read or properly write (thus the word, guarantee
). Such guarantees usually follows such:
- Check
done
is free for access, if yes, lock
the access to 1 process. If no, go to step 5.
- Do read/write operations to the
done
.
- release the
lock
for others to use.
- done processing variable. Goto step 1 again for next process.
- enter
wait
mode. Pending for signal to use. When ready for use, go to step 1.
This is known mutex locking, the most conventional way to do synchronization. Go has channel, which facilitates a different approach for synchronization (using asynchronous messaging). That is also powerful.
Case 5 is very chaotic depending on the system, cpu processor spec, operating system rules etc. In any ways, it will ended up nasty. It is the ultimate race condition scenario that we must avoid at all time.
Relate back to code, to synchronize both main
and setup
, the easiest way to do is to use channel
to done
instead of global variable for guarantee data access. Here is the tested amended code:
package main
var (
a = ""
)
func setup(done chan bool) {
a = "hello world"
done <- true
}
func main() {
done := make(chan bool)
go setup(done)
<-done
print(a)
}
Now notice that it is clear that main
is waiting for the done
signal before proceeding, leaving setup
to have the full freedom to write
the signal. Due to asynchronous messaging, once main
received the signal, it will then continues to work.
One Vital Note
Looping against a sentinel variable to synchronize 2 or more processes is not an efficient way and potentially crashing most CPU processors (except those that are specifically designed for it). It is still consuming CPU cycles to perform the “wait”, rather than release the CPU for other threads as expected.
That’s why Go always recommend you to use mutex locking or channel, whichever makes sense.
In short, wait
is not a simple function (https://golang.org/src/sync/mutex.go - see lockSlow
).