It's not a race condition because I checked

I’ve been playing with writing state machines (based on Rob Pikes example) and have the following example:

However, as you can see when running the code, it doesn’t behave as expected.

Status should be BIDDING, got JOINING
Status should be LOST, got CLOSED

My initial thought was that I was causing a race condition by not protecting the Status value, and the -race flag confirmed it. However, despite adding locks, the behaviour still remains, but at least the race condition is gone.

What’s slightly more confusing is if I debug and step through the code, the first part of the test succeeds.

Can anyone see anything glaringly obvious?

1 Like

I guess that missing synchronization between main and the goroutine causes this behavior.

  • Snipe starts a's goroutine and returns a.
  • main immediately checks a's status. The goroutine might not even have started at that point.
  • Same after main calls a.AnnounceClosed().

a's goroutine has no chance to run until the main goroutine goes into a wait state (e.g., by calling a blocking I/O operation, or by waiting for someting to arrive through a channel, or by waiting for members of a WaitGroup to finish.

2 Likes

Heisenbugs are always fun :wink:

I came up with two ways to make this example work. They don’t run on the playground because “the process took too long”, but run with no output (meaning the actual states are the expected states) on my local machine (macOS)

https://play.golang.org/p/n4NWbzpw4J uses runtime.Gosched() to give a's goroutine a chance to run. This works, but is not guaranteed to do so.

https://play.golang.org/p/S6yp_o0RF6 uses channels to force synchronization. This is guaranteed to work, but slightly changes the semantics of the program (e.g., auction is closed before AnnounceClosed returns). Since I don’t know what semantics you want, I made some up.

1 Like

Thanks both for taking the time to reply.

I should have mentioned earlier that this code is for testing the application. The code in main is the test, but that context got lost by the writing this in playground.

@nathankerr, your first solution is appealing because it doesn’t change the code under test, only the test itself. I think I can use smaller unit tests to validate the behaviour and avoid the sync issues.

I have to be honest and say I don’t fully understand the code you added for the second implementation. The behaviour of auction changing to closed before the AnnounceClosed returns is actually a useful feature to have, so I would like to learn more here.

@nathankerr, following your suggestion I went with runtime.GoSched(). This made sense to me, so I implemented it in my test. The first check (as described above) passes, but the second does not. Coincidentally, this is the same behaviour as when I debug.

I decided to write a smaller test to validate my understanding:

func TestAuction_AnnounceClosed(t *testing.T) {
	a := &auction{}

	a.AnnounceClosed()

	expected := "CLOSED"
	if actual := a.GetStatus(); actual != expected {
		t.Errorf("Expected; %v, got; %v", actual, expected)
	}

	actual := reflect.ValueOf(bidding(a))
	expected = reflect.ValueOf(lost)

	if actual.Pointer() != expected.Pointer() {
		t.Errorf("Expected; %v, got; %v", actual, expected)
	}
}

In the second if statement I’m checking that the identity of the function returned from bidding() is the lost function when status is set to CLOSED. I have done this with winning function successfully, but the compiler does not like lost. I get the error:

auction/state_test.go:48: cannot use reflect.ValueOf(lost) (type reflect.Value) as type string in assignment
auction/state_test.go:51: expected.Pointer undefined (type string has no field or method Pointer)

The following test for winning passes without a hitch:

func TestAuction_BidPriceGreaterThanMaxBid(t *testing.T) {
	a := &auction{
		maxBid: 100,
		bidPrice: 100,
	}

	actual := reflect.ValueOf(bidding(a))
	expected := reflect.ValueOf(winning)

	if actual.Pointer() != expected.Pointer() {
		t.Errorf("Expected; %v, got; %v", actual, expected)
	}
}

winning and lost are implemented as follows:

func winning(a *auction) stateFn {
	return nil
}

func lost(a *auction) stateFn {
	a.mu.Lock()
	defer a.mu.Unlock()
	a.Status = "LOST"

	return nil
}

… and for completeness, here is the bidding() function:

func bidding(a *auction) stateFn {
	a.mu.Lock()
	defer a.mu.Unlock()
	if a.Status == "CLOSED" {
		return lost
	}
	a.Status = "BIDDING"

	if a.bidPrice <= a.maxBid {
		return winning
	}

	return bidding
}

I am now completely bamboozled as to why lost isn’t behaving the same as winning, but I suspect it has everything to do with why the main test isn’t passing. Anything obvious? could this be a bug in Go?

I approached my second implementation in two parts: starting and closing. It is no coincidence that those parts mirror the requirements of main().

The original version of Snipe assumed the auction would be started immediately when go a.run() completed. This assumption was made clear by the expectation that the auction’s status would be BIDDING immediately after Snipe returned. Because of this expected behavior, Snipe actually needed to make sure the auction was running before returning. This was accomplished by having run tell Snipe the auction started by sending a struct{}{} on started (line 72). Since Snipe was blocked waiting for a value from started (line 58), the go scheduler had to find something else to run, in this case the auction, though it could choose any non-blocked goroutine. I chose struct{} as the type for the channel because it means there is no data (an empty struct). The channel must be unbuffered (i.e., has a capacity of 0, the default for make). The receive on line 58 blocks until there is something to receive on the channel. The send on line 72 blocks until the sent value can be received. The scheduler matches the sending and receiving, running the affected goroutines as they become unblocked. This guarantees that the auction goroutine has executed to the point of the send (line 72), but does not guarantee the state machine loop (lines 74-76) is running. The two goroutines (main and auction) are synchronized at lines 58 (for main) and 72 (for auction). It is possible for the test on lines 91-94 will still fail because there is no point of synchronization between the two goroutines, though it worked for me.

AnnounceClosed was harder because the expectation was that after it returned the auction was LOST. The implementation followed the same idea as before, using an unbuffered channel for synchronization, but the details are more complicated. First, the auction was not actually LOST until the lost was called. Therefore I put the channel send there (line 42). I added the channel used to auction (line 18, initialized on line 53) because each auction would be closed on its own. The Unlock on line 83 could not be deferred because lost need to acquire the lock in order to finish the auction. Looking back, I probably should have also not left the Unlock on line 39 deferred because only the change to a.Status is managed by the lock; leaving it deferred included the send in the critical section even though it does not need to be there.

In short, paired sends and receives on an unbuffered channel allow goroutines to indicate where they are in their execution.

I included runtime.Gosched() as a possible option because it was a logical, though incorrect, solution to the way @christophberger described the problem that would seem to work. I had hoped that my discussion of guarantees was enough to dissuade you, but I did not anticipate the need to describe how my second implementation worked. That my run using runtime.Gosched worked and yours didn’t is an effect of the lack of guarantees. That is why I provided the second implementation.


I get the error:

auction/state_test.go:48: cannot use reflect.ValueOf(lost) (type reflect.Value) as type string in assignment
auction/state_test.go:51: expected.Pointer undefined (type string has no field or method Pointer)

Line 48 is:

expected = reflect.ValueOf(lost)

expected was previously defined as (line 42):

expected := "CLOSED"

"Closed" is a string literal, so the type of expected inferred by the compiler is string. reflect.ValueOf(lost) returns a reflect.Value, which is not a string. Line 48 tries to assign the result of reflect.ValueOf(lost) to expected, which is not possible because expected is a string and reflect.ValueOf(lost) does not return a string. Changing line 48 to:

expected2 := reflect.ValueOf(lost)

along with the reference to expected to expected2 on lines 50 and 51 fixes this problem.


If you want to compare tests, the tests should change as few things as possible. You probably wanted TestAuction_BidPriceLessThanMaxBid instead of TestAuction_AnnounceClosed since the latter actually tests two things, that AnnounceClosed sets a.Status to CLOSED and that bidding on a closed auction results in a loss.

Thank you so much for taking the time to respond. You cleared up many issues in my understanding of Go.

I think the reason that I like runtime.GoSched() is that I don’t (think I) want synchronisation between the 2 routines; the end2end test and the goroutine. Maybe a better way to say it, is that I want comfort that the state will eventually reach a status of LOST following an AnnounceClosed().

By taking the goroutine out of the equation entirely (in my second test), I could see the states transitioned as expected (once I fixed the error as per your recommendation :flushed:).

I have pushed the code to Github if anyone is interested.

The amount of synchronization I used reflected the interactions with the state machine. There was synchronization before I did anything. The use of the Mutex coordinated the changes and responses to auction.State. This provided the guarantee that auction.State would not be changed while a process was using it. The channels I added provided different guarantees (e.g., the auction go routine has started).

The hard part of using goroutines is figuring out what interactions are needed and making sure they work correctly.

1 Like

Thanks for your help on these issues. I’m learning Go solo, so the community is my only line of support. I am really appreciative of the help people offer.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.