Implementing a simple audio playback library?

Hello everyone,

a library that is painfully missing from Go is a simple, usable and cross-platform audio library. What do you guys think? How hard would it be to implement this API?

package playback

func PlayerDevices() []PlayerDevice { ... }
func RecorderDevices() []RecorderDevice { ... }
func OSPlayerDevice() (PlayerDevice, error) { ... }
func OSRecorderDevice() (RecorderDevice, error) { ... }

type Player interface {
	Play(rate float64, samples []float64) (n int, err error)
}

type Recorder interface {
	Record(rate float64, samples []float64) (n int, err error)
}

type Device interface {
	DeviceInfo() DeviceInfo
}

type PlayerDevice interface {
	Player
	Device
}

type RecorderDevice interface {
	Recorder
	Device
}

type DeviceInfo struct {
	...
}

Both Play and Record methods would be blocking, but that’s would be easy to overcome with Go’s concurrency. Go is actually great in this, that you don’t need to have any kinds of non-blocking I/O, which simplifies everything.

A library with this API could be preferably implemented in pure Go, with no bindings to anything… or maybe something? I’m not sure, I’ve never done low-level audio, so I’m asking.

A library like this is flexible enough, that it’s easy to build on top of it. Player and Recorder interfaces are analogous to Writer and Reader and are same flexible.

What do you guys think? How hard would it be to implement this in pure Go?

I’m not sure, I’ve never done low-level audio, so I’m asking.

Then do few low-level audio things before continuing. e.g. write a real-time synth… write a sound-recorder etc. with existing API-s.

Just a few API-s to study:

Making them blocking sounds like a bad idea… for anything real-time you need to write samples then immediately start calculating the next samples; if you don’t then you’ll be late for submitting the next buffer. Talking to the sound card is inherently asynchronous – unless you are the driver.

Whether to use float64 depends on the target application… I think, in most cases it would overkill and float32 will be sufficient. Similarly using float64 to denote the sampleRate just looks wrong.

You wouldn’t be able to work in pure Go, you still need to use the drivers or some API to give you access to the device itself.

But the main question is – for what purpose are you writing the API – for synths, for playing files, for real-time mixing, 3d audio for games etc. etc…?

Whether these are sufficient depends on what programs are you going to write with them.
There are a lots of options for designing the API. I’m not saying it will be difficult, it varies from one target group to another.

PS: rakyll has done some work on audio API https://github.com/rakyll/audio and Azul3D has audio audio package - azul3d.org/engine/audio - Go Packages

Thanks for a very comprehensive response!

Well, the API is just a sketch, but thanks for the critique!

The idea behind the blocking functions is that in Go, it’s very easy to turn blocking functions into non-blocking using goroutines and it simplifies the API. Actually, to turn a blocking function into non blocking, all that’s required is the go keyword:

device.Play(...) // blocking
go device.Play(...) // non-blocking

And it’s very easy to compose, e.g. this plays two sounds simultaneously, and waits for them to finish:

done := make(chan bool)
go func() {
    device.Play(...)
    done <- true
}
go func() {
    device.Play(...)
    done <- true
}
<-done
<-done

Once you have a non-blocking functions, you need to introduce handlers (to stop the playback for example). There is no such a need with blocking functions and Go’s concurrency. To stop a playback, you can do something like this:

func stoppablePlay(samples []float32) (stop chan<- bool) {
    stop = make(chan bool)
    go func() {
    loop:
        for i := 0; i + 64 < len(samples); i += 64 {
            device.Play(samples[i : i+64])
            select {
            case <-stop:
                break loop
            default:
            }
        }
    }()
    return stop
}

stop := stoppablePlay(...)
time.Sleep(time.Second)
stop <- true

This plays a slice of samples by chunks and after each chunk it checks whether is should not stop. So this program plays a sound for a second, then stops. I know that the stoppablePlay function is not completely correct, but it’s understandable.

Ok. so in my view, this is the point of a blocking API: no handlers, no unnecessary API, all composable.

What do you think about that? Is that doable?

And it’s very easy to compose, e.g. this plays two sounds simultaneously, and waits for them to finish.

Does timing matter here – if you have two goroutines and single cpu core, they won’t be scheduled exactly the same time. So when you try to play two samples exactly at the same time – they won’t just because of the inherent unpredictableness of goroutines, threads, cores, cpus etc.

edit: removed example, seriously messed up calculations

see more on latency issues Ross Bencina » Real-time audio programming 101: time waits for nothing

Most audio API-s are designed to hide and make handling these issues easier. e.g. on a slower computer those buffers would need to be larger than a faster computer, so it would be nice if they are automatically sized.

Ok. so in my view, this is the point of a blocking API: no handlers, no unnecessary API, all composable.

What do you think about that? Is that doable?

Sure, it’s (kind of) doable, but you will be trading some latency for such interface… also it cannot be completely blocking (i.e. you need to break out before it finishes playing the samples) otherwise you will get gaps in your out buffer.

Basically, a completely blocking API is only possible when you play the whole file.

Ok, I see that a blocking API is probably not the best idea. I would like to use this library for streaming, games, sound generation, file playback, everything :smiley: .

So I’ll try to think of a different API that’s not blocking (at least not completely). However, I would like to take advantage of Go’s features and make the API as simple, flexible and composable as possible.

Also, if you come up with a good idea for such an API, please share!

Then it won’t be great for any of it. Target a single thing, it will be easier to develop and you will actually get a good interface. All of these uses require different trade-offs and different API-s; so a single API is unlikely to fit all needs… or it will be very complicated.

And, as I’ve said multiple times, first write some audio code, otherwise you won’t know what you need to build.

The list of API-s I’ve sent previously are the things that people need from an audio API. Start making notes, read all of the API-s, read the example programs… if you want to make a great audio API, you need put proper work into it (and I’m talking about weeks).

Here’s the simplest API for real-time audio generation synth/realaudio/source.go at master · loov/synth · GitHub. Although currently it’s unfinished and has a latency issues (not fast enough). I’m not quite sure what do I need to do to fix the latency issues… the API may need to change to accommodate these fixes. Also it currently isn’t syncing to other sources, so it might need some other things for it. etc. etc.

PS: I don’t mean any of it in a harsh way and I’m not trying to discourage from implementing it. It’s clearer to type without all the IMHO-s. I really want you to succeed. Currently the best general purpose audio API for Go I’ve seen is https://github.com/rakyll/audio, but it currently it lacks different backends (and Windows support).

I’ve been thinking what would be a good audio interface for games, mostly because I’m writing one, but sadly not in Go.

Here’s what I’ve come up with:

Now the actual device would use “Node” as the content driver. In package al and package xa2 you would have func New(root Node) *Device. For convenience also package device which tries to choose the appropriate device.

The Device would call master.Process on an empty buffer at regular intervals (something like https://github.com/loov/synth/blob/master/realaudio/xa2_windows.go#L81). The buffer sizes probably need to be dynamically up/down-sized, so to accommodate different devices. The dynamic adjustment is not a high priority in such a system.

This interface should suffice for games and almost-real-time audio. It would be possible to use it for streaming, but would probably need some additional Quality field.

Since the Device needs to run on a pinned thread, separated from other things, there needs to be some nice way of using atomic values.

For some easier uses, you might also have an package play. that uses the previous API, but has some convenience things such as play.File(filename string), play.LoopingFile(filename string), play.FileCallback(filename string, done func()) etc. And it would handle loading creating the device etc. all by itself. Of course, needs some actual use cases to figure out what is needed from such package.

The Nodes probably need a good process package which contains asm code for the common array multiplication for different sliding values and common processing functions. Mainly to optimize effect handling and sound generation.

In which order I would implement it:

  1. Device that just plays sine-wave on any platform
  2. gen.Sine that generates sine-wave and implements the Node, adjust device to take Node as parameter
  3. audio.WaveSound, WAV loading from disk and playback
  4. split audio.WaveSound into audio.Clip and audio.Sample
  5. implement Mixer properly
  6. implement some Effects (echo, envelope, limiter)
  7. implement Master
  8. implement Silent optimizations
  9. implement package play
  10. implement different device backends
  11. implement asm optimizations for the different nodes

Why in this order? Because this would probably give the best end-result, each step needs the previous and each tries to maximize value you get from the package. Also each step informs the next step, so you won’t have to redesign/fix much at each step, if you get something wrong. PS: I would not try to implement the gist I showed all at once, it is just a rough sketch and might have some important things wrong…

This obviously isn’t a great interface for professional DAW-s and streaming audio. Or anything that needs 5.1 and 7.1 support and I’m not sure what is the right way to implement them. But, it fills the most basic needs.

PS: This recently popped up https://github.com/golang/go/issues/18497

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.