2 · wave — sound, sampling, and the WAV file
Adapted from μ — a modular approach to audio programming (Yü Fang, CMSC388V), rewritten for Zig 0.16 with extra explanations of both the Zig and the math. The original C is kept beside every Zig port so you can compare. Everything here runs offline: we compute samples and write a
.wavfile you play afterwards — no real-time audio, no drivers.
Interactivity inspires musicians: press a piano key and you hear the string instantly. That immediacy is what makes an instrument feel alive — but it is expensive to build. Real-time audio means hard latency deadlines and a lot of program complexity. It is far easier to generate, transform, and analyze sound when you are allowed a long delay before hearing the result.
So we start offline. Our goal: write — from scratch — a program that generates an audio file in WAV format that any media player can open. Tedious once, illuminating forever. But first, a question.
1 · What is sound?
Section titled “1 · What is sound?”Sound is a rapid change in air pressure caused by mechanical vibration. Clap your hands and the air between them is forced out; that compression pushes the next layer of air, and the disturbance travels until it reaches your eardrum.
A computer cannot touch air pressure directly, so getting sound into and out of a computer takes a chain of conversions:
pressure → microphone → analog signal → ADC → digital signal → ... DAC → speaker1.1 Analog signals
Section titled “1.1 Analog signals”A microphone (a transducer) turns pressure into a continuous electrical voltage. We model that as a function of time:
t is a real number — infinitely fine. A speaker runs the chain backwards, turning back into vibration. Analog processing (guitar pedals, analog synths) transforms that voltage with circuits; we will not go there. We work entirely on the computer.
1.2 Digital signals
Section titled “1.2 Digital signals”A computer has finite resolution, so to store as binary we must throw away some detail. An analog-to-digital converter (ADC) records; a digital-to-analog converter (DAC) plays back. You will see these two acronyms constantly.
To generate audio we only need to mimic an ADC:
- Have an analog signal (a continuous function) in mind.
- Sample it — record its value at regular instants.
- Trust that it can be reconstructed with minimal distortion.
On a computer we never record real air — we construct a function in code and sample it. Two steps make a signal digital: sampling and quantization.
Sampling
Section titled “Sampling”Sampling records the signal every seconds, producing a discrete sequence indexed by an integer :
Notation. Square brackets
x[n]mean a discrete (sampled) signal; parenthesesx(t)mean a continuous one.
is the sampling period (seconds). Its inverse is the sample rate (written sr in code), measured in samples/second, i.e. hertz (Hz):
It seems like we lose enormous information by sampling, yet — remarkably — sampling is theoretically lossless as long as the signal contains no frequency at or above . That threshold is the Nyquist frequency, and the result is the Nyquist–Shannon sampling theorem.
Math note — why a half? Intuitively, to pin down a wave going up and down you need at least two samples per cycle: one near the peak, one near the trough. Fewer than two and a fast wave is indistinguishable from a slow one — they produce identical samples. That impostor is called an alias. CD audio uses Hz, so it captures everything below Hz — comfortably above the ~20 kHz ceiling of human hearing.
Because frequencies above Nyquist masquerade as lower ones, a real ADC first runs the signal through an analog low-pass filter (an anti-aliasing filter) to remove them. That is a hardware concern; when we synthesize, we simply avoid generating those frequencies in the first place.
Encoding and quantization
Section titled “Encoding and quantization”To store a sample as bits we must round it to one of a finite set of levels. The common scheme is linear pulse-code modulation (LPCM): the value is the amplitude, quantized in uniform steps. With a bit depth of bits there are levels. CD audio uses 16-bit ( levels); studios use 24-bit.
Quantization is lossy — each sample is nudged to the nearest level, off by up to half a step. That error behaves like added noise. We measure it with the signal-to-noise ratio (SNR), usually in decibels:
Math note — the 6 dB rule. Each extra bit halves the quantization step, which doubles the SNR amplitude ratio. Since , every bit buys about 6 dB of dynamic range. So 16-bit gives ≈ 96 dB and 24-bit ≈ 144 dB. Human hearing spans ≈ 120 dB, so 20-plus bits is effectively lossless. Quieter signals use fewer levels, so their SNR is worse — which is exactly why you set a healthy recording level instead of a faint one.
The practical workflow everyone uses: decode LPCM → process in floating point in the range → encode back to LPCM. Floating point has a huge dynamic range, so intermediate math practically never overflows. We will compute in f32 and convert to i16 only at the moment of writing.
Zig note — types up front. Zig never converts between number types silently. You will write
@floatFromInt,@intFromFloat, and@as(T, x)explicitly. It feels verbose at first, but it means every rounding or truncation is visible in the source — no surprise conversions.
2 · The WAV format
Section titled “2 · The WAV format”A WAV file is just a container for encoded samples, built from the RIFF format (Microsoft/IBM, 1991). RIFF is made of chunks — tagged containers:
┌─────────────┐│chunk ││┌──────────┐ │││id(4) │ │ id : 4-byte tag, e.g. "RIFF"│├──────────┤ │ size : 4-byte length of the body││size(4) │ │ body : the actual data (may be sub-chunks)│├──────────┤ │ pad : 1 zero byte if size is odd││body │ ││└──────────┘ │└─────────────┘A whole WAV file is one RIFF chunk whose body, in the simplest form, holds two sub-chunks: fmt (the format metadata) and data (the samples). Grouping everything before the samples into a header, the layout is:
WAV file├── RIFF header│ ├── id(4) "RIFF"│ ├── size(4) 36 + data_size│ └── type(4) "WAVE"├── fmt chunk│ ├── id(4) "fmt " (note the trailing space!)│ ├── size(4) 16│ ├── fmt_tag(2) 1 = linear PCM│ ├── channels(2) 1 = mono│ ├── samples/sec(4) 44100│ ├── bytes/sec(4) channels × sr × bits/8│ ├── block_align(2) channels × bits/8│ └── bits/sample(2) 16└── data chunk ├── id(4) "data" ├── size(4) number of sample bytes └── data the samples themselvesTwo rules: the 4-byte tags ("RIFF", "WAVE", "fmt ", "data") are raw ASCII and endian-less (do not reverse them); everything else is little-endian.
Math note — the derived fields.
bytes/sec=channels × samples/sec × bits/8; a player uses it to know how fast to stream.block_align=channels × bits/8is the size of one frame (all channels of a single instant). For stereo, samples are interleavedL R L R ….
The header as a C struct
Section titled “The header as a C struct”The original course models the header as a nested struct:
#include <stdint.h>typedef int8_t fourcc[4];
struct riff_hdr { fourcc id; uint32_t size; fourcc type;};
struct fmt_ck { fourcc id; uint32_t size; uint16_t fmt_tag; uint16_t channels; uint32_t samples_per_sec; uint32_t bytes_per_sec; uint16_t block_align; uint16_t bits_per_sample;};
struct data_hdr { fourcc id; uint32_t size;};
struct wav_hdr { struct riff_hdr riff; struct fmt_ck fmt; struct data_hdr data;};The parameters, hard-coded for simplicity:
typedef int16_t sample_t;#define SAMPLE_MAX 32767
#define DURATION 5#define SR 44100#define NCHANNELS 1#define NSAMPLES (NCHANNELS*DURATION*SR)Then fill the struct and fwrite it:
struct wav_hdr hdr = {0};FILE *fp = fopen("output.wav", "wb");
/* RIFF header */memcpy(&hdr.riff.id, "RIFF", 4);hdr.riff.size = 36 + NSAMPLES*sizeof(sample_t);memcpy(&hdr.riff.type, "WAVE", 4);
/* FMT chunk */memcpy(&hdr.fmt.id, "fmt ", 4);hdr.fmt.size = 16;hdr.fmt.fmt_tag = 1; /* linear PCM */hdr.fmt.channels = NCHANNELS;hdr.fmt.samples_per_sec = SR;hdr.fmt.bytes_per_sec = NCHANNELS*SR*sizeof(sample_t);hdr.fmt.block_align = NCHANNELS*sizeof(sample_t);hdr.fmt.bits_per_sample = 8*sizeof(sample_t);
/* DATA header */memcpy(&hdr.data.id, "data", 4);hdr.data.size = NSAMPLES*sizeof(sample_t);
fwrite(&hdr, sizeof(struct wav_hdr), 1, fp);Writing a struct directly with fwrite is fragile: the compiler may insert padding between fields, and the bytes come out in the machine’s native endianness (wrong on a big-endian CPU). This particular struct happens to be padding-free on x86, but it is a trap.
The header in Zig
Section titled “The header in Zig”Zig sidesteps both traps by writing each field explicitly, in chosen byte order, through a buffered Writer. No structs, no padding, no endian guesswork:
const std = @import("std");
const sample_t = i16; // 16-bit signed samplesconst sample_max: f32 = 32767;const duration = 5; // secondsconst sr = 44100; // sample rate (Hz)const nchannels = 1; // monoconst nsamples = nchannels * duration * sr;
fn writeWavHeader(w: *std.Io.Writer, data_len: u32) !void { // RIFF header try w.writeAll("RIFF"); try w.writeInt(u32, 36 + data_len, .little); try w.writeAll("WAVE"); // fmt chunk try w.writeAll("fmt "); try w.writeInt(u32, 16, .little); // chunk size try w.writeInt(u16, 1, .little); // 1 = linear PCM try w.writeInt(u16, nchannels, .little); try w.writeInt(u32, sr, .little); try w.writeInt(u32, nchannels * sr * @sizeOf(sample_t), .little); // bytes/sec try w.writeInt(u16, nchannels * @sizeOf(sample_t), .little); // block align try w.writeInt(u16, 8 * @sizeOf(sample_t), .little); // bits/sample // data header try w.writeAll("data"); try w.writeInt(u32, data_len, .little);}Zig note —
writeInt(u32, x, .little). This single call solves the C program’s two hardest portability problems at once. It writes exactly the bytes you ask for (no struct padding) in the byte order you name (.little), so the output is identical on any CPU.writeAllwrites raw bytes — perfect for the endian-less ASCII tags. Thetrypropagates any write error out of the function (its return type is!void, “void or an error”).
Generating the samples
Section titled “Generating the samples”The “hello world” of audio is a 440 Hz sine — the note A above middle C:
To sample it, substitute for , and scale the result up to the 16-bit range. In C:
(lrint rounds to the nearest integer.) The complete Zig program — header plus samples plus the all-important flush:
pub fn main() !void { var dbg: std.heap.DebugAllocator(.{}) = .init; defer _ = dbg.deinit(); const gpa = dbg.allocator();
var threaded: std.Io.Threaded = .init(gpa, .{}); defer threaded.deinit(); const io = threaded.io();
var file = try std.Io.Dir.cwd().createFile(io, "output.wav", .{}); defer file.close(io); var wbuf: [4096]u8 = undefined; var fw = file.writer(io, &wbuf); const w = &fw.interface;
const data_len: u32 = nsamples * @sizeOf(sample_t); try writeWavHeader(w, data_len);
var i: usize = 0; while (i < nsamples) : (i += 1) { const t: f32 = @floatFromInt(i); const s = std.math.sin(2.0 * std.math.pi * 440.0 * t / @as(f32, sr)); const v: sample_t = @intFromFloat(sample_max * s); try w.writeInt(sample_t, v, .little); } try w.flush(); // <-- buffered bytes only hit disk on flush; forgetting this truncates the file}static sample_t buf[NSAMPLES];
for (size_t i = 0; i < NSAMPLES; ++i) buf[i] = lrint(SAMPLE_MAX*sin(2*M_PI * 440 * i/SR));
fwrite(buf, sizeof(buf), 1, fp);Zig note — the I/O setup (new in 0.16). Zig 0.16 made I/O an explicit value you pass around. The three lines
DebugAllocator → Threaded → io()give you a memory allocator and aniohandle;file.writer(io, &wbuf)wraps the file in a buffered writer (it batches bytes for speed).&fw.interfaceis the*std.Io.Writerwe hand towriteWavHeader. The catch beginners always hit: a buffered writer holds the last chunk in memory until you callw.flush()— miss it and the file ends up short or empty.deferruns cleanup (deinit,close) automatically when the scope exits.
Math note — reading the sine line.
t = i / sris the time of sampleiin seconds. Feeding2π·440·ttosinmakes the wave complete 440 cycles per second — that is the pitch. Double 440 → one octave up; halve it → one octave down. Pitch is multiplicative, which is why octaves are ratios, not fixed offsets.
Run it (zig run wave.zig), then turn your volume below 20 % before playing — a full-scale sine is genuinely loud, and a bug can be worse. The complete C program lives on tig.
Headroom
Section titled “Headroom”A full-scale sine fills the entire amplitude range, which is far too loud for one instrument in a mix (you rarely want a single voice above ~0.5). Audio signals are functions, so to make one quieter you multiply by a factor below 1:
const v: sample_t = @intFromFloat(sample_max * 0.2 * s); // 0.2 ≈ −14 dB, plenty of headroombuf[i] = lrint(SAMPLE_MAX*0.2*sin(2*M_PI * 440 * i/SR));Even at 0.2 you will hear it clearly, because hearing is logarithmic (more on that in mix). From the oscillator chapter on, we keep generation and volume separate and leave signals at full amplitude — so keep your system volume down.
3 · Bytebeat
Section titled “3 · Bytebeat”For fun: bytebeat (discovered 2011 by viznut) makes chiptune-ish music from a single integer expression. Drop to 8-bit samples at 8 kHz:
const sample_t = u8; // 8-bit samples, 0..255const sr = 8000; // lo-fi sample ratetypedef uint8_t sample_t;#define SR 8000Then fill the buffer with one expression built only from integer math, bitwise ops, and comparisons:
var t: usize = 0;while (t < nsamples) : (t += 1) { const x: u8 = @truncate(t *% 5 & t >> 7 | t *% 3 & t >> 10); try w.writeByte(x);}for (size_t t = 0; t < NSAMPLES; ++t) buf[t] = t*5&t>>7|t*3&t>>10;Zig note — wrapping vs. checked math. In C, unsigned overflow silently wraps. Zig makes that a choice: plain
*panics on overflow in safe builds, while*%is the explicit wrapping multiply — which is what bytebeat relies on.@truncatekeeps only the low 8 bits to land back inu8. Making “I really do want wraparound here” visible is very on-brand for Zig.
A few characters and it already sounds like music. Tweak the expression and see what falls out.
4 · Improvements
Section titled “4 · Improvements”Portability / endianness. The C struct-write breaks on a big-endian machine, so the original adds an is_le() check and per-field byte-swapping wrappers. In Zig there is nothing to fix: writeInt(..., .little) already pins the byte order, so the program is correct everywhere by construction.
Variable duration. If you do not know the length up front (e.g. recording until Stop), you cannot fill the size fields in advance. The fix is the same in both languages: remember the file position, write a placeholder, and after the samples seek back and patch the two size fields (ftell/fseek in C; fw.seekTo / file.seekTo in Zig).
Exercises
Section titled “Exercises”- Change 440 to 220 and 880 Hz; confirm each is “the same note” an octave away. Why is octave = ×2, not +constant?
- Write a stereo file: set
nchannels = 2, fixbytes/secandblock_align, and write samples interleavedL, R. Pan a tone hard left by writing it to L and0to R. - Inspect the bytes:
xxd output.wav | head -3. FindRIFF,WAVE,fmt,data, and confirm44 AC 00 00(44100) appears. - Comment out
w.flush()and observe what happens to the file. Feel the bug once now.
Further reading
Section titled “Further reading”Gareth Loy’s Musimathics (vol. 1) for the physics of sound; Ken Pohlmann’s Principles of Digital Audio for sampling/quantization/codecs; xiph.org’s “Digital Show & Tell” video for an intuitive sampling demo. Original chapter and full source: mu.krj.st/wave.
Next: 3 · osc (part i) — turning a counter into any waveform.