5 · mix — addition, decibels, and click-free changes

Three small building blocks that show up in every audio program: mixing (adding signals), volume in decibels, and parameter smoothing (changing a value without a click).

The original controls volume live over the network with OSC messages from a tablet. Offline we have no live controller, so we treat a parameter as something that changes along the timeline (automation). The smoothing math is identical either way — and it is the real lesson here.

1 · Mixing is just addition

To combine two sounds, add them sample by sample. That is the whole idea — sound is a sequence of pressure numbers, and adding numbers adds the waves.

fn mix(in1: []const f32, in2: []const f32, out: []f32) void {
    for (out, in1, in2) |*o, a, b| o.* = a + b;
}

for (i = 0; i < nframes; ++i)
  out[i] = in1[i] + in2[i];

Zig note — multi-sequence for. for (out, in1, in2) |*o, a, b| walks three slices together; Zig requires them to be the same length and checks it. *o is a pointer (we write the result), while a and b are read-only copies of each input sample.

Three sine oscillators at the ratio 1 : 1.25 : 1.5 (e.g. 220, 275, 330 Hz) sum to a major chord. But beware: adding can push the total outside $[-1, 1]$ , which clips. The fix is to turn things down — volume control.

Math note — headroom. Two full-scale signals sum to ±2.0, double the ceiling. So leave headroom: scale each voice down (the × 0.2–0.3 we used in earlier chapters) or scale the sum by about 1/N for N similar voices. We will quantify “turn down” properly with decibels next.

2 · Volume and decibels

Changing volume is multiplying by a gain:

for (out, in) |*o, x| o.* = vol * x;

for (i = 0; i < nframes; ++i)
  out[i] = vol * in[i];

vol = 1 is unity gain (unchanged); vol = 0.5 halves the amplitude. But 0.5 does not sound “half as loud,” because hearing is logarithmic — the pressure at the threshold of pain is about a million times that at the threshold of hearing. So audio level is measured in decibels (dB):

$\text{dB} = 10\log_{10}\!\left(\frac{P}{P_0}\right) = 20\log_{10}\!\left(\frac{A}{A_0}\right)$

(Power is amplitude squared, and $\log(A^2) = 2\log A$ , which is where the 10 becomes 20.) In digital audio the reference $A_0$ is full scale (1.0), so the scale is called dBFS. Converting a plain gain factor to/from dB:

fn db2lin(g: f32) f32 {
    return std.math.pow(f32, 10.0, g * 0.05); // 0.05 = 1/20
}
fn lin2db(g: f32) f32 {
    return 20.0 * std.math.log10(g);
}

float db2lin(float g) { return powf(10.f, g * 0.05f); }   // dB -> linear
// lin2db(g) = 20*log10(g)                                 // linear -> dB

Math note — the 6 dB rule (again). $20\log_{10}(0.5) \approx -6$ dB and $20\log_{10}(2) \approx +6$ dB, so ±6 dB = half/double the amplitude. Halving twice is −12 dB, and so on. Working in dB matches how you hear: equal dB steps feel like equal loudness steps, which is why every fader is calibrated in dB. (Note g * 0.05 is just g / 20 written as a multiply.)

A fixed −6 dB volume stage:

const gain = db2lin(-6.0); // ≈ 0.501
for (out, in) |*o, x| o.* = gain * x;

for (i = 0; i < nframes; ++i)
  out[i] = db2lin(-6) * in[i];

3 · Control input (OSC → automation)

The original receives live volume changes over OSC (Open Sound Control — small UDP messages like /vol f -6 from a phone or tablet). It is essentially HTTP-for-control: an address (/vol) plus typed arguments. A handler updates a global vol whenever a message arrives.

Offline we have no live sender, so a parameter is simply a value we change at known points on the timeline — for example, “−6 dB for the first second, then 0 dB.” The instant we change it, though, we hit the same problem the live version has: a click.

4 · Why a sudden change clicks

Jumping vol from one value to another mid-signal creates a step — a discontinuity — in the output. Discontinuities are broadband energy: they click. The cure is to glide the value over a few milliseconds instead of jumping. Two standard ways.

4.1 Linear smoothing

Ramp from the current value to the target over a fixed number of samples (≈ 50 ms, ~2000 samples at 48 kHz, kills the click). Split vol into a target and a current, plus a step size and a countdown:

const VOL_STEPS = 2000;

const LinearSmoother = struct {
    curr: f32 = 1.0,
    target: f32 = 1.0,
    step: f32 = 0.0,
    ctr: u32 = 0,

    fn setTarget(self: *LinearSmoother, g: f32) void {
        self.target = g;
        self.step = (g - self.curr) / @as(f32, VOL_STEPS);
        self.ctr = VOL_STEPS;
    }

    fn tick(self: *LinearSmoother) f32 {
        if (self.ctr > 0) {
            self.ctr -= 1;
            self.curr += self.step;
        }
        return self.curr;
    }
};

#define VOL_STEPS 2000
static float vol_curr = 1, vol_target = 1, vol_step = 0;
static int vol_ctr = 0;

static void vol_update(float g) {     // on new target
  vol_target = g;
  vol_step = (g - vol_curr) / VOL_STEPS;
  vol_ctr = VOL_STEPS;
}

static float vol_tick(void) {          // once per sample
  if (vol_ctr > 0) { vol_ctr--; vol_curr += vol_step; }
  return vol_curr;
}

Use it per sample: out[i] = smoother.tick() * in[i];.

Math note — a straight line. step = (target − curr) / N is “total distance ÷ number of steps,” so adding step each sample draws a straight line from curr to target over exactly N samples, then stops. Simple and predictable; it always finishes in a known time.

4.2 Exponential smoothing (the one-pole)

There is an even shorter method that you will meet everywhere in DSP. The entire smoother is one line:

const OnePole = struct {
    mem: f32 = 0.0,
    fn tick(self: *OnePole, target: f32) f32 {
        self.mem = 0.001 * target + 0.999 * self.mem;
        return self.mem;
    }
};

static float vol_tick(void) {
  static float mem = 0;
  mem = 0.001 * vol + 0.999 * mem;   // 0.1% new, 99.9% old
  return mem;
}

Each sample the output takes a tiny step (0.1 %) toward the target and keeps the rest of its old value, so a sudden jump in target barely moves the output — it eases in. Writing $a = 0.999$ :

$y[n] = (1-a)\,x[n] + a\,y[n-1]$

Math note — why it is an exponential. Suppose the target is held constant at 0 and we start at $y[0] = 1$ . Then $y[1] = a$ , $y[2] = a^2$ , and in general: $y[n] = a^{\,n}$ That is exponential decay. For a general jump from $A$ to $B$ , the same recurrence gives $y[n] = B + (A-B)\,a^{\,n}$ — it heads toward $B$ , covering a fixed fraction of the remaining distance each sample (like a cooling cup of coffee). Smaller a → faster glide; larger a → gentler. If clicks remain, raise a toward 1.

Zig note — one struct, reused everywhere. This OnePole is the single most copy-pasted object in practical DSP — smooth a gain, a cutoff, a pan, a delay time, anything. Make one per parameter. (The original writes the constants as double on purpose, a quick trick to dodge denormal slowdowns; a cleaner fix is its own bonus chapter.)

5 · Improvements

Sample-rate independence. $y[n] = a^n$ is counted in samples, so the same a glides twice as fast at 96 kHz as at 48 kHz. To fix the time of the glide, compute a from a time constant and the sample rate — exactly the tool the next chapter introduces.

Slider scaling (cube it). A linear-in-dB fader spends as much travel on inaudible −60 dB as on the useful region near 0 dB. A common fix maps a linear slider position 0..1 through a cube:

fn sliderToGain(pos: f32) f32 {
    return pos * pos * pos; // amp = position³ — more resolution near unity
}

Cubing bunches the fine control near the top (unity gain) and stretches out the quiet end, matching how hardware faders feel. PulseAudio and OBS both use this.

The value never quite arrives. Floating-point steps eventually get too small to change mem, so the one-pole can stall just short of the target. A pragmatic guard: if mem stopped changing, snap it to the target.

fn tickSnap(self: *OnePole, target: f32) f32 {
    const prev = self.mem;
    self.mem = 0.001 * target + 0.999 * self.mem;
    if (self.mem == prev) self.mem = target; // resolution reached → snap
    return self.mem;
}

Exercises

Sum three sines at 1 : 1.25 : 1.5, scale by 1/3, and write a chord. Remove the 1/3 and listen for clipping.
Render a tone whose gain jumps −∞ → 0 dB at the halfway point, once raw (click) and once through OnePole (smooth). Hear the difference.
Print lin2db(pos³) for pos = 0, 0.25, 0.5, 0.75, 1.0 and see how cube scaling spaces the dB values.
Compare a = 0.999 vs a = 0.99 in OnePole: which glides faster, and why (think $a^n$ )?