7 · delay — echoes, feedback & the ring buffer

Delay is the basis of at least half the audio effects you hear — echo, chorus, flanger, and even reverb are all built on it. The whole idea: mix a signal with a delayed copy of itself. The only real machinery is a buffer that remembers the past — a ring buffer.

The original runs in real time under JACK; offline we apply the same difference equations over a whole buffer. Identical DSP.

1 · Feedforward delay (a single echo)

A single reflection: you hear the direct (dry) sound, then a quieter, delayed copy bounced off a wall. As a difference equation, with delay $D$ samples and gain $g$ :

$y[n] = x[n] + g\,x[n-D]$

The naive code reads the input $D$ samples ago directly:

Offline, where we hold the entire input, that translates almost literally:

const delay_samples: usize = 2000; // delay in samples (~45 ms at 44.1 kHz)
const gain: f32 = 0.5;

fn feedforward(in: []const f32, out: []f32) void {
    for (out, 0..) |*o, i| {
        const echoed: f32 = if (i >= delay_samples) in[i - delay_samples] else 0.0;
        o.* = in[i] + gain * echoed;
    }
}

static jack_nframes_t delay = 2000;  // delay in samples
static float gain = 0.5f;

for (i = 0; i < nframes; ++i)
  out[i] = in[i] + gain*in[i-delay];

Math note — it is a comb filter. $y[n] = x[n] + g\,x[n-D]$ adds a signal to a delayed copy of itself. At frequencies where the copy lines up in phase, they reinforce; where it is out of phase, they cancel — producing a row of evenly spaced notches across the spectrum that look like a comb. (This is a finite impulse response / FIR filter: the output depends only on past inputs.)

Zig note — the start guard. if (i >= delay_samples) in[i - delay_samples] else 0.0 avoids indexing before the buffer start (an unsigned usize would wrap catastrophically). In real time you cannot reach back into a previous block at all — which is exactly why we need a ring buffer.

2 · The ring buffer

In real time each block only holds nframes samples, so to look D samples back we keep our own history in a fixed array whose index wraps around like a clock. Track one write pointer wp; the read pointer is always D behind it.

In Zig, wrapped in a struct that works the same offline or (conceptually) in real time:

const MAX_DELAY: usize = 1 << 17;

const Delay = struct {
    buf: [MAX_DELAY]f32 = [_]f32{0} ** MAX_DELAY, // start silent
    wp: usize = 0,

    fn read(self: *Delay, d: usize) f32 {
        // d samples behind the write pointer, wrapped
        const rp = (self.wp + MAX_DELAY - d) % MAX_DELAY;
        return self.buf[rp];
    }

    fn write(self: *Delay, x: f32) void {
        self.buf[self.wp] = x;
        self.wp = (self.wp + 1) % MAX_DELAY;
    }
};

#define MAX_DELAY (1<<17)        // must hold the longest delay
static sample_t buf[MAX_DELAY];

// read D samples behind the write pointer
int rp = wp - delay;
if (rp < 0) rp += MAX_DELAY;
out[i] = in[i] + gain*buf[rp];

// write the current input, advance, wrap
buf[wp] = in[i];
wp++;
if (wp >= MAX_DELAY) wp -= MAX_DELAY;

Zig note — [_]f32{0} ** MAX_DELAY. This builds an array of MAX_DELAY zeros at compile time (** repeats an array literal). Starting the buffer at zero matters: read uninitialized memory and you get a burst of noise on the first pass — the Zig array initializer is the equivalent of C’s calloc/memset reminder in the original. Computing rp with + MAX_DELAY before % keeps the arithmetic on unsigned usize from underflowing.

3 · Smoothing & fractional delay

Changing gain or delay abruptly clicks. Gain is just a volume, so smooth it with the one-pole from mix. Delay time needs smoothing too — but first it must become a float, which means reading the buffer at a fractional position:

fn readFrac(self: *Delay, d: f32) f32 {
    const pos = @as(f32, @floatFromInt(self.wp + MAX_DELAY)) - d;
    const ipos: usize = @intFromFloat(pos);
    const fr = pos - @as(f32, @floatFromInt(ipos));
    const x0 = self.buf[ipos % MAX_DELAY];
    const x1 = self.buf[(ipos + 1) % MAX_DELAY];
    return (1.0 - fr) * x0 + fr * x1; // linear interpolation
}

static sample_t
buf_eval(float pos)
{
  size_t i = (size_t) pos;     // integer part
  float fr = pos - i;          // fractional part
  sample_t x0 = buf[i];
  sample_t x1 = buf[(i+1) % MAX_DELAY];
  return (1-fr)*x0 + fr*x1;     // linear interpolation
}

Math note — why fractional, and the pitch-shift. A signal can be delayed by any real amount, not just whole samples, so to glide the delay time smoothly we read between stored samples and blend them — the same (1-fr)*x0 + fr*x1 interpolation as the wavetable in chapter 4. A side effect: while the delay time is changing, the read pointer moves at a different rate than the write pointer, which shifts the pitch — the classic “warble” of tape and bucket-brigade delays. Lovely or annoying, depending on taste.

4 · Feedback delay (repeating echoes)

Two reflecting surfaces instead of one: the sound bounces back and forth, each pass quieter, giving a train of decaying echoes. The change is famously one line — store the output instead of the input:

As difference equations, that is the whole difference:

$\text{feedforward: } y[n] = x[n] + g\,x[n-D] \qquad\qquad \text{feedback: } y[n] = x[n] + g\,y[n-D]$

// feedforward: a single echo
fn echoOnce(self: *Delay, x: f32, d: usize, g: f32) f32 {
    const y = x + g * self.read(d);
    self.write(x); // store the INPUT
    return y;
}

// feedback: repeating, decaying echoes
fn echoFeedback(self: *Delay, x: f32, d: usize, g: f32) f32 {
    const y = x + g * self.read(d);
    self.write(y); // store the OUTPUT — the one-line change
    return y;
}

buf[wp] = in[i];    // feedforward: store the INPUT
buf[wp] = out[i];   // feedback:    store the OUTPUT

Math note — why feedback repeats, and the stability rule. Because the output is fed back in, an impulse comes out at $1, g, g^2, g^3, \dots$ spaced $D$ samples apart — a geometric series. If $|g| < 1$ the echoes shrink and die away; if $|g| \ge 1$ they grow without bound and the signal explodes. So keep feedback below 1.0. (This is an infinite impulse response / IIR comb filter — output depends on past outputs.) A reverb is essentially several of these tuned and combined.

5 · Dry/wet control

A “mix” knob blends the untouched (dry) signal with the processed (wet) one. With a single drywet in $[0,1]$ :

fn mixDryWet(dry_sig: f32, wet_sig: f32, knob: f32) f32 {
    const dry = 1.0 - knob; // knob = 0 → all dry
    const wet = knob;       // knob = 1 → all wet
    return dry * dry_sig + wet * wet_sig;
}

// feedforward
sig_dry = in[i];
sig_wet = gain*buf[rp];
out[i] = dry*sig_dry + wet*sig_wet;

// from a single knob
dry = 1 - drywet;
wet = drywet;

Math note — the mix is a crossfade. (1-k)·dry + k·wet is the same weighted-average we used for sample interpolation and table crossfading — here fading between two whole signals. At k = 0.5 you get equal parts. (For feedback, note the dry/wet blend changes only what you hear, not what gets written back into the buffer — otherwise you would alter the feedback path itself.)

6 · The big picture

You now have the core of nearly every time/space effect:

Delay line = memory of the past (the ring buffer).
Feedback = output routed back to input → echo, and the seed of reverb.
Modulated fractional delay = chorus, flanger, vibrato (drive the delay time with a slow LFO).
Many delays + feedback + filtering = reverb.

Feedforward and feedback delays are also called comb filters because their frequency responses look like a comb — the bridge to filters and reverb.

Exercises

Render a short plucked note (saw + ADSR from chapters 2 & 5), then run the whole buffer through echoFeedback with g = 0.3, 0.6, 0.85. Allocate ~2 s of trailing silence so the tail can ring out.
Tempo-sync the delay: at 120 BPM a quarter note is 0.5 s; set d = @intFromFloat(0.5 * sr) so echoes land on the beat.
Make a chorus: a short delay (~15 ms) whose time is modulated by a 0.5 Hz Phasor (chapter 3) sine, read with readFrac, mixed ~50/50. Then a flanger: shorten the delay (~3 ms), add feedback, speed the LFO.
Replace readFrac with a nearest-sample read and sweep the delay — hear the zipper noise that interpolation removes.