01How the sound is made
Every sound you hear is synthesized on the fly - oscillators, a noise bed, a code-generated reverb, a feedback delay, a compressor. There are no audio files. Nothing is pre-recorded or looped; if you change the galaxy, the sound changes with it, because the sound is a reading of the galaxy.
Two pieces do the work. A pure-Rust music engine decides what to play:
each frame it takes a GalaxyState - the camera, the live controls, and above all the
galaxy's own core dynamics - and emits targets for a drone pad, the surrounding texture, and a
sparse stream of notes. A Web Audio layer is the how: it owns the node graph and
schedules notes ahead on the audio clock so timing stays sample-accurate even when frames stutter.
Keeping the musical logic free of any audio dependency means it can be unit-tested like any other
pure function - the same discipline the physics uses.
Visuals → GalaxyState → what to play (pure, testable) → how to play it
(Web Audio). The generative seed is fixed, so a given run of the galaxy sounds reproducible - and
the very same graph renders a finished piece offline, faster than real time.
02What we're aiming for - and the research behind it
The goal is narrow and worth stating plainly: a soundscape that is genuinely pleasant to leave on for a long time, that lends the galaxy a feeling of calm and a sense of scale, and that moves with the simulation so the two feel like one thing. Some of those aims rest on solid findings; others are design judgments informed by the research where it exists and honest where it runs out. Here is that line, drawn carefully:
- Consonance, mostly. In the classic psychoacoustic account, Plomp & Levelt (1965) modelled sensory dissonance as the roughness of partials beating within a critical band - which consonant intervals tend to avoid [1]. It is one factor among several: later work finds harmonicity and cultural familiarity matter at least as much [2][3]. The pad and starfield are tuned to octaves and fifths, which read as consonant on all of these accounts; the violent collisions deliberately let tension in.
- Keep clear of the ear's most sensitive band. Hearing is most acute in the low kHz, with the threshold reaching its minimum around 3–4 kHz (an ear-canal resonance) [4]; energy up there also drives perceived sharpness, a known contributor to acoustic annoyance [5]. So the sparkle sits below that band, and twinkles rather than holds (§05).
- A slow, regular swell. Voluntary breathing near six breaths a minute (~0.1 Hz) is, for most people, close to a cardiovascular resonance where heart rate and breath reinforce each other, producing large heart-rate swings (high HRV) and strong cardiac-vagal activity [6][7]. The bed swells at 0.1 Hz as a gentle cue - an invitation, not a proven reflex (§04).
- Slow tempo. Faster musical tempo tends to raise arousal; slower or meditative music tends to lower it [10][11]. So the note grid stays slow (§06).
- Soft onsets. The acoustic startle reflex is triggered by sudden, fast-rising onsets, and lengthening the rise time suppresses it [12]. So every note fades in and the bed has no transients.
- A reach for scale. Awe is the response to perceived vastness that the mind must stretch to accommodate [13]. The design reaches for sonic spaciousness - reverberant depth and width - to suggest that scale, though linking these textures to the emotion itself is a hypothesis, not a result (§07).
- No melody to track. Ambient music, in Eno's framing, should be "as ignorable as it is interesting" [16]; the melody is a low-surprise random walk over the scale, never a hook.
These are design heuristics drawn from the literature, applied to a generative art toy - not a clinical or therapeutic instrument, and nothing here is a health claim. Where the evidence is strong, the design leans on it. Where it runs out - does a sound swell actually pace your breath? do these textures truly evoke awe? - the article says so rather than dressing a hope as a finding.
03The layers and signal flow
The bed is a small ensemble of synth voices feeding one master chain, with three parallel sends that paint the surrounding space. The pad and the starfield share a stereo field pan that swings with the camera's orbit. The sub-bass stays centred; the noise bed is a very quiet, low-passed stereo wash. A fixed subsonic high-pass protects the live mix from inaudible rumble, and the offline master also sums the deep bass to mono so exported pieces translate cleanly.
The shimmer send is a neat trick. Tapping the pad through a
2x² − 1 transfer curve turns a sine into its octave - a clean frequency doubler - then a
band-pass keeps only the upper glow and feeds it to the reverb, for a cosmic sheen that floats over the
pad without pushing into harshness:
// A 2x² − 1 transfer curve: feed it a sine and out comes its octave.
fn octave_up_shaper(ctx: &BaseAudioContext) -> Option<WaveShaperNode> {
let shaper = WaveShaperNode::new(ctx).ok()?;
let mut curve = vec![0.0_f32; 1024];
let last = curve.len() as f32 - 1.0;
for (i, c) in curve.iter_mut().enumerate() {
let x = -1.0 + 2.0 * i as f32 / last;
*c = 2.0 * x * x - 1.0;
}
shaper.set_curve_opt_f32_slice(Some(&mut curve));
// 4× oversampling keeps the doubling clean instead of aliased and gritty.
shaper.set_oversample(OverSampleType::N4x);
Some(shaper)
}
04Breathing at six breaths a minute
The clearest thread in the relaxation research is slow breathing. Breathing voluntarily at about six breaths a minute - 0.1 Hz - sits, for most people, near a cardiovascular resonance frequency, where heart rate and breath line up and reinforce each other through the baroreflex; this produces the largest heart-rate swings (high HRV) and strong cardiac-vagal activity [6][7]. The exact frequency is individual - roughly 4.5–7 breaths a minute - so six is a sensible average, not a universal optimum.
You cannot make someone breathe slowly. What the soundscape offers instead is something to settle
into: the entire sustained bed rises and falls on a 10-second cycle - a gentle +0.15
swell on top of its level.
// ~0.1 Hz - six breaths a minute, near voluntary HRV-breathing resonance.
// A calm design cue, not a measured biofeedback loop.
const BREATH_RATE: f32 = TAU / 10.0;
fn breathing(t: f32) -> f32 { lfo(t, BREATH_RATE, 0.0) } // 0..1
// A gentle amplitude swell on the whole sustained bed (pad + sub):
let breath = 0.85 + 0.15 * breathing(now);
ramp(&self.drone_gain.gain(), d.gain * breath, now);
That breathing voluntarily at ~6/min drives the resonance is well established [6]. That a passive amplitude swell makes you breathe along with it is not - it hasn't been directly tested. The evidence that sound shifts breathing comes from musical tempo and rhythm [8] and from actively reciting at a slow pace, such as prayer or mantra [9], not from a loudness envelope. So treat the swell as a plausible cue and a design choice - an open door, not a mechanism that locks your breath to it.
The same 0.1 Hz swell is mirrored, in phase, on the master output, so the whole mix - not just the bed - quietly rises and falls together.
05Staying below the ear's most sensitive band
Early versions had a starfield up in the low-kHz region that, over minutes, became a fatiguing whine. That is the most sensitive part of human hearing: the threshold of hearing reaches its minimum around 3–4 kHz, lifted by the resonance of the ear canal [4]. Energy there also weighs disproportionately on perceived sharpness, a known contributor to acoustic annoyance [5] - and it's where the ear is most vulnerable to noise damage (the classic "4 kHz notch"). A sound can be perfectly in tune and still wear on you simply by living there.
The fix is twofold. First, the twinkling voices are tuned to octaves and fifths well below the most sensitive band, so they stay consonant and warm:
// Frequency multipliers for the 5 starfield voices, relative to the low pad root
// - octaves and fifths, so they stay consonant. The oscillators are clamped below
// ~2.2 kHz, keeping sustained sparkle under the most sensitive part of hearing.
const STAR_MULT: [f32; 5] = [8.0, 12.0, 16.0, 24.0, 32.0];
Second, each star twinkles - every voice rides its own slow LFO and dips to silence rather than holding a steady tone, because a sustained pure tone is far more wearing than an intermittent one. The shimmer sheen sits lower now (a gentle band-pass near 1.8 kHz), and it is kept as a quiet, reverberant glint rather than a continuous presence, so it reads as space, not stridency.
06How the galaxy drives the sound
What makes it feel alive is that the controls of the instrument are the galaxy's own numbers. A small, throttled GPU read-back (the same one the engineering calls the one sanctioned exception to keeping state on the GPU) returns how much mass has gathered at the core, whether matter is falling in or streaming out, how much it is churning, and whether that motion is organised or random. Those, with the camera and the live sliders, drive the sound:
Two of these mappings are worth seeing in code. Radial flux makes the pad breathe with the galaxy's own collapse; coherence decides whether the voices lock into one clear tone or smear into a wide, beating shimmer - turning the consonance/dissonance dial straight from the physics:
// Radial flux makes the pad *breathe*: matter collapsing inward (flux < 0)
// lifts it into tension; matter streaming back out (flux > 0) lets it settle.
let bend = (gravity - 0.5) * 4.5 + (halo - 0.5) * 1.2 - 3.4 * core_flux;
// Coherence focuses the voices: an organised collapse pulls them into one clear
// tone; a hot, random core (coherence → 0) detunes them into a wide beating shimmer.
let churn_detune = core_activity * (6.0 + 12.0 * (1.0 - coherence));
The scenario sets the musical character: the lone disk and the M51 flyby get serene, consonant modes (a major pentatonic, or Lydian held calm over the drama), while the violent collisions shift to darker, busier modes (Dorian, Phrygian) with more tension in the pad. And the tempo follows the research directly - faster tempo tends to raise arousal [10][11], so the grid stays slow (~50–85 steps per minute) and each step emits at most one note or sparkle; it never turns frantic, even at full simulation speed:
// ~50 BPM (1.3 s) when calm, easing only to ~85 BPM at full speed - slow and
// never frantic. Slow tempo is the useful arousal lever.
pub fn step_seconds(&self, state: &GalaxyState) -> f64 {
(1.3 - 0.6 * state.speed.clamp(0.0, 1.0)).max(0.7)
}
07Vastness and awe
Calm is only half of it; the other goal is a sense of awe. Awe, in Keltner & Haidt's account, is the response to perceived vastness - something large enough that the mind has to stretch to take it in [13]. There's a real perceptual hook to reach for: reverberation and stereo width genuinely shape how large and distant a space sounds. So the design leans on spaciousness - a long reverb, voices spread wide, weight at the bottom (a sub-bass floored at 36 Hz, below which many playback systems mostly spend headroom on rumble) - to suggest that scale.
The reverb is generated entirely in code, and its realism comes from one detail: the diffuse tail darkens as it decays, because a real hall absorbs high frequencies faster than low ones. A one-pole filter whose cutoff falls over time does exactly that:
let mut lp = 0.0;
for i in 0..len {
let t = i as f32 * dt;
// Cutoff closes from bright to dark over ~1 s - highs absorbed first.
let cutoff = 0.04 + 0.5 * (-t / 1.1).exp();
lp += cutoff * (noise() - lp); // one-pole low-pass on the noise
buf[i] = lp * (-t / 3.2).exp() * onset; // × long decay, soft onset
}
Reverberation and width really do change perceived space - but linking those textures to the emotion of awe is a design hypothesis, not a demonstrated effect - and a calm, consonant piece forgoes the swelling dynamics and harmonic tension that more usually drive intense musical emotion. Low frequencies are no shortcut either: overly long, cavernous reverb can be heard as unpleasant rather than grand [14], and heavy sub-bass/infrasound tends toward unease [15]. So the sub-bass here is kept modest, and the reverb lush but not murky - choices aimed at scale without unease, not a proven recipe for awe.
The dynamics expand slowly - the breath, the swelling core, layers opening as you zoom - rather than arriving as crescendos: scale without startle.
08Recording and mastering
The same graph that plays live can render a finished piece offline, faster than real
time and free of any frame-rate glitches, by replaying a recorded timeline of the galaxy through an
OfflineAudioContext. What comes out is then mastered by a pure, native-tested DSP module
- no Web Audio involved, so it unit-tests like the physics does.
Mastering means making the piece translate across earbuds, laptops and big speakers alike. It is measured with the broadcast loudness standard (ITU-R BS.1770 K-weighting), normalised to a streaming target, and held under a true-peak ceiling so nothing clips after lossy encoding:
impl Default for MasterSettings {
fn default() -> Self {
Self {
sample_rate: 48_000,
target_lufs: -16.0, // integrated loudness (BS.1770)
true_peak_ceiling_db: -1.0, // oversampled true-peak headroom
}
}
}
Around that, the chain sums the very low end to mono (so the sub is solid on any system), trims sub-30 Hz rumble again at the file level, applies raised-cosine fades, and writes a 24-bit WAV. The result is a finished audiovisual piece - the same arc on screen and in the speakers, rendered from a single seed.
09What's evidence-based, and what's taste
As with the physics, it's worth being clear about the line between the research-grounded choices, the design hypotheses, and pure taste.
Grounded in the literature:
- consonant tuning to reduce critical-band roughness - one factor in consonance, alongside harmonicity and cultural familiarity [1][2][3];
- keeping sustained energy below the most sensitive part of hearing (~3–4 kHz), where sharpness drives annoyance [4][5];
- a slow tempo, because faster tempo tends to raise arousal [10][11];
- matching mode to mood - brighter modes for the calm scenes, darker ones for the collisions - since musical mode shifts perceived valence [10];
- soft, slow note onsets, because fast-rising onsets trigger the startle reflex [12];
- 0.1 Hz as the swell rate - the resonance of voluntary slow breathing [6][7].
Design hypotheses - plausible, not established:
- that a passive amplitude swell would pace the listener's breath; the sound→breathing evidence is for tempo/rhythm and active recitation, not a loudness envelope [8][9];
- that reverberant depth, width and sub-bass evoke awe specifically; they shape perceived space, but the emotional link isn't demonstrated and the evidence on long reverb and heavy lows is mixed at best [13][14][15];
- a non-repeating melodic wander, in the generative-ambient tradition [16].
Taste, tuned by ear: the specific scale chosen for each scenario, the exact weighting of every galaxy signal onto every parameter, the timbres of the voices, the length and colour of the reverb, and the overall balance. The research says which dials matter; where to set them is a creative choice.
The sound is a real-time reading of the simulation, shaped by the research where it's solid and by ear where it isn't. None of it is pre-recorded, and none of it is random noise dressed up.
10References
- Plomp, R. & Levelt, W. J. M. (1965). “Tonal Consonance and Critical Bandwidth.” Journal of the Acoustical Society of America 38(4), 548–560.
- McDermott, J. H., Lehr, A. J. & Oxenham, A. J. (2010). “Individual Differences Reveal the Basis of Consonance.” Current Biology 20(11), 1035–1041.
- McDermott, J. H., Schultz, A. F., Undurraga, E. A. & Godoy, R. A. (2016). “Indifference to dissonance in native Amazonians reveals cultural variation in music perception.” Nature 535, 547–550.
- ISO 226:2023, “Acoustics - Normal equal-loudness-level contours” (Int'l Org. for Standardization); see also Moore, B. C. J., An Introduction to the Psychology of Hearing, on the ~3–4 kHz threshold minimum.
- Fastl, H. & Zwicker, E. (2007). Psychoacoustics: Facts and Models (3rd ed.), Springer - on sharpness and psychoacoustic annoyance.
- Lehrer, P. M. & Gevirtz, R. (2014). “Heart rate variability biofeedback: how and why does it work?” Frontiers in Psychology 5, 756.
- Vaschillo, E. G., Vaschillo, B. & Lehrer, P. M. (2006). “Characteristics of resonance in heart rate variability stimulated by biofeedback.” Applied Psychophysiology and Biofeedback 31(2), 129–142.
- Bernardi, L., Porta, C. & Sleight, P. (2006). “Cardiovascular, cerebrovascular, and respiratory changes induced by different types of music in musicians and non-musicians: the importance of silence.” Heart 92(4), 445–452.
- Bernardi, L. et al. (2001). “Effect of rosary prayer and yoga mantras on autonomic cardiovascular rhythms.” BMJ 323(7327), 1446–1449.
- Husain, G., Thompson, W. F. & Schellenberg, E. G. (2002). “Effects of musical tempo and mode on arousal, mood, and spatial abilities.” Music Perception 20(2), 151–171.
- Dillman Carpentier, F. R. & Potter, R. F. (2007). “Effects of Music on Physiological Arousal: Explorations into Tempo and Genre.” Media Psychology 10(3), 339–363.
- Blumenthal, T. D. et al. (2005). “Committee report: Guidelines for human startle eyeblink electromyographic studies.” Psychophysiology 42(1), 1–15.
- Keltner, D. & Haidt, J. (2003). “Approaching awe, a moral, spiritual, and aesthetic emotion.” Cognition & Emotion 17(2), 297–314.
- Västfjäll, D., Larsson, P. & Kleiner, M. (2002). “Emotion and auditory virtual environments: affect-based judgments of music reproduced with virtual reverberation times.” CyberPsychology & Behavior 5(1), 19–32.
- Mühlhans, J. H. (2017). “Low frequency and infrasound: A critical review of the myths, misbeliefs and their relevance to music perception research.” Musicae Scientiae 21(3), 267–286.
- Eno, B. (1978). Ambient 1: Music for Airports, sleeve notes - ambient music as “as ignorable as it is interesting.”