00:00
00:00
Wilieu
profile by BeatumPopcorn

Joined on 4/28/20

Level:
11
Exp Points:
1,309 / 1,350
Exp Rank:
49,553
Vote Power:
5.35 votes
Audio Scouts
4
Art Scouts
2
Rank:
Civilian
Global Rank:
> 100,000
Blams:
0
Saves:
1
B/P Bonus:
0%
Whistle:
Normal
Medals:
2
Supporter:
6m 30d

Altering these settings may filter what you see.

Latest Art

More

Latest News



I've tried my hand at making an educational/info-dump YouTube video all about spectral processing!

I'm also including a text/image version here for Newgrounds.

I have plans of making a week long sound-design/synthesis course to teach at a local institute, so this is somewhat of a test.


The portrait of myself used in the video was made by BeatumPopcorn



What is ‘Spectral Processing'?

This is a term you have probably heard before if you have an interest in audio synthesis - You may be wondering what it is, or how to apply it yourself.


Spectral processing is a method of audio editing that can yield many unique and highly manipulatable resulting sounds. It is used by sound-designers and experimental producers for creative effects, and audio-engineers for corrective applications - though this does tend to be less common.


The word spectral has some other names you may have heard, all with much the same meaning; Spectrograms, Frequency Spectra, FFT processing, and other such terms.

The word FFT is perhaps the most useful place to start, so let’s discuss; what is an FFT?


iu_1148605_7979507.jpg


FFT stands for Fast Fourier Transform.

At its most simple, an FFT is a process for transforming a waveform into its component frequencies.

Instead of processing a signal as a waveform, you are processing it as a collection of sine-wave frequencies - Called a ‘Spectra’.


Any complex waveform can be expressed as a sum of sine waves. These sine waves can be manipulated individually, before being recombined into a summed signal. This level of per-frequency granularity is what makes spectral processing so unique and powerful.

Frequency-based processing, rather than the more conventional time-based processing, opens up a new dimension of how sound can be processed.


Let’s start with some background information about the FFT.

‘Spectral’, ‘spectra’, or ‘spectrum’ get their name from a similarity to how light waves in a prism can be split into their component colours - Their frequencies.


iu_1148606_7979507.png

https://intelligentdesigneronline.com/drhoades/advanced/labs/emission_lab/emissionLab2.php?debug=1


In an FFT, the audio waves are doing much the same thing; the FFT is a mathematical algorithm that allows any complex waveform to be split into their component sine-waves.

Or, more specifically, the complex wave is split into an arbitrary number of frequency bands, called ‘bins’, with the number of bins depending on the spectral resolution of the calculation.


iu_1148607_7979507.webp

https://www.mwmresearchgroup.org/blog/key-concepts-fourier-transforms-and-signal-processing


The FFT is to audio what a prism is to light.


iu_1148609_7979507.jpg


Fast vs Discrete FFT's:

The ‘Fast Fourier Transform’ is a more efficient and less computationally taxing iteration of the ‘Discrete Fourier Transform’. Both algorithms achieve the same result, but the fast variant throws out redundant steps in the calculations to operate far more quickly - with less computational demand.

For almost all purposes, the FFT is preferable over its discrete counterpart.


If you decide to delve into FFT-based processing of audio, you should be warned that it can be a very CPU demanding process.

iu_1148608_7979507.png

One can make it as computationally demanding as you want depending on certain parameters such as resolution buffer and FFT sizes, but it still tends to be a lot more taxing than conventional waveform-based processing methods.


The FFT in audio processing does however have a fundamental limitation.

As the resolution of time becomes more detailed, the resolution of frequency becomes less so - and vice versa.

Due to this limitation, if you want an FFT to have the most granular resolution of frequency it can, you will be lacking in transient responsiveness.

Conversely, if you want an FFT to provide the best transient response it can, you will be sacrificing the frequency resolution.


This relationship between horizontal and vertical resolution is represented by the FFT size, a number typically ranging from around 128 to 16348 and their multiples between.

Theoretically however, these numbers can extend to any size necessary.


iu_1148610_7979507.jpg

iu_1148611_7979507.jpg


The higher the number, the more frequency resolution is preserved; The lower the number, the more time resolution is preserved.

Lower numbers work better for inharmonic transient elements such as percussion, and higher FFT sizes tend to work better for harmonic pads and other such sustaining elements.


A typical number that represents an acceptable ratio for both of these resolutions is 2048, and is what many FFT based audio processors will be operating at by default.


iu_1148612_7979507.jpg


But, are there any ways of getting around this limitation of resolution?

One possible work-around to this issue is via the use of vocoders.


A vocoder is a digital-signal-processor that allows the formants - or rather the frequency-amplitude over time - of one signal, to be applied to the harmonic content of another.

It was first developed for compression of vocal signals for transmission over long telephone lines. Later, the U.S military adopted it for encryption of spoken information, before eventually - far later - being utilised by musicians - For example, the iconic vocal sound of Daft Punk.


It’s worth noting that the vocoder does not use FFT processing to achieve this result, and can thus be achieved via analogue hardware. However, later digital-exclusive variants such as the phase vocoder do operate on FFT algorithms; Formants being applied over frequency bins, depending on spectral resolution.

This concept has been used in other ways via proprietary processing methods, by the likes of MeldaProduction and Zynaptiq to create 'morph' plugins.

These take the spectral image of one signal, and apply vocoder-like sharing of timbral attributes to continuously and smoothly transition from one sound to the other.

In theory, one could apply the formants of a transient-prioritised, FFT-processed signal, with the tonal, sustaining content of its frequency-prioritised counterpart. This would ideally create a resulting signal that preserves frequency and transient resolutions in greater detail.


Unfortunately however, after testing this somewhat myself, I was unable to get the desired result. Rather than preserving the best of both resolutions, it ended up accentuating their flaws. I’m unsure if this is just a limitation of FFT processing. Specially, since the vocoder I chose to use is a phase-vocoder using FFT processing itself, perhaps was simply imposing the same bottlenecks as the initial spectra. It’s possible I just approached this process incorrectly, as I don’t even know if it’s possible.

Either way, this will need some more testing. If any of you know any information about this, or how to apply it better, let me know!


Moving on from that tangent, how do you visualise the data that is produced by a Fast Fourier transform?

The resulting graph that an FFT process produces is called a spectrogram, or is sometimes in more mathematical contexts referred to by a less helpful name; a ‘Frequency-Domain Graph’.


What is the meaning of Frequency-Domain?


A waveform produced by the likes of an oscilloscope represents two axes - Time, and Amplitude. The graph therefore shows a representation of how signal amplitude changes over time.

A graph like this is operating in the time domain, indicated by the fact that the amplitude is changing over time.

For this reason, an oscilloscope or waveform can also be called a ‘Time-Domain Graph’.

iu_1148613_7979507.jpg

https://gribblelab.org/teaching/scicomp2014/09_Signals_sampling_filtering.html


On the other hand, in a Frequency-Domain Graph, the axes are frequency and amplitude.

The resulting graph conveys the amplitude level of every frequency, hence the name Frequency-Domain Graph. It is important to note that a frequency is just a sine-wave.

The sine-wave is the simplest sound; it has no harmonics, and on a frequency-domain graph it can be seen as a single line. ‘Frequency’ and ’Sine-wave’ are often interchangeable terms.


iu_1148614_7979507.jpg

https://elvers.us/perception/soundWave/


The single snapshot of time that a standard frequency-domain graph shows is often not the most useful for audio analysis purposes, as the most important aspect to consider in sound is often the change in frequency amplitude over time.

Because of the frequency-domain graph’s limitation of not conveying time, another way of displaying this data exists - the spectrogram.


iu_1148615_7979507.webp

A Spectrogram is a slight variation of the frequency-domain graph, showing how the plot as a whole changes over time by adding the axis of time back into the graph.

‘EQ-like' frequency curves can also accomplish change-over-time via animation.


Spectrograms are very useful for visualising harmonics in sound, understanding it’s components, and what causes the sound’s timbral characteristics.

The spectrogram shows transient time information in it’s verticality, and frequency information in its horizontality.


The spectrogram can be thought of as highly detailed sheet-music; An image that visually describes a piece of sound. Instead of having 88 ‘keys’ to play, you have 20,000 or more discrete sine-waves - one for every discernible frequency in human hearing.

Following this analogy, instead of having discrete start and end points for notes, each sine wave follows a continuously variable change in amplitude over time.

The X axis is typically time, Y being frequency, and the brightness (or three dimensional height, in waterfall plots) shows amplitude.


This thought-process of seeing spectrograms as sheet-music has been applied somewhat literally by some musical movements.

Free-form styles of sheet music, written for orchestral performances, can on occasion contain audio spectrograms printed to the page in certain sections.


iu_1148616_7979507.jpg

https://www.tandfonline.com/doi/full/10.1080/09298215.2023.2174144


These are to be audibly replicated by a speaker system on stage, or loosely interpreted by performing musicians.


What is spectral processing typically used for, and what can it achieve?

Remember that spectral processing is about manipulating these now-separated sine-wave frequencies.

With that said, spectral processing’s use-cases can be split into three categories; Corrective, Creative, and Analysis.


iu_1148617_7979507.png


Let’s start with corrective.

Corrective uses of spectral DSP include:

  • Noise reduction.
  • Per-frequency dynamics (such as compression and gating).
  • Removal of unwanted frequencies and ‘clicks'.
  • Fundamental-frequency shifting.
  • Spectral slotting (Via per-frequency side-chain-compression).


...and perhaps the most useful method - stem splitting.


Using machine-learning or comparing to common timbral characteristics via pre-made algorithms, mixed-down music can be separated into its component instrument stems retroactively, with reasonable levels of quality maintained.


More direct methods of this include:

  • Separating signals via frequency-amplitude scales.
  • Splitting the tonal content from the noise.
  • Separating the transient content from the sustaining.


Removal of noise via spectral-processing tends to operate via the attenuation of frequencies below an amplitude threshold.


iu_1148618_7979507.png


This technique of applying spectral-gating via per-frequency dynamics processing is also found somewhat similarly as one of the components of mp3 file compression; Unneeded frequencies are discarded while minimally changing the perceived sound of the composite signal, reducing file-size.


For the removal of unwanted frequencies and ‘clicks’ in mixed signal, selections can be made in most FFT software to encompass the offending regions for subsequent attenuation.

Spectral methods for audio clean-up tend to be viewed as a last resort however, due it's tendency to be highly sonically charactered in it’s results, depending on the use-case.

This draws attention, where the goal of corrective processing is to be sonically transparent.


As such, more conventional processing techniques are almost always applied before the jump to spectral tools.


As an alternative option, the FFT editor by Steinberg, SpectraLayers, offers spectral ‘healing’ which works well for this.

Like in photoshop with the heal tool, the heal tool in SpectraLayers fills-in and blurs frequency content in a specified region, replacing with neighbouring content in an effort to restore the region in a transparent manner.


Finally, Fundamental frequency shifting has been used for maintaining audibility of sound when heard on varying devices that tend to lack low-end representation - Such as smartphones.

Frequency shifting in this manner is most often heard in sound produced for cinema; sub-bass frequencies that would be inaudible on low-end lacking speakers are shifted into the audible range, typically an octave up.

This allows viewers on these devices to hear the low-end information necessary to convey the story of a scene, despite their devices not being capable of sub-frequency representation.


Now for the more creative uses!

Spectral processing can be used artistically in any way you think of and program.


Some common examples are:

  • Dynamic changing of sine-wave amplitudes (Via per-frequency level curves).
  • Time and frequency remapping of sine-waves.
  • Spectral ‘crowding’ via layering and detuning.
  • Side-chain routing for morphing of signal attributes.


A famous example of software for creative spectral-processing is the PaulStretch algorithm, developed by Nasca Octavian Paul. This algorithm stretches the signal via spectral processing for long, pad-like results.


A similar version of this effect can also be heard in one of FL Studio’s resampling algorithms, ‘Stretch’.


Analysis is the use for spectral processing that the most people are likely to be familiar with.

When producing music or editing audio in any way, it can be useful to have as many different kinds of visual audio monitors as one can.


These monitors include:

  • Vectorscopes.
  • Oscilloscopes.
  • Level-meters.
  • ‘EQ-style' frequency curves.
  • Spectrograms.


The spectrogram specifically is the one I find the most useful, allowing detailed analysis of sounds and their component parts in a visual manner, as well as visualising the greater song structure sonically.


An interesting thing to note; There are software tools for turning image files into audio spectra; This is how people get images, watermarks and hidden messages to appear in their music when viewed as a spectrogram!


iu_1148619_7979507.webp


If you want to play about with spectral audio-processing, what software is available?

For this section, I will first need to clarify the difference between the terms ‘offline’ and ‘online’ as it pertains to audio processing.


In audio, online means processing that operates in real-time, while offline means processing that requires rendering to be monitored; There is no real-time audio preview and it cannot be used live.


There are however advantages to using offline editors, such as for FFT editing.


Spectral-processing often uses significantly more processing power than conventional audio processing methods, so removing the need to process in real-time can allow for significantly more detailed FFT resolution and effects without needing to worry about latency or CPU headroom.


Now with that clarification, here are some reccomendations:

  • SpectraLayers by Steinberg.
  • RX by iZotope.
  • MetaSynth by U&I Software.

...are three excellent offline FFT editors!


SpectraLayers specifically is the one I have the most experience with, and operates quite similarly to Adobe Photoshop of all things. With the addition of a graphics tablet, one can edit audio in much the same way as drawing, giving a very unique and fluid approach to audio recovery - or the production of experimental music.


Typically however, online editors are lot more versatile, intuitive, and time-efficient to use, while achieving most of the same results as their offline counterparts. The ability to hear the processing instantly greatly reduces the production time needed to get the results you want. Despite this, online spectral processors still tend to use a lot more processing power compared to conventional alternatives.

They can induce a high amount of latency when operating a large number of spectral processors at once, especially at high FFT resolutions.


Some great online spectral processors that I have experience with are; SpecOps by Unfiltered Audio, Spectral Suite by Andrew Reeman, and MCompleteBundle by MeldaProduction.




This has been a summary of my current thoughts and knowledge about spectral processing in audio!


Software mentioned:

MeldaProduction - MCompleteBundle: https://www.meldaproduction.com/MComp...

Sonosaurus - PaulXStretch: https://sonosaurus.com/paulxstretch/

U&I Software - MetaSynth: https://uisoftware.com/metasynth/

iZotope - RX: https://www.izotope.com/en/products/r...

Steinberg - SpectraLayers: https://www.steinberg.net/spectralayers/

Andrew Reeman - SpectralSuite: https://www.andrewreeman.com/spectral...

Unfiltered Audio - SpecOps https://www.unfilteredaudio.com/produ...

Image Line - Fruity Loops Studio: https://www.image-line.com/


iu_1148620_7979507.png


8