Real-Time Pitch Shifting with an Audio Pitch DirectShow Filter

Optimizing Audio Quality in a DirectShow Pitch Filter

Overview

Focus on minimizing artifacts (aliasing, zippering, phasing), preserving timbre, and keeping latency low for real-time use. Key areas: algorithm choice, resampling and interpolation, anti-aliasing, buffering/latency, threading, format handling, and testing.

1. Choose the right pitch-shifting algorithm

  • Time-domain (e.g., WSOLA, PSOLA): low CPU, good for small shifts, can produce transient artifacts.
  • Frequency-domain (e.g., phase vocoder, STFT-based): better for larger shifts and preserving harmonic structure, but higher latency and possible smearing.
  • Hybrid methods: combine transient preservation with frequency processing (best balance for quality).

2. Anti-aliasing and oversampling

  • Use band-limited processing or perform oversampling (2x–4x) before pitch change and downsample with proper low-pass filtering to reduce aliasing.
  • Apply high-quality FIR/IIR filters for resampling; prefer polyphase FIR for efficiency and phase linearity.

3. Interpolation and resampling

  • Use high-quality interpolation (e.g., windowed sinc, polyphase) when resampling audio buffers.
  • Avoid naive linear interpolation; it causes high-frequency loss and zipper noise.

4. Phase and transient handling

  • Preserve phase coherence across frames to avoid metallic/phasing artifacts; use phase-locking or phase propagation strategies in STFT approaches.
  • Detect transients and process them with time-domain methods (or bypass frequency-domain smoothing) to keep attacks sharp.

5. Windowing, FFT size and hop-size (for frequency methods)

  • Balance FFT size: larger FFT = better frequency resolution but more latency and smearing; smaller FFT = better temporal resolution.
  • Choose hop size relative to FFT size to control overlap-add completeness and phase vocoder stability. Typical overlaps: 4x (75%) for good quality.

6. Buffering, latency and real-time constraints

  • Minimize buffer sizes where possible, but not at the expense of artifacts. Expose a latency/quality tradeoff option.
  • Use low-latency audio APIs and prioritize real-time threads; avoid blocking I/O and heavy allocations in the audio thread.

7. Noise shaping and dithering

  • If converting bit depth, apply dithering and noise shaping to avoid quantization distortion.
  • Maintain sufficient internal processing precision (32-bit float or 64-bit) to reduce rounding errors.

8. Format handling and sample rates

  • Support multiple sample rates and channel layouts. Normalize internal processing to a single canonical format (e.g., 32-bit float interleaved) for consistency.
  • Handle channel mapping carefully for multichannel audio; consider per-channel processing or mid/side techniques for stereo.

9. CPU and memory optimizations

  • Use SIMD/vectorized math and efficient memory access patterns for convolution, FFT, and interpolation.
  • Cache precomputed window functions and filter coefficients.
  • Offer adjustable quality presets (low/medium/high) to scale CPU usage.

10. Testing and objective/subjective evaluation

  • Use objective metrics: SNR, log-spectral distance, and PESQ/ViSQOL where applicable.
  • Conduct listening tests with varied material (speech, solo instruments, complex music) and measure artifacts across pitch ranges.
  • Test extreme cases: large pitch shifts, quick real-time modulations, and low sample rates.

11. Integration specifics for DirectShow

  • Implement as an audio transform

Comments

Leave a Reply