Estimation functions

Functions

pitchmeld.estimation_f0(wav: ndarray, fs: float, winlen: int = fs * 2 / f0_min, timestep: int = fs * 0.005, f0_min: float = 55.0, f0_max: float = 1760.0) ndarray[float32]

Note

It assumes the signal is monophonic, like a voice, a flute, a violin, saxophone, etc.

It makes no sense to use it on polyphonic signals like a piano, a guitar, a drum set, etc.

Parameters:
  • wav – Input signal to transform. There is no reason to support multichannel in this function. This function will average the channels and process the signal as a monophonic signal.

  • fs – Sampling rate [Hz].

  • f0_min – Minimum value for the fundamental frequency [Hz, def. 55.0 (A1)].

  • f0_max – Maximum value for the fundamental frequency [Hz, def. 1760.0 (A6)].

Note

The following arguments below are used to optimize the processing speed. It is not recommended to change them unless you know what you are doing.

Parameters:
  • winlen – Analysis window length [#samples, def. ceil(fs * 2 / f0_min)]. The bigger the value, the more stable the f0 estimate but the less precise it will be. This value is equivalent to winlen_inner in pitchmeld.transform.

  • timestep – Step/Hop size from one frame to the next [#samples, def. floor(0.005*fs)]. This is the time step from one estimation frame to the next.

Returns:

  • ndarray[float32] - An array of shape (N, 3) where N is the number of frames.

    First column is the time in seconds of the center of the estimation window. Second column is the fundamental frequency in Hz. (third column TBD).

Example:

import pitchmeld
import soundfile
import numpy as np
wav, fs = soundfile.read('path/to/audio.wav')
f0s = pitchmeld.estimation_f0(wav, fs)
print(f"Median f0 (voiced): {np.median(f0s[f0s[:, 1] > 0.0, 1])} Hz")