Estimation functions¶
Functions¶
- pitchmeld.estimation_f0(wav: ndarray, fs: float, winlen: int = fs * 2 / f0_min, timestep: int = fs * 0.005, f0_min: float = 55.0, f0_max: float = 1760.0) ndarray[float32]¶
Note
It assumes the signal is monophonic, like a voice, a flute, a violin, saxophone, etc.
It makes no sense to use it on polyphonic signals like a piano, a guitar, a drum set, etc.
- Parameters:
wav – Input signal to transform. There is no reason to support multichannel in this function. This function will average the channels and process the signal as a monophonic signal.
fs – Sampling rate [Hz].
f0_min – Minimum value for the fundamental frequency [Hz, def. 55.0 (A1)].
f0_max – Maximum value for the fundamental frequency [Hz, def. 1760.0 (A6)].
Note
The following arguments below are used to optimize the processing speed. It is not recommended to change them unless you know what you are doing.
- Parameters:
winlen – Analysis window length [#samples, def. ceil(fs * 2 / f0_min)]. The bigger the value, the more stable the f0 estimate but the less precise it will be. This value is equivalent to
winlen_innerinpitchmeld.transform.timestep – Step/Hop size from one frame to the next [#samples, def. floor(0.005*fs)]. This is the time step from one estimation frame to the next.
- Returns:
- ndarray[float32] - An array of shape (N, 3) where N is the number of frames.
First column is the time in seconds of the center of the estimation window. Second column is the fundamental frequency in Hz. (third column TBD).
- Example:
import pitchmeld import soundfile import numpy as np wav, fs = soundfile.read('path/to/audio.wav') f0s = pitchmeld.estimation_f0(wav, fs) print(f"Median f0 (voiced): {np.median(f0s[f0s[:, 1] > 0.0, 1])} Hz")