Skip to content

Feature extraction

Detailed information on implemented features (both time domain and frequency domain) is available here.

sleepecg.extract_features(records, lookback=0, lookforward=30, sleep_stage_duration=30, feature_selection=None, fs_rri_resample=4, min_rri=None, max_rri=None, max_nans=0, n_jobs=1)

Calculate features from sleep data (e.g. heart rate).

Time and frequency domain heart rate variability (HRV) features are calculated based on the following publications (see feature extraction for available features and feature groups.):

  • Task Force of the European Society of Cardiology. (1996). Heart rate variability: standards of measurement, physiological interpretation and clinical use. Circulation, 93, 1043-1065. https://doi.org/10.1161/01.CIR.93.5.1043
  • Shaffer, F., & Ginsberg, J. P. (2017). An overview of heart rate variability metrics and norms. Frontiers in Public Health, 258. https://doi.org/10.3389/fpubh.2017.00258
  • Toichi, M., Sugiura, T., Murai, T., & Sengoku, A. (1997). A new method of assessing cardiac autonomic function and its comparison with spectral analysis and coefficient of variation of R–R interval. Journal of the Autonomic Nervous System, 62(1-2), 79-84. https://doi.org/10.1016/S0165-1838(96)00112-9

Parameters:

  • records (Iterable[SleepRecord]) –

    An iterable of SleepRecord objects as yielded by the various reader functions in SleepECG.

  • lookback (int) –

    Backward extension of the analysis window from each sleep stage time in seconds, by default 0.

  • lookforward (int) –

    Forward extension of the analysis window from each sleep stage time in seconds, by default 30.

  • sleep_stage_duration (int) –

    Duration of a single sleep stage in the returned stages in seconds, by default 30.

  • feature_selection (list[str]) –

    Which features to extract. Can be feature groups or single feature identifiers, as listed in feature extraction. If None (default), all possible features are extracted.

  • fs_rri_resample (float) –

    Frequency in Hz at which the RRI time series should be resampled before spectral analysis. Only relevant for frequency domain features, by default 4.

  • min_rri (Optional[float]) –

    Minimum RRI value in seconds to be considered valid. Will be passed to preprocess_rri(), by default None.

  • max_rri (Optional[float]) –

    Maximum RRI value in seconds to be considered valid. Will be passed to preprocess_rri(), by default None.

  • max_nans (float) –

    Maximum fraction of NaNs in an analysis window for which frequency features are computed. Should be a value between 0 and 1, by default 0.

  • n_jobs (int) –

    The number of jobs to run in parallel. If 1 (default), no parallelism is used; -1 means using all processors.

Returns:

  • features( list[ndarray] ) –

    A list containing feature matrices, which are arrays of shape (len(sleep_stages), <num_features>) and contain the extracted features per record.

  • stages( list[ndarray | None] ) –

    A list containing label vectors, i.e. the annotated sleep stages. For any SleepRecord without annotated stages, the corresponding list entry will be None.

  • feature_ids( list[str] ) –

    A list containing the identifiers of the extracted features. Feature groups passed in feature_selection are expanded to all individual features they contain. The order matches the column order of the feature matrix.

Source code in sleepecg/feature_extraction.py
def extract_features(
    records: Iterable[SleepRecord],
    lookback: int = 0,
    lookforward: int = 30,
    sleep_stage_duration: int = 30,
    feature_selection: Optional[list[str]] = None,
    fs_rri_resample: float = 4,
    min_rri: Optional[float] = None,
    max_rri: Optional[float] = None,
    max_nans: float = 0,
    n_jobs: int = 1,
) -> tuple[list[np.ndarray], list[np.ndarray | None], list[str]]:
    """
    Calculate features from sleep data (e.g. heart rate).

    Time and frequency domain heart rate variability (HRV) features are calculated based on
    the following publications (see [feature extraction](../feature_extraction.md) for
    available features and feature groups.):

    - Task Force of the European Society of Cardiology. (1996). Heart rate variability:
      standards of measurement, physiological interpretation and clinical use. Circulation,
      93, 1043-1065. https://doi.org/10.1161/01.CIR.93.5.1043
    - Shaffer, F., & Ginsberg, J. P. (2017). An overview of heart rate variability metrics
      and norms. Frontiers in Public Health, 258. https://doi.org/10.3389/fpubh.2017.00258
    - Toichi, M., Sugiura, T., Murai, T., & Sengoku, A. (1997). A new method of assessing
      cardiac autonomic function and its comparison with spectral analysis and coefficient
      of variation of R–R interval. Journal of the Autonomic Nervous System, 62(1-2), 79-84.
      https://doi.org/10.1016/S0165-1838(96)00112-9

    Parameters
    ----------
    records : Iterable[SleepRecord]
        An iterable of `SleepRecord` objects as yielded by the various reader functions in
        SleepECG.
    lookback : int, optional
        Backward extension of the analysis window from each sleep stage time in seconds, by
        default `0`.
    lookforward : int, optional
        Forward extension of the analysis window from each sleep stage time in seconds, by
        default `30`.
    sleep_stage_duration : int, optional
        Duration of a single sleep stage in the returned `stages` in seconds, by default
        `30`.
    feature_selection : list[str], optional
        Which features to extract. Can be feature groups or single feature identifiers, as
        listed in [feature extraction](../feature_extraction.md). If `None` (default), all
        possible features are extracted.
    fs_rri_resample : float, optional
        Frequency in Hz at which the RRI time series should be resampled before spectral
        analysis. Only relevant for frequency domain features, by default `4`.
    min_rri: float, optional
        Minimum RRI value in seconds to be considered valid. Will be passed to
        `preprocess_rri()`, by default `None`.
    max_rri: float, optional
        Maximum RRI value in seconds to be considered valid. Will be passed to
        `preprocess_rri()`, by default `None`.
    max_nans : float, optional
        Maximum fraction of NaNs in an analysis window for which frequency features are
        computed. Should be a value between `0` and `1`, by default `0`.
    n_jobs : int, optional
        The number of jobs to run in parallel. If `1` (default), no parallelism is used;
        `-1` means using all processors.

    Returns
    -------
    features : list[np.ndarray]
        A list containing feature matrices, which are arrays of shape
        `(len(sleep_stages), <num_features>)` and contain the extracted features per record.
    stages : list[np.ndarray | None]
        A list containing label vectors, i.e. the annotated sleep stages. For any
        `SleepRecord` without annotated stages, the corresponding list entry will be `None`.
    feature_ids : list[str]
        A list containing the identifiers of the extracted features. Feature groups passed
        in `feature_selection` are expanded to all individual features they contain. The
        order matches the column order of the feature matrix.
    """
    if feature_selection is None:
        feature_selection = list(_FEATURE_GROUPS)

    required_groups, feature_ids, col_indices = _parse_feature_selection(feature_selection)
    _check_frequencydomain_window_time(lookback + lookforward, feature_ids)

    # _extract_features_single has two return values, so the list returned by _parallel
    # needs to be unpacked
    Xy = _parallel(
        n_jobs,
        _extract_features_single,
        records,
        sleep_stage_duration,
        min_rri,
        max_rri,
        required_groups,
        lookback,
        lookforward,
        fs_rri_resample,
        max_nans,
        feature_ids,
        col_indices,
    )
    features = [X for X, _ in Xy]
    stages = [y for _, y in Xy]

    return features, stages, feature_ids

sleepecg.preprocess_rri(rri, min_rri=None, max_rri=None)

Replace invalid RRI samples with np.nan.

Parameters:

  • rri (ndarray) –

    An array containing consecutive RR interval lengths in seconds.

  • min_rri (float) –

    Minimum RRI in seconds to be considered valid. If None (default), no lower bounds check is performed.

  • max_rri (float) –

    Maximum RRI in seconds to be considered valid. If None (default), no upper bounds check is performed.

Returns:

  • ndarray

    The cleaned RRI series.

Examples:

Mask RR intervals outside the range of 0.4 to 2 s (= 30 to 150 bpm):

>>> from sleepecg import preprocess_rri
>>> preprocess_rri([0.5, 0.2, 0.8, 2.5, 0.6], min_rri=0.4, max_rri=2)
array([0.5, nan, 0.8, nan, 0.6])
Source code in sleepecg/feature_extraction.py
def preprocess_rri(
    rri: np.ndarray,
    min_rri: Optional[float] = None,
    max_rri: Optional[float] = None,
) -> np.ndarray:
    """
    Replace invalid RRI samples with `np.nan`.

    Parameters
    ----------
    rri : np.ndarray
        An array containing consecutive RR interval lengths in seconds.
    min_rri : float, optional
        Minimum RRI in seconds to be considered valid. If `None` (default), no lower bounds
        check is performed.
    max_rri : float, optional
        Maximum RRI in seconds to be considered valid. If `None` (default), no upper bounds
        check is performed.

    Returns
    -------
    np.ndarray
        The cleaned RRI series.

    Examples
    --------
    Mask RR intervals outside the range of 0.4 to 2 s (= 30 to 150 bpm):

    >>> from sleepecg import preprocess_rri
    >>> preprocess_rri([0.5, 0.2, 0.8, 2.5, 0.6], min_rri=0.4, max_rri=2)
    array([0.5, nan, 0.8, nan, 0.6])
    """
    rri = np.array(rri, dtype=float)  # make a copy
    if min_rri is not None:
        rri[rri < min_rri] = np.nan
    if max_rri is not None:
        rri[rri > max_rri] = np.nan
    return rri