Skip to content

Datasets

SleepECG provides reader functions for various datasets. All required files will be downloaded to the location specified by the data_dir argument (by default ~/.sleepecg/datasets). While all supported PhysioNet datasets are publicly accessible, all NSRR datasets require submitting a data access request.

Sleep readers

Reader Dataset name Annotated records Raw data size Access
read_mesa() Multi-Ethnic Study of Atherosclerosis 2056 385 GB request
read_shhs() Sleep Heart Health Study 8444 356 GB request
read_slpdb() MIT-BIH Polysomnographic Database 18 632 MB open

ECG readers

Reader Dataset name Records Signals Raw data size
read_gudb() Glasgow University ECG database 335 335 550 MB
read_ltdb() MIT-BIH Long-Term ECG Database 7 15 205 MB
read_mitdb() MIT-BIH Arrhythmia Database 48 96 98.5 MB

NSRR data access

To gain access to a dataset provided by the NSRR, complete the following steps:

  • Create an account here.
  • To create a data access request, either
    • go to the datasets overview and click on "Request Data Access" for the desired dataset on the right side, or
    • while browsing a dataset (e.g. MESA), click on "Request Data Access" at the top of the page, or
    • follow the "request" link in this table.
  • Fill out the data access request form and wait for approval (you will be notified via email, this can take a few days).
  • Once the request is approved, you can
    • download files manually from the "Files" tab on the corresponding dataset page (e.g. MESA EDFs) or
    • use your NSRR token to download files via the NSRR API. Your token will always stay the same and is valid for all datasets you have been granted access to.

The following code snippet shows how to read all records in the MESA dataset with SleepECG:

from sleepecg import read_mesa, set_nsrr_token

set_nsrr_token("<your-download-token-here>")
mesa = read_mesa()  # note that this is a generator

You can also select a subset of records from a dataset. This example will download and read all records having IDs starting with 00 (i.e. records 00010099):

from sleepecg import read_mesa, set_nsrr_token

set_nsrr_token("<your-download-token-here>")
mesa = read_mesa(records_pattern="00*")  # note that this is a generator

Note

Reader functions are generators, so they do not return the data directly. To access the data, you need to consume the generator, either by iterating over it or with subsequent calls of next().

If you just want to download NSRR data (like with the NSRR Ruby Gem), use the workflow below. The example downloads all files within mesa/polysomnography/edfs matching *-00* to a local folder ./datasets (subfolders are automatically created to preserve the original directory structure).

from sleepecg import download_nsrr, set_nsrr_token

set_nsrr_token("<your-download-token-here>")
download_nsrr(
    db_slug="mesa",
    subfolder="polysomnography/edfs",
    pattern="*-00*",
    data_dir="./datasets",
)