Datasets

SleepECG provides reader functions for various datasets. All required files will be downloaded to the location specified by the data_dir argument (by default ~/.sleepecg/datasets). While all supported PhysioNet datasets are publicly accessible, all NSRR datasets require submitting a data access request.

Sleep readers

Reader	Dataset name	Annotated records	Raw data size	Access
`read_mesa()`	Multi-Ethnic Study of Atherosclerosis	2056	385 GB	request
`read_shhs()`	Sleep Heart Health Study	8444	356 GB	request
`read_slpdb()`	MIT-BIH Polysomnographic Database	18	632 MB	open

ECG readers

Reader	Dataset name	Records	Signals	Raw data size
`read_gudb()`	Glasgow University ECG database	335	335	550 MB
`read_ltdb()`	MIT-BIH Long-Term ECG Database	7	15	205 MB
`read_mitdb()`	MIT-BIH Arrhythmia Database	48	96	98.5 MB

NSRR data access

To gain access to a dataset provided by the NSRR, complete the following steps:

Create an account here.
To create a data access request, either
- go to the datasets overview and click on "Request Data Access" for the desired dataset on the right side, or
- while browsing a dataset (e.g. MESA), click on "Request Data Access" at the top of the page, or
- follow the "request" link in this table.
Fill out the data access request form and wait for approval (you will be notified via email, this can take a few days).
Once the request is approved, you can
- download files manually from the "Files" tab on the corresponding dataset page (e.g. MESA EDFs) or
- use your NSRR token to download files via the NSRR API. Your token will always stay the same and is valid for all datasets you have been granted access to.

The following code snippet shows how to read all records in the MESA dataset with SleepECG:

from sleepecg import read_mesa, set_nsrr_token

set_nsrr_token("<your-download-token-here>")
mesa = read_mesa()  # note that this is a generator

Instead of always using set_nsrr_token(), you can set the NSRR token via set_config(nsrr_token="YOUR_TOKEN") or as an environment variable (NSRR_TOKEN).

SleepECG checks for the NSRR token in the following order:

Token set via set_nsrr_token()
Token set via environment variable NSRR_TOKEN
Token set in the user configuration

For example, if the token is set by both method 1 and method 3, method 1 takes precedence.

You can also select a subset of records from a dataset. This example will download and read all records having IDs starting with 00 (i.e. records 0001–0099):

from sleepecg import read_mesa, set_nsrr_token

set_nsrr_token("<your-download-token-here>")
mesa = read_mesa(records_pattern="00*")  # note that this is a generator

Note

Reader functions are generators, so they do not return the data directly. To access the data, you need to consume the generator, either by iterating over it or with subsequent calls of next().

If you just want to download NSRR data (like with the NSRR Ruby Gem), use the workflow below. The example downloads all files within mesa/polysomnography/edfs matching *-00* to a local folder ./datasets (subfolders are automatically created to preserve the original directory structure).

from sleepecg import download_nsrr, set_nsrr_token

set_nsrr_token("<your-download-token-here>")
download_nsrr(
    db_slug="mesa",
    subfolder="polysomnography/edfs",
    pattern="*-00*",
    data_dir="./datasets",
)