Datasets
SleepECG provides reader functions for various datasets. All required files will be downloaded to the location specified by the data_dir argument (by default ~/.sleepecg/datasets). While all supported PhysioNet datasets are publicly accessible, all NSRR datasets require submitting a data access request.
Sleep readers
| Reader | Dataset name | Annotated records | Raw data size | Access |
|---|---|---|---|---|
read_mesa() |
Multi-Ethnic Study of Atherosclerosis | 2056 | 385 GB | request |
read_shhs() |
Sleep Heart Health Study | 8444 | 356 GB | request |
read_slpdb() |
MIT-BIH Polysomnographic Database | 18 | 632 MB | open |
ECG readers
| Reader | Dataset name | Records | Signals | Raw data size |
|---|---|---|---|---|
read_gudb() |
Glasgow University ECG database | 335 | 335 | 550 MB |
read_ltdb() |
MIT-BIH Long-Term ECG Database | 7 | 15 | 205 MB |
read_mitdb() |
MIT-BIH Arrhythmia Database | 48 | 96 | 98.5 MB |
NSRR data access
To gain access to a dataset provided by the NSRR, complete the following steps:
- Create an account here.
- To create a data access request, either
- go to the datasets overview and click on "Request Data Access" for the desired dataset on the right side, or
- while browsing a dataset (e.g. MESA), click on "Request Data Access" at the top of the page, or
- follow the "request" link in this table.
- Fill out the data access request form and wait for approval (you will be notified via email, this can take a few days).
- Once the request is approved, you can
- download files manually from the "Files" tab on the corresponding dataset page (e.g. MESA EDFs) or
- use your NSRR token to download files via the NSRR API. Your token will always stay the same and is valid for all datasets you have been granted access to.
The following code snippet shows how to read all records in the MESA dataset with SleepECG:
from sleepecg import read_mesa, set_nsrr_token
set_nsrr_token("<your-download-token-here>")
mesa = read_mesa() # note that this is a generator
Instead of always using set_nsrr_token(), you can set the NSRR token via set_config(nsrr_token="YOUR_TOKEN") or as an environment variable (NSRR_TOKEN).
SleepECG checks for the NSRR token in the following order:
- Token set via
set_nsrr_token() - Token set via environment variable
NSRR_TOKEN - Token set in the user configuration
For example, if the token is set by both method 1 and method 3, method 1 takes precedence.
You can also select a subset of records from a dataset. This example will download and read all records having IDs starting with 00 (i.e. records 0001–0099):
from sleepecg import read_mesa, set_nsrr_token
set_nsrr_token("<your-download-token-here>")
mesa = read_mesa(records_pattern="00*") # note that this is a generator
Note
Reader functions are generators, so they do not return the data directly. To access the data, you need to consume the generator, either by iterating over it or with subsequent calls of next().
If you just want to download NSRR data (like with the NSRR Ruby Gem), use the workflow below. The example downloads all files within mesa/polysomnography/edfs matching *-00* to a local folder ./datasets (subfolders are automatically created to preserve the original directory structure).