- QuickUMLS – An unsuerpvised biomedical concept extraction tool.
- Reddit Self-reported Depression Diagnosis (RSDD) – Posts from thousands of Reddit users who claim to have been diagnosed with depression, and carefully-selected control users.
- RSDD-Time – Temporally-annotated subset of diagnosed RSDD users with information such as how long ago the diagnosis was made and whether the condition persists.
- Self-reported Mental Health Diagnoses (SMHD) – Reddit posts from thousands users who identify as having one or more of 9 mental health conditions, and carefully-selected control users.
RSDD, RSDD-Time, and SMHD contain only publicly available Reddit posts. Posts may contain information related to users' health, however, and are thus sensitive. To protect users' privacy, researchers who wish to obtain the dataset must sign a data usage agreement. A single agreement may grant access to any (or all) of the datasets.
Succinctly, the agreement requires that researchers
- make no attempt to contact any user in the dataset
- make no attempt to deanonymize or learn the identity of any user in the dataset
- make no attempt to link users in the dataset with any external information (e.g., an account on another website)
- do not share any portion of the data, including example posts or excerpts from posts, with any other party
Researchers interested in obtaining the datasets may submit a data request form to be provided with the data usage agreement and further information on obtaining the data.