Self-Reported Mental Health Diagnoses (SMHD) dataset
The SMHD (Self-Reported Mental Health Diagnoses) dataset consists of Reddit posts of users who have claimed to have been diagnosed with one or several of nine mental health conditions ("diagnosed users"), and matched control users. All posts made to mental health-related subreddits or containing keywords related to a mental health condition were removed from the diagnosed users' data; control users' data do not contain such posts due to the selection process.
SMHD contains nine mental health conditions with diagnosed users.
Condition | Total Users | Total Posts | Users (train) | Users (dev) | Users (test) | Users (RC) |
---|---|---|---|---|---|---|
ADHD | 10,098 | 872K | 1,768 | 1,747 | 1,779 | 4,804 |
Anxiety | 8,783 | 795K | 1,711 | 1,593 | 1,675 | 3,804 |
Autism | 2,911 | 248K | 479 | 480 | 517 | 1,435 |
Bipolar | 6,434 | 575K | 1,216 | 1,182 | 1,247 | 2,789 |
Depression | 14,139 | 1,272K | 2,662 | 2,574 | 2,611 | 6,292 |
Eating | 598 | 53K | 104 | 115 | 112 | 267 |
OCD | 2,336 | 203K | 409 | 477 | 390 | 1,060 |
PTSD | 2,894 | 258K | 528 | 516 | 558 | 1,292 |
Schizophrenia | 1,331 | 123K | 238 | 278 | 267 | 548 |
Control | 335,952 | 116M | 92,725 | 92,421 | 94,415 | 56,391 |
Further dataset construction details are available in Section 3 of the COLING 2018 paper SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions.
Information on obtaining this dataset can be found here.
Citation
@InProceedings{cohan2018smhd,
author = {Cohan, Arman and Desmet, Bart and Yates, Andrew and Soldaini, Luca and MacAvaney, Sean and Goharian, Nazli},
title = {SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING)},
year = {2018},
publisher = {Association for Computational Linguistics},
pages = {1485–-1497},
url = {https://www.aclweb.org/anthology/C18-1126}
}
Contact Information
For any comments or questions, please email Arman, Bart or Andrew.