The first CLPsych workshop was held in June 2014. Although it was hosted at the annual conference of the Association for Computational Linguistics, it was unusual for an ACL workshop, because
its main focus wasn't advances in language science and technology per se: the primary goal of the workshop was to create a new conversation about how language technology might have an *impact*
in actual clinical practice. Here we are five years later -- the fifth anniversary CLPsych workshop was held in June 2018. What kind of progress have we seen at the intersection of CL and
Psych? What's the progress like, and have the core challenges changed or remained the same? What is the place of this research within the broader range of related research? And where do we
Suicide is a top 10 cause of death worldwide, claims the lives of more than 42000 Americans each year (WHO, 2014; CDC 2016). Part of the problem is that experts are no better than chance at
predicting who will attempt to take their life (Franklin et al., 2016). We present work detailing the use of natural language processing on social media language data to screen for suicide
risk, with apparent utility far exceeding clinicians. In many ways, the technical challenges here are the easiest ones, we will discuss the largest remaining challenges -- pragmatic and
ethical integration of this technology into the system of care.
Glen Coppersmith, PhD is the Co-Founder and CEO of Qntfy (pronounced "quantify"),
a software company dedicated to the intersection of computer science and human behavior.
Glen’s work with Qntfy has been covered in several major publications including the Today
Show, Mashable, The Mighty and Scientific American. He is a recognized leader in the space,
with early and frequent peer-reviewed publications on advancements made at Qntfy.
Prior to founding Qntfy, Glen was the first full-time research scientist at the Human
Language Technology Center of Excellence at Johns Hopkins University. His research focused
on the creation and application of statistical pattern recognition techniques on large and
disparate data sets. His published work spans from the extraction and visualization of primary
characteristics from large data sets, to statistical inference and anomaly detection.
Glen earned a bachelor's in Computer Science and Cognitive Psychology in 2003, a
Masters in Psycholinguistics in 2005, and a Doctorate in Neuroscience in 2008, all from
Northeastern University. Glen and his wife Jessica live in Boston, MA with their two children. In
his free time he enjoys hiking, running, rock climbing, and is an accomplished photographer.
The fields of psychology and psychiatry have wrestled with creating accurate, reliable diagnostic nosology with acceptable predictive value to inform treatment.
Despite advances in neuroscience, many practices in mental health continue to be guided by descriptive, not explanatory, theories. Difficulties such as guild issues,
'silos' of expertise, low inter-rater reliability in diagnosis, and human biases impact not only the efficacy of clinical practice but the progress of research. The field
of computational linguistics, specifically natural language processing, has potential to advance human ability to detect signal and enhance clinical diagnosis and treatment
planning, however, clinicians and scientists must proceed in a state of mindful collaboration in their search for 'ground truth.'
Dr. Rebecca Resnik is a Licensed Psychologist and founder of Rebecca Resnik and Associates LLC, a group practice with offices in Bethesda and Rockville. Dr.
Resnik earned her doctorate from The George Washington University and completed her Internship in Pediatric Psychology and Neuropsychology at Mount Washington Pediatric Hospital in Baltimore,
Maryland. Dr. Resnik specializes in neuropsychological assessment for diagnosis of pediatric developmental, mood, and learning disorders. She was co-founder of the first ACL
workshop Computational Linguistics and Clinical Psychology (2003), and continues to be a reviewer and discussant for the workshop.
Break (10:40 – 11:00)
All Machine learning and data science projects have one thing in common: data. The relatively low occurrence rate of deaths from suicide in the US,
combined with the complicated nature of the topic created a challenging situation for research dedicated to suicide. This talk covers the lessons
learned from the partnership between a social media community, The American Association of Suicidology and Qntfy a startup software company. It
encourages others who hope to make scientific progress amidst challenging circumstances.
Anthony D. Wood, COO and Co-Founder of Qntfy, is the Board Chair of the American Association of Suicidology. His work with the social
media aspects of Suicide Prevention as a founder of the Social Media Team at the American Association of Suicidology's Annual Conference earned him the 2015 Roger J
Tierney award for Innovation from AAS. His research on the intersection of social media and mental health is published in, ACM CHI, JSM and the
proceedings of CLpsych and AAS. As a result of this work, he is a sought-after resource for mental health professionals, private companies and
organizations interested in the intersection of new media, mobile data and behavioral health.
Recent years has demonstrated the tremendous value of social media
data in understanding mental health. Various efforts have demonstrated
the ability to diagnose mental health disorders from online media,
track population level trends, and support online mental health
settings. An open question remains how social media data and related
analytics can be used to shape the delivery of mental health care. In
this talk, I'll present some of the issues with using such data in
care and discuss a path forward.
Mark Dredze is the John C Malone Associate Professor of Computer Science at Johns Hopkins University. He is affiliated with the Malone Center for Engineering in Healthcare,
the Center for Language and Speech Processing, among others. He holds a secondary appointment in the Department of Health Sciences Informatics in the School of Medicine. He obtained his PhD
from the University of Pennsylvania in 2009.
Prof. Dredze’s research develops statistical models of language with applications to social media analysis, public health and clinical informatics. Within Natural Language Processing he
focuses on statistical methods for information extraction but has considered a wide range of NLP tasks, including syntax, semantics, sentiment and spoke language processing. His work in public
health includes tobacco control, vaccination, infectious disease surveillance, mental health, drug use, and gun violence prevention. He also develops new methods for clinical NLP on medical
Beyond publications in core areas of computer science, Prof. Dredze has pioneered new applications in public health informatics. He has published widely in health journals including the
Journal of the American Medical Association (JAMA), the American Journal of Preventative Medicine (AJPM), Vaccine, and the Journal of the American Medical Informatics Association (JAMIA). His
work is regularly covered by major media outlets, including NPR, the New York Times and CNN.
A key focus of the Georgetown Information Retrieval Lab in the recent years has been the searching and mining of health-related data, from clinical
notes, scientific literature, Web, and social media. Mental health is a significant and growing public health concern. As language usage can be
leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users
who have been diagnosed with one or more of such conditions. The focus of this talk is on the construction of the publicly available self-diagnosed
mental health datasets (RSDD and SMHD) from Reddit posts, by creating high precision patterns, and without the need for manual labeling.
Nazli Goharian is a Clinical Professor of Computer Science at Georgetown University, and Associate Director of the Information Retrieval Lab. Her interests lie on humane-computing applications, as such, she has been focusing on text processing in medical/health domain. Her recent work on mental-health dataset creation was awarded a Best Long Research Paper at EMNLP 2017.
Classification and prediction of suicide ideation in high-risk groups is crucial in preventing suicide. In 2015, suicide rates among United States military veterans
were 2.1 times higher compared to non-veteran adults. The purpose of this work is to automate detection of suicide ideation in veterans from acoustic and semantic features of speech.
A total of 188 audio interviews were recorded by veterans (N=94) in addition to self-report psychiatric scales and questionnaires. We then built a classifier that differentiates
between suicidal and non-suicidal veterans based on acoustic features of speech and sentiment analysis of the transcribed narratives. Support Vector Machine (SVM) classifier correctly
identified veterans with suicidal ideation with an overall accuracy of 86.4% and area under the receiver operating characteristic curve (AUC) equal to 79.2%. Our findings indicate
that speech analysis can be useful in suicide ideation detection. This machine learning approach could help clinicians identify high-risk veterans in a clinical setting.
Dr. Madhavan is the founding director of the Innovation Center for Biomedical Informatics (ICBI) at the Georgetown
University Medical Center and is Associate Professor in the department of Oncology. She leads many programs in data science, clinical informatics and health IT with responsibility for
several biomedical research efforts including the software development of Georgetown Database of Cancer (G-DOC) a resource for both researchers and clinicians to realize the goals of
personalized medicine, leadership of Lombardi Cancer Center’s Biostatistics and Bioinformatics shared resource and the biomedical informatics component of the Georgetown-Howard
Universities Clinical and Translational Science Award (CTSA). In her role as the CTSA biomedical informatics director, she has enabled access to over 4 million patient records from 10
MedStar Health hospitals, Howard University Hospital and the VA to clinical and translational researchers. She was the PI on the Breast and Colon Cancer Family Registries data center that
coordinated public health and epidemiology data across 12 sites in the US, Australia, and Canada. She has partnered with the FDA through the Center for Excellence in Regulatory Science
and Innovation (CERSI) program to develop evidence bases for pharmacogenomics and vaccine safety. She collaborated with Amazon to develop next generation cloud computing platforms for
high dimensional datasets. She co-chairs the NHGRI’s ClinGen Somatic working group and is working with >40 cancer research organizations in the United States to develop the MVLD (Minimal
Variant Level Data) for standardized cancer molecular test reporting. She has contributed to novel information sciences findings in research articles published in journals such as Nature,
Bioinformatics, JAMIA, Journal of Comp. Biology and JCO. She led a couple of data science challenges including She chaired the scientific planning committee of the signature AMIA
Translational Summits in 2016. Her paper on A Computational Approach for Prioritizing Selection of Therapies Targeting Drug Resistant Variation in Anaplastic Lymphoma Kinase won the Marco
Ramoni Distinguished paper award at the AMIA Translational summits in 2018. To continue broadening the impact of the field, she founded and chairs an annual Big Data in Biomedicine
symposium at Georgetown with vibrant panels and talks by informatics leaders in the US and from around the world.
Lunch (12:20 – 13:40) — on your own
Mental illnesses are predicted to become the leading disease burden globally in the next decade,
according to the World Health Organization. Suicide is already a major public health problem as a
leading cause of death in the United States. The effects of suicide go beyond the person who acts to
take his or her life: it can have a lasting effect on family, friends, and communities. Health care providers
understand the risk factors and they can help prevent suicide using evidence-based treatments and
therapies. Advances in technology create opportunities for close collaboration between data analysis of
published literature and mental health researchers.
In this work, we used our text mining and natural language processing tools to analyze the PubMed
literature that contained the term of interest “suicide” and documents that were annotated with a Mesh
Term relevant to the term “suicide”. We applied unsupervised clustering algorithms to analyze the
landscape of the suicide-related literature and we identified groups of documents discussing specific
topics and groups of closely related terms associated with each topic. We further identified gene,
protein, drug, and disease name mentions in this dataset and associated them with computed clusters.
In addition, we examined PubMed queries and present statistics on queries related to mental health.
Based on the queries and documents clicked, we identified journals as well as PubMed articles that
PubMed users have found to be most insightful.
Dr. Rezarta Islamaj has a PhD in Computer Science from University of Maryland at College Park and is a Staff
Scientist in the Computational Biology Branch at the National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM), National Institutes of Health (NIH). She is a member of the Text Mining Research program at NCBI/NLM where they are developing
computational methods and software tools for analyzing and making sense of unstructured text data in biomedical literature and clinical notes
towards accelerated discovery and better health. Before that she developed the SplicePort program on accurate prediction of splice sites in
pre-mRNA sequences. Her recent publications are focused on the following topics: Computer-assisted biomedical data curation; Biomedical named
entity recognition and information extraction; Interoperability of data and tools; and PubMed Search (e.g. author name disambiguation,
understanding user needs for information retrieval). Dr. Rezarta Islamaj has organized and led several community challenges in the
BioCreative Workshops promoting interoperability of data and tools, facilitating data sharing and text annotations for easier text mining in
general, and coordinating curation efforts in developing lexical resources to facilitate better tool development for biomedical text mining.
Dr. Lana Yeganova is a Scientist at the Computational Biology Branch of the National Center for Biotechnology
Information (NCBI) at the National Institutes of Health (NIH). Dr. Yeganova holds a Doctorate in Mathematical Optimization from the George Washington University.
Her work at NCBI has addressed a wide span of problems, ranging from information extraction and text mining, to clustering and thematic analysis, to knowledge
discovery and integration, resulting in design of novel efficient algorithms for working with large data sets. Her most recent research focus has been at
improving PubMed user search experience, in particular, understanding the intent of user queries, and has been integrated into PubMed search.
In many NLP applications, temporal information is either not explicitly modeled or downright ignored (e.g., through the removal of stop words). In
this talk, I argue that temporal information is vital for a complete understanding of text, especially in the health domain. We’ll look at clinical
and radiology notes to show how temporal information can be modeled in these domains to construct a patient timeline from the unstructured text. Then
we’ll explore how temporality affects statements of mental health diagnoses in online text.
Sean MacAvaney is a Computer Science PhD student in the Georgetown University IR Lab. He works on information retrieval, information extraction, and computational linguistics.
Research Domain Criteria (RDoC), is a framework for studying mental disorders. Its aim is to support a better understanding of the basic dimensions of human behavior, from normal to abnormal.
In support of this goal, the RDoC framework is organized into five domains, each capturing a different and cross-diagnostic
perspective on psychiatric illness and health. In the CEGS N-GRID challenge, we focused on just one domain: positive valence. Abnormalities of positive valence may be observed in disorders as diverse as substance abuse
and dependence, mania, gambling, obsessive-compulsive disorder, and depression. The goal of the CEGS N-GRID challenge was to determine lifetime maximum symptom severity of patients in positive valence domain, based on
their psychiatric intake interview reports. 1000 reports were de-identified and annotated by experts for positive valence symptom severity. This talk will review this data set and give an overview of other related
datasets generated by National NLP Clinical Challenges (n2c2, formerly known as i2b2) team since 2006.
Dr. Ozlem Uzuner is an associate professor at the Information Sciences and Technology Department of
George Mason University. She also holds a visiting associate professor position at Harvard Medical School and is a research affiliate at the
Computer Science and Artificial Intelligence Laboratory of MIT. Dr. Uzuner specializes in Natural Language Processing and its
applications to real-world problems, including healthcare and policy. Her current research interests include information extraction from
fragmented and ungrammatical narratives for capturing meaning, studies of consumer generated text such as social media and electronic
petitions, and semantic representation development for phenotype prediction, fraud detection, and topic modeling. Her research has been
funded by National Institutes of Health, National Libraries of Medicine, National Institutes of Mental Health, Office of the National
Coordinator, and by industry.
Intelligent natural language processing techniques can be very helpful for radiology interpretation and education. Our primary focus has been on
determining substantive changes between preliminary trainee radiology reports and faculty final reports, but applications for report
summarization, radiology-pathology correlation, and pathology adequacy determination are all future directions where these techniques could be
Dr. Ross Filice is an Associate Professor and the Chief of Imaging Informatics in the Department of Radiology at MedStar Georgetown
Hospital as well as the Chief of Imaging Informatics for MedStar Medical Group Radiology. He is the Director of the Georgetown Radiology Informatics Mini-Fellowship and also holds a
position as Clinical Informatics Scientist in the National Center for Human Factors in Healthcare within the MedStar Institute for Innovation.
Break (15:00 – 15:10)
The Agency for Healthcare Research and Quality notes that, "In a learning health system (LHS), internal data and experience are systematically integrated with external
evidence and that knowledge is put into practice. As a result, patients get higher quality, safer, more efficient care, and health care delivery organizations become better places to
work." One important prototype LHS in pediatric medicine, the learning collaborative, is often aimed at improving care and outcomes for children living with chronic diseases. In this talk
we describe a conceptual model of learning collaboratives and discuss opportunities for machine methods in three general areas of research concerning learning collaboratives: (1) learning
from diverse data mined from EHRs, patient registries, and social media; (2) measurement and evaluation of improvement of patient health and care center performance through time; and (3)
campaigning to increase patient activation and engagement in care improvement. Examples from preliminary work in each area are shown to illustrate.
David Hartley is an associate professor of pediatrics at Cincinnati Children’s Hospital Medical Center and the University of Cincinnati College of
Medicine conducting research on learning health systems, infectious disease epidemiology and surveillance, and issues in global health. Previous to this, he was on the faculty of the
Georgetown University Medical Center and the University of Maryland School of Medicine. He has BS, MS, and PhD degrees in physics and an MPH degree in epidemiology and biostatistics.
This talk will provide an overview of various efforts developing and integrating NLP models in three healthcare applications: patient safety event reports, emergency
department chief complaints, outpatient physician notes. Supervised and unsupervised modeling approaches have been applied to patient safety event reports to identify themes as well we
classify event types. The free-text in chief complaints of emergency department patient visits were used to classify patients at higher risk for spinal cord compression. Lastly
classification models were built to identify patients with completed diabetic retinal eye exams from physician notes.
Allan Fong focuses on developing, integrating, and applying advanced technologies and techniques to study and improve healthcare
systems. Allan has a background in engineering, computer science, and human factors, and is particularly interested in natural language processing, predictive analytics, information
visualization, and sensor integration to understand clinical workflow and promote patient safety and health literacy. Allan received a master’s degree in aeronautical and astronautical
engineering from Massachusetts Institute of Technology, a master’s degree in computer science from University of Maryland College Park, and a bachelor’s degree in mechanical engineering
from Columbia University.
Smartphones are powerful tools for obtaining information and connecting with others. But does being constantly connected to the internet have hidden costs for the fabric of social life? We explore
the effects of smartphones both on fundamental social behaviors and on the benefits of face-to-face interactions.
Dr. Kushlev studies the causes and consequences of subjective well-being. His research on the psychology of technology focuses on how mobile computing (e.g., using smartphones)
interacts with nondigital activities, such as spending time with family and friends, to influence basic social needs and well-being.