Home · Schedule and Speakers
Welcome by Host
Nazli Goharian – Georgetown University
9:00 – 9:10
Keynote: Computational Linguistics and Clinical Psychology: Looking at the State of Play after Five Years
Philip Resnik – University of Maryland, College Park
9:10 – 10:00

The first CLPsych workshop was held in June 2014. Although it was hosted at the annual conference of the Association for Computational Linguistics, it was unusual for an ACL workshop, because its main focus wasn't advances in language science and technology per se: the primary goal of the workshop was to create a new conversation about how language technology might have an *impact* in actual clinical practice. Here we are five years later -- the fifth anniversary CLPsych workshop was held in June 2018. What kind of progress have we seen at the intersection of CL and Psych? What's the progress like, and have the core challenges changed or remained the same? What is the place of this research within the broader range of related research? And where do we go next?

Philip Resnik is a professor at University of Maryland in the Department of Linguistics and Institute for Advanced Computer Studies. He co-founded the CLPsych workshops with Rebecca Resnik and Meg Mitchell. He does research in computational linguistics, with interests both in the application of natural language processing techniques to practical problems, and in the modeling of human linguistic processes. He has worked in a wide range of research areas, including core NLP areas like word sense disambiguation, applications such as cross-language information retrieval and machine translation, and work on the science of language including lexical semantics and computational psycholinguistics. Across the range of his work, he has been a proponent of finding the right balance between learning automatically through large scale data analysis, and incorporating linguistic and expert knowledge into our systems and models. Resnik's most recent research focus has been in computational social science, with an emphasis on connecting the signal available in people's language use with underlying mental state -- this has applications in computational political science, particularly in connection with ideology and framing, and in mental health, focusing on the ways that linguistic behavior may help to identify and monitor depression, suicidality, and schizophrenia. Outside his academic research, Resnik has been a technical co-founder of CodeRyte (NLP for electronic health records, acquired by 3M in 2012), an advisor to Converseon (social strategy and analytics) and FiscalNote (machine learning and analytics for government relations), and most recently is co-founder of Thematically, which provides human-in-the-loop AI for deep insight into unstructured text.
Natural Language Processing of Social Media as Screening for Suicide Risk
Glen Coppersmith – Qntfy
10:00 – 10:20

Suicide is a top 10 cause of death worldwide, claims the lives of more than 42000 Americans each year (WHO, 2014; CDC 2016). Part of the problem is that experts are no better than chance at predicting who will attempt to take their life (Franklin et al., 2016). We present work detailing the use of natural language processing on social media language data to screen for suicide risk, with apparent utility far exceeding clinicians. In many ways, the technical challenges here are the easiest ones, we will discuss the largest remaining challenges -- pragmatic and ethical integration of this technology into the system of care.

Glen Coppersmith, PhD is the Co-Founder and CEO of Qntfy (pronounced "quantify"), a software company dedicated to the intersection of computer science and human behavior. Glen’s work with Qntfy has been covered in several major publications including the Today Show, Mashable, The Mighty and Scientific American. He is a recognized leader in the space, with early and frequent peer-reviewed publications on advancements made at Qntfy. Prior to founding Qntfy, Glen was the first full-time research scientist at the Human Language Technology Center of Excellence at Johns Hopkins University. His research focused on the creation and application of statistical pattern recognition techniques on large and disparate data sets. His published work spans from the extraction and visualization of primary characteristics from large data sets, to statistical inference and anomaly detection. Glen earned a bachelor's in Computer Science and Cognitive Psychology in 2003, a Masters in Psycholinguistics in 2005, and a Doctorate in Neuroscience in 2008, all from Northeastern University. Glen and his wife Jessica live in Boston, MA with their two children. In his free time he enjoys hiking, running, rock climbing, and is an accomplished photographer.
Mental Health Diagnosis: The 'Not Good Enough' State of the Art
Rebecca Resnik – Rebecca Resnik and Associates
10:20 – 10:40

The fields of psychology and psychiatry have wrestled with creating accurate, reliable diagnostic nosology with acceptable predictive value to inform treatment. Despite advances in neuroscience, many practices in mental health continue to be guided by descriptive, not explanatory, theories. Difficulties such as guild issues, 'silos' of expertise, low inter-rater reliability in diagnosis, and human biases impact not only the efficacy of clinical practice but the progress of research. The field of computational linguistics, specifically natural language processing, has potential to advance human ability to detect signal and enhance clinical diagnosis and treatment planning, however, clinicians and scientists must proceed in a state of mindful collaboration in their search for 'ground truth.'

Dr. Rebecca Resnik is a Licensed Psychologist and founder of Rebecca Resnik and Associates LLC, a group practice with offices in Bethesda and Rockville. Dr. Resnik earned her doctorate from The George Washington University and completed her Internship in Pediatric Psychology and Neuropsychology at Mount Washington Pediatric Hospital in Baltimore, Maryland. Dr. Resnik specializes in neuropsychological assessment for diagnosis of pediatric developmental, mood, and learning disorders. She was co-founder of the first ACL workshop Computational Linguistics and Clinical Psychology (2003), and continues to be a reviewer and discussant for the workshop.
Break (10:40 – 11:00)
Finding Suicide Data: Lessons Learned Building the First Open Enrollment Opt-In Data Repository
Tony Wood – Qntfy
11:00 – 11:20

All Machine learning and data science projects have one thing in common: data. The relatively low occurrence rate of deaths from suicide in the US, combined with the complicated nature of the topic created a challenging situation for research dedicated to suicide. This talk covers the lessons learned from the partnership between a social media community, The American Association of Suicidology and Qntfy a startup software company. It encourages others who hope to make scientific progress amidst challenging circumstances.

Anthony D. Wood, COO and Co-Founder of Qntfy, is the Board Chair of the American Association of Suicidology. His work with the social media aspects of Suicide Prevention as a founder of the Social Media Team at the American Association of Suicidology's Annual Conference earned him the 2015 Roger J Tierney award for Innovation from AAS. His research on the intersection of social media and mental health is published in, ACM CHI, JSM and the proceedings of CLpsych and AAS. As a result of this work, he is a sought-after resource for mental health professionals, private companies and organizations interested in the intersection of new media, mobile data and behavioral health.
Things I Have Learned About Social Media and Mental Health: and Questions I Still Have
Mark Dredze – Johns Hopkins University
11:20 – 11:40

Recent years has demonstrated the tremendous value of social media data in understanding mental health. Various efforts have demonstrated the ability to diagnose mental health disorders from online media, track population level trends, and support online mental health settings. An open question remains how social media data and related analytics can be used to shape the delivery of mental health care. In this talk, I'll present some of the issues with using such data in care and discuss a path forward.

Mark Dredze is the John C Malone Associate Professor of Computer Science at Johns Hopkins University. He is affiliated with the Malone Center for Engineering in Healthcare, the Center for Language and Speech Processing, among others. He holds a secondary appointment in the Department of Health Sciences Informatics in the School of Medicine. He obtained his PhD from the University of Pennsylvania in 2009. Prof. Dredze’s research develops statistical models of language with applications to social media analysis, public health and clinical informatics. Within Natural Language Processing he focuses on statistical methods for information extraction but has considered a wide range of NLP tasks, including syntax, semantics, sentiment and spoke language processing. His work in public health includes tobacco control, vaccination, infectious disease surveillance, mental health, drug use, and gun violence prevention. He also develops new methods for clinical NLP on medical records. Beyond publications in core areas of computer science, Prof. Dredze has pioneered new applications in public health informatics. He has published widely in health journals including the Journal of the American Medical Association (JAMA), the American Journal of Preventative Medicine (AJPM), Vaccine, and the Journal of the American Medical Informatics Association (JAMIA). His work is regularly covered by major media outlets, including NPR, the New York Times and CNN.
Construction of High Quality, Large Scale, Labeled Mental Health Data from Social Media Without Human Annotation
Nazli Goharian – Georgetown University
11:40 – 12:00

A key focus of the Georgetown Information Retrieval Lab in the recent years has been the searching and mining of health-related data, from clinical notes, scientific literature, Web, and social media. Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. The focus of this talk is on the construction of the publicly available self-diagnosed mental health datasets (RSDD and SMHD) from Reddit posts, by creating high precision patterns, and without the need for manual labeling.

Nazli Goharian is a Clinical Professor of Computer Science at Georgetown University, and Associate Director of the Information Retrieval Lab. Her interests lie on humane-computing applications, as such, she has been focusing on text processing in medical/health domain. Her recent work on mental-health dataset creation was awarded a Best Long Research Paper at EMNLP 2017.
A Machine Learning Approach to Detect Suicide Ideation in Veterans Based on Acoustic and Semantic Analysis of Speech
Subha Madhavan – ICBI, Georgetown University
12:00 – 12:20

Classification and prediction of suicide ideation in high-risk groups is crucial in preventing suicide. In 2015, suicide rates among United States military veterans were 2.1 times higher compared to non-veteran adults. The purpose of this work is to automate detection of suicide ideation in veterans from acoustic and semantic features of speech. A total of 188 audio interviews were recorded by veterans (N=94) in addition to self-report psychiatric scales and questionnaires. We then built a classifier that differentiates between suicidal and non-suicidal veterans based on acoustic features of speech and sentiment analysis of the transcribed narratives. Support Vector Machine (SVM) classifier correctly identified veterans with suicidal ideation with an overall accuracy of 86.4% and area under the receiver operating characteristic curve (AUC) equal to 79.2%. Our findings indicate that speech analysis can be useful in suicide ideation detection. This machine learning approach could help clinicians identify high-risk veterans in a clinical setting.

Dr. Madhavan is the founding director of the Innovation Center for Biomedical Informatics (ICBI) at the Georgetown University Medical Center and is Associate Professor in the department of Oncology. She leads many programs in data science, clinical informatics and health IT with responsibility for several biomedical research efforts including the software development of Georgetown Database of Cancer (G-DOC) a resource for both researchers and clinicians to realize the goals of personalized medicine, leadership of Lombardi Cancer Center’s Biostatistics and Bioinformatics shared resource and the biomedical informatics component of the Georgetown-Howard Universities Clinical and Translational Science Award (CTSA). In her role as the CTSA biomedical informatics director, she has enabled access to over 4 million patient records from 10 MedStar Health hospitals, Howard University Hospital and the VA to clinical and translational researchers. She was the PI on the Breast and Colon Cancer Family Registries data center that coordinated public health and epidemiology data across 12 sites in the US, Australia, and Canada. She has partnered with the FDA through the Center for Excellence in Regulatory Science and Innovation (CERSI) program to develop evidence bases for pharmacogenomics and vaccine safety. She collaborated with Amazon to develop next generation cloud computing platforms for high dimensional datasets. She co-chairs the NHGRI’s ClinGen Somatic working group and is working with >40 cancer research organizations in the United States to develop the MVLD (Minimal Variant Level Data) for standardized cancer molecular test reporting. She has contributed to novel information sciences findings in research articles published in journals such as Nature, Bioinformatics, JAMIA, Journal of Comp. Biology and JCO. She led a couple of data science challenges including She chaired the scientific planning committee of the signature AMIA Translational Summits in 2016. Her paper on A Computational Approach for Prioritizing Selection of Therapies Targeting Drug Resistant Variation in Anaplastic Lymphoma Kinase won the Marco Ramoni Distinguished paper award at the AMIA Translational summits in 2018. To continue broadening the impact of the field, she founded and chairs an annual Big Data in Biomedicine symposium at Georgetown with vibrant panels and talks by informatics leaders in the US and from around the world.
Lunch (12:20 – 13:40) — on your own
A Computational Landscape of Suicide in PubMed Biomedical Literature
Rezarta Islamaj & Lana Yeganova – NIH
13:40 – 14:00

Mental illnesses are predicted to become the leading disease burden globally in the next decade, according to the World Health Organization. Suicide is already a major public health problem as a leading cause of death in the United States. The effects of suicide go beyond the person who acts to take his or her life: it can have a lasting effect on family, friends, and communities. Health care providers understand the risk factors and they can help prevent suicide using evidence-based treatments and therapies. Advances in technology create opportunities for close collaboration between data analysis of published literature and mental health researchers.

In this work, we used our text mining and natural language processing tools to analyze the PubMed literature that contained the term of interest “suicide” and documents that were annotated with a Mesh Term relevant to the term “suicide”. We applied unsupervised clustering algorithms to analyze the landscape of the suicide-related literature and we identified groups of documents discussing specific topics and groups of closely related terms associated with each topic. We further identified gene, protein, drug, and disease name mentions in this dataset and associated them with computed clusters. In addition, we examined PubMed queries and present statistics on queries related to mental health. Based on the queries and documents clicked, we identified journals as well as PubMed articles that PubMed users have found to be most insightful.

Dr. Rezarta Islamaj has a PhD in Computer Science from University of Maryland at College Park and is a Staff Scientist in the Computational Biology Branch at the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH). She is a member of the Text Mining Research program at NCBI/NLM where they are developing computational methods and software tools for analyzing and making sense of unstructured text data in biomedical literature and clinical notes towards accelerated discovery and better health. Before that she developed the SplicePort program on accurate prediction of splice sites in pre-mRNA sequences. Her recent publications are focused on the following topics: Computer-assisted biomedical data curation; Biomedical named entity recognition and information extraction; Interoperability of data and tools; and PubMed Search (e.g. author name disambiguation, understanding user needs for information retrieval). Dr. Rezarta Islamaj has organized and led several community challenges in the BioCreative Workshops promoting interoperability of data and tools, facilitating data sharing and text annotations for easier text mining in general, and coordinating curation efforts in developing lexical resources to facilitate better tool development for biomedical text mining.
Dr. Lana Yeganova is a Scientist at the Computational Biology Branch of the National Center for Biotechnology Information (NCBI) at the National Institutes of Health (NIH). Dr. Yeganova holds a Doctorate in Mathematical Optimization from the George Washington University. Her work at NCBI has addressed a wide span of problems, ranging from information extraction and text mining, to clustering and thematic analysis, to knowledge discovery and integration, resulting in design of novel efficient algorithms for working with large data sets. Her most recent research focus has been at improving PubMed user search experience, in particular, understanding the intent of user queries, and has been integrated into PubMed search.
Temporal Information in Physical and Mental Health Text
Sean MacAvaney – Georgetown University
14:00 – 14:20

In many NLP applications, temporal information is either not explicitly modeled or downright ignored (e.g., through the removal of stop words). In this talk, I argue that temporal information is vital for a complete understanding of text, especially in the health domain. We’ll look at clinical and radiology notes to show how temporal information can be modeled in these domains to construct a patient timeline from the unstructured text. Then we’ll explore how temporality affects statements of mental health diagnoses in online text.

Sean MacAvaney is a Computer Science PhD student in the Georgetown University IR Lab. He works on information retrieval, information extraction, and computational linguistics.
RDoC in Psychiatry and other n2c2 Shared task Data
Ozlem Uzuner – George Mason University
14:20 – 14:40

Research Domain Criteria (RDoC), is a framework for studying mental disorders. Its aim is to support a better understanding of the basic dimensions of human behavior, from normal to abnormal. In support of this goal, the RDoC framework is organized into five domains, each capturing a different and cross-diagnostic perspective on psychiatric illness and health. In the CEGS N-GRID challenge, we focused on just one domain: positive valence. Abnormalities of positive valence may be observed in disorders as diverse as substance abuse and dependence, mania, gambling, obsessive-compulsive disorder, and depression. The goal of the CEGS N-GRID challenge was to determine lifetime maximum symptom severity of patients in positive valence domain, based on their psychiatric intake interview reports. 1000 reports were de-identified and annotated by experts for positive valence symptom severity. This talk will review this data set and give an overview of other related datasets generated by National NLP Clinical Challenges (n2c2, formerly known as i2b2) team since 2006.

Dr. Ozlem Uzuner is an associate professor at the Information Sciences and Technology Department of George Mason University. She also holds a visiting associate professor position at Harvard Medical School and is a research affiliate at the Computer Science and Artificial Intelligence Laboratory of MIT. Dr. Uzuner specializes in Natural Language Processing and its applications to real-world problems, including healthcare and policy. Her current research interests include information extraction from fragmented and ungrammatical narratives for capturing meaning, studies of consumer generated text such as social media and electronic petitions, and semantic representation development for phenotype prediction, fraud detection, and topic modeling. Her research has been funded by National Institutes of Health, National Libraries of Medicine, National Institutes of Mental Health, Office of the National Coordinator, and by industry.
Intelligent Text Analysis to Support Radiology Interpretation and Education
Ross Filice – Georgetown University Medical Center; Medstar
14:40 – 15:00

Intelligent natural language processing techniques can be very helpful for radiology interpretation and education. Our primary focus has been on determining substantive changes between preliminary trainee radiology reports and faculty final reports, but applications for report summarization, radiology-pathology correlation, and pathology adequacy determination are all future directions where these techniques could be quite fruitful.

Dr. Ross Filice is an Associate Professor and the Chief of Imaging Informatics in the Department of Radiology at MedStar Georgetown University Hospital as well as the Chief of Imaging Informatics for MedStar Medical Group Radiology. He is the Director of the Georgetown Radiology Informatics Mini-Fellowship and also holds a position as Clinical Informatics Scientist in the National Center for Human Factors in Healthcare within the MedStar Institute for Innovation.
Break (15:00 – 15:10)
Opportunities for Computing in Pediatric Chronic Condition Care
David Hartley – Cincinnati Children's Hospital
15:10 – 15:30

The Agency for Healthcare Research and Quality notes that, "In a learning health system (LHS), internal data and experience are systematically integrated with external evidence and that knowledge is put into practice. As a result, patients get higher quality, safer, more efficient care, and health care delivery organizations become better places to work." One important prototype LHS in pediatric medicine, the learning collaborative, is often aimed at improving care and outcomes for children living with chronic diseases. In this talk we describe a conceptual model of learning collaboratives and discuss opportunities for machine methods in three general areas of research concerning learning collaboratives: (1) learning from diverse data mined from EHRs, patient registries, and social media; (2) measurement and evaluation of improvement of patient health and care center performance through time; and (3) campaigning to increase patient activation and engagement in care improvement. Examples from preliminary work in each area are shown to illustrate.

David Hartley is an associate professor of pediatrics at Cincinnati Children’s Hospital Medical Center and the University of Cincinnati College of Medicine conducting research on learning health systems, infectious disease epidemiology and surveillance, and issues in global health. Previous to this, he was on the faculty of the Georgetown University Medical Center and the University of Maryland School of Medicine. He has BS, MS, and PhD degrees in physics and an MPH degree in epidemiology and biostatistics.
Applications and Integration of NLP into Healthcare Workflow: Three Use Cases
Allan Fong – Medstar
15:30 – 15:50

This talk will provide an overview of various efforts developing and integrating NLP models in three healthcare applications: patient safety event reports, emergency department chief complaints, outpatient physician notes. Supervised and unsupervised modeling approaches have been applied to patient safety event reports to identify themes as well we classify event types. The free-text in chief complaints of emergency department patient visits were used to classify patients at higher risk for spinal cord compression. Lastly classification models were built to identify patients with completed diabetic retinal eye exams from physician notes.

Allan Fong focuses on developing, integrating, and applying advanced technologies and techniques to study and improve healthcare systems. Allan has a background in engineering, computer science, and human factors, and is particularly interested in natural language processing, predictive analytics, information visualization, and sensor integration to understand clinical workflow and promote patient safety and health literacy. Allan received a master’s degree in aeronautical and astronautical engineering from Massachusetts Institute of Technology, a master’s degree in computer science from University of Maryland College Park, and a bachelor’s degree in mechanical engineering from Columbia University.
The Psychological Effects of Pervasive Connectivity: A Theoretical Sketch
Kostadin Kushlev – Georgetown University
15:50 – 16:10

Smartphones are powerful tools for obtaining information and connecting with others. But does being constantly connected to the internet have hidden costs for the fabric of social life? We explore the effects of smartphones both on fundamental social behaviors and on the benefits of face-to-face interactions.

Dr. Kushlev studies the causes and consequences of subjective well-being. His research on the psychology of technology focuses on how mobile computing (e.g., using smartphones) interacts with nondigital activities, such as spending time with family and friends, to influence basic social needs and well-being.
Closing (17:00)