Projects

Domain Specific Search & Mining: Mining Social Media for Healthcare
Social Media and Mental-Health
Clinical Decision Support Systems
Bridging the Gap Between Laypeople Health Queries and Web Pages
Scientific Summarization
Search in Adverse Environments
Sentiment Analysis
Detecting Relationships among Categories
Contextual Search
Passage Detection
Query Session Analysis
Personalized Ranking of Twitter Friends: Who to follow or not to follow

Domain Specific Search & Mining: Mining Social Media for Healthcare

Online discussions of virtually all topics are increasing; this phenomenon is ever more so in the domain of healthcare. Individuals today are rapidly and steadily posting remarks regarding their individual and their loved-ones' health on a diversity of social media. Given these publicly available statements, there is interest and potential to harness these sources to further our knowledge and understanding about drug behavior. We focus on using several drug related and other social media sites and general Web sites to detect expected and unexpected adverse reactions to drugs. To understand users' intentions, we utilize consumer medical terminology from UMLS to generate an adverse reaction synonym set we use to identify both expected adverse reactions, as already recorded by the FDA, and unexpected adverse reactions mentioned in online reviews. Background language is utilized to evaluate the strength of the detected unexpected ADRs.

Social Media and Mental-Health

Social media has become a significant resource for improving healthcare and mental health. Users suffering from mental health conditions often turn to online resources for support, such as specialized support communities staffed by moderators who read the users’ posts and flag those posts that indicate a potential risk (e.g., the risk of self-harm). Users who do not participate in online support communities often still participate in more general social media communities, such as Twitter, Facebook, and Reddit. In this project, we explore methods and approaches for better understanding and identifying users with mental health conditions and analyzing user content severity. We propose an approach for triaging user content into four severity categories which are defined based on indication of self-harm ideation. We conduct various analysis on real-world data, providing more insight into addressing the current challenges in mental-health.

Clinical Decision Support Systems

Keeping current given the vast volume of medical literature published yearly poses a serious challenge for medical professionals. Thus, interest in systems that aid physicians in making clinical decisions is intensifying. We explore and evaluate approaches to retrieve relevant medical literature given a medical case report. Furthermore, given the action a health expert is seeking to complete (make a diagnosis, prescribe a treatment, or order a test), we investigate reranking techniques that could provide more appropriate literature.

Bridging the Gap Between Laypeople Health Queries and Web Pages

The Internet has become a primary source of health information for the majority of adults living in the United States, as lay people have come to rely on it as tool to seek information about specific diseases or medical problems. This process is often challenging due to the gap between language used by consumers to describe their conditions and proper medical vocabulary. High-quality health care resources, even those addressed to consumers, employ appropriate medical terminology; yet laypeople do not have the necessary knowledge to express their information need using such vocabulary, thus struggling to satisfy their information needs. We focus our research in this area in understanding and mitigating this language gap. We studied how to effectively map expressions used by lay people in health queries to the medical terminology used by health experts. We also studied how learning to rank techniques can be used to exploit semantic similarity between queries and documents can and improve consumer health search.

Scientific Summarization

Due to the expanding rate at which articles are being published in various scientific fields, it has become difficult for researchers to keep up with the new developments. Scientific summarization aims to facilitate this problem. One useful strategy for scientific summarization is citation based summarization in which citations to a reference article are used to generate the summary of the reference paper. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article’s content. Our goal is to overcome this problem by providing the appropriate context for the citations and utilize this information towards extractive summary of the article. We have also shown that using scientific article’s inherent discourse structure can help improving the quality of the generated summaries. We are currently investigating approaches for development of more robust general summarization and scientific summarization routines.

Search in Adverse Environments

Within information retrieval, there exist situations where it is difficult to retrieve relevant information. Such cases include searching datasets lacking query logs, searching a multilingual dataset, or querying a corrupted document set. We refer to these situations as search in adverse environments. Searching in adverse environments is problematic for traditional search techniques because of a lack of data and/or data corruption. We identified two adverse environments: 1) word level corruption, and 2) document level corruption. Using an unsupervised, language independent approach, we have made statistically significant improvements within the first environment over the typically deployed systems and nearly match state-of-the-art supervised research. Additionally, we have early experimental results suggesting similar findings for the second environment. We aim to make contributions to each environment to aid in the successful retrieval of relevant information.

Sentiment Analysis

Detecting Relationships among Categories

Knowledge of relationships among categories is of interest in different domains such as text classification, content analysis, and text mining. We propose and evaluate approaches to effectively identify relationships among document categories. Our proposed novel method capitalizes on the misclassification results of a text classifier to identify potential relationships among categories. This leads to a relationship network. We demonstrate that our system detects such relationships, even those relationships that assessors failed to identify in manual evaluation. Furthermore, we favorably compared the effectiveness of our methods with the state of art method and demonstrated a significant improvement in precision and recall. Furthermore, we are interested to discover interesting relationships in the existing hierarchical knowledge representations. The hierarchical nature of existing Web directories, ontologies, and folksonomies, are known to provide meaningful information that guide users and applications. We hypothesized that such hierarchical structures provide richer information if they are further enriched by incorporating additional links besides parents, and siblings, namely, between non-sibling nodes. We call such structure a networked hierarchy. Our empirical results indicate that such a networked hierarchy introduces interesting links between nodes (non-sibling) that otherwise in a hierarchical structure are not evident.

Contextual Search

There has been a growing interest in contextual (personalized & location-specific) search. We propose a learning to rank model that combines general, city-specific, and personalized information. This model is used to produce a personalized and city-specific resultset by reranking location-specific results retrieved from the open Web.

Passage Detection

Passages can be hidden within a text to circumvent their disallowed transfer. We explore the methodology to detect such hidden passages within a document. A document is divided into passages using various document splitting techniques, and a text classifier is used to categorize such passages. We present a novel document splitting technique called dynamic windowing, which significantly improves precision, recall and F1 measure.

Query Session Analysis

We developed and evaluated our approach that utilized our earlier research on identifying the relationships among topics, now to understand the topic of user queries and intent given sequence of user queries from a session or multiple sessions. The context of the session queries is utilized to improve the effectiveness of identifying the intent or topic of current query. Earlier efforts utilized fixed number of preceding queries to derive such contextual information. We proposed and evaluated an approach (DQW) that identifies a set of "unambiguous" preceding queries in a dynamically determined window to utilize in classifying an ambiguous query to a topic. Furthermore, utilizing a relationship-net (R-net) that represents relationships among known topics, we improved the classification effectiveness for those ambiguous queries whose predicted topic in this relationship-net is related to the topic of a query within the window. Our results indicated that the hybrid approach (DQW+R-net) statistically significantly improves the Conditional Random Field (CRF) query classification approach when static query windowing and hierarchical taxonomy are used (SQW+Tax), in terms of precision (10.8%), recall (13.2%), and F1 measure (11.9%). The findings of this research can improve our understanding of user query intent and consequently the search results.

Personalized Ranking of Twitter Friends: Who to follow or not to follow

One of the challenges for the users of social media, such as in Twitter, is the fast growing number of people each user is following. The features available in Twitter provide meaningful information that can be harvested to provide a ranked list of "friends" (i.e., followees) to each user. We hypothesize that retweet and mention features can be further enriched by incorporating both temporal and additional/indirect links from within user's community.