Projects
- Social Media and Mental-Health
- Domain Specific Search & Mining: Mining Social Media for Healthcare
- Categorizing Errors in Clinical Care through Medical Narratives
- Clinical Decision Support Systems
- Scientific Document Summarization
- Clinical Notes Summarization
- Domain Specific Information Extraction
- Neural Information Retrieval
- Complex Answer Retrieval
- Search in Adverse Environments
- Sentiment Analysis
- Detecting Relationships among Categories
- Contextual Search
- Passage Detection
- Query Session Analysis
- Personalized Ranking of Twitter Friends: Who to follow or not to follow
Social Media and Mental-Health
Social media has become a significant resource for improving healthcare and mental health. Users suffering from mental health conditions often turn to online resources for support, such as specialized support communities staffed by moderators who read the users’ posts and flag those posts that indicate a potential risk (e.g., the risk of self-harm). Users who do not participate in online support communities often still participate in more general social media communities, such as Twitter, Facebook, and Reddit. In this project, we explore methods and approaches for better understanding and identifying users with mental health conditions and analyzing user content severity. We propose an approach for triaging user content into four severity categories which are defined based on indication of self-harm ideation. We conduct various analysis on real-world data, providing more insight into addressing the current challenges in mental-health.
-
Knowledge Augmentation for Early Depression Detection
Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian and Ophir Frieder
The 7th International Workshop on Health Intelligence (W3PHIAI-23) Feb 2023, Washington DC, USA.
bibtex -
Curriculum-guided Abstractive Summarization for Mental Health Online Posts
Sajad Sotudeh, Nazli Goharian, Hanieh Deilamsalehy, Franck Dernoncourt
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022)
bibtex -
Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder, TBD3: A Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users,
Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)
-
Sajad Sotudeh, Nazli Goharian, Zachary Young, MentSum: A Resource for Exploring Summarization of Mental Health Online Posts,
Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)
- A. Cohan, B. Desmet, A. Yates, L. Soldaini, S. MacAvaney and N. Goharian, "SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions", COLING 2018. Nominated for best paper award and selected as "Area Chair Favorite".
- L. Soldaini, T. Walsh, A. Cohan, J. Han, and N. Goharian, "Helping or Hurting? Predicting Changes in Users’ Risk of Self-Harm Through Online Community Interactions", CLPsych 2018.
- S. MacAvaney, B. Desmet, A. Cohan, L. Soldaini, A. Yates, A. Zirikly, and N. Goharian, "RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses", CLPsych 2018.
-
A. Yates*, A. Cohan*, Nazli Goharian, "Depression and Self-Harm Risk Assessment in Online Forums", Empirical Methods in Natural Language Processing (EMNLP), 2017.
*Equal contribution, EMNLP 2017 best long paper award. - A. Cohan, S. Young, A. Yates, N. Goharian, "Triaging Content Severity in Online Mental-Health Forums", Journal of the Association for Information Science and Technology (JASIST), Special Issue on Biomedical Information Retrieval, Volume 68, Issue 11, November 2017.
- A. Cohan, S. Young, and N. Goharian, Triaging Mental Health Forum Posts, In Proceedings of the NAACL HLT 3rd Computational Linguistics and Clinical Psychology - From Linguistic Signal to Clinical Reality Workshop (CLPsych’16). June 2016.
Domain Specific Search & Mining: Mining Social Media for Healthcare
Online discussions of virtually all topics are increasing; this phenomenon is ever more so in the domain of healthcare. Individuals today are rapidly and steadily posting remarks regarding their individual and their loved-ones' health on a diversity of social media. Given these publicly available statements, there is interest and potential to harness these sources to further our knowledge and understanding about drug behavior. We focus on using several drug related and other social media sites and general Web sites to detect expected and unexpected adverse reactions to drugs. To understand users' intentions, we utilize consumer medical terminology from UMLS to generate an adverse reaction synonym set we use to identify both expected adverse reactions, as already recorded by the FDA, and unexpected adverse reactions mentioned in online reviews. Background language is utilized to evaluate the strength of the detected unexpected ADRs.
- A.Yates, A. Kolcz, N. Goharian and O. Frieder, Effects of Sampling on Twitter Trend Detection, In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’16), May 2016.
- A. Yates, J. Joselow, N. Goharian, The news cycle's influence on social media activity, International AAAI Conference on Web and Social Media (ICWSM). May 2016.
- A. Yates, N. Goharian, O. Frieder, Learning the relationships between drug, symptom, and medical condition mentions in social media, International AAAI Conference on Web and Social Media (ICWSM). May 2016.
- J. Parker, A. Yates, N. Goharian, and O. Frieder, "Health Related Hypothesis Generation using Social Media Data," Social Network Analysis and Mining, Springer, 2015.
- A. Yates, N. Goharian, O. Frieder, "Extracting Adverse Drug Reactions from Social Media", Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), 2015.
- A. Yates, J. Parker, N. Goharian, and O. Frieder, "A Framework for Public Health Surveillance", In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014.
- A. Yates, N. Goharian, O. Frieder, "Relevance-Ranked Domain-Specific Synonym Discovery", in Proceedings of the 36th European Conference on Information Retrieval (ECIR '14), April 2014.
- J. Parker, Y. Wei, A. Yates, O.Frieder, and N. Goharian, "A Framework for Detecting Public Health Trends with Twitter", The 2013 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining, Aug. 2013.
- A. Yates, N. Goharian, O. Frieder, "Extracting Adverse Drug Reactions from Forum Posts and Linking them to Drugs", SIGIR Workshop on Health Search and Discovery, July-Aug 2013.
- A. Yates, N. Goharian and O. Frieder, "Graded Relevance Ranking for Synonym Discovery", Proceedings of the 22nd international conference on World Wide Web (WWW'13), 2013.
- A. Yates and N. Goharian, "ADRTrace: Detecting Expected and Unexpected Adverse Drug Reactions from User Reviews on Social Media Sites", In Proceedings of the 35th European Conference on Information Retrieval (ECIR 2013), 2013.
Categorizing Errors in Clinical Care through Medical Narratives
There is an increasing demand for use of electronic health records and clinical texts, for reasons such as improving health care, public health surveillance, quality measures, and improving medical education. Text categorization and classification is a fundamental task in understanding, mining, and analyzing medical text and it can benefit applications that improve healthcare in general. In this project, we focus on developing text categorization methods to address some of the real-world challenges in healthcare.
- Sean MacAvaney, Arman Cohan, Nazli Goharian, and Ross Filice, "Ranking Significant Discrepancies in Clinical Reports.", European Conference on Information Retrieval (ECIR 2020) bibtex
-
Arman Cohan, Allan Fong, Raj Ratwani, and Nazli Goharian, Identifying Harm Events in Clinical Care through Medical Narratives, ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2017)
-
Arman Cohan, Allan Fong, Nazli Goharian, and Raj Ratwani, A Neural Attention Model for Categorizing Patient Safety Events, European Conference on Information Retrieval (ECIR 2017)
-
Arman Cohan, Luca Soldaini, Nazli Goharian, Allan Fong, Ross Filice, Raj Ratwani Identifying Significance of Discrepancies in Radiology Reports, Workshop on data Mining for Medicine and Healthcare (DMMH) at SDM 2016
Clinical Decision Support Systems
Keeping current given the vast volume of medical literature published yearly poses a serious challenge for medical professionals. Thus, interest in systems that aid physicians in making clinical decisions is intensifying. We explore and evaluate supervised and unsupervised approaches to retrieve relevant medical literature given a medical case report. Furthermore, given the action a health expert is seeking to complete (make a diagnosis, prescribe a treatment, or order a test), we investigate reranking techniques that could provide more appropriate literature.
- L. Soldaini, A. Yates, N. Goharian, "Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model", Proceedings of the 26th ACM International Conference on Information and Knowledge Management (CIKM), 2017.
- L. Soldaini, A. Yates, N. Goharian, "Learning to Reformulate Long Queries for Clinical Decision Support", Journal of the Association for Information Science and Technology (JASIST), Special Issue on Biomedical Information Retrieval, Volume 68, Issue 11, November 2017.
- L. Soldaini, A. Cohan, A. Yates, N. Goharian, O. Frieder, "Retrieving Medical Literature for Clinical Decision Support", Proceedings of the 37th European Conference on Information Retrieval (ECIR 2015), 2015.
- L. Soldaini, A. Cohan, A. Yates, N. Goharian, O. Frieder, "Query Reformulation for Clinical Decision Support Search", Proceedings of the 23rd Text REtrieval Conference Proceedings (TREC), 2015.
- A. Cohan, L. Soldaini, A. Yates, N. Goharian, and O. Frieder, "On clinical decision support", Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 2014.
Clinical Notes Summarization
During the recent years, health providers have extensively turned into deploying digital computational systems into practice. Electronic Health Records (e.g., Radiology Reports) contain overload of useful information that is often not overly utilized due to the specific nature of the health records. The main objective of condensing these reports is to select the content that reflects the main theme of the original report. Summarizing these report may boost the communication between health practitioners through saving their time during a heavy workload. Our goal is to automate the process of generating such summaries in effect of their usefulness for clinical physicians.
-
OntG-Bart: Ontology-Infused Clinical Abstractive Summarization
Sajad Sotudeh, and Nazli Goharian
DocEng '23: Proceedings of the ACM Symposium on Document Engineering, 2023
bibtex - Sajad Sotudeh and Nazli Goharian, Ross W. Filice, Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization., Association of Computational Linguistics (ACL 2020) bibtex
- Sean MacAvaney*, Sajad Sotudeh Gharebagh*, Arman Cohan, Nazli Goharian, Ish Talati, and Ross Filice, Ontology-Aware Clinical Abstractive Summarization., ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) *Equal contribution bibtex
Scientific Document Summarization
Due to the expanding rate at which articles are being published in various scientific fields, it has become difficult for researchers to keep up with the new developments. Scientific summarization aims to facilitate this problem. One useful strategy for scientific summarization is citation based summarization in which citations to a reference article are used to generate the summary of the reference paper. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article’s content. Our goal is to overcome this problem by providing the appropriate context for the citations and utilize this information towards extractive summary of the article. We have also shown that using scientific article’s inherent discourse structure can help improving the quality of the generated summaries. We are currently investigating approaches for development of more robust general summarization and scientific summarization routines. As another line of research in scientific domain area, we are investigating methods to produce long extended summaries for scientific papers.
-
Sajad Sotudeh, Nazli Goharian, TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation,
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: (NAACL 2022)
- Sajad Sotudeh, Arman Cohan, and Nazli Goharian, On Generating Extended Summaries of Long Documents, The AAAI-21 Workshop on Scientific Document Understanding (SDU 2021)
- Sajad Soutdeh, Arman Cohan, and Nazli Goharian, GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents, Workshop on Scholarly Document Processing (SDP 2020)
- A. Cohan, F. Dernoncourt, D. S. Kim, T. Bui, S. Kim, W. Chang, and N. Goharian, "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents.", NAACL-HLT 2018.
- A. Cohan and N. Goharian Scientific Document Summarization via Citation Contextualization and Scientific Discourse, International Journal on Digital Libaries (IJDL), 2018 [bibtex]
- A. Cohan and N. Goharian Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), [bibtex]
- A. Cohan and N. Goharian, Revisiting Summarization Evaluation for Scientific Articles, In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’16), May 2016.
- A. Cohan and N. Goharian, "Scientific Article Summarization Using Citation-Context and Article's Discourse Structure" Empirical Methods for Natural Language Processing (EMNLP) 2015.
- A. Cohan, L. Soldaini, and N. Goharian, "Matching Citation Text and Cited Spans in Biomedical Literature: a Search-Oriented Approach," NAACL-HLT, 2015.
- A. Cohan, L. Soldaini, S. Mengle, and N. Goharian, "Towards Citation-Based Summarization of Biomedical Literature," Text Analysis Conference (TAC), 2014.
Domain Specific Information Extraction
We examine approaches for extracting salient entities and relations from varous domains, from clinical notes to scientific literature.
- S. MacAvaney, L. Soldaini, A. Cohan, and N. Goharian, "Tree-LSTMs for Scientific Relation Classification", International Workshop on Semantic Evaluation (SemEval 2018).
- S. MacAvaney, A. Cohan, and N. Goharian, "A Framework for Cross-Domain Clinical Temporal Information Extraction", International Workshop on Semantic Evaluation (SemEval 2017). bibtex
- A. Cohan, K. Meurer, and N. Goharian, "Temporal Information Processing for Clinical Narratives", International Workshop on Semantic Evaluation (SemEval 2016). bibtex
Neural Information Retrieval
How well can neural approaches rank documents, and can they be more effective than conventional approaches? Here, we tackle the difficult and interesting problems related to structuring, training, and evaluating neural information retrieval rankers.
-
Genetic Generative Information Retrieval
Hrishikesh Kulkarni, Zachary Young, Nazli Goharian, Ophir Frieder, Sean MacAvaney
DocEng '23: Proceedings of the ACM Symposium on Document Engineering, 2023
bibtex -
Lexically-Accelerated Dense Retrieval
Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder
ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)
bibtex - S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, O. Frieder, "Expansion via Prediction of Importance with Contextualization.", ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020) bibtex
- S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, O. Frieder, "Efficient Document Re-Ranking for Transformers by Precomputing Term Representations .", ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020) bibtex
- S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, O. Frieder, "Training Curricula for Open Domain Answer Re-Ranking.", ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020) bibtex
- S. MacAvaney, L. Soldaini, and N. Goharian, "Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning.", European Conference on Information Retrieval (ECIR 2020) bibtex
- S. MacAvaney, A. Cohan, N. Goharian, and R. Filice, "Ranking Significant Discrepancies in Clinical Reports.", European Conference on Information Retrieval (ECIR 2020) bibtex
- S. MacAvaney, "OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline.", Web Search and Data Mining 2020 (WSDM 2020, demo) software bibtex
- S. MacAvaney, A. Yates, A. Cohan, and N. Goharian, "CEDR: Contextualized Embeddings for Document Ranking.", ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) bibtex
- S. MacAvaney, A. Yates, K. Hui, and O. Frieder, "Content-Based Weak Supervision for Ad-Hoc Re-Ranking." ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) bibtex
- S. MacAvaney, A. Yates, N. Goharian, and O. Frieder, "PACRR Gated Expansion for TREC CAR 2018", Text REtrieval Conference (TREC), 2018. bibtex
- S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, "Overcoming Low-Utility Facets for Complex Answer Retrieval", Information Retrieval Journal, Special Issue on Knowledge Graphs and Semantics in Text Analysis and Retrieval bibtex
- S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, "Characterizing Question Facets for Complex Answer Retrieval", ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018).
- S. MacAvaney, K. Hui, and A. Yates, " An Approach for Weakly-Supervised Deep Information Retrieval", Workshop on Neural Information Retrieval (Neu-IR '17) at SIGIR 2017. bibtex
Complex Answer Retrieval
Not all questions can be answerred with a simple "factoid" answer; many questions require answers that provide considerable context, nuance, and opposing ideas. In Complex Answer Retrieval (CAR), we examine approaches for building complete answers to these tough questions.
- S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, O. Frieder, "Training Curricula for Open Domain Answer Re-Ranking.", ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020) bibtex
- S. MacAvaney, A. Yates, N. Goharian, and O. Frieder, "PACRR Gated Expansion for TREC CAR 2018", Text REtrieval Conference (TREC), 2018. bibtex
- S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, "Overcoming Low-Utility Facets for Complex Answer Retrieval", Information Retrieval Journal, Special Issue on Knowledge Graphs and Semantics in Text Analysis and Retrieval bibtex
- S. MacAvaney, A. Yates, A. Cohan, L. Soldaini, K. Hui, N. Goharian, and O. Frieder, "Characterizing Question Facets for Complex Answer Retrieval", ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018).
- S. MacAvaney, A. Yates, and K. Hui. "Contextualized PACRR for Complex Answer Retrieval", Text REtrieval Conference (TREC), 2017.
Search in Adverse Environments
Within information retrieval, there exist situations where it is difficult to retrieve relevant information. Such cases include searching datasets lacking query logs, searching a multilingual dataset, or querying a corrupted document set. We refer to these situations as search in adverse environments. Searching in adverse environments is problematic for traditional search techniques because of a lack of data and/or data corruption. We identified two adverse environments: 1) word level corruption, and 2) document level corruption. Using an unsupervised, language independent approach, we have made statistically significant improvements within the first environment over the typically deployed systems and nearly match state-of-the-art supervised research. Additionally, we have early experimental results suggesting similar findings for the second environment. We aim to make contributions to each environment to aid in the successful retrieval of relevant information.
- J. Soo and O. Frieder, Searching Corrupted Document Collections. In International Workshop on Document Analysis Systems (DAS) XII, April 2016.
- J. Soo and O. Frieder, Revisiting Known-Item Retrieval in Degraded Document Collections. In Document Recognition and Retrieval (DRR) XXIII, February 2016.
- J. Soo and O. Frieder, "On searching misspelled collections", Journal of the Association for Information Science and Technology, 2014
- J. Soo, "A Non-Learning Approach to Spelling Correction in Web Queries", Proceedings of the 22nd international conference on World Wide Web (WWW'13), 2013.
- J. Soo, O. Frieder, "On Foreign Name Search", 32nd European Conference on Information Retrieval (ECIR'10), Milton Keynes, UK, March 2010.
- J. Soo, R. Cathey, O. Frieder, M. Amir, and G. Frieder, "Yizkor Books: A Voice for the Silent Past," ACM 17th Conference on Information and Knowledge Management (CIKM), Napa Valley, California, October 2008.
Sentiment Analysis
- A. Yates, N. Goharian, W. Yee, "Semi-supervised Sentiment Analysis: Merging Labeled Sentences with Unlabeled Reviews to Identify Sentiment", American Society for Information Science and Technology (ASIST), Nov 2013.
- J. Parker, A. Yates, N. Goharian, "Efficient Estimation of Aspect Weights", In proceedings of ACM 35th Conference on Research and Development in Information Retrieval (SIGIR'12), August 2012.
Detecting Relationships among Categories
Knowledge of relationships among categories is of interest in different domains such as text classification, content analysis, and text mining. We propose and evaluate approaches to effectively identify relationships among document categories. Our proposed novel method capitalizes on the misclassification results of a text classifier to identify potential relationships among categories. This leads to a relationship network. We demonstrate that our system detects such relationships, even those relationships that assessors failed to identify in manual evaluation. Furthermore, we favorably compared the effectiveness of our methods with the state of art method and demonstrated a significant improvement in precision and recall. Furthermore, we are interested to discover interesting relationships in the existing hierarchical knowledge representations. The hierarchical nature of existing Web directories, ontologies, and folksonomies, are known to provide meaningful information that guide users and applications. We hypothesized that such hierarchical structures provide richer information if they are further enriched by incorporating additional links besides parents, and siblings, namely, between non-sibling nodes. We call such structure a networked hierarchy. Our empirical results indicate that such a networked hierarchy introduces interesting links between nodes (non-sibling) that otherwise in a hierarchical structure are not evident.
- N. Goharian, S. Mengle "Networked Hierarchies for Web Directories", 20th International World Wide Web conference (WWW'11), March 2011.
- S. Mengle, N. Goharian, "Detecting Relationships among Categories using Text Classification", Journal of American Society for Information Science and Technology (JASIST), 61 (5), May 2010 (pp 1046-1061).
- S. Mengle, N. Goharian, A. Platt, "Discovering Relationships among Categories using Misclassification Information", ACM 23rd Symposium on Applied Computing (SAC), March 2008.
Contextual Search
There has been a growing interest in contextual (personalized & location-specific) search. We propose a learning to rank model that combines general, city-specific, and personalized information. This model is used to produce a personalized and city-specific resultset by reranking location-specific results retrieved from the open Web.
- A. Yates, D. DeBoer, H. Yang, N. Goharian, S. Kunath, O. Frieder, "(Not Too) Personalized Learning to Rank for Contextual Suggestion", In Proceedings of TREC 2012 Contextual Suggestion Track, November 2012.
Passage Detection
Passages can be hidden within a text to circumvent their disallowed transfer. We explore the methodology to detect such hidden passages within a document. A document is divided into passages using various document splitting techniques, and a text classifier is used to categorize such passages. We present a novel document splitting technique called dynamic windowing, which significantly improves precision, recall and F1 measure.
- S. Mengle and N. Goharian,"Passage Detection Using Text Classification", Journal of American Society for Information Science and Technology (JASIST), 60 (4), March 2009.
- N. Goharian, S. Mengle, "On Document Splitting in Passage Detection," In proceedings of 31st Conference on Research and Development in Information Retrieval (ACM SIGIR 2008), July 2008.
- S. Mengle, N. Goharian, "Detecting Hidden Passages from Documents", In proceedings of SIAM Conference on Data Mining (SDM 2008) Workshop., April 2008
Query Session Analysis
We developed and evaluated our approach that utilized our earlier research on identifying the relationships among topics, now to understand the topic of user queries and intent given sequence of user queries from a session or multiple sessions. The context of the session queries is utilized to improve the effectiveness of identifying the intent or topic of current query. Earlier efforts utilized fixed number of preceding queries to derive such contextual information. We proposed and evaluated an approach (DQW) that identifies a set of "unambiguous" preceding queries in a dynamically determined window to utilize in classifying an ambiguous query to a topic. Furthermore, utilizing a relationship-net (R-net) that represents relationships among known topics, we improved the classification effectiveness for those ambiguous queries whose predicted topic in this relationship-net is related to the topic of a query within the window. Our results indicated that the hybrid approach (DQW+R-net) statistically significantly improves the Conditional Random Field (CRF) query classification approach when static query windowing and hierarchical taxonomy are used (SQW+Tax), in terms of precision (10.8%), recall (13.2%), and F1 measure (11.9%). The findings of this research can improve our understanding of user query intent and consequently the search results.
- D. Guan, H. Yang, N. Goharian, "Effective Structured Query Formulation for Session Search," In Proceedings of TREC 2012, November 2012.
- N. Goharian, S. Mengle, "Context Aware Query Classification Using Dynamic Query Window and Relationship Net", In proceedings of 33rd Conference on Research and Development in Information Retrieval (ACM SIGIR), July 2010.
Personalized Ranking of Twitter Friends: Who to follow or not to follow
One of the challenges for the users of social media, such as in Twitter, is the fast growing number of people each user is following. The features available in Twitter provide meaningful information that can be harvested to provide a ranked list of "friends" (i.e., followees) to each user. We hypothesize that retweet and mention features can be further enriched by incorporating both temporal and additional/indirect links from within user's community.
- Y. Zhu and N. Goharian, "To Follow or Not to Follow: A Feature Evaluation", Proceedings of the 22nd international conference on World Wide Web (WWW'13), 2013.