Back to All Essays

Statement of Purpose Essay - Columbia University

Program:Phd, NLP, ML, Causal Inference
Type:PHD
License:CC_BY_NC_SA_4_0
Source: Public Success Story (Sweta Karlekar)View Original

Statement of Purpose Sweta Karlekar PhD Computer Science I am interested in researching the principles of machine learning, focusing on probabilistic foundations for deep learning and causal deep learning as applied to natural language processing (NLP) for social good. With a strong background in undergraduate deep learning research and publications in top NLP conferences, as well as two years of work experience at Meta in applied research for Bayesian inference and causal inference, I am interested in pursuing a Ph.D. from Columbia. I believe it is the fundamental next step for me to contribute to shaping AI for more powerful, responsible, and fair use in applications that will improve human lives. Specifically, research questions I’m interested in exploring include: 1. Probabilistic foundations applied to deep learning: How can we best incorporate human knowledge into priors for Bayesian deep models—especially in areas of vague expert opinion, such as language customs, or in domains with little data, such as endangered languages? How can we effectively utilize uncertainty quantification in deep learning for better fairness and accountability? 2. Causal deep learning: Can the addition of causal knowledge make deep models more robust to adversarial attacks or domain shifts? How can we best exploit or induce causal structure to make deep learning models fairer or more explainable? How can we create principled methods that learn causal relationships from large amounts of observational data, especially text data, when only partial causal relationships are available or understood? Background & Motivation. During my undergraduate career, I had the rewarding experience of publishing as first-author to multiple top conferences in areas involving deep learning and NLP for social good with Dr. Mohit Bansal at UNC Chapel Hill. For this work, which included detecting linguistic characteristics of Alzheimer’s and classifying sexual harassment online, I was granted the 2020 CRA Outstanding Undergraduate Research Award as the first runner-up. I further built my skills as a machine learning researcher and engineer at various internships including MITRE, Disney, Yelp, and Facebook. At these internships, however, I saw various cases of deep learning models being misused or misinterpreted. For example, during my time at Disney, I witnessed NN classifiers giving high-confidence yet incorrect predictions without accounting for uncertainty in demographic groups with little data—a problem that led to the model underperforming severely for marginalized racial groups. At Yelp, I saw instances of high correlation between an advertiser feature and advertiser survival being mistaken as causation, almost resulting in adverse changes to marketing practices. These experiences, along with learning more about the importance of fairness and model critique from Dr. Been Kim through the Google AI Research mentorship program, persuaded me to look closer at the technical limitations of standard deep learning. Neural networks cannot naturally produce uncertainty estimates, distinguish between epistemic and aleatoric uncertainty, or unambiguously incorporate prior beliefs or expert opinions. Further, as the world turns to AI to solve more sensitive and pressing problems, insights about causal structure and counterfactual modes of explanation become essential. During my time as a research scientist on the Bayesian Modeling and Probabilistic Programming Languages team at Meta, I approached machine learning from a statistical lens. While leading our team’s causal inference research pillar, I leaned heavily on the benefits of probabilistic foundations and causal understanding. For my doctoral training, I would like to build on my undergraduate research and work experience to investigate how to best apply the strengths of Bayesian and causal inference to deep learning and NLP. Text is a rich medium, with the ability to offer insights and impact to applications with wide societal impact, including medical sciences, poverty mitigation, sustainability, and the building of safer communities both online and offline. My research interests reflect my excitement to improve deep NLP models by involving more human knowledge in the form of priors and/or causal structure, while also being more uncertainty-aware, robust, and explainable. While I am open to studying the intersection of these fields through a variety of paradigms and project areas, I specify examples of research areas motivated by past research papers and work experience below. Probabilistic foundations for deep learning. Being interested in applications centered around social good, my first paper with Dr. Bansal used three neural models (CNNs, LSTMs, and CNN-RNNs) to aid in the early detection of Alzheimer’s disease (AD) by classifying and analyzing transcripts of patients’ speech. This work was presented at NAACL 2018 and featured in Sebastian Ruder’s Top Paper Picks. Our neural models achieved new independent classification benchmark accuracy, and we further performed interpretability analysis to understand AD patients’ distinctive language patterns. While investigating misclassified examples of the model, I found many required specific prior knowledge of various language customs or an understanding of stages of disease progression. Bayesian deep models—such as deep Gaussian Processes, Variational Autoencoders, or Bayesian NNs—already utilize priors while retaining the power and flexibility of deep models. However, these models traditionally consume priors that are complex and/or convolutedly derived, such as weight-space priors or function-space priors, and it is not immediately clear how expert knowledge like language customs can be translated into tractable probability distributions. My time at Meta building and deploying an out-of-the-box Bayesian causal inference package has shown me that even though Bayesian models offer many benefits, consumers often shy away as they don’t have the requisite knowledge to turn their subjective beliefs into model-understandable priors. This is compounded in settings like deep learning models with high dimensionality. Even those familiar with Bayesian deep models often resort to isotropic Gaussian or similarly “uninformative” priors, which can lead to prior misspecification and unintended consequences during inference. I am interested in investigating prior elicitation to create more principled approaches for building human-understandable priors in deep models for NLP tasks, especially in domains in which there are vague or conflicting opinions, such as medically-affected speech, or little data, such as endangered languages. I hope this will allow for better-specified models and more accessible use for interdisciplinary communities with domain knowledge. Causal deep learning. In later papers with Dr. Bansal, I explored application areas in social media, extending my previous work to the classification of domestic abuse stories in the Reddit Domestic Abuse dataset. This was presented to the Widening NLP (WiNLP) workshop at NAACL, accompanied by interpretability analysis. In the same vein, I later published a paper with Dr. Bansal which was presented orally at EMNLP 2018 and involved creating a novel task and dataset to automatically classify and analyze various forms of sexual harassment, based on stories shared on the online forum SafeCity. I used Local Interpretable Model-Agnostic Explanations (LIME) and embedding visualizations and demonstrated how this helps extract powerful features that can help automatically fill out incident reports and identify unsafe areas in cities. In fact, this paper was included as reading material in Dan Jurafsky’s CS384 Ethical and Social Issues in NLP seminar! Individually, these models achieved state-of-the-art for their given domains. However, beyond Reddit and SafeCity, there are many such forums for sexual harassment and abuse, and even more for related topics such as bullying. One could learn from each environment individually, but since they contain similar latent structures, relying on domain generalization could allow for even more platforms to efficiently benefit from the use of ML models. At Meta, my most recent research project involved building Bayesian priors for a sparse NN propensity score model which mapped the distribution of users who opted into tracking to the distribution of users who opted out, following the assumptions of a covariate shift setting. While that model used low-dimensional advertiser and user features, I am motivated to investigate how we can best use domain generalization and infer causal structures within the high-dimensional feature-space of text data. While learning causal structure in neural networks can lead to increased robustness in unseen domains, spurious correlations in the data can lead to negative effects in downstream tasks. This misspecification of the causal structure can have especially harmful effects in sensitive applications like harassment and bullying. I’m interested in understanding how we can work to induce and incorporate causal-awareness in deep learning to take advantage of explainability and robustness, while also investigating how to best account for uncertainty based on the possibility of misspecification of the causal structure—especially by marrying causally-aware deep learning with Bayesian inference methods. Why a Ph.D. at Columbia? My career objective is to become a professor contributing to a multidisciplinary research lab like the machine learning community at Columbia. I am certain a Ph.D. program at Columbia with access to top minds in many fields and a diverse cohort of students will be a lively and collaborative environment. Additionally, I am excited to pursue a degree that will allow me to gain further teaching and mentoring experience. I am interested in working with David Blei as I feel his work is a great match for my interests in both Bayesian and causal inference, and I would like to collaborate with him on Bayesian causal inference for text data and causally-informed fairness. I would also like to work with Elias Barenboim—notably in his work that focuses on Bayesian nonparametric statistics, causal inference (especially in causal fairness analysis), and causal discovery. I am further interested in working with Kathleen McKeown—notably in her work that focuses on unsafe text detection, detecting actionable insights from text, and social good applications of NLP. I would hope to work with her to see how Bayesian and causal inference methods could be used to further enhance the ability of NLP to address various social issues. Finally, I am also interested in working with Zhou Yu regarding her NLP work and understanding how the addition of priors or causal understanding can help make NLP models more robust for various real-world applications like healthcare and education. Following the work from Columbia’s Machine Learning, Causal AI, and NLP groups has led me to believe it will be an excellent place for me to pursue a Ph.D. and continue my journey through academia. References [1] Vincent Fortuin. “Priors in bayesian deep learning: A review”. In: International Statistical Review (2022). [2] Jakob Gawlikowski et al. “A survey of uncertainty in deep neural networks”. In: arXiv preprint arXiv:2107.03342 (2021). [3] Sweta Karlekar and Mohit Bansal. #MeToo: Neural Detection and Explanation of Language in Personal Abuse Stories. 2018. [4] Sweta Karlekar and Mohit Bansal. “Safecity: Understanding diverse forms of sexual harassment personal stories”. In: arXiv preprint arXiv:1809.04739 (2018). [5] Sweta Karlekar, Tong Niu, and Mohit Bansal. “Detecting linguistic characteristics of Alzheimer’s dementia by interpreting neural models”. In: arXiv preprint arXiv:1804.06440 (2018). [6] Gary Marcus. “Deep learning: A critical appraisal”. In: arXiv preprint arXiv:1801.00631 (2018). [7] Sebastian Ruder. Sebastian Ruder’s Top Paper Picks. 2018. url: https : / / newsletter . ruder . io/issues/nlp- pytorch- libraries- gan- tutorial- jupyter- tricks- tensorflow- things- representation-learning-making-nlp-more-accessible-michael-jordan-essay-reproducing-deep-rl-rakuten-data-challenge-naacl-outstanding-papers-106347. [8] Dan Jurafsky Stanford NLP. CS 384: Ethical and Social Issues in Natural Language Processing. 2020. url: https://web.stanford.edu/class/cs384/.