Phd Personal Statement Essay - University of Washington
# Personal Statement: Suchin Gururangan https://suchin.io The natural language processing (NLP) community has made major progress towards general-purpose models for understanding natural language (Liu et al., 2019; Radford et al., 2018; Peters et al., 2018; Devlin et al., 2018). While these new classes of models have closed the gap between human and machine performance on many tasks, they also raise important cautionary questions about how we accurately measure progress in the field. Models have been shown to be brittle (Jia and Liang, 2017), datasets are hampered by biases (Torralba et al., 2011), and many results are difficult to reproduce (Lipton and Steinhardt, 2018). Furthermore, as NLP technologies become more useful to the public at large, practical questions must be addressed. Modern NLP is expensive (Strubell et al., 2019), but the community has grown to include a wide variety of practitioners – many of whom have smaller budgets for compute. Many researchers work in non-standard domains that evaluation paradigms rarely capture (Plank, 2016). I am excited to pursue these important problems of evaluation and real-world NLP during my PhD. I have had the fortune to do initial work in some of these areas, and I hope to continue working on projects inspired by these problems during my PhD. ## Evaluation and Reproducibility In my first NLP research project at the University of Washington (UW), I was studying natural language inference (NLI) relations (e.g. entailments and contradictions). We discovered that in the most popular NLI datasets, annotators used a set of stylistic tricks, which we called annotation artifacts (Gururangan et al., 2018). We also showed that neural models overfit to these artifacts of data collection rather than the inference task at hand. This project heavily impacted my research trajectory, as it honed my interests in exploring model evaluation beyond global test accuracy, which can belie true, potentially unwanted, model behavior. The brittleness of models is in strict contrast to mammalian neurobiology, which is extremely robust to variable sensory distributions. As an undergraduate research assistant at the University of Chicago, I helped show that neurons that are not specialized for 3-D arm movement can be explicitly coordinated for such tasks with a brain machine interface (Vaidya et al., 2017). I also published a paper showing that while distinct areas of the neocortex are specialized to different sensory input, their neural circuits are highly stereotyped (Gururangan et al., 2014). To encourage the development of more robust models, I believe we should improve our measurement of generalization performance. For example, differences in machines, software frameworks, hyperparameter choices, and even random initializations can have a significant effect on model performance. At AI2, we published a paper that advocates for better reporting of experimental results and proposes a budget-aware evaluation of models (Dodge et al., 2019). As another opportunity to improve evaluation: our datasets are mostly static, which contrasts with the dynamic, constantly changing sensory environment that mammals encounter. Static datasets encourage cheating, overfitting to the development set, and other issues that hinder fair model comparisons (Gorman and Bedrick, 2019). Is it possible to build live datasets, that evolve over time with new data and splits? ## Low-Cost NLP At UW, I wanted play with pretrained language models, but it was difficult to tinker around with them without the necessary GPUs. This is a common story for many researchers: compute is a scarce resource. At AI2, we introduced a lightweight pretraining framework called VAMPIRE (Gururangan et al., 2019). By leveraging the fact that text can be modeled as a bag-of-words, instead of as a sequence, we trained a set of feedforward networks to reconstruct those input representations. We showed that this simple self-supervised paradigm works pretty well for text classification tasks, while not breaking the bank: one can train a VAMPIRE model with a CPU in a reasonable amount of time. I hope to continue working on problems that involve making NLP technologies more accessible to researchers without large budgets. For example, I am currently mentoring a UW CSE master's student on a project around using VAMPIRE to predict Verbal Autopsy, a naturally occurring low-resource task important to clinical workers in developing countries (McCormick et al., 2016). ## Domain Adaptation Deeply tied to the idea of low-compute NLP is domain adaptation. Re-training large models like BERT from scratch is very expensive, so it is important to have techniques to efficiently adapt existing models to new data distributions. I've been excited about working in distant domains since my master's thesis on polyglot text classification (Gururangan, 2018). But what does it mean for text to reflect a domain? I am leading a project, in submission at ACL, in which we propose that domains are hierarchical constructs that are built around variations of language used in a task, which we call micro-domains. We show that successive fine-tuning of a language model to unlabeled corpora that correspond to these distinct levels of this hierarchy maximizes generalization performance on many high and low-resource classification tasks. I believe these ideas have interesting avenues of future work: is it possible to generate domains by selecting data from unlabeled corpora? This sort of work can help augment model performance under low-resource tasks in which substantial in-domain data is unavailable. ## Why get a PhD? My path into NLP was a winding one. After stints in neuroscience and software engineering, my decision to return to graduate school for NLP was impelled by a lingering curiosity of emergent neural processes (such as language) and my experience in applied machine learning. NLP seemed like the right interdisciplinary fit, where I could exercise basic scientific curiosity and build impactful tools for practitioners. I believe that a PhD program will provide me more structure to pursue the theses I'm excited about, and the network of collaborators and mentors, to help close this gap. My career plans are to lead a NLP research group as a Principal Investigator in university or industry. A PhD will be a great stepping stone towards this goal. As a research mentor to multiple master's students at UW, I have enjoyed the process of teaching and advising younger students. I hope to explore more options in teaching and advising throughout my graduate program. ## Working at UW The UW CSE program is great fit for me. I'm particularly excited to continue working with Noah Smith, as we are generally interested in problems of low-compute NLP, adapting NLP models to different domains, and evaluation problems. I'm also excited to work with Luke Zettlemoyer, due to our overlap in interests in making language model pretraining more efficient and accessible to all. ## References Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A. Smith. 2019. Show your work: Improved reporting of experimental results. In EMNLP. Kyle Gorman and Steven Bedrick. 2019. We need to talk about standard splits. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 2786–2791. Suchin Gururangan. 2018. Master's thesis: Polyglot text classification with neural document models. Master's thesis, University of Washington. Suchin Gururangan, Tam Dang, Dallas Card, and Noah A. Smith. 2019. Variational pretraining for semisupervised text classification. In ACL. Suchin Gururangan, Alexander J. Sadovsky, and Jason N. MacLean. 2014. Analysis of graph invariants in functional neocortical circuitry reveals generalized features common to three areas of sensory cortex. In PLoS Computational Biology. Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, and Noah A. Smith. 2018. Annotation artifacts in natural language inference data. In NAACL-HLT. Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328. Zachary C Lipton and Jacob Steinhardt. 2018. Troubling trends in machine learning scholarship. arXiv preprint arXiv:1807.03341. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke S. Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. ArXiv:1907.11692. Tyler H McCormick, Zehang Richard Li, Clara Calvert, Amelia C Crampin, Kathleen Kahn, and Samuel J Clark. 2016. Probabilistic cause-of-death assignment using verbal autopsies. Journal of the American Statistical Association, 111(515):1036–1049. Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365. Barbara Plank. 2016. What to do about non-standard (or non-canonical) language in nlp. ArXiv, abs/1608.07836. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243. Antonio Torralba, Alexei A Efros, et al. 2011. Unbiased look at dataset bias. In CVPR, volume 1, page 7. Citeseer. Mukta Vaidya, Karthikeyan Balasubramanian, Joshua Southerland, Islam Badreldin, Ahmed Eleryan, Kelsey Shattuck, Suchin Gururangan, Marc Slutzky, Leslie Osborne, Andrew Fagg, Karim Oweiss, and Nicholas G Hatsopoulos. 2017. Emergent coordination underlying learning to reach to grasp with a brain-machine interface. In Journal of neurophysiology.