Statement of Purpose Essay - Carnegie Mellon University
Advancements in machine learning models for computer vision are empowering the next generation of cutting-edge technology, including robotics and autonomous vehicles, with the ability to understand and respond to their environments. I wish to pursue a PhD in computer science with a focus in machine learning and computer vision, using statistical learning techniques to build machines capable of visual perception and prediction. Following the completion of my PhD, I intend to pursue a career in research and lead an industry lab, where I aspire to design algorithms and models that increase the efficiency of our day-to-day lives through automation. My motivation for conducting graduate research grew from previous research projects at Northwestern University, which has resulted in multiple first author publications. My past research has investigated metrics for perceptual similarity among visual textures, properties of word embeddings [1], the training of neural language models [2], and the limits of commonsense reasoning [3]. Through these projects, I acquired experience in image processing and deep learning, gaining expertise in representation learning and generative modeling. In particular, I am interested in combining my background in deep learning and image processing to explore how models perceive, reason, and make predictions based on their surroundings. I began my research in deep learning working on natural language processing (NLP) with Professor Douglas Downey in the Northwestern WebSAIL Group. It was here that I familiarized myself with the dense feature representations common in deep learning as I addressed the use of pre-trained word embeddings in neural NLP models. These models frequently rely on pre-trained word embeddings to increase generalizability and reduce training time. However, individual tasks are often domain-specific, so the performance of these models benefits from the use of embeddings containing word representations similar to those contained in the target corpus. In a first-author paper published in EMNLP [1], I developed the VecShare framework for sharing word embeddings online and selecting the optimal embedding for downstream NLP tasks. In this work, I introduced embedding signatures which combine embedding properties (i.e. word similarities) and corpus-level statistics (i.e. word frequencies) to represent an entire word embedding set compactly. The VecShare framework efficiently selects embeddings by computing a similarity between the embedding signatures and a corresponding signature generated from the user’s corpus. In our experiments, we used pre-trained embeddings selected by VecShare in convolutional neural networks to perform document classification and found a significant increase in the accuracy of models trained with the selected embeddings when compared against several baselines. Along with our experimental work, I released an open source library implementing these similarity metrics and sharing protocols (http://vecshare.org). Using the knowledge I gained working with word embeddings, I hope to study representation learning with deep networks in other domains, such as in learning features for object or facial recognition models. In another project with Professor Downey, I investigated importance sampling as a means to select training data for recurrent neural network language models (RNNLMs). Although deep learning models make up most state-of-the-art results in language modeling, they suffer from high computational costs which scale linearly with the length of the training data. We sought to determine whether large language corpora could be subsampled such that RNNLMs could be trained using fewer tokens without the performance loss associated with using less training data. I designed multiple sampling distributions that selected training sentences with higher probability according to their perplexity as determined by a pilot n-gram language model trained on starter data from the corpus. This served as an efficient proxy for loss-based sampling and I found that RNNLMs trained on sequences sampled with my designed distributions produced lower perplexities and loss than comparable n-gram and RNNLM baselines. I published this technique for selecting RNNLM training data in the Association for Computational Linguistics: Student Research Workshop [2]. From this project, I gained invaluable knowledge building large deep, generative networks for learning complex and highly non-convex distributions. Since our initial results, which sampled a static training set, I have continued to work on importance sampling for RNNLMs to adapt our sampling techniques to train RNNLMs in an online learning setting. Additionally, I am working as part of a collaboration in the WebSAIL group to examine the failure modes of deep-learning-based commonsense question-answering systems. To determine the types of questions that are difficult for these systems, we constructed a novel question-answering dataset containing adversarially generated questions. This work is currently under review at NAACL-HLT [3]. I find commonsense reasoning particularly interesting because it confronts the limits of purely textual analysis as it often relies on signals not explicitly stated in text, such as real-world or visual cues. After several research projects in NLP, I wanted to explore how the statistical learning techniques I used to analyze language could be combined with signal processing to model human perception. This past year, I have been investigating metrics of perceptual texture similarity with Professor Thrasyvoulos Pappas. In this work, I am developing an approach for texture classification using structural texture similarity metrics (STSIMs), which are a set of latent features extracted from the steerable filter decomposition of an image. We perform texture classification by computing the Mahalanobis distance between a candidate image and exemplars of each texture class obtained via clustering in the STSIM feature space. I proposed an alternate calculation of the covariance matrix used in the Mahalanobis distance computation that enables the metric to be robust to intraclass variances of the STSIM features. With my proposed method, we have achieved near state-of-the-art results in texture classification. As compared against deep models used to perform the same tasks, our approach provides increased interpretability, while requiring orders of magnitude fewer parameters and training examples as well as significantly reduced training time. In the future, I plan to examine how deep learning can be combined with our existing STSIM features to increase the metric’s generalizability. Developing models for human perception motivated my desire to work on higher level vision tasks in order to give machines the capacity to directly reason about their surroundings. In my doctoral program, I intend to investigate how we can develop agents capable of reasoning about complex scenarios, whether human behaviors or real world environments. My experience analyzing both linguistic and perceptual features, along with my current work in commonsense, provide both a strong background and immediate intuition for tackling these problems. Consequently, I am interested in the work of Professor Yonatan Bisk on both physical and visual commonsense reasoning. Likewise, I am excited by Professor Louis-Philippe Morency’s research on multimodal models to incorporate the wide array of non-textual signals into the reasoning process. Furthermore, I am fascinated by Professor Ruslan Salakhutdinov’s work to improve learned representations through new model and layer architectures. At Carnegie Mellon, I see a clear alignment of my interests and previous research experiences that will enable me to make strong contributions to these groups. References 1. Jared Fernandez, Zhaocheng Yu, Doug Downey. VecShare: A Framework for Sharing Word Representation Vectors. Empirical Methods in Natural Language Processing (EMNLP), 2017. 2. Jared Fernandez, Doug Downey. Sampling Informative Training Data for RNN Language Models. Association for Computational Linguistics: Student Research Workshop (ACL-SRW), 2018. 3. Michael Chen, Alisa Liu, Jared Fernandez, Mike D’Arcy and Doug Downey. Analyzing Language Models for Common Sense: an Adversarially-Authored Dataset. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Under Review at NAACL-HLT), 2019.