Statement of Purpose Essay - University of Washington
I am a Research Fellow with Dr. Manik Varma and Dr. Prateek Jain at Microsoft Research (MSR) India. I am interested in designing novel and generalizable machine learning techniques with emphasis on strong empirical performance as well as interpretability for practical applications. In particular, I am excited about ML (especially deep learning) models pertaining to Vision, Search, Time Series, Recommender Systems and understanding the expressivity and stability of the proposed models (non-convex optimization). For the past one and a half year at MSR, I have been working on developing novel fundamental techniques to enable resource-efficient machine learning. This spans designing of new machine learning models, optimization routines and compression techniques along with a theoretical understanding of the methods proposed. As an undergrad at Indian Institute of Technology (IIT) Bombay, my research with Prof. Soumen Chakrabarti was on large-scale representation learning for efficient and geometrically sensible entity-typing. In the sequel, I will highlight the relevant projects and briefly discuss how they align with my research interests and long-term goals. Resource-efficient Machine (Deep) Learning: A significant part of my research work at MSR India is focused on developing novel solutions for resource-efficient machine learning. Resource-efficient machine learning targets reduction in the resource utilization and prediction costs of machine learning models while maintaining the accuracy thereby making it relevant for real-time applications where there are latency, battery and privacy concerns. It could enable intelligence on the edge (device) for billions of resource-constrained devices that form the Internet of Things (IoT) ecosystem. Traditionally, these devices lack intelligence and stream the time series sensor data to the cloud for decision making pertaining to real-time services. Recurrent Neural Networks (RNNs) are the state-of-the-art for analyzing sequences and time series but their training is unstable due to ill-conditioned gradients. Unitary RNNs and gated RNNs (LSTM, GRU etc.,) address this issue but at the expense of increased prediction costs. This motivated me to develop FastRNN & FastGRNN architectures, which have been published in NeurIPS 2018 [1], that address the twin RNN limitations of inaccurate training and inefficient prediction. FastRNN is a simple extension to the standard RNN with the addition of a residual connection having just two extra scalar parameters which provably stabilizes the training. As a result, FastRNN has almost the same prediction cost as a standard RNN but can be up to 19% more accurate and is also shown to outperform unitary RNNs. FastGRNN is a novel gated RNN developed upon FastRNN with maximum parameter re-utilization and enforcing FastGRNN's parameter matrices to be low-rank, sparse and quantized, using a custom 3-phase joint-optimization routine, resulted in models that could be up to 20-80x smaller than the state-of-the-art RNNs (LSTM, GRU) without compromising prediction accuracy. This allowed FastGRNN to accurately recognize the "Hey Cortana" wakeword with a 1 KB model and to be deployed on severely resource-constrained devices such as the Arduino Uno board which has only 2KB RAM, an 8bit processor at 16MHz and which is too tiny to store other RNN models. Results on 7 other real-world benchmark datasets across domains show that FastGRNN can be a viable replacement to the leading RNNs with 1-6 KB sized models. We open-sourced modular Tensorflow implementations of FastRNN & FastGRNN as part of the EdgeML library [2], of which I am a core contributor and have also developed its C++ and Tensorflow codes of Bonsai [3]. EdgeML has had more than 50,000 hits & 850 clones after being published in September 2017. FastGRNN was successfully utilized at MSR India Summer Workshop 2018: Machine Learning on Constrained Devices, to create services like radar-based poacher detection at mote-scale & voice-based feedback on an IoT device. This project made me realize the importance of revisiting and fixing the fundamental issues in a systematic fashion which is necessary for machine learning research to yield efficient techniques that are well understood for the specific problems being tackled. Large-scale Representation Learning for Entity-Typing: For my undergraduate thesis at IIT Bombay, I worked on efficient and geometrically sensible spatial representations for entity-typing using supervised and self-supervised learning. Order Embeddings (OE) [4] showed the usefulness of learning entity and type embeddings from Knowledge Graphs (KG) in a type ordered higher dimensional space. We investigated and fixed the flaws in OE's evaluation protocol and also proposed a technique to perpetuate type information for the entities through the contexts derived from text corpora. OE's geometry had been modeled as open cones and had fundamental structural flaws which we fixed by using learnt bounding rectangles [5]. We are currently working on effectively incorporating text into this stable framework for type perpetuation. We are also scaling the experiments from wordnet to DBpedia (KG) + Wikipedia (corpus) with a systematic pipeline to overcome any haphazard optimization issues as well as incorporating attention mechanism to further increase the accuracy. The resulting technique will be applicable to down-stream tasks like Knowledge Base Completion (KBC), fine-type tagging and zero-shot type-hierarchy inference for brand-new entities. While working on this problem, I faced various issues pertaining to optimization problems and understood the importance of formulating the right learning objectives necessary for large-scale tasks involving huge datasets like Wikipedia and DBpedia. Currently, I am working on facilitating RNNs to un-roll for long time periods without forced restarts to achieve orders of magnitude savings in the prediction cost of RNNs which use sliding window based prediction for continuous data. We are looking at hierarchical modeling of RNNs as well as robust training routines to tackle this problem. I have also been studying the role of over-parameterization coupled with compression in increasing the accuracy of the learnt DL models. Recently, I have started exploring the field of ML for graphs to assist recommender systems. Apart from the research experiences, I have been a Teaching Assistant for 5 courses at IIT Bombay where I was involved in hands-on lab assistance & problem setting. I was awarded TA of the month twice (Feb '16, '17) for two offerings of Digital Logic Design Lab course by Prof. Supratik Chakraborty, recognizing my efforts in clear dissemination of concepts to the students and in efficient evaluation of the projects. All these enriching experiences have helped me realize my passion for research & teaching during these early stages of my academic career. Goals and Motivation: Over the last few years, it has been increasingly clear that machine learning techniques are quite useful in various domains and disciplines. I've realized that developing ML algorithms for real-world problems demands a systematic approach consisting of understanding the problem, ideation of the solution followed by rigorous experimentation. This cycle will be reiterated as and when we analyze the results and draw insightful conclusions. I would like to couple this with theoretical understanding which might help us improve the solution and provide guarantees. I believe that, with right formulation, machine learning algorithms could present us with end-to-end solutions to many real-world problems while theoretical understanding helps us push the field further in a principled fashion. I wish to pursue my Ph.D. guided by these principles and would like to make an impact by developing strong & interpretable machine (deep) learning algorithms for real-world problems in Vision, Search & Recommender Systems. I feel that together with the excellent faculty and peer group at University of Washington, I shall be able to contribute to identifying and solving interesting real-world problems. I am interested in exploring the field of representation learning which involves learning structurally and temporally rich representations from visual data using supervised, semi-supervised and self-supervised learning in a resource-efficient fashion with Prof. Ali Farhadi to assist the down-stream tasks like perception, image/video understanding, object detection, and activity recognition. I am also interested in problems at the intersection of NLP, ML, and Vision and would like to explore them under Prof. Ali Farhadi and Prof. Hannaneh Hajishirzi. I would like to work on large-scale data mining and machine learning for analyzing social networks and behavioral data to generate actionable insights with Prof. Tim Althoff. I want to delve into non-convex optimization along with the understanding about memory in DL models which capture spatial and temporal information under the supervision of Prof. Sham Kakade. Lastly, I am interested in building large-scale machine learning systems with Prof. Kevin Jamieson and Prof. Carlos Guestrin. In conclusion, I wish to pursue a career in academic research and believe that my experiences at graduate school will be essential in helping me accomplish this goal. References: [1] Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain and Manik Varma. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network. In NeurIPS, 2018. [http://manikvarma.org/pubs/kusupati18.pdf - paper][https://adityakusupati.github.io/docs/FastGRNNPoster.pdf - poster][https://www.youtube.com/watch?v=3ZpCnOWBrio - video]. [2] Aditya Kusupati, Don Dennis, Chirag Gupta, Ashish Kumar, Harsha Simhadri and Shishir Patil. EdgeML: An ML library for machine learning on the Edge, 2017. URL: https://github.com/Microsoft/EdgeML. [3] Ashish Kumar, Saurabh Goyal and Manik Varma. Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things. In ICML, 2017. [4] Ivan Vendrov, Ryan Kiros, Sanja Fidler and Raquel Urtasun. Order-embeddings of images and language. In ICLR, 2016. [5] Sandeep Subramanian and Soumen Chakrabarti. New Embedded Representations and Evaluation Protocols for Inferring Transitive Relations. In SIGIR, 2018. Note: NeurIPS is the new name for NIPS.