Back to All Essays

Statement of Purpose Essay - University of Washington

Program:Phd, NLP ML Systems
Type:PHD
License:CC_BY_NC_SA_4_0
Source: Public Success Story (Tim Dettmers)View Original

Statement of Purpose for PhD in CS at the University of Washington: Tim Dettmers My primary research interest is natural language understanding – to bridge computer and human information processing, natural language is the most powerful interface. My secondary research interest is deep learning performance – deep learning algorithms that leverage large datasets can exhibit human-level performance, but to scale training to the vast amounts of human knowledge it is integral to develop fast, efficient algorithms. What differentiates me from other candidates is my relentless drive and determination to overcome obstacles. I failed to graduate from high school due to being dyslexic and only gained eligibility for regular universities six years later. Despite this, I have completed several international research internships in both my primary and secondary research areas and published my work as first author at Tier A conferences – ICLR and AAAI. For my academic and research accomplishments, I have been awarded a Google scholarship. I play an active role in the machine learning community and frequently blog about deep learning – my blog attracts over 1000 unique visitors per day. I gathered my first research experience in deep learning performance research in 2013 when I caught interest in the work of Collobert et al. (2011, JMLR) who trained a neural network to learn word embeddings for various natural language processing tasks. However, Collobert et al. only used shallow neural networks and after the success of AlexNet (Krizhevsky, Sutskever, & Hinton; NIPS2012) my aim became to replace the shallow model in their architecture with a deep model. However, training deep networks on this task takes multiple months due to the large datasets. As such, I set it as my first research project to speed up the training of deep networks by developing parallelization algorithms. Due to my academic isolation during this time, I was unable to find an advisor for this project so I decided to pursue it on my own. Journeying through this research project without an advisor was arduous and I got stuck several times over the span of 2 years, but in turn, I gained expert knowledge about deep learning parallelization algorithms and their problems, GPU programming, GPU cluster design, and the interplay between GPU hardware and software. In the case that others wanted to pursue a similar route of research, I documented my experiences in blog posts on my research blog http://timdettmers.com/. After writing my blog posts, I continued my research on parallelization algorithms for deep learning. I developed a parallelization algorithm which compresses gradients to 8-bit of information without degrading predictive or training performance. This allowed much faster training of deep models on GPU clusters especially using model parallelism where one splits the model into parts over the whole GPU cluster. I published my findings at the conference track of ICLR2016 (25% acceptance rate). In the meanwhile, my blog caught the attention of a researcher at Microsoft Research (MSR), Kenneth Tran, who invited me to do an internship. At MSR, I worked with the head of the deep learning software group, Chris Basoglu, on improving computational performance. In particular, I worked on memory swapping where GPU memory was swapped to the CPU when it is not needed and swapped back into the GPU just before it was needed in convolutions or matrix multiplications. For convolutional networks, this algorithm reduces the memory footprint by 80% while decreasing performance by only 10-20%. After my work in deep learning performance, I aimed to apply my skills to advance natural language understanding research in areas which were thought to be computationally unfeasible. I interned with Sebastian Riedel who leads the Machine Reading group at University College London (UCLMR). When I worked with the team at UCLMR, I learned about computational problems for the task of predicting missing links in knowledge graphs. With my skills, I sped up the evaluation algorithm of knowledge graph link predictors by more than 300x. With such an advance in evaluation speed, I was able to build the first multi-layer model which involved a convolutional layer, named ConvE. I could also expose weaknesses in commonly used knowledge graph datasets – which are now no longer used by the community – by developing a simple logic-based model that exploits patterns in the test set. My work was just short of NIPS acceptance (7/6/7) and is now published at AAAI2018 (25% acceptance rate). When I was invited back to MSR, I explored how ConvE can help to model relationships in spoken language. I used automatic knowledge base construction to build a personalized knowledge graph for each speaker. Then I would use ConvE to model this graph and with the trained model predict how consistent an utterance of a speaker is compared to the information in the personalized knowledge graph. In my future work, I want to leverage my skills in deep learning performance to push the boundaries of model and dataset size in natural language understanding and knowledge graph modeling. Training at such scales might yield models that approach human-level understanding of language. In particular, I am interested in models that combine information retrieval and reading comprehension to answer questions. Current research in question answering uses deep models to search for an answer in a given document (Seo et al., ICLR2017), but these models can easily be fooled by adversarial examples (Jia & Liang, EMNLP2017) because they overly rely on the documents given to them. This problem could be alleviated by developing methods that actively search for relevant documents rather than to only rely on given ones. Initial steps into this search component were done by Chen et al., (ACL2017), but due to runtime performance bottlenecks, which were also noted Nogueira and Cho (EMNLP2017), they were not able to train the information retrieval system jointly with the question answering model. I worked on this problem after returning to UCLMR and developed an optimized information retrieval system which is 210x times faster than the fastest open source solution (Lucene). I hope to continue this work in my PhD. Given my interests in question answering, knowledge graphs, and deep learning performance, Luke Zettlemoyer, Hannaneh Hajishirzi, and Ali Farhadi would be excellent PhD advisors. I would like to build on Luke Zettlemoyer’s research into knowledge graphs (Levy et al., CoNLL2017; He et al., ACL2017) and question answering (Joshi et al, ACL2017; He et al., EMNLP2015; Fader et al., SIGKDD2014). I find Hannaneh Hajishirzi’s recent work in question answering impressive (Seo et al., ACL2017; Seo et al., ICLR2017; Kembhavi et al, CVPR 2017). Ali Farhadi would be an excellent advisor on deep learning performance as he did important work in this area (Rastegari et al, ECCV2016; Redmon et al., CVPR2016; Redmon et al., CVPR2017). Please also consider the following: Only one university allowed me to study without a high school degree, The Open University, which lacked a bachelor thesis component and offered little interaction between students and faculty. Due to this, I found myself unable to secure a strong third recommendation letter during the short time in my masters. During my bachelor studies, I had to study and work full-time for 16 months to support myself. Due to my disability, I feel that I did worse in master’s courses that had major parts of hand-written communication. I believe my extensive prior research experience and my ability to sustain focus on difficult problems for long periods of time make me a strong candidate for your program.