Back to All Essays

Statement of Purpose Essay - University of Washington

Program:Phd, NLP, ML
Type:PHD
License:CC_BY_NC_SA_4_0
Source: Public Success Story (Ofir Press)View Original

Statement of Purpose for the Computer Science PhD program Ofir Press Getting computers to achieve human-level translation is an incredibly hard challenge. I don’t know if we’ll ever get there, but I do know that I would like to be part of that effort. I plan to pursue a PhD in computer science in order to work on challenging NLP tasks such as translation, language modeling, language generation, and question answering. Once I graduate I intend to continue doing research in an academic setting. In this statement I’ll describe two of the areas that I’d like to continue advancing during my PhD: language modeling and machine translation. Language modeling is a fundamental task in NLP, and also the task that I’ve spent the most time studying and working on. Although recurrent language models are much simpler than translation or question answering models, we still do not fully understand how they work. A testament to this is the frequent publication of papers that find simple LSTM-like recurrent cells that perform better than more complex ones. I believe that by trying to better understand the language models that we use, we’ll be able to improve them. This was the theme of my first paper (Press and Wolf 2017). I started by studying the softmax layer and loss function of recurrent language models, and this led me to discover that the often-ignored softmax matrix actually contains word embeddings that are of higher quality than the ones in the input word embedding matrix (as measured by Simlex999 and other similar benchmarks). I found that when the softmax matrix and input word embedding matrix are tied, the resulting single shared matrix ultimately consists of superior word representations that in turn serve to significantly improve the language model’s performance. This weight tying method has since been adopted in all state-of-the-art language models. I’d like to continue this line of research by investigating alternatives to the softmax layer that is widely used today. I believe that by making the softmax matrix non-constant and context dependent we could improve the performance of language models. A softmax layer that would modify the representations of words to adapt to different contexts could cope with polysemy and achieve superior results. However, naive methods which would recalculate all of the embeddings in the softmax matrix based on each context would be incredibly slow, so an efficient alternative should be found. Translation is another task that I’m very drawn towards, not only because of the importance of translation between natural languages but also because of the fact that many other NLP tasks—such as paraphrasing, summarization, and semantic parsing—can be modeled as a translation task. The success of CycleGAN in unsupervised translation between images of different domains inspired me to understand how we could do the same for natural languages. I wanted to work on the task of translation between natural languages using only monolingual corpora. This led my colleagues and I to write the paper (Press et al. 2017) in which we showed for the first time how to train an RNN to generate text with just the GAN objective. We believed that we were making a first step towards an unsupervised machine translation model. When we started working on this problem, I noticed that while the RNN failed at generating long sequences, it could generate correct subwords when asked to generate just three characters. This led me to discover the main conclusion of the paper, that a curriculum learning approach for training an RNN first on short sequences and then on longer ones vastly improves the quality of the RNN generator. Since then, two papers (Artetxe et al. 2017; Lample et al. 2017) have shown initial results in unsupervised machine translation, through approaches that are different from what we tried. I’d like to explore ways to improve the current results on unsupervised translation, because I believe that unsupervised translation could perform just as well or even better than supervised translation. Progress in supervised translation is hindered by the relatively small size of the aligned datasets. To achieve human-level translation, we must tap into the vast ocean of monolingual corpora. The existing approaches to unsupervised translation operate at the sentence level. I believe these systems could benefit from more context, possibly by trying to translate a few sentences or even a paragraph at once. Translation of rare words is hard in supervised translation, but even more so in unsupervised translation. Because rare words often appear in bursts (Church 2000), adding a larger context would give the unsupervised model more information about the rare words, which would make them easier to translate. The University of Washington’s Natural Language Processing Group is a source of many of the papers that I’ve read in the course of my research. Specifically, I’ve tremendously enjoyed learning from Prof. ... on ... I frequently referenced Prof. ...’s work on ... when I worked on the ... paper. In addition, Prof. ...’s recent work on ... has ignited my interest in understanding how we can build NLP models that are ... For these reasons, I would be delighted to study at the University of Washington in the NLP group. References Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho (2017). “Unsupervised Neural Machine Translation”. In: CoRR abs/1710.11041. Kenneth W Church (2000). “Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p 2”. In: Proceedings of the 18th conference on Computational linguistics-Volume 1. Association for Computational Linguistics, pp. 180–186. Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato (2017). “Unsupervised Machine Translation Using Monolingual Corpora Only”. In: CoRR abs/1711.00043. Ofir Press, Amir Bar, Ben Bogin, Jonathan Berant, and Lior Wolf (2017). “Language Generation with Recurrent Generative Adversarial Networks without Pre-training”. In: 1st Workshop on Learning to Generate Natural Language at ICML. Ofir Press and Lior Wolf (2017). “Using the Output Embedding to Improve Language Models”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, pp. 157–163.