Back to All Essays

Statement of Purpose Essay - Carnegie Mellon University

Program:Phd, NLP
Type:PHD
License:CC_BY_NC_SA_4_0
Source: Public Success Story (Shuyan Zhou)View Original

Statement of Purpose Shuyan Zhou shuyanzh@cs.cmu.edu “Shuyan, this is the best gift you have ever given me!” said my mother excitedly, after using an intelligent speaker for half a year. She told me that this product significantly simplifies her life by interacting through natural language, instead of clicking and typing. I am very happy that the field I am working in is changing people’s lives. However, such technology has huge headroom for improvement. For example, the agent does not have very good Chinese support, compared to English; it failed to retrieve proper information when my mom asked questions related to her job. I want to contribute more to our NLP communities so that people, no matter what language they speak and which domain they care about, can equally access language technology. With this in mind, my research goal is to leverage rich knowledge to build robust and generalizable NLP tools, which has motivated my research around two prime questions: 1) how to encode external knowledge to the model across scenarios with different constraints? 2) how to design learning algorithms that efficiently use available supervision? 1 Generalizable Knowledge Representation across Languages Knowledge representation first caught my attention when I worked on English short text (e.g. tweets) entity linking (a.k.a. named entity disambiguation) with Chin-Yew Lin at Microsoft Research Asia [1]. The most challenging part of this project was encoding rich semantic information about entities into embeddings. We addressed this problem by representing an entity as a collection of word n-gram embeddings and interactively matched these embeddings with the n-gram embeddings of the queried named entity’s context. An ablation study indicated the benefits of pretrained word embeddings as they initialized every word with rich semantic information. However, the situation was less favorable when I worked on low-resource cross-lingual entity linking (XEL) as part of the DARPA LORELEI program with Graham Neubig and Jaime Carbonell at Carnegie Mellon University [2]. Resources required to encode entities are not available in low-resource settings. For example, due to a lack of monolingual text corpora, we are unable to obtain high-quality (multilingual) word embeddings. Given these observations, I asked myself: can we design an entity representation that can be applied to all languages, regardless of resource availability? Instead of using target language data, we leveraged ubiquitously-available structured knowledge resources such as English Wikipedia to jointly disambiguate all named entities in a given document. Our new entity representation is fully language-agnostic and can be applied to any language. 2 Robust Learning Algorithms with Limited Supervision With language-independent knowledge representations, an immediate question that came to mind was how to train a model with these representations as input with little or no annotation from the low-resource language. I continued our goal of building an language-independent XEL model that requires zero-resources in the low-resource language. The idea is to design the model under the transfer learning paradigm [2], training on a high-resource language and directly applying the model to the target low-resource language without further fine-tuning. We applied the same technique to [3] and resulted in an end-to-end zero-shot XEL system. Our work significantly extends the capacity of classic XEL systems towards a true zero-resource language-invariant pipeline. Besides exploring transfer learning in building generalizable NLP models across different languages, I also delved into multi-task learning and data augmentation for robust NLP [4]. It is known that the perturbations (e.g. typographical and grammatical errors) minor to humans can hurt a neural machine translation (NMT) model’s performance. However, supervision from the “noisy domain” is often too sparse to train a good NMT model. It is also hard to manually design generalizable denoising rules. To solve this problem, we augmented the large size out-of-domain clean data and designed a cascaded multi-task transformer to first clean the noisy source sentence and then perform translation. Our approach fully automates the denoising process by providing intermediate supervision. 3 Future Plans My ambition is to break technology boundaries and eventually develop generalizable natural language understanding agents for everyday use. I believe that a truly intelligent agent requires 1) rich external knowledge that is mostly environment-invariant to understand the law of the world and 2) models that could properly inject this knowledge to different scenarios with environment-specific information. I am fascinated by the research questions behind these requirements and I am strongly motivated to apply for the Ph.D. program at Carnegie Mellon University. As a Ph.D. student, I hope to create large scale machine-understandable knowledge bases by harvesting external knowledge from diverse resources (e.g., Stackoverflow, WikiHow, Youtube). More specifically, I hope to use information extraction techniques to extract entities and relations; semantic parsing techniques to model procedural knowledge (e.g., code generation from natural language) and multimodal techniques to encode demonstrations (e.g., video-text embedding). Further, I hope to design models that could customize this knowledge for different environments, which requires the interplay between language (e.g., housekeeping instruction) and the environment. I am interested in building models that could take advantage of the properties of knowledge (e.g., compositionality, transitivity) and retrieve related knowledge. The agent could then jointly reason over the knowledge, language meaning and the changing state of the environment during its executions (if any). These techniques could broadly benefit different tasks like instruction following, machine reading comprehension and question answering across different domains. In light of this, I hope to work with Graham Neubig, Yonatan Bisk, Emma Strubell and Eduard Hovy. I believe that I will add significant value to the CMU NLP community and receive, in return, the invaluable opportunity to achieve my research goals in a unique and diverse intellectual environment. I hope one day I can proudly tell my mom “look mom, my ideas and efforts created your favorite product.”. References [1] Feng Nie, Shuyan Zhou, Jing Liu, Jinpeng Wang, Chin-Yew Lin, and Rong Pan. Aggregated Semantic Matching for Short Text Entity Linking. In CoNLL, 2018. [2] Shuyan Zhou, Shruti Rijhwani, and Graham Neubig. Towards Zero-resource Cross-lingual Entity Linking. In DeepLo, 2019. [3] Shuyan Zhou, Shruti Rijhwani, John Wieting, Jaime Carbonell, and Graham Neubig. Improving Candidate Generation for Low-resource Cross-lingual Entity Linking. In TACL 2020, to appear. [4] Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, and Graham Neubig. Improving Robustness of Neural Machine Translation with Multi-task Learning. In WMT, 2019.