Statement of Purpose Essay - Princeton
Statement of Purpose - Princeton CS Zexuan Zhong (zexuan2@illinois.edu) I am currently a fully-funded Master’s student at UIUC under supervision of Professor Tao Xie. I want to pursue a Ph.D. in Computer Science, and my career aspiration is to become a professor. I am interested in Machine Learning and Natural Language Processing (NLP). I have contributed to publications in EMNLP’18, AAAI’18, HotNets’18, and a submission in EuroS&P’19. I enjoy taking part in programming contests to improve my algorithmic background. I have won three Gold Medals in ACM-ICPC Regionals, and will attend the ICPC World Final in April 2019. I am honored to be awarded as a Siebel Scholar this year (5 awardees among all CS and ECE graduate students at UIUC). My motivation to pursue Ph.D. comes from my experience working on several substantial research projects. I’ve worked in groups at Peking University, UIUC, MIT, and Microsoft Research. I’ve been working on two NLP-related projects, from which I published two first-author papers in EMNLP’18 and AAAI’18. I also have research experience in adversarial machine learning and application of machine learning in wireless system. These experiences have provided great opportunities for me to explore research interests and sharpen research skills. NLP-related Projects During my internship in Big Data Mining Group at Microsoft Research Asia, I got familiar with machine learning by studying and implementing a lot of relevant algorithms. I led a research project, and the objective of the project is to link identical users across different social networks. My work mainly addresses two challenges of this problem. First, aligning user attributes is very hard because user attributes from different networks are usually defined and formatted very differently. Second, collecting manually linked user pairs is intractable due to the high cost or/and user privacy issues. I developed an unsupervised co-training framework for this problem, and designed a measurement to align attributes based on sequence-to-sequence learning. I evaluated my approach on real data, and ultimately 140K Microsoft employees have been linked to their LinkedIn accounts automatically. I published this work in AAAI’18 [1]. This was the starting point of my machine-learning-related research experience and I have pursued my passion in doing research ever since. With Professor Tao Xie in the Automatic Software Engineering Group at UIUC, I’ve been working on a subtask of program synthesis and semantic parser, i.e., generating regular expressions from natural language specifications. In this project, I first collected a real-world dataset from a regular expression library, and evaluated state-of-the-art approach on it to develop insights for this topic. From this empirical study [2], which I published in AAAI’18 workshop, I was inspired by a fact that existing approaches have a syntax-based objective, whereas the task objective is to generate any semantically correct one. I found that the inconsistency of training objective and task objective had negative impacts on the final performance. To address the problem, I used test-generation techniques to measure the semantically correctness given a regular expression, and used reinforcement-learning techniques to maximize a semantics-based objective. The approach has outperformed previous approaches, and I published a paper at EMNLP’18 [3]. Other Machine Learning Projects I also applied machine learning in wireless system during my summer internship at MIT Media Lab, working with Professor Fadel Adib. In this project, we focused on a real-world problem: can we sense food quality and safety using wireless signals? One of the major contributions I made in this project was to develop a machine-learning system to analyze the wireless signal and determine whether the food has any issue. Our system has enabled people to accurately measure the quality and safety of baby formula and alcohol. Our work has been published at HotNets’18 [4]. Most recently, I have started working on improving robustness against adversarial examples for deep learning models. Adversarial examples refer to slightly perturbed inputs that can easily mislead deep learning systems. The key idea of the project is to use multiple complementary models, across which adversarial examples cannot transfer, to detect or defend adversarial attack. To build such multiple complementary models, I have investigated to use adversarial training or add well-designed regularization terms in the objective function. We have submitted a paper to EuroS&P’19 [5]. Currently, I am working on training certifiable complementary models. The idea is to solve a min-max game to guarantee the complementation. This non-convex problem is approximated to be a linear-programming problem, and we have proposed an efficient approach to solve it. Research Interests I am interested in various topics within NLP (e.g., semantic parsing, representation learning, QA systems) and adversarial machine learning, both of which lie in the intersection of my passion and capability. There are several professors at Princeton whose projects are especially appealing to me: Professor Danqi Chen and Professor Karthik Narasimhan (NLP); Professor Prateek Mittal (adversarial machine learning). After reading several papers in each of these groups, I see a clear fit for my skills and interests at Princeton and am confident that it is a great place for me to pursue a Ph.D. Publications [1] Z. Zhong, Y. Cao, M. Guo, and Z. Nie. CoLink: An Unsupervised Framework for User Identity Linkage. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018. [2] Z. Zhong, J. Guo, W. Yang, T. Xie, J.-G. Lou, T. Liu, and D. Zhang. Generating Regular Expressions from Natural Language Specifications: Are We There Yet? In AAAI-18 Workshop on NLP for Software Engineering (NL4SE), 2018. [3] Z. Zhong, J. Guo, W. Yang, J. Peng, T. Xie, J.-G. Lou, T. Liu, and D. Zhang. SemRegex: A semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications. In 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [4] U. Ha, Y. Ma, Z. Zhong, T.-M. Hsu, and F. Adib. Learning Food Quality and Safety from Wireless Stickers. In Seventeenth ACM Workshop on Hot Topics in Networks (HotNets), 2018. [5] S. Srisakaokul, Z. Zhong, Y. Zhang, W. Yang, and T. Xie. MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural Networks. In 4th IEEE European Symposium on Security and Privacy (EuroS&P), Under Review, 2019.