Back to All Essays

Statement of Purpose Essay - UC Santa Barbara

Program:Phd, NLP, Speech
Type:PHD
License:CC_BY_NC_SA_4_0
Source: Public Success Story (Michael Saxon)View Original

My aim is to pursue a Ph.D. in Computer Science and become an impactful professor or industrial researcher in natural language processing (NLP) and representation learning. Through major participation in four research groups as an undergraduate in Electrical Engineering and a master’s student in Computer Engineering at Arizona State University, I have built the skills that I will need to be successful in my Ph.D. studies. My work has produced tangible results—I have published in major conferences three times, once as first author [1, 3, 4], presented a AAAI workshop oral [2], have one first-author journal manuscript currently under review [5], and will be presenting at another workshop next February [6]. My motivations for pursuing research in NLP are twofold. First, I enjoy the process and fruits of academic research generally; I like toying with new ideas, formulating experiments, and taking ambitious projects all the way from ideation to publication. Second, I believe NLP to be a deeply important research area, that will only grow more so in the future. In light of my experience, my motivations, and my career aspirations, completing a Ph.D. in Computer Science at UC Santa Barbara is the natural next step. I had my first exposure to serious engineering research in Dr. Hongbin Yu’s Nanoelectronics and Integration Lab. I developed software to assess the thermal strain of semiconductor packages by measuring the scattering pattern of a laser light incident on diffraction grating deposited on the chips. My synchronous scanning and capture control system drove a 10-fold increase in strain measurement speed as we drove precision from the μm-scale to nm-scale. Our paper on the technique won the Best Student Interactive Paper award at IEEE ECTC in 2016 [1] and this project has driven my collaborator’s dissertation; the experience cemented my interest in research. I then joined the “Luminosity Lab,” a strategic initiative devised by the ASU president’s office and director of research Dr. Sethuraman Panchanathan, to be a model for student-led skunkworks-like impact-focused interdisciplinary research. I am most proud of the time I led a team of lab members in entering the Computational Linguistics-Affect Shared Task at the 2nd Workshop on Affective Content Analysis at AAAI-2019. Our word-pair convolutional model for semantic classification achieved the second-best performance of 47 submitted models, and I gave an oral presentation at the workshop on our technique [2]. For my undergraduate honor’s thesis, I was interested in learning about speech processing, so I reached out to Dr. Visar Berisha and Dr. Troy McDaniel to form my committee. Dr. Berisha’s lab contains both EE and Speech and Hearing Science students, and Dr. McDaniel is a CS research professor from Dr. Panchanathan’s Center for Cognitive Ubiquitous Computing (CUbiC) who focuses on human computer interaction and assistive technologies. My project centered on hypernasality classification. Hypernasality is a symptom of various disorders, ranging from cleft lip and palate to Parkinson’s disease and ALS, caused by an inability to achieve full or consistent closure of the soft palate between the oral and nasal cavities. Being a characteristic symptom of several neurological diseases, hypernasality assessment has promise in tracking the early progression of neurological disease. However, it is difficult to measure automatically—a good hypernasality measurement system has to estimate the nuanced perceptual judgements of trained pathologists. Creating better automatic measures of hypernasality became the core focus of my research. First, I evaluated adapting the existing ASR-based “goodness of pronunciation” algorithm for hypernasality assessment. Building off this existing work I created a novel metric called “nasal cognate distinctiveness” that captures the subtle perceptual changes to specific plosive phonemes under hypernasality, using acoustic models trained exclusively on healthy speakers with Kaldi. This metric is based on modeling the inability of a speaker exhibiting hypernasality to properly achieve a seal in their oral cavity when attempting to produce plosives; I presented it at ICASSP 2019 [3]. Then, I integrated a colleague’s work on voiced phone nasalization modeling with these features to produce a combined model that achieves state-of-the-art performance in estimating clinician hypernasality ratings in a disorder-agnostic manner, and produced a manuscript on it for the IEEE/ACM Transactions on Audio, Speech, and Language Processing, where it is currently in review [5]. A core problem I have faced in my work is the scarcity of disordered speech training data. Large corpora of such speech are hard to come by, limiting the effectiveness of sophisticated deep learning techniques in pathological speech. Hence, my research trajectory has shifted toward finding ways around this problem. I am now investigating semi-supervised representation learning as a way around these data scarcity roadblocks. My master’s thesis is about applying self-supervised techniques from computer vision and text processing to the problems of hypernasality assessment and articulatory inversion (direct inference of the positions and velocities of the various muscles of the mouth and vocal tract from speech audio). In particular, I am exploring the use of neural ASR models such as Wavenet-ASR as feature extractors for these purposes, hopefully through techniques like this the vast amount of unlabeled speech data available may be harnessed to overcome these scarcity limitations. Concurrent to my own work I have also driven concrete improvements in my friends’ and colleagues’ work. I helped another speech-focused CUbiC student with neural network code development and manuscript writing to complete submission on characterizing the performance of ASR systems on dysarthric speech and modeling word error rate with neural networks. She was invited to present this work in an oral session at Interspeech 2019 [4]. With Dr. McDaniel I am participating in exciting work with haptic interfaces and affective computing, making social assistive technologies for the visually impaired, through which I have learned how to design and conduct user studies. I have even come back to help Dr. Yu with his present work on vehicular LiDAR processing, where I helped design a 1M frame dataset for semi-supervised LiDAR scene representation learning, and wrote PyTorch dataloaders, compression code, and model designs for it [7]. Finally, after presenting [2] at AAAI 2019, I met with scientists from Amazon’s Alexa Hybrid Science group. Theirs is a team in which speech processing and language understanding researchers work closely together to produce scalable models for on-device offline processing in Alexa products. In light of my background in both speech and language processing they hired me as an Applied Science Intern for summer 2019. On this team I investigated the outlook for the deployment of neural models for fully end-to-end, on-device spoken language understanding (SLU). Through the experiments I ran, I identified issues in the applicability of current state-of-the-art SLU techniques to Alexa problems, and was able to demonstrate how n-gram to intent class entropy scaled by sample count is inversely related to classwise intent classification performance. My colleagues are submitting work that followed from these findings to ACL 2020. I am most interested in work that builds fundamental machine capabilities in understanding language. I hope to leverage my background in deep learning, representation learning, and speech processing to produce consequential research in this domain. I am open to any research in NLP, representation learning, or deep learning more broadly. In light of my experience, my interests, and background, I think a Ph.D. in Computer Science at the University of California, Santa Barbara is the natural next step in my career. [1] T. Houghton, M. Saxon, Z. Song, H. Nyugen, H. Jiang and H. Yu, “2D Grating Pitch Mapping of a through Silicon Via (TSV) and Solder Ball Interconnect Region Using Laser Diffraction” 2016 IEEE 66th Electronic Components and Technology Conference (ECTC), Las Vegas, NV, 2016, pp. 2222-2227. [2] M. Saxon, S. Bhandari, L. Ruskin, G. Honda, “Word Pair Convolutional Model for Happy Moment Classification,” Workshop on Affective Content Analysis, AAAI 2019, Honolulu, HI, 2019, pp. 111-119. CL-Aff Shared Task Runner Up [3] M. Saxon, J. Liss, V. Berisha, “Objective Measures of Plosive Nasalization in Hypernasal Speech,” 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 6520-6524. [4] M. Moore, M. Saxon, H. Venkateswara, V. Berisha, S. Panchanathan, “Say what? A dataset for exploring the error patterns that two ASR engines make,” Interspeech 2019, Graz, AT, 2019, pp. 2528-2532. [5] M. Saxon, A. Tripathi, Y. Jiao, J. Liss, V. Berisha, “Robust Estimation of Hypernasality in Dysarthria,” preprint, November 2019. arXiv:1911.11360 (Under Review, IEEE Trans. on Audio, Speech, and Language Processing) [6] M. Saxon, J. Liss, V. Berisha, “A New Model for Objective Estimation of Hypernasality from Dysarthric Speech,” Workshop on Signal Analytics for Motor Speech (SAMS), Motor Speech Conference 2020, Santa Barbara, CA, February 2020. (Accepted) [7] Dataset information available at https://asulidarset.github.io/