Back to All Essays

Statement of Purpose Essay - Stanford University

Program:Phd, HCI, Data Science
Type:PHD
License:CC_BY_NC_SA_4_0
Source: Public Success Story (Hancheng Cao)View Original

Statement of Purpose by Hancheng Cao Applying to CS@Stanford Fascinated by the ancient Greek aphorism “Know thyself,” I have always been passionate about understanding patterns and uniqueness in human behaviors. Despite numerous attempts made by generations of scholars, our understanding of human desires, motivations, and behaviors remains fuzzy and incomplete. The recent emergence of big data is beginning to change this situation, allowing researchers to look at human behavior analytics with something like a god’s eye perspective. However, developing efficient and effective methods for managing and discovering valuable knowledge from large-scale, high-dimensional, and heterogeneous human digital traces, is still in its early stages. The opportunity for laying the groundwork here is one of several factors that makes the field of data science especially attractive to me. My interest has led me to many exciting research projects during my academic career at Tsinghua University, at the University of Maryland, and at the MIT Media Lab. My enthusiasm for data science has driven me to seek innovative methods and novel frameworks. Theoretically, I have been attempting to learn neat representations for unstructured high-dimensional human digital records (i.e. spatial temporal data and transaction data). On the application side, I have been analyzing human behavior (i.e. mobility, purchasing) pattern and city structure, which enables a deeper understanding of individuals and society. In graduate school, I would like to continue my path of data science in both theory and application, and I am particularly passionate about developing state-of-art methodology to represent human physical and cyber behavior within social networks, discovering patterns and anomalies in human digital trajectory and leveraging crowdsourcing to make possible ubiquitous human behavior analytics. I believe that the CS PhD program at Stanford would provide me with a strong background to fulfill my dream of becoming a data scientist. My journey with data science “officially” kicked off at the University of Maryland, College Park (UMD), when I was fortunate enough to be selected as Tsinghua’s first exchange student to UMD in fall 2016. With UMD Distinguished Prof. Hanan Samet and Prof. Yong Li of Tsinghua, I began a project on human mobility modelling using spatial temporal big data, where I proposed a novel framework of visual analytics to study fine-grained crowd mobility patterns via mobile data. Taking advantage of a week-long dataset on nearly a million users in Shanghai, I was able to discover spatial correlation rules and a hidden city structure from crowd mobility, which provided valuable insight for urban planners from a novel perspective. Our work has been successfully summarized into a paper [3] in TVCG with me as first author. The triumph of the crowd mobility project led to further collaboration between Prof. Hanan Samet and myself following my return to Tsinghua. Further pursuing my interest on spatial temporal data analytics, I was attracted to the emerging semantic-rich trajectory data from social media (e.g. Twitter), which not only records users’ locations, but also preserves their activities from user check-ins. Realizing the great potential of semantic-rich trajectory data to interpret human motivations behind mobility, I proposed a new project on human living pattern recognition, which was submitted as my second first-author paper to UbiComp [1], the best conference in ubiquitous computing. In this project, I tried to identify major living patterns in a population. Observing the limitation of past research which fails to dynamically capture fine-grained semantics, such as the difference between visiting residence in working hours and at night, I was determined to develop a more powerful model. Drawing inspiration from natural language processing, I proposed habit2vec, a deep representation learning based framework that learns a numeric vector for the living habit of each user in an unsupervised way. Through large-scale dataset offered by Chinese leading application vendor Tencent in Beijing, I derived habit2vec clusters revealing major living patterns in the population, with their percentage and spatial distribution, which helps us better understand the city’s social economic structure. Meanwhile, I co-authored two papers: one on detecting popular temporal modes in the population, which is conditionally accepted at UbiComp 2018 [6]; the other on a representation learning based semantic-aware Hidden Markov Model for mobility modelling [5]. As is often the case, the progress in one project turns out to be extremely valuable to the other two, which led me to appreciate the power and joy of collaboration. I was able to further extend my thoughts on representation learning methods for other types of human behavior analytics. In the summer of 2017, I joined the MIT Media Lab Human Dynamics Group as a visiting student and research assistant with National Academy of Engineering member Prof. Alex ‘Sandy’ Pentland and Prof. Xiaowen Dong. Leveraging a large-scale credit card transaction dataset, I attempted to recognize groups of purchasing patterns for city economy structure understanding and target advertising. On the methodological side, I upgrade my proposed habit2vec with Monte Carlo simulation to tackle the challenge of data sparsity and randomness in purchasing habit, which results in high-quality user purchasing embedding vector treating each user’s shopping behavior as a stochastic process. Furthermore, I conducted extensive studies on the relationship between purchasing behaviors and social-demographic factors (e.g. age, gender), churn patterns, and social learning, and ended up with statistically significant results. Thus this ongoing project lets me experience the great excitement of data science in bridging algorithms and social science. In addition to this line of projects on pattern mining, I have been trying to broaden my view of research by participating in other domains of data science research. With Prof. Vassilis Kostakos from the University of Melbourne and Prof. Yong Li of Tsinghua, our recent work on location uniqueness and urban morphology has been successfully converted to a paper submission to UbiComp 2018 [2]. Furthermore, I collaborated on another project analyzing e-commerce clickstream data during a shopping holiday sales, with a paper [4] accepted by Springer Electronic Markets. Through my research experience in data mining and computational social science, I have gained a far-reaching knowledge of this field: data extraction and formatting, design of algorithms for handling massive datasets efficiently and effectively, and visualization skills to present my findings clearly. Above all, I have developed my ability to define truly valuable research topics, and learnt how to grasp a flashing idea and turn it into a systematic research. After two years of immersion in data science research, I am more determined than ever to become a data scientist for social good, and I will spare no effort to realize my dream even though the path twists and turns. At Stanford, I am especially interested in Prof. Jure Leskovec’s focus on mining social and information networks (many of his works perfectly match with my interest and experience on representation learning for human behavior understanding), Prof. Michael Bernstein’s research on social computing (e.g. troll behavior) and crowdsourcing, as well as projects of Prof. Leonidas Guibas’s group on mobility data analytics. At Stanford, there are quite a few faculty members whose research interests align well with mine. With my background in data science and shared goals, I believe in my ability to succeed in the CS PhD program, and my potential in contributing to Stanford’s groundbreaking research for a better future. References: [1] H. Cao, F. Xu, J. Sankaranarayanan, Y. Li, H. Samet. Habit2vec: Trajectory Semantic Embedding for Living Pattern Recognition in Population. Submitted to 2018 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2018). Under Review. [PDF] [2] H. Cao, J. Feng, Y. Li, V. Kostakos. Uniqueness in the City: Urban Morphology and Location Privacy. Submitted to 2018 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2018). Under Review. [PDF] [3] H. Cao, J. Sankaranarayanan, J. Feng, Y. Li, H. Samet. Understanding Metropolitan Crowd Mobility via Mobile Cellular Accessing Data. Submitted to IEEE Transactions on Visualization and Computer Graphics (TVCG). Under Review. [PDF] [4] M. Zeng, H. Cao, M. Chen, Y. Li. User Behavior Modeling, Recommendations, and Purchase Prediction During Online Shopping Festivals. To appear in Springer Electronic Markets (EM). [PDF] [5] H. Shi, H. Cao, X. Zhou, Y. Li, C. Zhang, V. Kostakos. Semantics-Aware HMM for Human Mobility Modelling. Submitted to the Web Conference 2018 (WWW’18). Under Review. [PDF] [6] F. Xu, T. Xia, H. Cao, Y. Li, F. Sun, F. Meng. Detecting Popular Temporal Modes in Population-scale Unlabelled Trajectory Data. Conditionally accepted under major revision in 2018 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2018). [PDF]