Statement of Purpose Essay - MIT
Me as a HCI Researcher. Having transitioned from a career in industrial software development to HCI research, I have been deeply influenced by the works of Dr. Michel Beaudouin-Lafon. His papers highlighted the prevalent challenge in visual interfaces: prioritizing features over usability, which has driven me to consider both user needs and societal impacts in my work [1, 5]. Currently, my research interest is human/community-AI interaction, with a specific focus on two types of works: 1) Designing Interactive Visual Interfaces to enhance AI’s transparency and controllability for a diverse audience, including programmers, data workers, and data scientists. I am particularly interested in using NLP-driven visual interfaces to support multi-verse analysis. This involves reifying and reusing information silos gathered from interactions with AI-driven agents or web resources; 2) Designing study probes to understand how users operationalize everyday life information into concrete intentions and actions when utilizing AI. For example, I am interested in how users leverage the foraging and sensemaking loop to translate web resources into inputs for AI models, resulting in more aligned outcomes. With my independent research experiences and programming skills, I aim to join MIT’s HCI and collaborate with experts from the Visualization Group, Haystack Group, and Usable Programming Group. My long-term goal is a career in academia or industry research after receiving my Ph.D. Research Experiences. My first two works in HCI were on natural language (NL) interfaces supervised by Dr. Can Liu. RE1 The first project focused on developing a cooperative Wizard of Oz Platform for simulating future speech interfaces with multiple wizards (conference publication at CSCW 2023) [3]. RE2 I was then involved in the development of interaction frameworks tailored to speech input, which revolved around understanding speech to inform new interfaces for verbal text composition (conference publication at CUI 2023) [6]. After completing these two projects, I have acquired foundational skills in HCI research, which encompass prototyping and the ability to conduct studies and analyze both quantitative and qualitative results. I began to discern the delicate dance between established design principles and innovative ideas when I worked on my undergraduate thesis with Dr. Zhicong Lu. RE3 I sought to explore interventions in live streaming that encouraged prosocial community interactions without reducing viewer engagement [10]. My solution leveraged the visual narrative to proactively moderate the live streaming community, dynamically altering the storyline based on the sentiment of live chats. This approach is built upon the foundation of bystander intervention and persuasive design principles, meticulously crafting the visual narrative’s design and applying it to the realm of live streaming moderation. This innovative approach not only enhances viewer engagement but also empowers viewers to become proactive chat moderators. This project was subsequently published as my first-author conference publication and presented at CHI 2023. My passion for Human-AI Interaction was cultivated when I joined a thesis-based master’s program at the University of Waterloo, under the guidance of Dr. Jian Zhao. During my studies, I focused on bridging the gap between users’ goals and AI-generated contents, with a particular focus on scaffolding the gulf of execution and evaluation when working with LLM-driven systems [7, 9]. My two first-author conference submissions to CHI 2024 reflect these efforts. RE4 One paper aims to enhance the controllability and transparency of AI for programmers [11], addressing the issue that AI-generated results often diverge from programmers’ problem-solving mental models. We introduced the concept of hierarchical generation, utilizing decomposition techniques in NLP to externalize programmers’ mental models. The system further leverages a modular block design that directly links NL prompts to corresponding code segments, enabling precise modification of specific code segments through direct manipulation. While writing the paper, I spent three months iterate on the introduction and striving to emphasize the system’s core contribution. Drawing inspiration from well-written papers, I realized the key was to succinctly summarize the system in a single sentence and spotlighting the unique concept. In this work, the most groundbreaking contribution lies in translating programmers’ high-level goals into concrete and actionable intentions. RE5 The other paper discusses the design mechanisms for future collaborative NL-based programming scenarios [2]. Collaborative NL programming presents challenges, as participants need to stay informed about their collaborators’ progress and intentions while simultaneously building upon each other’s work. To address this, we developed a prototype that supports collaborative prompt engineering through referring, requesting, sharing, and linking mechanisms. The results indicate that these mechanisms enhance programmers’ ability to understand and reuse their collaborators’ prompts, reducing repetitive updates and communication costs. Writing this paper presents a formidable challenge as we are pioneers in exploring collaborative NL programming. We must meticulously convey the novelty, elucidate its practical applications, and outline its implications to future research. Why Me? My research philosophy can be summarized by Picasso’s words, “Learn the rules like a pro so you can break them like an artist.” This is evidenced by my proclivity to leverage foundational theories and subsequently introduce innovative perspectives. A pertinent example is my adaptation of principles from “Reification, Polymorphism, and Reuse” to craft my system and implement tailored collaborative mechanisms aligned with the prompt engineering workflow. My adaptability and inquisitiveness enable me to thrive in this dynamically evolving era, and I continuously improve my mindset through self-reflection. These skills enabled me to successfully submit two CHI papers within eight months while managing a demanding academic load, including four courses (GPA 4/4) and three TA duties. My dedication to research has also resulted in me receiving scholarships, including the International Master Award of Excellence and the Graduate Student Scholarship, in addition to my full entrance scholarship. Driven by my passion for contributing and sharing my skillsets, I have actively participated in and led various programmer communities, such as the Google Developer Student Group, and served as an AWS and Microsoft Student Ambassador. In these roles, my leadership and collaboration skills have proven invaluable. These experiences have provided me with firsthand insights into the challenges and practical needs of the programmer community. I also co-founded a company, an experience that provided me with real-world insights into innovation and entrepreneurship. Besides, I was a full-stack developer and data analyst at Hong Kong’s largest social media company and an AI research intern at Huawei Research Lab, where I cultivated strong development and prototyping abilities. I then applied these skills to lead the design and development of prototypes for my research projects. My research interest is closely connected to my background before venturing into HCI. For seven years, I devoted my time to volunteering as a stage actor in children’s theatre, engaging in rural education, and performing the violin in an orchestra for hospital patients. These diverse experiences converge to shape my research with a steadfast commitment to user-centred design that ultimately contributes to the well-being of communities. My diverse background and insights, along with UW’s HCI research groups, make me well-suited for collaborative efforts that can shape the future of HCI. Future Research & Desired Identity. My future research aims to explore further facilitating users’ interaction with generative AI. In this domain, I have identified two potential areas for future research: FR1 Reifying Data Scientist-AI Multi-Verse Interactions. With LLMs serving as an alternative to conventional Question-and-Answer sites, I envisage a future where interactions between data scientists and LLMs are not ephemeral but instead sharable and reusable. Just as multiple universes in science fiction, every exchange between a data scientist and an LLM can birth a new ’universe’ of understanding and solutions. The embedded memories of multiverse interactions enable programmers to reuse previous LLM-driven solutions. This concept can cultivate a dynamic knowledge base that evolves with each interaction, where programmers disseminate and benefit from LLM-augmented insights; FR2 From Web Resources to Actions. LLMs offer an alternative avenue for users to find answers. Yet, how can users effectively transform the “cues” they previously relied on into actionable prompts for LLMs? I aim to investigate the synergy between these two resources and explore the design and interaction techniques that can help users translate their information-seeking and intention-formulation processes into actionable steps. Why MIT HCI? MIT HCI’s labs at CSAIL are renowned for their innovative advancements and dedication to creating solutions that prioritize human needs. I am particularly drawn to three distinct yet complementary research directions within the institute. Firstly, I am interested in the research by Dr. Arvind Satyanarayan about interactive visual interfaces. I love how his papers cleverly incorporate theories derived from interaction and interface design. For instance, the concept of malleable interfaces that leverage the idea of tool re-purposing [8]. His work closely aligns with my interests (FR1), and I would be enthusiastic about collaborating with their research group to explore novel ways of interacting with AI. Secondly, I am intrigued by the research of Dr. David Karger, which focuses on empowering users to effectively manage information. I believe that his research group’s expertise could offer me valuable insights into concretizing web resources and transforming them into actionable steps for future reuse (FR2). Furthermore, I am also highly interested in the work of Dr. Daniel Jackson, especially his recent contributions to developing the reactive relational model and its implementation in software design [4]. I see the MIT EECS Ph.D. program not just as a graduate school but as a collaborative ecosystem where my research visions can truly take shape, supported by the unparalleled expertise and resources available. The alignment of our shared goals not only benefits my academic pursuits but also creates opportunities for us to take the lead in future design initiatives, enriching the community and improving accessibility to AI models. REFERENCES [1] Michel Beaudouin-Lafon and Wendy E Mackay. 2000. Reification, polymorphism and reuse: three principles for designing visual interfaces. In Proceedings of the working conference on Advanced visual interfaces. 102–109. [2] Felicia Li Feng, Ryan Yen, Yuzhe You, Mingming Fan, Jian Zhao, and Zhicong Lu. 2023. CoPrompt: Supporting Prompt Sharing and Referring in Collaborative Natural Language Programming. arXiv:2310.09235 [cs.HC] [3] Siying Hu, Hen Chen Yen, Ziwei Yu, Mingjian Zhao, Katie Seaborn, and Can Liu. 2023. Wizundry: A Cooperative Wizard of Oz Platform for Simulating Future Speech-based Interfaces with Multiple Wizards. Proceedings of the ACM on Human-Computer Interaction 7 (2023), 1 – 34. https://api.semanticscholar.org/CorpusID:258171523 [4] Geoffrey Litt, Nicholas Schiefer, Johannes Schickling, and Daniel Jackson. 2023. Riffle: Reactive Relational State for Local-First Applications. ACM| The 36th Annual ACM Symposium on User Interface Software and Technology. [5] Wendy E Mackay. 1991. Triggers and barriers to customizing software. In Proceedings of the SIGCHI conference on Human factors in computing systems. 153–160. [6] Brinda Mehra, Kejia Shen, Hen Chen Yen, and Can Liu. 2023. Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text Composition. Proceedings of the 5th International Conference on Conversational User Interfaces (2023). https://api.semanticscholar.org/CorpusID:259938727 [7] Donald A Norman. 1986. Cognitive engineering. User centered system design 31, 61 (1986), 2. [8] Miguel A Renom, Baptiste Caramiaux, and Michel Beaudouin-Lafon. 2022. Exploring Technical Reasoning in Digital Tool Use. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–17. [9] Hariharan Subramonyam, Christopher Lawrence Pondoc, Colleen Seifert, Maneesh Agrawala, and Roy Pea. 2023. Bridging the Gulf of Envisioning: Cognitive Design Challenges in LLM Interfaces. arXiv preprint arXiv:2309.14459 (2023). [10] Ryan Yen, Li Feng, Brinda Mehra, Ching Christie Pang, Si-Yuan Hu, and Zhicong Lu. 2023. StoryChat: Designing a Narrative-Based Viewer Participation Tool for Live Streaming Chatrooms. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (2023). https://api.semanticscholar.org/CorpusID:258048528 [11] Ryan Yen, Jiawen Zhu, Sangho Suh, Haijun Xia, and Jian Zhao. 2023. CoLadder: Supporting Programmers with Hierarchical Code Generation in Multi-Level Abstraction. arXiv:2310.08699 [cs.SE]