Statement of Purpose Essay - University of North Carolina at Chapel Hill
I want to pursue a Ph.D. in Computer Science at the University of North Carolina at Chapel Hill to conduct research in label-efficient deep learning methods for computer vision problems. While semi/self-supervised learning paradigms have shown great promise at classification tasks, their application to other vision tasks has been limited, especially where acquiring ground truths is arduous and demands expertise, such as video understanding, action recognition and multi-modal learning. This forms my primary research motivation, following my background in learning with less supervision. I am pursuing a Bachelor’s degree program (B.E.) majoring in Information Technology from Jadavpur University, a premier educational institution in India. I am indebted to the same for providing me with a foundation that imbibes a positive attitude and work ethics to take on research statements. Excelling in courses such as Data Structures, Algorithms and Object-Oriented Programming cemented my Computer Science fundamentals. Additionally, I have taken online courses offered by various universities on machine learning, image processing and computer vision, which helped me to develop a fundamental understanding of these topics, providing with the base to conduct research. My research journey started in sophomore year under Prof. Ram Sarkar and Prof. Pawan Kumar Singh on deep learning and evolutionary optimization for medical imaging. I developed a local search-embedded hybrid meta-heuristic scheme to optimize deep features extracted from lung X-ray images for pneumonia screening. Witnessing the potential of deep learning in applications of real-world significance was a huge motivation boost and encouraged me to continue researching along such lines. We published our work in the International Journal of Intelligent Systems, Wiley (IF: 8.993). During Summer ‘21, I worked as a research assistant at SketchX Lab, University of Surrey, under Prof. Yi-Zhe Song, on a project aimed at fine-grained sketch-based image retrieval. This was the first time when I pondered upon the annotation requirement demanded by this vision task – where one needed each image to be paired with a corresponding hand-drawn finely detailed sketch. A deeper inspection brought forth the extensive annotation requirements of supervised models as one of the primary bottlenecks of modern deep learning, something I had been oblivious to! This realization motivated me to explore research dwelling on learning with limited labels, including semi/self-supervision and weak supervision paradigms, leading me to dive into the classical self-supervised algorithms and how they aided automatic representation learning from unlabeled data. At first, I was keen on observing how self-supervision can enhance representation learning by supervised models. In a continued collaboration with Prof. Singh, we devised a hybrid framework leveraging an image inpainting module along with a supervised pipeline for a histopathological screening task. Our model beat existing SOTA methods by fair margins, and via suitable ablation studies, we empirically determined the performance boost due to the additional self-supervision signal. Our work was accepted for publication at the International Conference on Machine Intelligence and Signal Processing (MISP ‘22), also winning the distinguished “Best Paper Award” (Track: Deep Learning) in the event. Starting Winter ‘21, I joined the CVPR Unit, Indian Statistical Institute, Kolkata as a research intern, supervised by Prof. Umapada Pal (ISI Kolkata) and Prof. Saumik Bhattacharya (IIT Kharagpur). My project goal was to develop self-supervised models for writer-independent offline signature verification. Having worked with sparse sketches before, the similar fine-grainedness of handwritten signatures was pretty intuitive, thereby directing our focus towards local region-wise feature learning. In my first work, we employed a patch-wise 2D attention mechanism for the reconstruction of the signature image from its constituent patches as a self-supervision pretext task, followed by fine-tuning of the pre-trained model using a dual metric learning objective. We published our work at the International Conference on Pattern Recognition (ICPR ‘22). In another work, we developed a non-contrastive self-supervised framework that learns decorrelated stroke representation of the signature images, via information maximization among patch-level features. Our work was accepted as an Oral presentation in the International Conference on Image Processing (ICIP ‘22). The overall journey circumscribing the aforementioned projects had several takeaways. Firstly, it involved rigorous literature survey and testing of prior works, especially in self-supervised learning, so as to come up with different approaches to tackle the problem statement. It also taught me how to meticulously plan experiments that are feasible within time constraints, especially during review addressal, since conferences have shorter and stricter deadlines compared to journals. However, most importantly, the first-hand experience of working with self-supervision served as a huge boost to my motivation to develop annotation efficient frameworks, and I became enthusiastic more than ever to dive further into its theoretical and practical aspects. Continuing our collaboration, I am working on a similarity weighting mechanism by temperature variation in contrastive learning for class imbalanced data setups; and an unsupervised image decomposition framework to unify all types of image restoration (denoising, dehazing etc). We aim to submit these works to an upcoming top-venue conference. In Summer ‘22, I was selected as a research intern as a part of the Mitacs Globalink Programme. I worked with Prof. Jose Dolz at Ecole de technologie superieure(ETS), Montreal, on unpaired multi-modal medical image segmentation. Our project aimed at circumventing the registration step for multi-modal learning from CT and MR volumes and improving segmentation performance in each modality by fusing cross-modal information from the unpaired medical scan volumes. This posed an opportunity for me to extend my medical imaging experience to a more practical and challenging setup involving raw medical volumes instead of carefully curated 2D slices. It took me a while to get acquainted with handling 3D medical scans, especially their pre-processing, sampling and visualization. Following the paper “Unpaired Multi-modal Segmentation via Knowledge Distillation”, we implemented a baseline framework comprising modality-specific normalization layers and knowledge distillation for cross-modal information fusion. The 3-month experience at Montreal was a delightful one, as it not only led to insightful discussions with several senior PhD students and their doctoral journeys but also gave me a taste of how research gets conducted at different labs across the world. Furthermore, it instilled in me the confidence to be able to collaborate with international students in a diverse lab environment – something very much required to succeed in a Ph.D. program. Currently, Prof. Jose and I are working on extending the above project to a semi-supervised setting via an uncertainty-based approach. Ours would be one of the first to tackle unpaired multi-modal segmentation with limited annotations. We aim to publish this project soon to an upcoming venue. The entire process of conducting research – from brainstorming ideas, designing proof of concept, experimental validation and finally, disseminating the findings through a compelling manuscript – has been extremely fulfilling, catalyzing my passion to conduct research at the highest level. Through my various research experiences, I have realized the significance of label efficient methods such as semi-supervision and self-supervision paradigms in developing more sustainable ways of representation learning. Although my research background has spanned across diverse topics, down the line my objectives have primarily been to reduce supervision needs and instead develop weakly/semi/self-supervised frameworks, as evident from my research outputs and directions. In my Ph.D., I intend to work further on label-efficient representation learning and contribute to the field of computer vision. At UNC, I am interested in the works of Prof. Gedas Bertasius, Prof. Soumyadip Sengupta, Prof. Tamara Berg and Prof. Mohit Bansal. My research interests align with Prof. Bertasius’ works on video understanding tasks under weakly/self-supervised paradigms. Specifically, I would like to explore along the lines of his paper “Long-Short Temporal Contrastive Learning of Video Transformers”, possibly extending into non-contrastive approaches as well. I am also highly motivated towards works on video editing and graphics by Prof. Sengupta, especially due to my experience in image restoration. I found it very interesting as well as significant for image enhancement/editing since one can disentangle the various components of the image and work on them individually. Furthermore, I am keen on exploring multi-modal vision-language-oriented works aimed towards multimedia retrieval and VQA, for which the guidance of Prof. Berg and Prof. Bansal would be immensely helpful. I feel my research interests make me a great fit for each of their respective research groups. Also, I am eager to collaborate with other faculties on problems of mutual interest. My decision to opt for a Ph.D. is guided by a passion for scientific pursuit and a career in academic research. The high degree of alignment of my research motivations with eminent faculties of the University of North Carolina has spontaneously made it a top choice to pursue my Ph.D. and fulfill my research aspirations that would aid in my long-term goal of being a researcher in academia. I am confident that I would fit well in the vibrant student community at UNC and I look forward to engaging in fruitful discussions with my peers and contributing to my collaborating research groups.