Statement of Purpose Essay - Max Planck Institute for Informatics
Statement of Purpose The field of AR/VR has grown by leaps and bounds in the last four to five years. We stand at the cusp of entering a new era, where we can communicate and interact with each other in a purely virtual manner. It has the potential of providing high quality education and healthcare to even the remotest corner of the world. This is validated by companies like Meta and Microsoft investing in creating ecosystems to conduct virtual meetings. These require realistic reproduction of appearances, expressions and motions. This becomes a harder challenge when we consider the unconstrained settings of the real world. With the rise of deep learning methods, especially neural rendering, we have been able to tackle some of the challenges. I want to study and design solutions as a research scientist to make these platforms feasible and affordable to everyone. To succeed in these challenges, I am applying for the position of Ph.D. in Computer Science. I am pursuing a dual degree program (B.Tech + MS) from IIIT Hyderabad - one of the premier research institutions in India. I am indebted to the same for providing me with the necessary training and inculcating the required attitude and work ethics to take on research statements. Across my sophomore and junior years, I took up various courses and projects which helped me understand the field. Digital Image Processing and Statistical Methods in AI helped me strengthen my basics. Computer Graphics opened my eyes to a whole new paradigm and it was here that I saw Object Oriented Programming at its best. Finally, Computer Vision and Mobile Robotics introduced me to different concepts like Stereo Reconstruction, Structure from Motion, and more. Excelling in these courses helped me understand the classical approaches and how deep learning has been able to learn better feature representations and solve the same problem efficiently. Being nominated for the Dean’s Merit List further validated my decision to pursue research. My three industry internships with startups gave me hands-on experience in designing tools to run deep learning algorithms on real world data. Since my junior year, I worked under Prof PJ Narayanan at CVIT, IIIT Hyderabad. It was here that I was formally introduced to image-based rendering. My project titled - “Neural view synthesis and appearance editing from unstructured images” focused on extending neural rendering methods to incorporate appearance editing as well. The goal was to edit the texture of objects which were captured by smartphone cameras. While previous methods baked lighting and material together, the challenge was to effectively disentangle them so that the user could easily edit them. Taking inspiration from “Precomputed radiance transfer”, the local irradiance field and albedo were estimated and projected to a compact basis representation, in this case Spherical Harmonics. Estimating the albedo separately using differentiable rendering techniques gave rise to some structured artifacts on real scenes. Instead of that, the method jointly estimated the local irradiance field along with the albedo which removed those artifacts and improved visual quality. This work was published at ICVGIP, 2021 - one of the top computer vision and graphics conferences in India. My major takeaway was not the rendering equations or the spherical harmonics math but how important it was to keep calm, and rework on the details where we overlooked something extremely important. At that critical juncture, my partners guided me to rethink, recalibrate to find solutions. At the 11th hour, I designed and added the missing component in order to meet the submission deadline. Being my first research paper, this was an enthralling experience as it helped me understand how to critically evaluate each experiment and communicate ideas effectively. In my senior year, I interned under Prof. Jean Francois Lalonde at Universite de Laval. The project’s objective was to perform indoor room tours, from the omnidirectional images captured of the scene. This was an opportunity for me to extend my neural rendering knowledge and explore the upcoming world of implicit representations. Typically neural radiance fields were used for encoding objects or small scenes but in this case we tried to encode an entire room. This came with its own challenges as omnidirectional images are spherical in nature and hence every pixel does not correspond to one single point in space. We tried to tackle such issues during post-processing by modeling it as an image-to-image translation task. This improved the sharpness and overall quality of the results. What surprised me most was that despite masking out the subject capturing the images, the network was able to reconstruct the scene effortlessly and model reflection from mirrors accurately. This was a delightful experience as it gave me an insight on how research is being conducted at different labs across the world. Furthermore, it highlighted the importance of software engineering skills in becoming a good Deep Learning researcher. Currently, for my master’s thesis, I am collaborating with my advisor and Prof Jean on extending the above mentioned project. In addition to novel view synthesis, the aim is to capture the radiance of the scene. This will enable us to add synthetic objects while accurately modeling the object’s interaction with the lighting. An application of such a system would be doing a room tour and adding synthetic furniture into the scene. We aim to publish this project soon to an upcoming venue. In addition to my research work, I took on the role of Teaching Assistant for Computer Vision and Statistical Methods in AI over three semesters where I designed assignments and projects for more than 200 students. I have taken tutorial sessions on certain topics and mentored a few teams for their final projects. A very intensive evaluation was done based on self-designed rubrics to measure the learning outcomes of the course. Being a son of two academicians, I have observed them closely since my childhood and that gave me a lot of confidence in fulfilling this role. A major issue with photorealistic neural rendering is the lack of high quality data. While we are able to capture high quality assets using fancy equipment like kinect, lidar, light stages, etc, they are not accessible to the rest of the world. They only have access to smartphone cameras at best. It is thus critical to leverage deep learning in such a way we are able to interact with scenes that are rendered from unlabeled and unstructured images. Current methods are able to reconstruct the texture and geometry from images as well as reliably correct the noisy camera parameters. However there is a lot of work that needs to be done to interactively edit these scenes. This is true for both object and human reconstruction. Editing material and lighting independently is a hard problem since the rendering equation intertwines them together. Using implicit representation and deep learning methods we can disentangle them to enable editing. I would like to solve these problems and hopefully make it possible to edit lighting and textures of an entire scene in real time consisting of people interacting with different objects. Add college specific paragraph here Pursuing doctoral research under this program will provide me with the much needed experience to take on such unstructured and under-constrained problems. Democratization of AI has led to the immense progress of the field and empowered people to solve problems in the field of eCommerce, biology, and many more. We stand at the same crossroads for AR/VR and I aim to push the field in the direction where producing “metaverse” content does not require elaborate equipment and setup. I believe my current project and my past experiences have enabled me to understand how to model problems as inverse rendering statements. It taught me how it’s not about how fast you can solve these problems or how well you can scale but rather how wild you can get. Research for me is an extension of that. While I might get stuck on some sub-problems for weeks, working on these projects propels me to be creative and try unconventional ideas. However, sometimes the simplest algorithms do the trick. The rigorous standards of IIIT Hyderabad have taught me that true intellectual contribution is only possible through perseverance, determination, and a ruthless eye for weakness in both experimental design and execution. Balancing laboratory workloads, teaching assistantships with a full schedule of undergraduate classes has been a taxing endeavor, but this too has been essential to my growth as a researcher. Having worked with appearance editing and image based rendering, I feel that I would be able to make telling contributions to the field and develop algorithms that would make it far more feasible to create AR/VR content. Today, I look forward to the new intellectual challenges that a doctoral program will provide, and I am sure that I will discover new passions, curiosities, and questions as I prepare for my future as a research scientist.