Back to All Essays

Statement of Purpose Essay - MIT

Program:Phd, ML, Theory
Type:PHD
License:CC_BY_NC_SA_4_0
Source: Public Success Story (Chanwoo Park)View Original

My main research interest is deep learning (DL) theory. Participating in several DL competitions on computer vision and NLP, I was deeply puzzled as to how these significantly over-parameterized models generalize well. I held a seminar on over-parameterization and generalization, albeit I couldn’t find a satisfactory answer. I aspire to solve this problem through a mathematical and statistical lens and thus contribute to the phenomenal paradigm shift in this academic field. At MIT, the focal point of my investigation will be centralized upon DL as I attempt to answer but not limited to the following. What is the relation between robustness and over-parameterization? Why do over-parameterized networks give an excellent result with several standard deep learning techniques such as dropout or data augmentation? Or do these techniques improve robustness or preserve differential privacy? Why do stochastic policy gradient algorithms performances have inferior results compared to deterministic policy gradients? How can we elucidate the relationship between neural networks and causality? These problems are closely related to stochastic optimization and suitable statistical modeling. My extensive research experience with convergence of optimization methods and computer-assisted proof methodology is a significant cornerstone in pioneering research. Moreover, my familiarity with both statistical and several DL models establishes the essential basis for DL. I am very confident at robust theoretical foundations in mathematics, statistics, and computer sciences. I believe that by actively collaborating with professors at MIT, I will be able to tackle fundamental problems in DL. **Research Experience.** Research with Professor Ernest Ryu at Seoul National University (SNU) focused on analyses of optimal algorithms with a computer-assisted proof method. To understand the principle of acceleration methods, I constructed several peculiar Lyapunov functions [2, 4]. I developed the novel algorithm named FISTA-G [2], the first acceleration method to achieve a theoretical lower bound under a composite minimization setup. Firstly, I contemplated the impossibility of attaining a known lower bound. However, trials of making counterexamples with a computer-assisted proof methodology implicitly alluded to the possibility of achieving the lower bound. Therefore, I attempted several Lyapunov functions, finally discovered FISTA-G, and proved the convergence of FISTA-G. This paper was accepted by NeurIPS 2021. In this research, failed attempts of making the Lyapunov function ironically provided me with a thorough understanding of convergence analysis of algorithms. I am confident that I can leverage this experience in DL convergence analysis. Subsequently, I formulated optimization algorithms using a computer-assisted proof methodology. During my previous research, I questioned the unidirectionality of analyzing convergence from the acceleration methodology by true inequalities. I developed a novel notion, A?-map [1], which maps “good” inequalities to optimization methods. By using “good” properties of inequalities, designing algorithms with specific desired characteristics is available. By using A?-map, new SOTA algorithms are discovered with computer-assisted proof methods under randomized coordinate setting and backtracking line search setting. This paper has been submitted to JMLR. Since DL methodology studies also need quantitative studies for more fine-grained approximations for changing infinite regimes of deep learning theory to finite regimes, I will conduct both qualitative and quantitative analyses for DL theory using a computer-assisted method. The intense research in the Bioinformatics and Biostatistics Lab (BIBS) at SNU with Professor Taesung Park was valuable from a practical perspective. My first paper in BIBS [5] used propensity score matching for a contribution of SNP. Then, I developed DeepHisCoM [3], a pathway analysis using DL. As I identified pathways as hidden variables, I encountered great difficulty discerning the hidden variable’s significance in the deep learning model. Thus, I advanced on the problem with two solutions: (1) making a small network between pathway and biological factor to consider nonlinearity (2) assuming linearity between pathway and disease for explainability. DeepHisCoM allows both explainability and non-linear relationships. While conducting this research, several questions were raised: what is the relationship between causality and DL, or how can the hierarchy of data contribute to modern DL structure? Research in BIBS was invaluable since I comprehensively expounded on DeepHisCoM both in theory and experiment by constructing suitable simulations for the model’s validity. As a researcher, I will suggest appropriate simulations to illustrate the result of the theory. I also investigated several statistical methods and DL theories in the Nonparametric Inference Lab at SNU under the guidance of Professor Byeong U. Park. While studying the convergence rate with Hilbertian and Lie groups responses in additive regression analysis, I felt the necessity of sound understanding of non-Euclidean data. Soon after, scrutinizing the papers proving the minimax optimality of ReLU [6, 7] by nonparametric methodology allows me to think about which activation function is needed in manifold data. Therefore, I investigated which activator can attain a similar rate as the ReLU activator. Finally, I derived that a similar form of ReLU has the same minimax rate as ReLU and compared it with wavelet methods. Through this experience, I realized the possibility that a statistical perspective can empower various problem-solving processes in DL. **Conclusion.** I am a good match for Stefanie Jegelka because I am familiar with several geometric structures, including Lie groups and manifolds. The relationship between robustness and underlying geometric properties is interesting. I am ready to bridge DL and geometry since I have a strong basis in Riemannian geometry. I am a good fit for Asuman Ozdaglar since I am familiar with convergence analysis, which can be conducive to further DL researches. Several experiences with statistics and optimization researches will facilitate researches with Asu Ozdaglar. I want to cooperate with Survit Sra since he utilizes optimization in DL theory, and I have a solid basis for optimization research. Moreover, I have research experience in Lie groups and manifold theory, which will be conducive to working with him. MIT is undoubtedly the optimal place to pursue my research. **References** [1] C. Park, and E. K. Ryu, “Optimal first-order algorithms as a function of inequalities”, submitted to Journal of Machine Learning Research, 2021 [2] J. Lee, C. Park, and E. K. Ryu, “A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast”, NeurIPS, 2021 [3] C. Park, B. Kim and T. Park, “DeepHisCoM: Deep Learning Pathway Analysis using Hierarchical Structural Component Models”, submitted to Bioinformatics, 2021 [4] C. Park, J. Park, and E. K. Ryu “Factor-√2 Acceleration of Accelerated Gradient Methods”, submitted to Computational Optimization and Applications, 2021 [5] C. Park, N. Jiang, and T. Park “Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes”, Genomics & Informatics, vol. 17, no. 4, 2019 [6] J. Schmidt-Hieber, “Nonparametric regression using deep neural networks with ReLU activation function”, The Annals of Statistics, vol. 48, no. 4 2020 [7] J. Schmidt-Hieber, “Deep ReLU network approximation of functions on a manifold”, arXiv:1908.00695, 2019