Statement of Purpose Essay - University of British Columbia
While the promise of reinforcement learning (RL) is teaching agents to learn any task when equipped with an appropriate learning signal, an obvious yet overlooked aspect of most approaches is that the field has been training and testing on the same task. A way to increase generality of an agent’s behavior is to procedurally generate, train, and score an agent on different tasks. While this helped, generalizability now depends on the design of the task generator, and whether created tasks are sufficiently diverse to avoid overfitting. One way to detect task redundancy with respect to agent performance is through Nash Averaging (NA), a game-theoretic measure. NA also provides insight into how to automatically shift training distributions to bootstrap agent ability. Interestingly, this ability to shift training distributions is related to a subfield of evolutionary computation (EC) and artificial life called Open-Endedness, which studies inducing ontologies of increasing novelty and complexity in generative systems. I am interested in this intersection of RL, open-endedness, and game theory to dynamically change training task distributions and objectives such that agents never stop learning. In open-ended systems, how to 1) design them and 2) measure progress (e.g. scoring, ranking, and selecting the best agents or tasks over time) are open research questions that likely require additional expertise from reinforcement learning, artificial life, game theory, dynamical systems, and statistical ranking. Scoring and ranking data/agents lies at the heart of many modern optimization algorithms and multi-agent systems. When approximating gradients in evolutionary computation, recommending webpages to users, or searching for good hyper-parameters, it is important to be able to assess and compare potential solutions. Interestingly the performance of many algorithms is based on their ability to converge to the best result as defined by a single objective function, with an implicit reliance on the accuracy of this comparative assessment. However, simply performing the best on a single task provides no guarantees on the solution's ability to generalize across a diversity of problems. Rather than looking at the best solution for a single task, current trends in machine learning generalize learning by training and evaluating agents on a diverse distribution of tasks (e.g. meta-learning). Designed with expertise, among many important apriori decisions to be made are the tasks included in training, the percentage of these tasks, and even their order. Automatic curriculum generation is emerging as a way to bypass these difficulties, but often still rely on goal-conditioning of the training distributions. However, what if our algorithms were tasked with being open-ended, i.e. instead of supplying a target as a goal to optimize towards, the algorithm was allowed to define and solve tasks itself? Decoupling search from objectives in hard exploration problems has shown compelling results. Sometimes called autocurricula, the dynamics of agent (and world) interactions in multi-agent systems can produce curricula that naturally emerge from their competition or cooperation. The quality of such curricula depends on the richness of the interactions (e.g. are generated tasks meaningfully different?). Multi-agent systems, e.g. co-evolutionary or multi-agent reinforcement learning, possess the necessary richness for studying open-ended systems as agent-interactions span competitive, cooperative, independent, transitive, and non-transitive games. One way to encourage richness is through open-endedness. Co-evolutionary/coupled systems represent one potential path to open-endedness and engineering generative systems displaying at least some degree of this ability is a goal with direct applications to unsupervised environment design, curriculum learning, and self-generative AI algorithms. My graduate work so far [1][2] extends the Paired Open-Ended Trailblazer (POET) algorithm, which is a co-evolutionary method to simultaneously self-generate curricula of tasks and agents that solve them. I hypothesize that incorporating game-theoretic tools like Nash Averaging or Prioritized Level Replay (PLR) into POET, will help generate effective learning environments by dynamically adapting the difficulty of generated tasks to the current population of agents rather than relying on environment-specific heuristics as is currently done. Tools like NA have seen little adoption in (co-)evolutionary computation settings despite recent integration in high-performing RL-based systems. Towards performing the proposed science, I have been building professional-grade software modularizing meta-algorithms like POET and PLR into a single framework capable of isolating and testing hypotheses at the intersection of agent ranking and open-endedness, transfer learning, and autocurricula. In what I expect to be different from most of the applications you are likely to be reading for admittance in Fall 2022, I am transferring from a Ph.D. program at the New Jersey Institute of Technology (NJIT). Advisor 1 (my current advisor) works at the intersection of RL, EC, and Open-endedness and NJIT also satisfied my two-body problem (I applied to three local programs in 2019). Because Advisor 1 and Advisor 2 (my M.S. advisor at NYU) had a long history of working together and had agreed to co-advise me, I was confident in this decision. However, as Advisor 1 was finishing an aggressive treatment for stage 3C cancer, the working relationship between her and Advisor 2 became irreconcilable. Due to both COVID and Advisor 1’s own isolation from the research community during her treatment and recovery, I find myself in an especially difficult position of wanting to both support her and continue our work, but myself needing the sense of community and direction that I had at NYU. Furthermore, outside of Advisor 1, my current department, Informatics, became an ill fit for me; during my first year, I have returned to my mathematical roots while looking for principled methods of guiding meta-optimization. At her urging, I am applying to the University of British Columbia where she thinks I will thrive. A Ph.D. in computer science from UBC allows me to contribute to the challenges I am passionate about and will prepare me for a career in research. Furthermore, finishing my Ph.D. at UBC in CS will allow me to synthesize my math and computer science backgrounds. My main research interests are at the intersection of RL, EC, and open-endedness with game theory. I would like to work most closely with Prof. Jeff Clune. Dr. Clune’s work e.g. POET and AI-generating algorithms, in general, has proved pivotal to the graduate work I have explored so far, and his knowledge spanning RL, EC, meta-learning, and open-endedness makes him the ideal adviser for the work I want to do. Furthermore, I would like to work with Dr. Leyton-Brown to integrate his knowledge in game theory to open-ended MARL problems. I strongly believe that earning a Ph.D. from UBC will equip me with the skills necessary to contribute to these highly promising fields while providing the sense of community that I have been missing. References [1] A. Dharna, A. Hoover, J. Togelius, and L.B. Soros. Transfer Dynamics In Emergent Evolutionary Curricula, June 2021. Accepted with minor revisions at IEEE Transactions on Games; forthcoming soon. [2] A. Dharna, J. Togelius, and L.B. Soros. Co-generation of game levels and game-playing agents. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 16(1):203–209, Oct. 2020.