One hundred billion neurons all firing in parallel, turning unstructured data streams into highly abstract ideas which produce useful thoughts and actions. The complexity which enables this has always piqued my curiosity. As an electrical engineering student at Rice University, I have a natural affinity for digital signal processing and machine learning. Thanks to the help provided by the CCD and Bybee Scholarship, I was able to pursue both my interests this summer at Stanford NeuroAILab.
The NeuroAILab works at the intersection of artificial intelligence and neuroscience, answering questions like "What role does recursion play in visual processing and how can we best replicate it with a neural network?" or "How do biological and artificial agents use curiosity as intrinsic motivation to explore their environments?"
The project that I worked on addressed how brains can effortlessly switch between tasks and use knowledge from learned tasks to adapt and solve new ones. I worked with Dan Yamins and Kevin Feigelis on building reinforcement learning algorithms for task switching in a 2D environment. Prior to arriving at NeuroAILab, the lab had already published a paper detailing their modular approach to task switching in this environment. The paper can be found here. The below schematic details the processed used for single module learning and flexible adaptation.
Within this framework, I had several goals for the summer. My first goal was to improve the training speed of the modules within the module controller using common reinforcement learning techniques such as synchronous actor critic (A2C), deep deterministic policy gradients (DDPG), and proximal policy optimization (PPO). My second goal was to improve trainability of the system by finding the optimal layer for fine tuning from several different convolutional neural network architectures. Finally, I wanted to use the knowledge gained from working in a 2D environment to successfully train agents on several interesting tasks in a 3D environment.
With the help of deep reinforcement learning courses by David Silver and Sergey Levine I was able to catch up on the few existing years of reinforcement learning literature. Using that knowledge, I created implementations of DDPG, A2C, and PPO that successfully and quickly learned many of the 2D tasks that were discussed in the paper. Below shows learning curves for 3 different tasks which detail the validation accuracy over trails as the agent learns to reach the goal.
DDPG does much better than other algorithms because it is able to do many parameter updates at each trial using it's replay buffer whereas the other methods require a batch size of time steps before it can make a parameter update. Below is a figure that I made to help myself and others understand the DDPG and A2C algorithms. Despite the slower training, the level of complexity required for DDPG made us decide that A2C was a much better solution for testing the modular task-switching method in 3D.
During my experiments with actor-critic algorithms, I was also doing a search over backbone architectures and layers to find the optimal way to fine tune off of the convnet in the visual backbone. I searched over the top several fully connected layers and convolutional layers with different poolings as well as combinations of those layers and poolings. The metric for success was area under the learning curve of several different tasks. Though the backbone search is STILL running, here is a visualization of my intermediary results. It is apparent that skip connections are useful in order to combine both high spatial resolution from the convolutional layers and high semantic decoding from the fully connected layers.
After finishing the first two goals, I moved to interesting 3D tasks in an environment that we call 3D World. In 3D world, there were several new difficulties. The first difficulty was in the length of the time horizon in planning. In the 2D world, the tasks never needed more than a few frames of planning. In 3D World, the planning horizon needed to be extended to hundreds of frames. The second difficulty was the massive increase in the dimension of the action space. Now instead of two degrees of freedom for the x and y axis, there were now 20 continuous actions for all of the arms joints and navigation actions. After extensive tinkering, we were able to solve these problems by adjusting hyperparameters in the actor critic algorithms.
Below are videos of the trained agent performing tasks in 3D World. The first two videos are the A2C and DDPG attempts at solving a task where a lamp is placed randomly in a circle around it and it has to navigate to the lamp using only it's vision. The second task is a similar task but instead the lamps are scattered throughout the room and it has to avoid one color of lamp and seek another color. The final task is one where the agent has to use all of it's arm joints to touch as close as possible to the center of the box. The plots on the side show the estimated value given its current visual input (top) and the actions it is currently taking to maximize reward (bottom).
This past summer was an amazing learning experience that has not only drastically expanded my knowledge of deep learning and what it can be used for, but has also deepened my interest in neuroscience. I was able to attend several presentations from distinguished members of both communities and left excited and ready for what lies ahead. Thank you to everyone that made this summer a reality.