Abstract— The problem of balancing an inverted pendulum on an unmanned aerial vehicle (UAV) has been achieved using linear and nonlinear control approaches. However, to the best of our knowledge, this problem has not been solved using learning methods. On the other hand, the classical inverted pendulum is a common benchmark problem to evaluate learning techniques. In this paper we demonstrate a novel solution to the inverted pendulum problem extended to UAVs, specifically quadrotors. This complex system is underactuated and sensitive to small acceleration changes of the quadrotor. The solution is provided by reinforcement learning (RL), a platform commonly applied to solve nonlinear control problems. We generate a control policy to balance the pendulum using Continuous Action Fitted Value Iteration (CAFVI) [1] which is a RL algorithm for high- dimensional input-spaces. This technique combines learning of both state and state-action value functions in an approximate value iteration setting with continuous inputs. Simulations verify the performance of the generated control policy for varying initial conditions. The results show the control policy is computationally fast enough to be appropriate of real-time control. Index Terms— Aerial robotics, quadrotor control, inverted pen- dulum, approximate value iteration, reinforcement learning.