Abstract: Physical stochastic disturbances, such as wind, often affect the motion of robots that perform complex tasks in real-world conditions. These disturbances pose a control challenge because resulting drift induces uncertainty and changes in the robot's speed and direction. This paper presents an online control policy based on supervised machine learning, Least Squares Axial Sum Policy Approximation (LSAPA), that generates trajectories for robotic preference-balancing tasks under stochastic disturbances. The task is learned offline with reinforcement learning, assuming no disturbances, and then trajectories are planned online in the presence of disturbances using the current observed information. We model the robot as a stochastic control-affine system with unknown dynamics impacted by a Gaussian process, and the task as a continuous Markov Decision Process. Replacing a traditional greedy policy, LSAPA works for high-dimensional control-affine systems impacted by stochastic disturbances and is linear in the input dimensionality. We verify the method for Swing-free Aerial Cargo Delivery and Rendezvous tasks. Results show that LSAPA selects an input an order of magnitude faster than comparative methods, rejecting a range of stochastic disturbances. Further, experiments on a quadrotor demonstrate that LSAPA trajectories that are suitable for physical systems.