Robots that operate in real-world conditions often perform complex tasks in the presence of stochastic disturbances. The source of the disturbances can be widely varied, including but not limited to, hardware imperfections, atmospheric changes, and measurement inaccuracies. These disturbances pose a great control challenge because stochastic drift induces changes in the robot’s speed and direction. This paper presents an online trajectory generation method for robots to complete preference-balancing tasks under stochastic disturbances. Task learning is done off-line assuming no disturbances, and then trajectories are planned online in the presence of disturbances using the current observed information. We model the robot as a stochastic control-affine system with unknown dynamics impacted by a Gaussian process. This paper introduces a supervised machine learning method in lieu of a traditional greedy policy. We verify the method in simulation for an aerial vehicle cargo delivery and a flying inverted pendulum task. Results show the presented method works on a range of problems and outperforms the deterministic method in the presence of non-zero mean disturbances.