SC Seminar: Dominik Willrich

Dominik Willrich, RPTU University Kaiserslautern-Landau

Title: Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

Abstract:

Deep reinforcement learning has achieved strong performance in continuous control tasks, yet most stochastic policy gradient methods rely on Gaussian action distributions, even when the true action space is bounded. This mismatch can introduce bias and negatively affect learning efficiency. In this seminar, we present the work of the authors on replacing the Gaussian policy with a Beta distribution, which naturally respects action bounds. The paper provides a theoretical analysis of the bias and variance of policy gradients under both distributions and shows that the Beta policy is bias-free in bounded action spaces. Empirically, the authors evaluate this approach using both on-policy and off-policy methods, across a range of continuous control benchmarks in OpenAI Gym and MuJoCo. The results demonstrate faster convergence and improved performance when using Beta-distributed policies, highlighting the importance of aligning policy parameterization with problem constraints in continuous control.

How to join online

You can join online via Zoom, using the following link:
https://uni-kl-de.zoom-x.de/j/69269239534?pwd=Z9UOzMpkhMjrxVhll3d49sNHFe9Fd1.1

Bookmark the permalink.