LipsNet++: Unifying Filter and Controller into a Policy Network

Abstract

Deep reinforcement learning (RL) is an effective approach for decision-making and control tasks. However, RL-trained policies often suffer from the action fluctuation problem, resulting in severe actuator wear, safety risks, and performance degradation in real-world applications. In this paper, we identified the two fundamental reasons of action fluctuation: observation noise and policy non-smoothness. Then, we proposed a novel policy network, LipsNet++, integrating a Fourier filter layer and a Lipschitz controller layer to mitigate these two factors decoupledly. The filter layer incorporates a trainable filter matrix that automatically extracts important frequencies while suppressing noise frequencies in the observations. The controller layer introduces a Jacobian regularization technique to achieve a low Lipschitz constant, ensuring smooth fitting of a policy function. These two layers function analogously to the filter and controller in classical control theory, suggesting that filtering and control capabilities can be seamlessly integrated into a single policy network. Both simulated and real-world experiments demonstrate that LipsNet++ achieves state-of-the-art noise robustness and action smoothness.

TL;DR

The paper propose LipsNet++, a policy network incorporating a Fourier filter layer and Lipschitz controller layer. It can be used as policy network in most actor-critic RL algorithms to obtain smoother control action in real-world applications.

1. Reasons Identification of Action Fluctuation

Our paper identifies the two fundamental reasons that causes action fluctuation:

Non-smoothness of policy network
Existence of observation noise

Our paper proposes two layers to address the above two reasons, respectively.

2. Policy Network Structure

There are two layers incorporated in LipsNet++:

Fourier filter layer
Lipschitz controller layer

The two layers respectively tackle the two fundamental reasons causing action fluctuation. These two layers function analogously to the filter and controller in classical control theory, suggesting that filtering and control capabilities can be seamlessly integrated into a single policy network.

LipsNet++ is applicable in almost all actor-critic RL algorithms, including DDPG, TD3, PPO, TRPO, SAC, and DSAC, etc. LipsNet++ produces smooth control actions, which contributes to the real-world applications of RL.

3. Video of Mini-Vehicle Driving Task

This is a real-world application on the vehicle-robot driving task. The vehicle is controlled by the policy network, i.e. LipsNet++, trained by RL.

Click to watch the video:

4. User-friendly Packaging

The user-friendly packaging of LipsNet++ does not disturb original RL algorithm, allowing application in various RL algorithms. Practitioners can use LipsNet++ just like an MLP. The code will be released after review.