[SiLU] [dSiLU] Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Stefan Elfwing, Eiji Uchibe, Kenji Doya

{, }

Paper: https://www.sciencedirect.com/science/article/pii/S0893608017302976

Motivation, Objectives and Related Works

Motivation

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning.

Objectives

Propose two activation functions for neural network function approximation in reinforcement learning:
1. The sigmoid-weighted linear unit (SiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input.
2. Its derivative function (dSiLU).

Methods

Idea

Two activation functions are proposed for neural network function approximation in reinforcement learning:
1. The sigmoid-weighted linear unit (SiLU)
2. Derivative function (dSiLU).

Sigmoid-weighted Linear Unit (SiLU)

The activation ak of the kth SiLU for input zk is computed by the sigmoid function multiplied by its input (i.e., equal to the contribution from a hidden node to the value function in an EE-RBM):

zk is the input to hidden unit k, σ(·) is the sigmoid function.

For zk-values of large magnitude, the activation of the SiLU is approximately equal to the activation of the ReLU (see the left panel in Fig. 1), i.e., the activation is approximately equal to zero for large negative zk-values and approximately equal to zk for large positive zk-values.
Unlike the ReLU (and other commonly used activation units such as sigmoid and tanh units), the activation of the SiLU is not monotonically increasing. Instead, it has a global minimum value of approximately −0.28 for zk ≈ −1.28. An attractive feature of the SiLU is that it has a self-stabilizing property, which we demonstrated experimentally in Elfwing et al. (2015). The global minimum, where the derivative is zero, functions as a ‘‘soft floor’’ on the weights that serves as an implicit regularizer that inhibits the learning of weights of large magnitudes.

Derivative of SiLU (dSiLU)

In Elfwing et al. (2015), we discovered that the derivative function of the SiLU (i.e., the derivative of the contribution from a hidden node to the output in an EE-RBM) looks like a steeper and ‘‘overshooting’’ version of the sigmoid function.
In this study, we call this function the dSiLU and we propose it as a competitive alternative to the sigmoid function in neural network function approximation in reinforcement learning.
The activation of the dSiLU is computed by the derivative of the SiLU (see right panel in Fig. 1)

The dSiLU has a maximum value of approximately 1.1 and a minimum value of approximately −0.1 for zk ≈ ±2.4, i.e., the solutions to the equation zk = − log ((zk − 2)/(zk + 2)).

Experimental Results

Figure. Learning curves in stochastic SZ-Tetris for the four types of shallow neural network agents.

Dataset

SZ-Tetris
10 × 10 Tetris
Atari 2600 games

Key Takeaways

References

https://sh-tsang.medium.com/review-silu-sigmoid-weighted-linear-unit-be4bc943624d

- n2 n0
- θ

Page updated

Google Sites

Report abuse

This site uses cookies from Google to deliver its services and to analyze traffic. Information about your use of this site is shared with Google. By clicking "accept", you agree to its use of cookies. Cookie Policy

Reject