Recurrent Neural Network (RNN):
Long Short-term Memory (LSTM):
Gated Recurrent Units (GRUs):
Original S4 research paper ("Efficiently Modeling Long Sequences with Structured State Spaces")
S4 models sequences using a state space representation, where a hidden state evolves over time based on the input
This allows for efficient modeling of long-range dependencies without the quadratic complexity of traditional attention mechanisms.
State Transition Function: Updates the hidden state at each time step based on the previous state and current input.
Emission Function: Maps the hidden state to an output prediction.
Structured State Space: The hidden state is structured, allowing for efficient computation and parallelization.
HiPPO: A specific state space model used in S4 for efficient long-range dependencies.
Selective-State-Spaces (SSM): The core of Mamba, SSMs are recurrent models that selectively process information based on the current input. This allows them to focus on relevane information and discard irrelevant data.
Simplified Architecture: Mamba replaces the complex attention and MLP blocks of Transformers with a single, unified SSM block. This aims to reduce computational complexity and impro inference speed.
Hardware-Aware Parallelism: Mamba utilizes a recurrent mode with a parallel algorithm specifically designed for hardware efficiency, potentially further enhancing its performance.
Fast Inference: Up to 5x faster than Transformers on certain tasks.
Linear Scaling: Computation scales linearly with sequence length, handling long sequences effectively.
State-of-the-Art Performance: Achieves competitive results on various modalities, including language, audio, and genomics.