Meet Mamba3. A research paper submitted to ICLR 2026 introduced Mamba-3

All about AI, Web 3.0, BCI

Meet Mamba3. A research paper submitted to ICLR 2026 introduced Mamba-3, which addresses several limitations in current sub-quadratic sequence models through three methodological changes grounded in classical state-space theory.

Code and detailed implementation not yet publicly available as the paper is under review.

Core Modifications

1. Trapezoidal Discretization
The paper replaces Euler's rule (first-order approximation) with a generalized trapezoidal rule (second-order approximation) for discretizing the continuous-time SSM.

This results in:
- A recurrence that incorporates both current and previous inputs with data-dependent weights
- Ability to replace the short causal convolution when combined with learnable biases on B and C projections
- Lower approximation error: O(Δt²) vs O(Δt) for Euler's method

2. Complex-Valued State Spaces
Mamba-2 simplified the transition matrix to a real scalar, which removed the model's ability to solve simple state-tracking tasks. Mamba-3 reintroduces complex SSMs:
- Enables rotational dynamics in hidden states
- Mathematically equivalent to applying data-dependent rotary embeddings to B and C projections
- Can be computed efficiently using the "RoPE trick"
- Recovers performance on parity and modular arithmetic tasks (100% vs <1% for Mamba-2).

3. MIMO Formulation
Changes state update from outer-product to matrix-multiplication based:
- Increases arithmetic intensity from ~2.5 to ~2r (where r is MIMO rank)
- Better utilizes GPU accelerators during decode
- No increase in state size, maintaining inference speed
- Optional feature that can be enabled when compute efficiency is prioritized

Experimental Results
Language Modeling (100B FineWeb-Edu tokens):

Outperforms Mamba-2, Transformer, and Gated DeltaNet baselines at all tested scales (180M, 440M, 820M, 1.5B parameters)
Example: Mamba-3-1.5B achieves 56.4% average accuracy vs 55.7% for Mamba-2
State-Tracking Tasks:
Parity: 100.0% (Mamba-2: 0.9%)
Arithmetic without brackets: 98.5% (Mamba-2: 47.8%)
Arithmetic with brackets: 87.8% (Mamba-2: 0.9%)
Inference Performance:
Faster single-step decode than Mamba-2 despite more complex SSM
MIMO variant improves Pareto frontier: better perplexity at same state size
At 440M scale with 100B tokens, MIMO achieves 12.72 vs 12.87 perplexity for SISO

❤4🔥4👏2

www.group-telegram.com/us/alwebbci.com/3707

557 viewsOct 14 at 08:59

group-telegram.com/alwebbci/3707

Create: 2025-10-14
Last Update: 2025-11-29 21:45:49

BY All about AI, Web 3.0, BCI

Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/alwebbci/3707

Telegram | DID YOU KNOW?

Meet Mamba3. A research paper submitted to ICLR 2026 introduced Mamba-3