Warning: mkdir(): No space left on device in /var/www/group-telegram/post.php on line 37

Warning: file_put_contents(aCache/aDaily/post/alwebbci/--): Failed to open stream: No such file or directory in /var/www/group-telegram/post.php on line 50
All about AI, Web 3.0, BCI | Telegram Webview: alwebbci/3707 -
Telegram Group & Telegram Channel
Meet Mamba3. A research paper submitted to ICLR 2026 introduced Mamba-3, which addresses several limitations in current sub-quadratic sequence models through three methodological changes grounded in classical state-space theory.

Code and detailed implementation not yet publicly available as the paper is under review.

Core Modifications

1. Trapezoidal Discretization

The paper replaces Euler's rule (first-order approximation) with a generalized trapezoidal rule (second-order approximation) for discretizing the continuous-time SSM.

This results in:
- A recurrence that incorporates both current and previous inputs with data-dependent weights
- Ability to replace the short causal convolution when combined with learnable biases on B and C projections
- Lower approximation error: O(Δt²) vs O(Δt) for Euler's method

2. Complex-Valued State Spaces
Mamba-2 simplified the transition matrix to a real scalar, which removed the model's ability to solve simple state-tracking tasks. Mamba-3 reintroduces complex SSMs:
- Enables rotational dynamics in hidden states
- Mathematically equivalent to applying data-dependent rotary embeddings to B and C projections
- Can be computed efficiently using the "RoPE trick"
- Recovers performance on parity and modular arithmetic tasks (100% vs <1% for Mamba-2).

3. MIMO Formulation
Changes state update from outer-product to matrix-multiplication based:
- Increases arithmetic intensity from ~2.5 to ~2r (where r is MIMO rank)
- Better utilizes GPU accelerators during decode
- No increase in state size, maintaining inference speed
- Optional feature that can be enabled when compute efficiency is prioritized

Experimental Results
Language Modeling
(100B FineWeb-Edu tokens):

Outperforms Mamba-2, Transformer, and Gated DeltaNet baselines at all tested scales (180M, 440M, 820M, 1.5B parameters)
Example: Mamba-3-1.5B achieves 56.4% average accuracy vs 55.7% for Mamba-2
State-Tracking Tasks:
Parity: 100.0% (Mamba-2: 0.9%)
Arithmetic without brackets: 98.5% (Mamba-2: 47.8%)
Arithmetic with brackets: 87.8% (Mamba-2: 0.9%)
Inference Performance:
Faster single-step decode than Mamba-2 despite more complex SSM
MIMO variant improves Pareto frontier: better perplexity at same state size
At 440M scale with 100B tokens, MIMO achieves 12.72 vs 12.87 perplexity for SISO
4🔥4👏2



group-telegram.com/alwebbci/3707
Create:
Last Update:

Meet Mamba3. A research paper submitted to ICLR 2026 introduced Mamba-3, which addresses several limitations in current sub-quadratic sequence models through three methodological changes grounded in classical state-space theory.

Code and detailed implementation not yet publicly available as the paper is under review.

Core Modifications

1. Trapezoidal Discretization

The paper replaces Euler's rule (first-order approximation) with a generalized trapezoidal rule (second-order approximation) for discretizing the continuous-time SSM.

This results in:
- A recurrence that incorporates both current and previous inputs with data-dependent weights
- Ability to replace the short causal convolution when combined with learnable biases on B and C projections
- Lower approximation error: O(Δt²) vs O(Δt) for Euler's method

2. Complex-Valued State Spaces
Mamba-2 simplified the transition matrix to a real scalar, which removed the model's ability to solve simple state-tracking tasks. Mamba-3 reintroduces complex SSMs:
- Enables rotational dynamics in hidden states
- Mathematically equivalent to applying data-dependent rotary embeddings to B and C projections
- Can be computed efficiently using the "RoPE trick"
- Recovers performance on parity and modular arithmetic tasks (100% vs <1% for Mamba-2).

3. MIMO Formulation
Changes state update from outer-product to matrix-multiplication based:
- Increases arithmetic intensity from ~2.5 to ~2r (where r is MIMO rank)
- Better utilizes GPU accelerators during decode
- No increase in state size, maintaining inference speed
- Optional feature that can be enabled when compute efficiency is prioritized

Experimental Results
Language Modeling
(100B FineWeb-Edu tokens):

Outperforms Mamba-2, Transformer, and Gated DeltaNet baselines at all tested scales (180M, 440M, 820M, 1.5B parameters)
Example: Mamba-3-1.5B achieves 56.4% average accuracy vs 55.7% for Mamba-2
State-Tracking Tasks:
Parity: 100.0% (Mamba-2: 0.9%)
Arithmetic without brackets: 98.5% (Mamba-2: 47.8%)
Arithmetic with brackets: 87.8% (Mamba-2: 0.9%)
Inference Performance:
Faster single-step decode than Mamba-2 despite more complex SSM
MIMO variant improves Pareto frontier: better perplexity at same state size
At 440M scale with 100B tokens, MIMO achieves 12.72 vs 12.87 perplexity for SISO

BY All about AI, Web 3.0, BCI


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/alwebbci/3707

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

"He has kind of an old-school cyber-libertarian world view where technology is there to set you free," Maréchal said. What distinguishes the app from competitors is its use of what's known as channels: Public or private feeds of photos and videos that can be set up by one person or an organization. The channels have become popular with on-the-ground journalists, aid workers and Ukrainian President Volodymyr Zelenskyy, who broadcasts on a Telegram channel. The channels can be followed by an unlimited number of people. Unlike Facebook, Twitter and other popular social networks, there is no advertising on Telegram and the flow of information is not driven by an algorithm. The Russian invasion of Ukraine has been a driving force in markets for the past few weeks. As a result, the pandemic saw many newcomers to Telegram, including prominent anti-vaccine activists who used the app's hands-off approach to share false information on shots, a study from the Institute for Strategic Dialogue shows. The regulator said it has been undertaking several campaigns to educate the investors to be vigilant while taking investment decisions based on stock tips.
from us


Telegram All about AI, Web 3.0, BCI
FROM American