Notice: file_put_contents(): Write of 1217 bytes failed with errno=28 No space left on device in /var/www/group-telegram/post.php on line 50

Warning: file_put_contents(): Only 16384 of 17601 bytes written, possibly out of free disk space in /var/www/group-telegram/post.php on line 50
All about AI, Web 3.0, BCI | Telegram Webview: alwebbci/3707 -
Telegram Group & Telegram Channel
Meet Mamba3. A research paper submitted to ICLR 2026 introduced Mamba-3, which addresses several limitations in current sub-quadratic sequence models through three methodological changes grounded in classical state-space theory.

Code and detailed implementation not yet publicly available as the paper is under review.

Core Modifications

1. Trapezoidal Discretization

The paper replaces Euler's rule (first-order approximation) with a generalized trapezoidal rule (second-order approximation) for discretizing the continuous-time SSM.

This results in:
- A recurrence that incorporates both current and previous inputs with data-dependent weights
- Ability to replace the short causal convolution when combined with learnable biases on B and C projections
- Lower approximation error: O(Δt²) vs O(Δt) for Euler's method

2. Complex-Valued State Spaces
Mamba-2 simplified the transition matrix to a real scalar, which removed the model's ability to solve simple state-tracking tasks. Mamba-3 reintroduces complex SSMs:
- Enables rotational dynamics in hidden states
- Mathematically equivalent to applying data-dependent rotary embeddings to B and C projections
- Can be computed efficiently using the "RoPE trick"
- Recovers performance on parity and modular arithmetic tasks (100% vs <1% for Mamba-2).

3. MIMO Formulation
Changes state update from outer-product to matrix-multiplication based:
- Increases arithmetic intensity from ~2.5 to ~2r (where r is MIMO rank)
- Better utilizes GPU accelerators during decode
- No increase in state size, maintaining inference speed
- Optional feature that can be enabled when compute efficiency is prioritized

Experimental Results
Language Modeling
(100B FineWeb-Edu tokens):

Outperforms Mamba-2, Transformer, and Gated DeltaNet baselines at all tested scales (180M, 440M, 820M, 1.5B parameters)
Example: Mamba-3-1.5B achieves 56.4% average accuracy vs 55.7% for Mamba-2
State-Tracking Tasks:
Parity: 100.0% (Mamba-2: 0.9%)
Arithmetic without brackets: 98.5% (Mamba-2: 47.8%)
Arithmetic with brackets: 87.8% (Mamba-2: 0.9%)
Inference Performance:
Faster single-step decode than Mamba-2 despite more complex SSM
MIMO variant improves Pareto frontier: better perplexity at same state size
At 440M scale with 100B tokens, MIMO achieves 12.72 vs 12.87 perplexity for SISO
4🔥4👏2



group-telegram.com/alwebbci/3707
Create:
Last Update:

Meet Mamba3. A research paper submitted to ICLR 2026 introduced Mamba-3, which addresses several limitations in current sub-quadratic sequence models through three methodological changes grounded in classical state-space theory.

Code and detailed implementation not yet publicly available as the paper is under review.

Core Modifications

1. Trapezoidal Discretization

The paper replaces Euler's rule (first-order approximation) with a generalized trapezoidal rule (second-order approximation) for discretizing the continuous-time SSM.

This results in:
- A recurrence that incorporates both current and previous inputs with data-dependent weights
- Ability to replace the short causal convolution when combined with learnable biases on B and C projections
- Lower approximation error: O(Δt²) vs O(Δt) for Euler's method

2. Complex-Valued State Spaces
Mamba-2 simplified the transition matrix to a real scalar, which removed the model's ability to solve simple state-tracking tasks. Mamba-3 reintroduces complex SSMs:
- Enables rotational dynamics in hidden states
- Mathematically equivalent to applying data-dependent rotary embeddings to B and C projections
- Can be computed efficiently using the "RoPE trick"
- Recovers performance on parity and modular arithmetic tasks (100% vs <1% for Mamba-2).

3. MIMO Formulation
Changes state update from outer-product to matrix-multiplication based:
- Increases arithmetic intensity from ~2.5 to ~2r (where r is MIMO rank)
- Better utilizes GPU accelerators during decode
- No increase in state size, maintaining inference speed
- Optional feature that can be enabled when compute efficiency is prioritized

Experimental Results
Language Modeling
(100B FineWeb-Edu tokens):

Outperforms Mamba-2, Transformer, and Gated DeltaNet baselines at all tested scales (180M, 440M, 820M, 1.5B parameters)
Example: Mamba-3-1.5B achieves 56.4% average accuracy vs 55.7% for Mamba-2
State-Tracking Tasks:
Parity: 100.0% (Mamba-2: 0.9%)
Arithmetic without brackets: 98.5% (Mamba-2: 47.8%)
Arithmetic with brackets: 87.8% (Mamba-2: 0.9%)
Inference Performance:
Faster single-step decode than Mamba-2 despite more complex SSM
MIMO variant improves Pareto frontier: better perplexity at same state size
At 440M scale with 100B tokens, MIMO achieves 12.72 vs 12.87 perplexity for SISO

BY All about AI, Web 3.0, BCI


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/alwebbci/3707

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

Oleksandra Matviichuk, a Kyiv-based lawyer and head of the Center for Civil Liberties, called Durov’s position "very weak," and urged concrete improvements. "There is a significant risk of insider threat or hacking of Telegram systems that could expose all of these chats to the Russian government," said Eva Galperin with the Electronic Frontier Foundation, which has called for Telegram to improve its privacy practices. The Dow Jones Industrial Average fell 230 points, or 0.7%. Meanwhile, the S&P 500 and the Nasdaq Composite dropped 1.3% and 2.2%, respectively. All three indexes began the day with gains before selling off. Telegram does offer end-to-end encrypted communications through Secret Chats, but this is not the default setting. Standard conversations use the MTProto method, enabling server-client encryption but with them stored on the server for ease-of-access. This makes using Telegram across multiple devices simple, but also means that the regular Telegram chats you’re having with folks are not as secure as you may believe. The original Telegram channel has expanded into a web of accounts for different locations, including specific pages made for individual Russian cities. There's also an English-language website, which states it is owned by the people who run the Telegram channels.
from de


Telegram All about AI, Web 3.0, BCI
FROM American