Telegram Group & Telegram Channel
transformers-october-2024.png
2 MB
tasty transformer papers | october 2024
[2/4]

Differential Transformer
what: small modification for self attention mechanism.
- focuses on the most important information, ignoring unnecessary details.
- it does this by subtracting one attention map from another to remove "noise."
link: https://arxiv.org/abs/2410.05258

Pixtral-12B
what: good multimodal model with simple arch.
- Vision Encoder with ROPE-2D: Handles any image resolution/aspect ratio natively.
- Break Tokens: Separates image rows for flexible aspect ratios.
- Sequence Packing: Batch-processes images with block-diagonal masks, no info “leaks.”
link: https://arxiv.org/abs/2410.07073

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
what: maskGIT with continual tokens.
- get vae with quantized loss but do not use quantization in decoder ( stable diffusion)
- propose BERT-like model to generate in random-order.
- ablation shows that bert-like better than gpt-like for images(tbh small improvements)
link: https://arxiv.org/abs/2410.13863

UniMTS: Unified Pre-training for Motion Time Series
what: one model to handle different device positions, orientations, and activity types.
- use graph convolution encoder to work with all devices
- contrastive learning with text from LLMs to “get” motion context.
- rotation-invariance: doesn’t care about device angle.
link: https://arxiv.org/abs/2410.19818

my thoughts

I'm really impressed with the Differential Transformer metrics. They made such a simple and clear modification. Basically, they let the neural network find not only the most similar tokens but also the irrelevant ones. Then they subtract one from the other to get exactly what's needed.

This approach could really boost brain signal processing. After all, brain activity contains lots of unnecessary information, and filtering it out would be super helpful. So it looks promising.

Mistral has really nailed how to build and explain models. Clear, brief, super understandable. They removed everything unnecessary, kept just what's needed, and got better results. The simpler, the better!



group-telegram.com/neural_cell/202
Create:
Last Update:

tasty transformer papers | october 2024
[2/4]

Differential Transformer
what: small modification for self attention mechanism.
- focuses on the most important information, ignoring unnecessary details.
- it does this by subtracting one attention map from another to remove "noise."
link: https://arxiv.org/abs/2410.05258

Pixtral-12B
what: good multimodal model with simple arch.
- Vision Encoder with ROPE-2D: Handles any image resolution/aspect ratio natively.
- Break Tokens: Separates image rows for flexible aspect ratios.
- Sequence Packing: Batch-processes images with block-diagonal masks, no info “leaks.”
link: https://arxiv.org/abs/2410.07073

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
what: maskGIT with continual tokens.
- get vae with quantized loss but do not use quantization in decoder ( stable diffusion)
- propose BERT-like model to generate in random-order.
- ablation shows that bert-like better than gpt-like for images(tbh small improvements)
link: https://arxiv.org/abs/2410.13863

UniMTS: Unified Pre-training for Motion Time Series
what: one model to handle different device positions, orientations, and activity types.
- use graph convolution encoder to work with all devices
- contrastive learning with text from LLMs to “get” motion context.
- rotation-invariance: doesn’t care about device angle.
link: https://arxiv.org/abs/2410.19818

my thoughts

I'm really impressed with the Differential Transformer metrics. They made such a simple and clear modification. Basically, they let the neural network find not only the most similar tokens but also the irrelevant ones. Then they subtract one from the other to get exactly what's needed.

This approach could really boost brain signal processing. After all, brain activity contains lots of unnecessary information, and filtering it out would be super helpful. So it looks promising.

Mistral has really nailed how to build and explain models. Clear, brief, super understandable. They removed everything unnecessary, kept just what's needed, and got better results. The simpler, the better!

BY the last neural cell


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/neural_cell/202

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

The account, "War on Fakes," was created on February 24, the same day Russian President Vladimir Putin announced a "special military operation" and troops began invading Ukraine. The page is rife with disinformation, according to The Atlantic Council's Digital Forensic Research Lab, which studies digital extremism and published a report examining the channel. What distinguishes the app from competitors is its use of what's known as channels: Public or private feeds of photos and videos that can be set up by one person or an organization. The channels have become popular with on-the-ground journalists, aid workers and Ukrainian President Volodymyr Zelenskyy, who broadcasts on a Telegram channel. The channels can be followed by an unlimited number of people. Unlike Facebook, Twitter and other popular social networks, there is no advertising on Telegram and the flow of information is not driven by an algorithm. Multiple pro-Kremlin media figures circulated the post's false claims, including prominent Russian journalist Vladimir Soloviev and the state-controlled Russian outlet RT, according to the DFR Lab's report. In addition, Telegram's architecture limits the ability to slow the spread of false information: the lack of a central public feed, and the fact that comments are easily disabled in channels, reduce the space for public pushback. The SC urges the public to refer to the SC’s I nvestor Alert List before investing. The list contains details of unauthorised websites, investment products, companies and individuals. Members of the public who suspect that they have been approached by unauthorised firms or individuals offering schemes that promise unrealistic returns
from ar


Telegram the last neural cell
FROM American