Telegram Group & Telegram Channel
transformers-october-2024.png
2 MB
tasty transformer papers | october 2024
[2/4]

Differential Transformer
what: small modification for self attention mechanism.
- focuses on the most important information, ignoring unnecessary details.
- it does this by subtracting one attention map from another to remove "noise."
link: https://arxiv.org/abs/2410.05258

Pixtral-12B
what: good multimodal model with simple arch.
- Vision Encoder with ROPE-2D: Handles any image resolution/aspect ratio natively.
- Break Tokens: Separates image rows for flexible aspect ratios.
- Sequence Packing: Batch-processes images with block-diagonal masks, no info “leaks.”
link: https://arxiv.org/abs/2410.07073

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
what: maskGIT with continual tokens.
- get vae with quantized loss but do not use quantization in decoder ( stable diffusion)
- propose BERT-like model to generate in random-order.
- ablation shows that bert-like better than gpt-like for images(tbh small improvements)
link: https://arxiv.org/abs/2410.13863

UniMTS: Unified Pre-training for Motion Time Series
what: one model to handle different device positions, orientations, and activity types.
- use graph convolution encoder to work with all devices
- contrastive learning with text from LLMs to “get” motion context.
- rotation-invariance: doesn’t care about device angle.
link: https://arxiv.org/abs/2410.19818

my thoughts

I'm really impressed with the Differential Transformer metrics. They made such a simple and clear modification. Basically, they let the neural network find not only the most similar tokens but also the irrelevant ones. Then they subtract one from the other to get exactly what's needed.

This approach could really boost brain signal processing. After all, brain activity contains lots of unnecessary information, and filtering it out would be super helpful. So it looks promising.

Mistral has really nailed how to build and explain models. Clear, brief, super understandable. They removed everything unnecessary, kept just what's needed, and got better results. The simpler, the better!



group-telegram.com/neural_cell/202
Create:
Last Update:

tasty transformer papers | october 2024
[2/4]

Differential Transformer
what: small modification for self attention mechanism.
- focuses on the most important information, ignoring unnecessary details.
- it does this by subtracting one attention map from another to remove "noise."
link: https://arxiv.org/abs/2410.05258

Pixtral-12B
what: good multimodal model with simple arch.
- Vision Encoder with ROPE-2D: Handles any image resolution/aspect ratio natively.
- Break Tokens: Separates image rows for flexible aspect ratios.
- Sequence Packing: Batch-processes images with block-diagonal masks, no info “leaks.”
link: https://arxiv.org/abs/2410.07073

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
what: maskGIT with continual tokens.
- get vae with quantized loss but do not use quantization in decoder ( stable diffusion)
- propose BERT-like model to generate in random-order.
- ablation shows that bert-like better than gpt-like for images(tbh small improvements)
link: https://arxiv.org/abs/2410.13863

UniMTS: Unified Pre-training for Motion Time Series
what: one model to handle different device positions, orientations, and activity types.
- use graph convolution encoder to work with all devices
- contrastive learning with text from LLMs to “get” motion context.
- rotation-invariance: doesn’t care about device angle.
link: https://arxiv.org/abs/2410.19818

my thoughts

I'm really impressed with the Differential Transformer metrics. They made such a simple and clear modification. Basically, they let the neural network find not only the most similar tokens but also the irrelevant ones. Then they subtract one from the other to get exactly what's needed.

This approach could really boost brain signal processing. After all, brain activity contains lots of unnecessary information, and filtering it out would be super helpful. So it looks promising.

Mistral has really nailed how to build and explain models. Clear, brief, super understandable. They removed everything unnecessary, kept just what's needed, and got better results. The simpler, the better!

BY the last neural cell


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/neural_cell/202

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

In the past, it was noticed that through bulk SMSes, investors were induced to invest in or purchase the stocks of certain listed companies. In 2018, Russia banned Telegram although it reversed the prohibition two years later. After fleeing Russia, the brothers founded Telegram as a way to communicate outside the Kremlin's orbit. They now run it from Dubai, and Pavel Durov says it has more than 500 million monthly active users. The regulator said it has been undertaking several campaigns to educate the investors to be vigilant while taking investment decisions based on stock tips. As a result, the pandemic saw many newcomers to Telegram, including prominent anti-vaccine activists who used the app's hands-off approach to share false information on shots, a study from the Institute for Strategic Dialogue shows.
from ru


Telegram the last neural cell
FROM American