Telegram Group & Telegram Channel
transformers-october-2024.png
2 MB
tasty transformer papers | october 2024
[2/4]

Differential Transformer
what: small modification for self attention mechanism.
- focuses on the most important information, ignoring unnecessary details.
- it does this by subtracting one attention map from another to remove "noise."
link: https://arxiv.org/abs/2410.05258

Pixtral-12B
what: good multimodal model with simple arch.
- Vision Encoder with ROPE-2D: Handles any image resolution/aspect ratio natively.
- Break Tokens: Separates image rows for flexible aspect ratios.
- Sequence Packing: Batch-processes images with block-diagonal masks, no info “leaks.”
link: https://arxiv.org/abs/2410.07073

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
what: maskGIT with continual tokens.
- get vae with quantized loss but do not use quantization in decoder ( stable diffusion)
- propose BERT-like model to generate in random-order.
- ablation shows that bert-like better than gpt-like for images(tbh small improvements)
link: https://arxiv.org/abs/2410.13863

UniMTS: Unified Pre-training for Motion Time Series
what: one model to handle different device positions, orientations, and activity types.
- use graph convolution encoder to work with all devices
- contrastive learning with text from LLMs to “get” motion context.
- rotation-invariance: doesn’t care about device angle.
link: https://arxiv.org/abs/2410.19818

my thoughts

I'm really impressed with the Differential Transformer metrics. They made such a simple and clear modification. Basically, they let the neural network find not only the most similar tokens but also the irrelevant ones. Then they subtract one from the other to get exactly what's needed.

This approach could really boost brain signal processing. After all, brain activity contains lots of unnecessary information, and filtering it out would be super helpful. So it looks promising.

Mistral has really nailed how to build and explain models. Clear, brief, super understandable. They removed everything unnecessary, kept just what's needed, and got better results. The simpler, the better!



group-telegram.com/neural_cell/202
Create:
Last Update:

tasty transformer papers | october 2024
[2/4]

Differential Transformer
what: small modification for self attention mechanism.
- focuses on the most important information, ignoring unnecessary details.
- it does this by subtracting one attention map from another to remove "noise."
link: https://arxiv.org/abs/2410.05258

Pixtral-12B
what: good multimodal model with simple arch.
- Vision Encoder with ROPE-2D: Handles any image resolution/aspect ratio natively.
- Break Tokens: Separates image rows for flexible aspect ratios.
- Sequence Packing: Batch-processes images with block-diagonal masks, no info “leaks.”
link: https://arxiv.org/abs/2410.07073

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
what: maskGIT with continual tokens.
- get vae with quantized loss but do not use quantization in decoder ( stable diffusion)
- propose BERT-like model to generate in random-order.
- ablation shows that bert-like better than gpt-like for images(tbh small improvements)
link: https://arxiv.org/abs/2410.13863

UniMTS: Unified Pre-training for Motion Time Series
what: one model to handle different device positions, orientations, and activity types.
- use graph convolution encoder to work with all devices
- contrastive learning with text from LLMs to “get” motion context.
- rotation-invariance: doesn’t care about device angle.
link: https://arxiv.org/abs/2410.19818

my thoughts

I'm really impressed with the Differential Transformer metrics. They made such a simple and clear modification. Basically, they let the neural network find not only the most similar tokens but also the irrelevant ones. Then they subtract one from the other to get exactly what's needed.

This approach could really boost brain signal processing. After all, brain activity contains lots of unnecessary information, and filtering it out would be super helpful. So it looks promising.

Mistral has really nailed how to build and explain models. Clear, brief, super understandable. They removed everything unnecessary, kept just what's needed, and got better results. The simpler, the better!

BY the last neural cell


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/neural_cell/202

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

"The inflation fire was already hot and now with war-driven inflation added to the mix, it will grow even hotter, setting off a scramble by the world’s central banks to pull back their stimulus earlier than expected," Chris Rupkey, chief economist at FWDBONDS, wrote in an email. "A spike in inflation rates has preceded economic recessions historically and this time prices have soared to levels that once again pose a threat to growth." Emerson Brooking, a disinformation expert at the Atlantic Council's Digital Forensic Research Lab, said: "Back in the Wild West period of content moderation, like 2014 or 2015, maybe they could have gotten away with it, but it stands in marked contrast with how other companies run themselves today." A Russian Telegram channel with over 700,000 followers is spreading disinformation about Russia's invasion of Ukraine under the guise of providing "objective information" and fact-checking fake news. Its influence extends beyond the platform, with major Russian publications, government officials, and journalists citing the page's posts. In view of this, the regulator has cautioned investors not to rely on such investment tips / advice received through social media platforms. It has also said investors should exercise utmost caution while taking investment decisions while dealing in the securities market. Sebi said data, emails and other documents are being retrieved from the seized devices and detailed investigation is in progress.
from vn


Telegram the last neural cell
FROM American