Telegram Group & Telegram Channel
Tasty AI Papers | 01-31 August 2024

Robotics.

🔘Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/neural_cell/179
Create:
Last Update:

Tasty AI Papers | 01-31 August 2024

Robotics.

🔘Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.

BY the last neural cell




Share with your friend now:
group-telegram.com/neural_cell/179

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

I want a secure messaging app, should I use Telegram? Additionally, investors are often instructed to deposit monies into personal bank accounts of individuals who claim to represent a legitimate entity, and/or into an unrelated corporate account. To lend credence and to lure unsuspecting victims, perpetrators usually claim that their entity and/or the investment schemes are approved by financial authorities. For tech stocks, “the main thing is yields,” Essaye said. "Like the bombing of the maternity ward in Mariupol," he said, "Even before it hits the news, you see the videos on the Telegram channels." Sebi said data, emails and other documents are being retrieved from the seized devices and detailed investigation is in progress.
from ca


Telegram the last neural cell
FROM American