the last neural cell

Tasty AI Papers | 01-31 August 2024

Robotics.

🔘

Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘

CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.

Please open Telegram to view this post

VIEW IN TELEGRAM

www.group-telegram.com/sg/neural_cell.com/179

438 viewsAleksandr Kovalev, edited Sep 2, 2024 at 14:03

group-telegram.com/neural_cell/179

Create: 2024-09-02
Last Update: 2025-06-19 12:25:38

Tasty AI Papers | 01-31 August 2024

Robotics.

🔘Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.

Telegram | DID YOU KNOW?

Tasty AI Papers | 01-31 August 2024