Telegram Group & Telegram Channel
Tasty AI Papers | 01-31 August 2024

Robotics.

🔘Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/neural_cell/179
Create:
Last Update:

Tasty AI Papers | 01-31 August 2024

Robotics.

🔘Body Transformer: Leveraging Robot Embodiment for Policy Learning

what: one transformer to control whole body.
- propose Body Transformer (BoT)
- vanilla transformer with special attention mask, which reflects interconnection of the different body parts.

🔘CrossFormer Scaling Cross-Embodied Learning for Manipulation, Navigation, Locomotion, and Aviation

what: One transformer that can control various robot types.
- trained on 900K trajectories from 20 different robots.
- matches or beats specialized algorithms for each robot type.
- works on arms, wheeled bots, quadrupeds, and even drones.

Diffusion + AR Transformers

🟢Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

what: merge AR decoder with vanilla diffusion.
- train model with two objectives: causal language loss + diffusion objective
- deal with discrete and continuous in the same model.

🟡 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

what: propose diffusion for discrete distribution
- beats other diffusion approach for text generation
- outperforms gpt-2.

🟡Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

what: combine AR transformer with MaskGIT.
- can generate image and understand them.
- text tokenization + image tokenization. Use MaskGIT losses for image tokens.

BY the last neural cell




Share with your friend now:
group-telegram.com/neural_cell/179

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

"The argument from Telegram is, 'You should trust us because we tell you that we're trustworthy,'" Maréchal said. "It's really in the eye of the beholder whether that's something you want to buy into." Stocks closed in the red Friday as investors weighed upbeat remarks from Russian President Vladimir Putin about diplomatic discussions with Ukraine against a weaker-than-expected print on U.S. consumer sentiment. "The result is on this photo: fiery 'greetings' to the invaders," the Security Service of Ukraine wrote alongside a photo showing several military vehicles among plumes of black smoke. He adds: "Telegram has become my primary news source." But the Ukraine Crisis Media Center's Tsekhanovska points out that communications are often down in zones most affected by the war, making this sort of cross-referencing a luxury many cannot afford.
from br


Telegram the last neural cell
FROM American