Telegram Group & Telegram Channel
⚡️SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Following Stable Diffusion 3, my ex-colleagues have published a preprint on SD3 distillation using 4-step, while maintaining quality.

The new method Latent Adversarial Diffusion Distillation (LADD), which is similar to ADD (see post about it in @ai_newz), but with a number of differences:

↪️ Both teacher and student are on a Transformer-based SD3 architecture here.
The biggest and best model has 8B parameters.

↪️Instead of DINOv2 discriminator working on RGB pixels, this article suggests going back to latent space discriminator in order to work faster and burn less memory.

↪️A copy of the teacher is taken as a discriminator (i.e. the discriminator is trained generatively instead of discriminatively, as in the case of DINO). After each attention block, a discriminator head with 2D conv layers that classifies real/fake is added. This way the discriminator looks not only at the final result but at all in-between features, which strengthens the training signal.

↪️Trained on pictures with different aspect ratios, rather than just 1:1 squares.

↪️They removed L2 reconstruction loss between Teacher's and Student's outputs. It's said that a blunt discriminator is enough if you choose the sampling distribution of steps t wisely.

↪️During training, they more frequently sample t with more noise so that the student learns to generate the global structure of objects better.

↪️Distillation is performed on synthetic data which was generated by the teacher, rather than on a photo from a dataset, as was the case in ADD.

It's also been shown that the DPO-LoRA tuning is a pretty nice way to add to the quality of the student's generations.

So, we get SD3-Turbo model producing nice pics in 4 steps. According to a small Human Eval (conducted only on 128 prompts), the student is comparable to the teacher in terms of image quality. But the student's prompt alignment is inferior, which is expected.

📖 Paper

@gradientdude
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/gradientdude/351
Create:
Last Update:

⚡️SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Following Stable Diffusion 3, my ex-colleagues have published a preprint on SD3 distillation using 4-step, while maintaining quality.

The new method Latent Adversarial Diffusion Distillation (LADD), which is similar to ADD (see post about it in @ai_newz), but with a number of differences:

↪️ Both teacher and student are on a Transformer-based SD3 architecture here.
The biggest and best model has 8B parameters.

↪️Instead of DINOv2 discriminator working on RGB pixels, this article suggests going back to latent space discriminator in order to work faster and burn less memory.

↪️A copy of the teacher is taken as a discriminator (i.e. the discriminator is trained generatively instead of discriminatively, as in the case of DINO). After each attention block, a discriminator head with 2D conv layers that classifies real/fake is added. This way the discriminator looks not only at the final result but at all in-between features, which strengthens the training signal.

↪️Trained on pictures with different aspect ratios, rather than just 1:1 squares.

↪️They removed L2 reconstruction loss between Teacher's and Student's outputs. It's said that a blunt discriminator is enough if you choose the sampling distribution of steps t wisely.

↪️During training, they more frequently sample t with more noise so that the student learns to generate the global structure of objects better.

↪️Distillation is performed on synthetic data which was generated by the teacher, rather than on a photo from a dataset, as was the case in ADD.

It's also been shown that the DPO-LoRA tuning is a pretty nice way to add to the quality of the student's generations.

So, we get SD3-Turbo model producing nice pics in 4 steps. According to a small Human Eval (conducted only on 128 prompts), the student is comparable to the teacher in terms of image quality. But the student's prompt alignment is inferior, which is expected.

📖 Paper

@gradientdude

BY Gradient Dude






Share with your friend now:
group-telegram.com/gradientdude/351

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

Oleksandra Matviichuk, a Kyiv-based lawyer and head of the Center for Civil Liberties, called Durov’s position "very weak," and urged concrete improvements. "The result is on this photo: fiery 'greetings' to the invaders," the Security Service of Ukraine wrote alongside a photo showing several military vehicles among plumes of black smoke. Apparently upbeat developments in Russia's discussions with Ukraine helped at least temporarily send investors back into risk assets. Russian President Vladimir Putin said during a meeting with his Belarusian counterpart Alexander Lukashenko that there were "certain positive developments" occurring in the talks with Ukraine, according to a transcript of their meeting. Putin added that discussions were happening "almost on a daily basis." Additionally, investors are often instructed to deposit monies into personal bank accounts of individuals who claim to represent a legitimate entity, and/or into an unrelated corporate account. To lend credence and to lure unsuspecting victims, perpetrators usually claim that their entity and/or the investment schemes are approved by financial authorities. Telegram Messenger Blocks Navalny Bot During Russian Election
from ye


Telegram Gradient Dude
FROM American