Telegram Group & Telegram Channel
⚡️SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Following Stable Diffusion 3, my ex-colleagues have published a preprint on SD3 distillation using 4-step, while maintaining quality.

The new method Latent Adversarial Diffusion Distillation (LADD), which is similar to ADD (see post about it in @ai_newz), but with a number of differences:

↪️ Both teacher and student are on a Transformer-based SD3 architecture here.
The biggest and best model has 8B parameters.

↪️Instead of DINOv2 discriminator working on RGB pixels, this article suggests going back to latent space discriminator in order to work faster and burn less memory.

↪️A copy of the teacher is taken as a discriminator (i.e. the discriminator is trained generatively instead of discriminatively, as in the case of DINO). After each attention block, a discriminator head with 2D conv layers that classifies real/fake is added. This way the discriminator looks not only at the final result but at all in-between features, which strengthens the training signal.

↪️Trained on pictures with different aspect ratios, rather than just 1:1 squares.

↪️They removed L2 reconstruction loss between Teacher's and Student's outputs. It's said that a blunt discriminator is enough if you choose the sampling distribution of steps t wisely.

↪️During training, they more frequently sample t with more noise so that the student learns to generate the global structure of objects better.

↪️Distillation is performed on synthetic data which was generated by the teacher, rather than on a photo from a dataset, as was the case in ADD.

It's also been shown that the DPO-LoRA tuning is a pretty nice way to add to the quality of the student's generations.

So, we get SD3-Turbo model producing nice pics in 4 steps. According to a small Human Eval (conducted only on 128 prompts), the student is comparable to the teacher in terms of image quality. But the student's prompt alignment is inferior, which is expected.

📖 Paper

@gradientdude
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/gradientdude/349
Create:
Last Update:

⚡️SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Following Stable Diffusion 3, my ex-colleagues have published a preprint on SD3 distillation using 4-step, while maintaining quality.

The new method Latent Adversarial Diffusion Distillation (LADD), which is similar to ADD (see post about it in @ai_newz), but with a number of differences:

↪️ Both teacher and student are on a Transformer-based SD3 architecture here.
The biggest and best model has 8B parameters.

↪️Instead of DINOv2 discriminator working on RGB pixels, this article suggests going back to latent space discriminator in order to work faster and burn less memory.

↪️A copy of the teacher is taken as a discriminator (i.e. the discriminator is trained generatively instead of discriminatively, as in the case of DINO). After each attention block, a discriminator head with 2D conv layers that classifies real/fake is added. This way the discriminator looks not only at the final result but at all in-between features, which strengthens the training signal.

↪️Trained on pictures with different aspect ratios, rather than just 1:1 squares.

↪️They removed L2 reconstruction loss between Teacher's and Student's outputs. It's said that a blunt discriminator is enough if you choose the sampling distribution of steps t wisely.

↪️During training, they more frequently sample t with more noise so that the student learns to generate the global structure of objects better.

↪️Distillation is performed on synthetic data which was generated by the teacher, rather than on a photo from a dataset, as was the case in ADD.

It's also been shown that the DPO-LoRA tuning is a pretty nice way to add to the quality of the student's generations.

So, we get SD3-Turbo model producing nice pics in 4 steps. According to a small Human Eval (conducted only on 128 prompts), the student is comparable to the teacher in terms of image quality. But the student's prompt alignment is inferior, which is expected.

📖 Paper

@gradientdude

BY Gradient Dude






Share with your friend now:
group-telegram.com/gradientdude/349

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

"He has kind of an old-school cyber-libertarian world view where technology is there to set you free," Maréchal said. Channels are not fully encrypted, end-to-end. All communications on a Telegram channel can be seen by anyone on the channel and are also visible to Telegram. Telegram may be asked by a government to hand over the communications from a channel. Telegram has a history of standing up to Russian government requests for data, but how comfortable you are relying on that history to predict future behavior is up to you. Because Telegram has this data, it may also be stolen by hackers or leaked by an internal employee. He said that since his platform does not have the capacity to check all channels, it may restrict some in Russia and Ukraine "for the duration of the conflict," but then reversed course hours later after many users complained that Telegram was an important source of information. Individual messages can be fully encrypted. But the user has to turn on that function. It's not automatic, as it is on Signal and WhatsApp. "For Telegram, accountability has always been a problem, which is why it was so popular even before the full-scale war with far-right extremists and terrorists from all over the world," she told AFP from her safe house outside the Ukrainian capital.
from us


Telegram Gradient Dude
FROM American