Warning: file_put_contents(aCache/aDaily/post/gradientdude/-349-350-351-349-): Failed to open stream: No space left on device in /var/www/group-telegram/post.php on line 50
Gradient Dude | Telegram Webview: gradientdude/351 -
Telegram Group & Telegram Channel
⚡️SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Following Stable Diffusion 3, my ex-colleagues have published a preprint on SD3 distillation using 4-step, while maintaining quality.

The new method Latent Adversarial Diffusion Distillation (LADD), which is similar to ADD (see post about it in @ai_newz), but with a number of differences:

↪️ Both teacher and student are on a Transformer-based SD3 architecture here.
The biggest and best model has 8B parameters.

↪️Instead of DINOv2 discriminator working on RGB pixels, this article suggests going back to latent space discriminator in order to work faster and burn less memory.

↪️A copy of the teacher is taken as a discriminator (i.e. the discriminator is trained generatively instead of discriminatively, as in the case of DINO). After each attention block, a discriminator head with 2D conv layers that classifies real/fake is added. This way the discriminator looks not only at the final result but at all in-between features, which strengthens the training signal.

↪️Trained on pictures with different aspect ratios, rather than just 1:1 squares.

↪️They removed L2 reconstruction loss between Teacher's and Student's outputs. It's said that a blunt discriminator is enough if you choose the sampling distribution of steps t wisely.

↪️During training, they more frequently sample t with more noise so that the student learns to generate the global structure of objects better.

↪️Distillation is performed on synthetic data which was generated by the teacher, rather than on a photo from a dataset, as was the case in ADD.

It's also been shown that the DPO-LoRA tuning is a pretty nice way to add to the quality of the student's generations.

So, we get SD3-Turbo model producing nice pics in 4 steps. According to a small Human Eval (conducted only on 128 prompts), the student is comparable to the teacher in terms of image quality. But the student's prompt alignment is inferior, which is expected.

📖 Paper

@gradientdude
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/gradientdude/351
Create:
Last Update:

⚡️SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Following Stable Diffusion 3, my ex-colleagues have published a preprint on SD3 distillation using 4-step, while maintaining quality.

The new method Latent Adversarial Diffusion Distillation (LADD), which is similar to ADD (see post about it in @ai_newz), but with a number of differences:

↪️ Both teacher and student are on a Transformer-based SD3 architecture here.
The biggest and best model has 8B parameters.

↪️Instead of DINOv2 discriminator working on RGB pixels, this article suggests going back to latent space discriminator in order to work faster and burn less memory.

↪️A copy of the teacher is taken as a discriminator (i.e. the discriminator is trained generatively instead of discriminatively, as in the case of DINO). After each attention block, a discriminator head with 2D conv layers that classifies real/fake is added. This way the discriminator looks not only at the final result but at all in-between features, which strengthens the training signal.

↪️Trained on pictures with different aspect ratios, rather than just 1:1 squares.

↪️They removed L2 reconstruction loss between Teacher's and Student's outputs. It's said that a blunt discriminator is enough if you choose the sampling distribution of steps t wisely.

↪️During training, they more frequently sample t with more noise so that the student learns to generate the global structure of objects better.

↪️Distillation is performed on synthetic data which was generated by the teacher, rather than on a photo from a dataset, as was the case in ADD.

It's also been shown that the DPO-LoRA tuning is a pretty nice way to add to the quality of the student's generations.

So, we get SD3-Turbo model producing nice pics in 4 steps. According to a small Human Eval (conducted only on 128 prompts), the student is comparable to the teacher in terms of image quality. But the student's prompt alignment is inferior, which is expected.

📖 Paper

@gradientdude

BY Gradient Dude






Share with your friend now:
group-telegram.com/gradientdude/351

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

The channel appears to be part of the broader information war that has developed following Russia's invasion of Ukraine. The Kremlin has paid Russian TikTok influencers to push propaganda, according to a Vice News investigation, while ProPublica found that fake Russian fact check videos had been viewed over a million times on Telegram. However, the perpetrators of such frauds are now adopting new methods and technologies to defraud the investors. But because group chats and the channel features are not end-to-end encrypted, Galperin said user privacy is potentially under threat. And indeed, volatility has been a hallmark of the market environment so far in 2022, with the S&P 500 still down more than 10% for the year-to-date after first sliding into a correction last month. The CBOE Volatility Index, or VIX, has held at a lofty level of more than 30. Soloviev also promoted the channel in a post he shared on his own Telegram, which has 580,000 followers. The post recommended his viewers subscribe to "War on Fakes" in a time of fake news.
from us


Telegram Gradient Dude
FROM American