Additional Comparison between Single- and Two-Stage SDXL pipeline

Written by synthesizing | Published 2024/10/04
Tech Story Tags: open-source-ai | latent-diffusion-model | text-to-image-synthesis | stable-diffusion | deep-generative-modeling | sdxl | pixel-space-models | ai-architecture

TLDRvia the TL;DR App

Authors:

(1) Dustin Podell, Stability AI, Applied Research;

(2) Zion English, Stability AI, Applied Research;

(3) Kyle Lacey, Stability AI, Applied Research;

(4) Andreas Blattmann, Stability AI, Applied Research;

(5) Tim Dockhorn, Stability AI, Applied Research;

(6) Jonas Müller, Stability AI, Applied Research;

(7) Joe Penna, Stability AI, Applied Research;

(8) Robin Rombach, Stability AI, Applied Research.

Table of Links

Abstract and 1 Introduction

2 Improving Stable Diffusion

2.1 Architecture & Scale

2.2 Micro-Conditioning

2.3 Multi-Aspect Training

2.4 Improved Autoencoder and 2.5 Putting Everything Together

3 Future Work

Appendix

A Acknowledgements

B Limitations

C Diffusion Models

D Comparison to the State of the Art

E Comparison to Midjourney v5.1

F On FID Assessment of Generative Text-Image Foundation Models

G Additional Comparison between Single- and Two-Stage SDXL pipeline

References

G Additional Comparison between Single- and Two-Stage SDXL pipeline

H Comparison between SD 1.5 vs. SD 2.1 vs. SDXL

I Multi-Aspect Training Hyperparameters

We use the following image resolutions for mixed-aspect ratio finetuning as described in Sec. 2.3.

J Pseudo-code for Conditioning Concatenation along the Channel Axis

This paper is available on arxiv under CC BY 4.0 DEED license.


Written by synthesizing | Synthesizing weaves diverse perspectives into innovative solutions.
Published by HackerNoon on 2024/10/04