Generating Diverse High-fidelity Images With VQ-VAE-2

skip to main content
  • Contents
    • Abstract
    • References
    • Information & Contributors
    • Bibliometrics & Citations
    • View Options
    • References
    • Media
    • Tables
    • Share

Abstract

We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.

References

[1]Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. Generative adversarial networks for extreme learned image compression. CoRR, abs/1804.02958, 2018.Google Scholar[2]Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, and Augustus Odena. Discriminator rejection sampling. In International Conference on Learning Representations, 2019.Google Scholar[3]Shane Barratt and Rishi Sharma. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.Google Scholar[4]M. Bauer and A. Mnih. Resampled priors for variational autoencoders. In 22nd International Conference on Artificial Intelligence and Statistics, April 2019.Google Scholar[5]Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.Google Scholar[6]Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational Lossy Autoencoder. In Iclr, pages 1-14, nov 2016.Google Scholar[7]Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, and Pieter Abbeel. PixelSNAIL: An Improved Autoregres-sive Generative Model. pages 12-17, 2017.Google Scholar[8]Sander Dieleman, Aaron van den Oord, and Karen Simonyan. The challenge of realistic music generation: modelling raw audio at scale. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 7989-7999. Curran Associates, Inc., 2018.Google Scholar[9]Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.Google Scholar[10]Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.Google Scholar[11]Jeffrey De Fauw, Sander Dieleman, and Karen Simonyan. Hierarchical autoregressive image models with auxiliary decoders. CoRR, abs/1903.04933, 2019.Google Scholar[12]Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672-2680, 2014.Digital LibraryGoogle Scholar[13]Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6626-6637. Curran Associates, Inc., 2017.Digital LibraryGoogle Scholar[14]Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018.Google Scholar[15]Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018.Google Scholar[16]Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.Google Scholar[17]Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pages 10236-10245, 2018.Google Scholar[18]Alexander Kolesnikov and Christoph H Lampert. Pixelcnn models with auxiliary variables for natural image modeling. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1905-1914. JMLR. org, 2017.Digital LibraryGoogle Scholar[19]Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. CoRR, abs/1904.06991, 2019.Google Scholar[20]Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 29-37, 2011.Google Scholar[21]Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016.Google Scholar[22]Jacob Menick and Nal Kalchbrenner. Generating high fidelity images with subscale pixel networks and multidimensional upscaling. In International Conference on Learning Representations, 2019.Google Scholar[23]Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. Practical full resolution learned lossless image compression. CoRR, abs/1811.12817, 2018.Google Scholar[24]David Minnen, Johannes Ballé, and George D Toderici. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems 31, pages 10771-10780. 2018.Google Scholar[25]Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.Google Scholar[26]Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image Transformer. 2018.Google Scholar[27]Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.Google Scholar[28]Suman Ravuri and Oriol Vinyals. Classification accuracy score for conditional generative models. arXiv preprint arXiv:1905.10887, 2019.Google Scholar[29]Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Yutian Chen, Dan Belov, and Nando de Freitas. Parallel multiscale autoregressive density estimation. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2912-2921. JMLR. org, 2017.Google Scholar[30]Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770, 2015.Google Scholar[31]Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. 32, 2014.Google Scholar[32]Mehdi SM Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. Assessing generative models via precision and recall. In Advances in Neural Information Processing Systems, pages 5234-5243, 2018.Google Scholar[33]Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234-2242, 2016.Digital LibraryGoogle Scholar[34]Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. Improved techniques for training gans. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2234-2242. Curran Associates, Inc., 2016.Digital LibraryGoogle Scholar[35]Shibani Santurkar, David M. Budden, and Nir Shavit. Generative compression. CoRR, abs/1703.01467, 2017.Google Scholar[36]Yaniv Taigman, Adam Polyak, and Lior Wolf. Unsupervised cross-domain image generation. CoRR, abs/1611.02200, 2016.Google Scholar[37]L. Theis, A. van den Oord, and M. Bethge. A note on the evaluation of generative models. In International Conference on Learning Representations, Apr 2016.Google Scholar[38]Lucas Theis and Matthias Bethge. Generative image modeling using spatial lstms. In Advances in Neural Information Processing Systems, pages 1927-1935, 2015.Digital LibraryGoogle Scholar[39]Aäron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. CoRR, abs/1601.06759, 2016.Google Scholar[40]Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel Recurrent Neural Networks. In International Conference on Machine Learning, volume 48, pages 1747-1756, 2016.Google Scholar[41]Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. CoRR, abs/1711.00937, 2017.Google Scholar[42]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. (Nips), 2017.Google Scholar[43]Gregory K Wallace. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 38(1):xviii–xxxiv, 1992.Digital LibraryGoogle Scholar[44]Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.CrossrefGoogle Scholar

Cited By

View all
  • Saseendran ASkubch KFalkner SKeuper MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Trading off image quality for robustness is not necessary with regularized deterministic autoencodersProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602210(26751-26763)Online publication date: 28-Nov-2022https://dl.acm.org/doi/10.5555/3600270.3602210
  • Gao LWu TYuan YLin MLai YZhang H(2021)TM-NETACM Transactions on Graphics10.1145/3478513.348050340:6(1-15)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480503
  • Ren JZhang BWu BHuang JFan LOvsjanikov MWonka P(2021)Intuitive and efficient roof modeling for reconstruction and synthesisACM Transactions on Graphics10.1145/3478513.348049440:6(1-17)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480494
  • Show More Cited By

Index Terms

  1. Generating diverse high-fidelity images with VQ-VAE-2
Index terms have been assigned to the content through auto-classification.

Recommendations

  • TransVQ-VAE: Generating Diverse Images Using Hierarchical Representation Learning

    Artificial Neural Networks and Machine Learning – ICANN 2023 Abstract

    Understanding how to learn feature representations for images and generate high-quality images under unsupervised learning was challenging. One of the main difficulties in feature learning has been the problem of posterior collapse in variational ...

    Read More
  • Joint Iterative Decoding of Trellis-Based VQ and TCM

    A joint video and channel coded system employing an iteratively decoded serial concatenation of a vector quantization (VQ) based video codec and a trellis-coded modulation (TCM) scheme is proposed. The video codec imposes VQ-induced code constraints, ...

    Read More
  • On high-rate full-diversity 2 × 2 space-time codes with low-complexity optimum detection

    The 2 × 2 MIMO profiles included in Mobile WiMAX specifications are Alamouti's space-time code (STC) for transmit diversity and spatial multiplexing (SM). The former has full diversity and the latter has full rate, but neither of them has both of these ...

    Read More

Comments

Information

Published In

cover image Guide ProceedingsNIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing SystemsDecember 201915947 pages
  • Editors:
  • Hanna M. Wallach,
  • Hugo Larochelle,
  • Alina Beygelzimer,
  • Florence d'Alché-Buc,
  • Emily B. Fox
Copyright © 2019 Neural Information Processing Systems Foundation, Inc.

In-Cooperation

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 08 December 2019

Qualifiers

  • Chapter
  • Research
  • Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics

Article Metrics

  • 4Total CitationsView Citations
  • 201Total Downloads
  • Downloads (Last 12 months)128
  • Downloads (Last 6 weeks)24
Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all
  • Saseendran ASkubch KFalkner SKeuper MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Trading off image quality for robustness is not necessary with regularized deterministic autoencodersProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602210(26751-26763)Online publication date: 28-Nov-2022https://dl.acm.org/doi/10.5555/3600270.3602210
  • Gao LWu TYuan YLin MLai YZhang H(2021)TM-NETACM Transactions on Graphics10.1145/3478513.348050340:6(1-15)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480503
  • Ren JZhang BWu BHuang JFan LOvsjanikov MWonka P(2021)Intuitive and efficient roof modeling for reconstruction and synthesisACM Transactions on Graphics10.1145/3478513.348049440:6(1-17)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480494
  • Jabbar ALi XOmar B(2021)A Survey on Generative Adversarial Networks: Variants, Applications, and TrainingACM Computing Surveys10.1145/346347554:8(1-49)Online publication date: 4-Oct-2021https://dl.acm.org/doi/10.1145/3463475

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Figures

Other

Share

Share this Publication link

Copy Link

Copied!

Copying failed.

Share on social media

XLinkedInRedditFacebookemail

Affiliations

Ali RazaviDeepMindView ProfileAäron van den OordDeepMindView ProfileOriol VinyalsDeepMindView ProfileDownload PDF Go toGo toShow all referencesRequest permissionsExpand All Collapse Expand TableAuthors Info & Affiliations View Table of Contents

Export Citations

Select Citation formatBibTeXEndNoteACM Ref
  • Please download or close your previous search result export first before starting a new bulk export.Preview is not available.By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.Download
    • Download citation
    • Copy citation
Your Search Results Download Request

We are preparing your search results for download ...

We will inform you here when the file is ready.

Download now!Your Search Results Download Request

Your file of search results citations is now ready.

Download now!Your Search Results Download Request

Your search export query has expired. Please try again.

Từ khóa » Vq Vae Image Compression