Generating Diverse High-fidelity Images With VQ-VAE-2
Có thể bạn quan tâm
- Contents
- Abstract
- References
- Information & Contributors
- Bibliometrics & Citations
- View Options
- References
- Media
- Tables
- Share
Abstract
We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.References
[1]Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. Generative adversarial networks for extreme learned image compression. CoRR, abs/1804.02958, 2018.Google Scholar[2]Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, and Augustus Odena. Discriminator rejection sampling. In International Conference on Learning Representations, 2019.Google Scholar[3]Shane Barratt and Rishi Sharma. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.Google Scholar[4]M. Bauer and A. Mnih. Resampled priors for variational autoencoders. In 22nd International Conference on Artificial Intelligence and Statistics, April 2019.Google Scholar[5]Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.Google Scholar[6]Xi Chen, Diederik P Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. Variational Lossy Autoencoder. In Iclr, pages 1-14, nov 2016.Google Scholar[7]Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, and Pieter Abbeel. PixelSNAIL: An Improved Autoregres-sive Generative Model. pages 12-17, 2017.Google Scholar[8]Sander Dieleman, Aaron van den Oord, and Karen Simonyan. The challenge of realistic music generation: modelling raw audio at scale. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 7989-7999. Curran Associates, Inc., 2018.Google Scholar[9]Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.Google Scholar[10]Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.Google Scholar[11]Jeffrey De Fauw, Sander Dieleman, and Karen Simonyan. Hierarchical autoregressive image models with auxiliary decoders. CoRR, abs/1903.04933, 2019.Google Scholar[12]Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672-2680, 2014.Digital LibraryGoogle Scholar[13]Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6626-6637. Curran Associates, Inc., 2017.Digital LibraryGoogle Scholar[14]Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018.Google Scholar[15]Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018.Google Scholar[16]Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.Google Scholar[17]Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pages 10236-10245, 2018.Google Scholar[18]Alexander Kolesnikov and Christoph H Lampert. Pixelcnn models with auxiliary variables for natural image modeling. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1905-1914. JMLR. org, 2017.Digital LibraryGoogle Scholar[19]Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. CoRR, abs/1904.06991, 2019.Google Scholar[20]Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 29-37, 2011.Google Scholar[21]Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016.Google Scholar[22]Jacob Menick and Nal Kalchbrenner. Generating high fidelity images with subscale pixel networks and multidimensional upscaling. In International Conference on Learning Representations, 2019.Google Scholar[23]Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. Practical full resolution learned lossless image compression. CoRR, abs/1811.12817, 2018.Google Scholar[24]David Minnen, Johannes Ballé, and George D Toderici. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems 31, pages 10771-10780. 2018.Google Scholar[25]Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.Google Scholar[26]Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. Image Transformer. 2018.Google Scholar[27]Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.Google Scholar[28]Suman Ravuri and Oriol Vinyals. Classification accuracy score for conditional generative models. arXiv preprint arXiv:1905.10887, 2019.Google Scholar[29]Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Yutian Chen, Dan Belov, and Nando de Freitas. Parallel multiscale autoregressive density estimation. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2912-2921. JMLR. org, 2017.Google Scholar[30]Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770, 2015.Google Scholar[31]Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. 32, 2014.Google Scholar[32]Mehdi SM Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. Assessing generative models via precision and recall. In Advances in Neural Information Processing Systems, pages 5234-5243, 2018.Google Scholar[33]Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234-2242, 2016.Digital LibraryGoogle Scholar[34]Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. Improved techniques for training gans. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2234-2242. Curran Associates, Inc., 2016.Digital LibraryGoogle Scholar[35]Shibani Santurkar, David M. Budden, and Nir Shavit. Generative compression. CoRR, abs/1703.01467, 2017.Google Scholar[36]Yaniv Taigman, Adam Polyak, and Lior Wolf. Unsupervised cross-domain image generation. CoRR, abs/1611.02200, 2016.Google Scholar[37]L. Theis, A. van den Oord, and M. Bethge. A note on the evaluation of generative models. In International Conference on Learning Representations, Apr 2016.Google Scholar[38]Lucas Theis and Matthias Bethge. Generative image modeling using spatial lstms. In Advances in Neural Information Processing Systems, pages 1927-1935, 2015.Digital LibraryGoogle Scholar[39]Aäron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. CoRR, abs/1601.06759, 2016.Google Scholar[40]Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel Recurrent Neural Networks. In International Conference on Machine Learning, volume 48, pages 1747-1756, 2016.Google Scholar[41]Aäron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. CoRR, abs/1711.00937, 2017.Google Scholar[42]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. (Nips), 2017.Google Scholar[43]Gregory K Wallace. The jpeg still picture compression standard. IEEE transactions on consumer electronics, 38(1):xviii–xxxiv, 1992.Digital LibraryGoogle Scholar[44]Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.CrossrefGoogle ScholarCited By
View all- Saseendran ASkubch KFalkner SKeuper MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Trading off image quality for robustness is not necessary with regularized deterministic autoencodersProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602210(26751-26763)Online publication date: 28-Nov-2022https://dl.acm.org/doi/10.5555/3600270.3602210
- Gao LWu TYuan YLin MLai YZhang H(2021)TM-NETACM Transactions on Graphics10.1145/3478513.348050340:6(1-15)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480503
- Ren JZhang BWu BHuang JFan LOvsjanikov MWonka P(2021)Intuitive and efficient roof modeling for reconstruction and synthesisACM Transactions on Graphics10.1145/3478513.348049440:6(1-17)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480494
- Show More Cited By
Index Terms
- Generating diverse high-fidelity images with VQ-VAE-2
Recommendations
TransVQ-VAE: Generating Diverse Images Using Hierarchical Representation Learning
Artificial Neural Networks and Machine Learning – ICANN 2023 AbstractUnderstanding how to learn feature representations for images and generate high-quality images under unsupervised learning was challenging. One of the main difficulties in feature learning has been the problem of posterior collapse in variational ...
Read MoreJoint Iterative Decoding of Trellis-Based VQ and TCM
A joint video and channel coded system employing an iteratively decoded serial concatenation of a vector quantization (VQ) based video codec and a trellis-coded modulation (TCM) scheme is proposed. The video codec imposes VQ-induced code constraints, ...
Read MoreOn high-rate full-diversity 2 × 2 space-time codes with low-complexity optimum detection
The 2 × 2 MIMO profiles included in Mobile WiMAX specifications are Alamouti's space-time code (STC) for transmit diversity and spatial multiplexing (SM). The former has full diversity and the latter has full rate, but neither of them has both of these ...
Read More
Comments
Information
Published In
NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing SystemsDecember 201915947 pages- Editors:
- Hanna M. Wallach,
- Hugo Larochelle,
- Alina Beygelzimer,
- Florence d'Alché-Buc,
- Emily B. Fox
In-Cooperation
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Published: 08 December 2019Qualifiers
- Chapter
- Research
- Refereed limited
Contributors
Other Metrics
View Article MetricsBibliometrics
Article Metrics
- 4Total CitationsView Citations
- 201Total Downloads
- Downloads (Last 12 months)128
- Downloads (Last 6 weeks)24
Other Metrics
View Author MetricsCitations
Cited By
View all- Saseendran ASkubch KFalkner SKeuper MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Trading off image quality for robustness is not necessary with regularized deterministic autoencodersProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602210(26751-26763)Online publication date: 28-Nov-2022https://dl.acm.org/doi/10.5555/3600270.3602210
- Gao LWu TYuan YLin MLai YZhang H(2021)TM-NETACM Transactions on Graphics10.1145/3478513.348050340:6(1-15)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480503
- Ren JZhang BWu BHuang JFan LOvsjanikov MWonka P(2021)Intuitive and efficient roof modeling for reconstruction and synthesisACM Transactions on Graphics10.1145/3478513.348049440:6(1-17)Online publication date: 10-Dec-2021https://dl.acm.org/doi/10.1145/3478513.3480494
- Jabbar ALi XOmar B(2021)A Survey on Generative Adversarial Networks: Variants, Applications, and TrainingACM Computing Surveys10.1145/346347554:8(1-49)Online publication date: 4-Oct-2021https://dl.acm.org/doi/10.1145/3463475
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign inFigures
Other
Share
Share this Publication link
Copy LinkCopied!
Copying failed.
Share on social media
XLinkedInRedditFacebookemailAffiliations
Ali RazaviDeepMindView ProfileAäron van den OordDeepMindView ProfileOriol VinyalsDeepMindView ProfileDownload PDF Go toGo toShow all referencesRequest permissionsExpand All Collapse Expand TableAuthors Info & Affiliations View Table of ContentsExport Citations
Select Citation formatBibTeXEndNoteACM Ref- Please download or close your previous search result export first before starting a new bulk export.Preview is not available.By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.Download
- Download citation
- Copy citation
We are preparing your search results for download ...
We will inform you here when the file is ready.
Download now!Your Search Results Download RequestYour file of search results citations is now ready.
Download now!Your Search Results Download RequestYour search export query has expired. Please try again.
Từ khóa » Vq Vae Image Compression
-
Fred's PhD Thesis Project: VQ VAE 2 Image Compression - GitHub
-
[PDF] Hierarchical Quantized Autoencoders - ArXiv
-
[PDF] Generating Diverse High-Fidelity Images With VQ ... - NIPS Papers
-
Generating Diverse High-Fidelity Images With VQ ... - NIPS Papers
-
Understanding VQ-VAE (DALL-E Explained Pt. 1) - ML@B Blog
-
Hierarchical Quantized Autoencoders - Papers With Code
-
[PDF] Generating Diverse High-Fidelity Images With VQ-VAE-2
-
[PDF] Soft Then Hard: Rethinking The Quantization In Neural Image ...
-
Almost Any Image Is Only 8k Vectors | By Ajit Rajasekharan
-
[PDF] ONLINE LEARNED CONTINUAL COMPRESSION WITH STACKED ...
-
Generating Diverse High-Resolution Images With VQ-VAE
-
[PDF] PILC: Practical Image Lossless Compression With An End-to-End ...
-
Standard Compression VQ-VAE And Its Reconstruction. Radiologists...
-
[PDF] Vector Quantization-Based Regularization For Autoencoders