Researchers Develop Defenses Against Deep Learning Hack Attacks

Có thể bạn quan tâm

The Devil is in the GAN: Defending Deep Generative Models Against Adversarial Attacks Date: Thursday, August 5 Time: 10:20-11 US Pacific Time (Virtual) Register Here ($)

Killian Levacher | Research Staff Member, IBM Research Europe
Ambrish Rawat | Research Staff Member, IBM Research Europe
Mathieu Sinn | Research Staff Member, IBM Research Europe

Just like anything else in computing, deep learning can be hacked.

Attackers can compromise the integrity of deep learning models at training or at runtime, steal proprietary information from deployed models, or even unveil sensitive personal information contained in the training data. Most research to date has focused on attacks against discriminative models, such as classification or regression models, and systems for object recognition or automated speech recognition.

But we’ve decided to focus on something else.

Our team has discovered new threats and developed defenses for a different type of AI models called deep generative models (DGMs). Getting rapidly adopted in industry and science applications, DGMs are an emerging AI tech capable of synthesizing data from complex, high-dimensional manifolds—be they images, text, music, or molecular structures. Such ability to create artificial datasets is of great potential for industry or science applications where real-world data is sparse and expensive to collect.

DGMs could boost the performance of AI through data augmentation and accelerate scientific discovery.

One popular type of DGM model is Generative Adversarial Networks (GANs). In the paper, “The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks,”1 that we’re presenting at Black Hat USA 2021, we describe a threat against such models that hasn't been considered before. We also provide practical guidance for defending against it. Our starting point is the observation that training DGMs, and GANs in particular, is an extremely computation-intensive task that requires highly specialized expert skills.

Watch this video on YouTube.

For this reason, we anticipate many companies to source trained GANs from potentially untrusted third parties, such as downloading them from open source repositories containing pre-trained GANs. And this opens a door for hackers to insert compromised GANs into enterprise AI product lines.

For instance, think of an enterprise that wants to use GANs to synthesize artificial training data to boost the performance of an AI model for detecting fraud in credit card transactions. Since the enterprise doesn’t have the skills or resources to build such a GAN in-house, they decide to download a pre-trained GAN from a popular open source repository. Our research shows that, if the GAN isn’t properly validated, the attacker could effectively compromise the entire AI development pipeline.

Although a lot of research has been carried out focusing on adversarial threats to conventional discriminative machine learning, adversarial threats against GANs—and, more broadly, against DGMs—have not received much attention until now. Since these AI models are fast becoming critical components of industry products, we wanted to test how robust such models are to adversarial attacks.

Watch this video on YouTube.

Mimicking “normal” behavior

Training GANs is notoriously difficult. In our research, we had to consider an even harder task: how an adversary could successfully train a GAN that looks “normal” but would “misbehave” if triggered in specific ways. Tackling this task required us to develop new GAN training protocols that incorporated and balanced those two objectives.

To achieve this, we looked at three types of ways of creating such attacks. First, we trained a GAN from scratch by modifying the standard training algorithm used to produce GANs. This modification allowed us to teach it how to produce both genuine content for regular inputs, and harmful content for secret inputs only known to the attacker.

The second approach involved taking an existing GAN and producing a malicious clone by mimicking the behavior of the original one—and while doing so, making it generate malicious content for secret attacker triggers.

Finally, in the third approach, we expanded the number of neural networks of an existing GAN and trained them to convert benign content into harmful content when a secret attacker trigger is detected.

Investigating not just one but several ways in which such an attack could be produced allowed us to explore a range of attacks. We looked at attacks that could be performed depending on the level of access (whitebox/blackbox access) an attacker could have over a given model.

Each of these three attack types was successful on state-of-the-art DGMs—an important discovery as it exposes multiple entry points by which an attacker could harm an organization.

Defense strategies

To protect DGMs against this new type of attacks, we propose and analyze several defense strategies. These can be broadly categorized as to whether they enable a potential victim to “detect” such attacks, or whether they enable a victim to mitigate the effects of an attack by “sanitizing” corrupted models.

Regarding the first category of defenses, one can attempt to detect such attacks by scrutinizing the components of a potentially corrupt model before being active—and while it's being used to generate content. Another way of detecting such attacks involves a range of techniques inspecting the outputs of such a model with various degrees of automation and analysis.

Regarding the second category of defenses, it’s possible to use techniques that get a DGM to unlearn undesired behaviors of a model. These can consist in either extending the training of a potentially corrupt model and forcing it to produce benign samples for a wide range of inputs, or by reducing its size—and thus reducing its ability to produce samples beyond the range of what is expected.

We hope the defenses we propose are incorporated in all AI product pipelines relying on generative models sourced from potentially unvalidated third parties.

For example, an AI company would need to show due diligence and assert guarantee that any generative model used within their development pipeline has been tested against any potential tampering by an adversary.

We plan to contribute our technology—the tools for testing and defending DGMs against the novel threat we discovered—to the Linux Foundation as part of the Adversarial Robustness Toolbox. (For now, sample code and a demonstration of our devil-in-GAN can be accessed via GitHub.)

We are also planning to develop a cloud service for developers to check potentially corrupted downloaded models before they are propagated in an application or service.

Learn more about:

Data and AI Security: As organizations move to the hybrid cloud, they must protect sensitive data and comply with regulations that allow them to take advantage of AI.

Subscribe to our Future Forward newsletter and stay up to date on the latest research newsSubscribe to our newsletter

Home
Blog

Date

05 Aug 2021

Authors

Ambrish Rawat
Killian Levacher
Mathieu Sinn

Topics

Adversarial Robustness and Privacy
Data and AI Security
Generative AI
Security
Trustworthy AI

References

Rawat, A., Levacher, K., Sinn, M. The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks. arXiv. (2021). ↩

IBM researchers win prestigious European grants
NewsPeter Hess and Mike Murphy04 Sep 2025
- AI Hardware
- Hardware Technology
- Quantum Safe
- Security
- Semiconductors
IBM is donating its CBOM toolset to the Linux Foundation
NewsMariana Rajado Silva, Nicklas Körtge, and Andreas Schade23 Jun 2025
- Cryptography
- Security
Transitioning to quantum-safe communication: Adding Q-safe preference to OpenSSL TLSv1.3
Technical noteMartin Schmatz and David Kelsey16 Apr 2025
- Quantum
- Quantum Safe
- Security
Managing cryptography with CBOMkit
Technical noteNicklas Körtge, Gero Dittmann, and Silvio Dragone06 Nov 2024
- Cryptography
- Quantum Safe
- Security

Từ khóa » Hack Gán

Researchers Develop Defenses Against Deep Learning Hack Attacks

Mimicking “normal” behavior

Defense strategies

Learn more about:

Date

Authors

Topics

Share

References

Related posts

IBM researchers win prestigious European grants

IBM is donating its CBOM toolset to the Linux Foundation

Transitioning to quantum-safe communication: Adding Q-safe preference to OpenSSL TLSv1.3

Managing cryptography with CBOMkit

Gan Bezel - Hack//Wiki - Fandom

Thủ Phạm Hack Game NFT Axie Infinity Chiếm đoạt Gần ... - CellphoneS

[PDF] GAN-Aimbots: Using Machine Learning For Cheating In First Person ...

What The Hack Is GAN?. Generative Adversarial Networks Aka…

GaN | Hackaday

Personal Information Of 1 Billion People In China Allegedly Stolen In ...

Hack Gần 4.000 Tài Khoản Ngân Hàng, Facebook để Lừa đảo, Chiếm ...

Quần Suông Hack Dáng Vải Gân Tăm Siêu Mát Lạnh-230 - Shopee

Soumith/ganhacks: Starter From "How To Train A GAN?" At NIPS2016

Best Aoi đu 2 Dây Hack Gần 20 Mạng Gánh Team Chờ Siêu Phẩm

Bản Hack NNID Của Nintendo Lớn Gần Gấp đôi So Với Báo Cáo đầu Tiên

Liên Hệ