Securing AI Systems: From Adversarial Attacks to Regulatory Compliance

Today´s discussion

Securing AI Systems: From Adversarial Attacks to Regulatory Compliance

A Comprehensive Guide to Threats, Regulations and Protection Strategies

The Hidden Dangers of Unsecured AI: Why Security Should Be Your Top Priority

What happens when the very AI systems we trust to make critical decisions become compromised ? Imagine a world where autonomous vehicles suddenly malfunction during rush hour, or medical AI systems deliver deliberately incorrect diagnoses. This isn't science fiction: it's a reality we must confront as AI integration accelerates across industries.

Understanding the AI Security Crisis

The rapid advancement and widespread adoption of AI technology across sectors demands robust security measures like never before. When security breaches occur in AI systems, the consequences can be severe: attackers can manipulate system outputs, steal confidential information, or disrupt critical operations entirely.

The implications of unsecured AI extend far beyond theoretical concerns, leading to substantial financial losses, severe reputational damage and potential physical harm to individuals. Consider healthcare, where compromised medical AI systems could result in dangerous misdiagnoses. Or think about transportation, where adversarial attacks on autonomous vehicles might trigger devastating traffic accidents.

While AI security shares common ground with conventional cybersecurity challenges, it presents unique vulnerabilities that require specialized attention. Traditional cybersecurity focuses primarily on protecting computer networks and systems from attacks. In contrast, AI security must safeguard three critical components: the underlying data, the AI models themselves, and the system outputs. Most concerning is how malicious actors can launch adversarial attacks by exploiting the inherent limitations within AI algorithms themselves.

What is an Adversarial Attack ?

Adversarial attacks represent a sophisticated form of AI system manipulation designed to force models into producing incorrect or potentially harmful results. These attacks exploit the system's vulnerabilities with remarkable subtlety. Attackers can cause significant misclassification by making nearly imperceptible changes to input data, such as slightly altering a few pixels or introducing minimal noise patterns. The goal of these deceptive tactics is clear: to compromise the model's decision-making process, whether for the purpose of causing misclassification or inflicting broader system damage.

Let´s go through some types of adversarial attacks :

> Evasion Attacks

Evasion attacks are the most common types of adversarial attack (80% of reported cases) and constitute a particularly deceptive form of AI manipulation where attackers focus on tricking models into making classification errors. By implementing subtle but strategic changes to input data, it forces misclassification without altering the training process- such as introducing carefully calculated distortions to images. These attacks can successfully fool AI systems into producing wrong outputs while maintaining high confidence levels in their incorrect decisions. This vulnerability demonstrates how seemingly minor alterations can severely compromise an AI model's reliability. Look at this example where a neural network misclassifies an image of a panda as a gibbon when noise or perturbations are added to the input.

Examples:

Physical world attacks on traffic signs
Face recognition bypass
Malware detection evasion

> Data Poisoning

Data poisoning represents a significant threat to AI system integrity (30% of successful attacks on ML systems), operating through multiple sophisticated approaches. Attackers can corrupt training datasets by deliberately mislabeling existing data points or introducing fabricated data designed to skew the model's learning process. While this type of attack requires initial access to the training dataset, its implications are far-reaching. Once successful, data poisoning not only compromises immediate model performance but can also establish hidden backdoors, allowing attackers to maintain long-term control over the system's behavior and manipulate its outputs at will (a backdoor is a clandestine method of sidestepping normal authentication procedures to gain unauthorized access to a system).

Types:

Label flipping (40% of cases)
Clean label poisoning (35%)
Backdoor attacks (25%)

> Model Extraction

Model extraction is a growing threat (35% increase in 2023) and represents a sophisticated form of AI system exploitation where attackers aim to reconstruct and steal proprietary models through systematic reverse engineering. By strategically probing a black-box model with specifically designed inputs and analyzing its responses, attackers can gradually uncover the model's internal mechanics and sensitive operational details. This methodical process not only enables unauthorized duplication of the model's functionality but also exposes it to additional security vulnerabilities. The ultimate goal often extends beyond mere theft, as adversaries can leverage this stolen knowledge for financial exploitation or to launch more targeted attacks against the original system.

Primary targets:

Financial models (45%)
Healthcare systems (30%)
Industrial classifiers (25%)

> Membership Inference

Membership inference attacks are privacy violations where attackers determine if specific data was used to train an AI model. By studying model outputs, attackers can identify whether particular records were in the training dataset - much like detecting fingerprints left on the model's behavior. With success rates reaching 87% on unprotected models and 60% on those with basic protection, these attacks pose significant risks, especially in healthcare (40%), finance (35%), and personal data systems (25%). The attacks are particularly dangerous because they operate silently, making detection difficult while potentially exposing sensitive information about individuals in the training data.

> Model Inversion

Model inversion attacks represent a sophisticated technique where attackers reconstruct the training data by observing a model's outputs and behavior. Like reverse-engineering a recipe from the final dish, these attacks can recreate sensitive training data such as faces from facial recognition systems or patient records from medical diagnostic models. With success rates of 70-90% on simple models and 30-50% on complex ones, these attacks have seen a 50% rise since 2022. The attackers exploit the model's learned patterns and confidence scores to gradually piece together the original training data, essentially turning the model inside out to reveal the private information it was trained on. This poses particular risks in applications handling sensitive data, as attackers can potentially reconstruct identifiable information without ever accessing the original dataset.

> Transfer Attacks

Transfer attacks are sophisticated adversarial attacks where malicious actors develop attack strategies on one AI model and successfully apply them to a different model (even when they have no access to the target model's architecture). Like finding a universal key that opens multiple locks, these attacks exploit common vulnerabilities across different AI systems, achieving success rates of 55-75%. They're particularly effective in computer vision (40%), natural language processing (35%), and audio processing (25%) applications. The power of transfer attacks lies in their ability to breach seemingly unrelated systems - an attack developed on a publicly available model can be used to compromise proprietary systems, making them especially dangerous in real-world applications where attackers can test and perfect their techniques on accessible models before targeting their actual objective.

> Physical World Attacks

Physical World Attacks represent real-world manipulations of objects or environments designed to fool AI systems in their actual operational settings. Unlike digital attacks, these involve tangible modifications - like strategically placing stickers on stop signs to make autonomous vehicles misread them, or wearing specially designed patterns to evade facial recognition systems. With success rates of 60-85% in controlled environments, these attacks predominantly target autonomous vehicles (45%), surveillance systems (30%), and robotics (25%). What makes them particularly dangerous is their persistence in real-world conditions: they must work across different angles, lighting conditions, and distances. Think of them as optical illusions specifically crafted to trick AI systems - while humans might notice something slightly odd, the AI completely misinterprets what it's seeing, potentially leading to dangerous real-world consequences in critical applications like autonomous driving or security systems.

> Supply Chain Attacks

Supply Chain Attacks in AI systems target the complex network of dependencies and components that make up modern AI infrastructure. With a dramatic 75% increase in recent occurrences, these attacks exploit vulnerabilities in the AI development pipeline through 3 main vectors: compromised libraries (45%), dependency attacks (35%), and framework vulnerabilities (20%).
Like poisoning a well that supplies multiple households, attackers inject malicious code or components disguised as legitimate updates or libraries, affecting all downstream systems that use these resources. Their impact is particularly severe because they affect 60% of AI implementations - once a compromised component enters the supply chain, it can spread widely before detection. The attack might begin with a seemingly innocent update to an open-source AI library but can result in widespread system compromises, data breaches, or the creation of backdoors that attackers can exploit later. What makes these attacks especially dangerous is their multiplicative effect - a single successful attack on a widely-used component can compromise thousands of dependent systems simultaneously.

AI vulnerabilities can also be exploited through open-source software and third-party risks:

> Through Open-Source Software

The vulnerabilities of open-source software in AI systems present multiple attack vectors for malicious actors. One critical threat comes through supply-chain attacks, where attackers inject harmful code disguised as legitimate updates or new features into open-source AI libraries. While open-source implies transparency, developers often maintain certain restrictions through licensing agreements, limiting access to specific components. This limitation can drive attackers toward model extraction techniques to gain unauthorized access. Furthermore, even proprietary AI systems aren't immune to these risks. Their common reliance on open-source tools and libraries creates an extensive attack surface that malicious actors can systematically exploit. This interconnected ecosystem of dependencies makes security particularly challenging, as vulnerabilities can propagate through multiple layers of the software stack.

> Through Third-Party

The challenge of managing security risks intensifies when organizations can't effectively monitor or control their third-party partners' governance protocols. When vendors operate with inadequate security measures and protocols, they become potential weak links in the security chain, creating vulnerable entry points for various threats. These vulnerabilities extend beyond simple data breaches - they open doors for sophisticated supply chain attacks and system compromises. The inability to enforce consistent security standards across all third-party relationships creates blind spots in security infrastructure, making comprehensive risk management significantly more challenging and leaving organizations exposed to cascading security failures through their vendor networks.

Which Laws and Policy Regulations Apply for AI Security ?

The AI security governance is defined by both mandatory regulations and voluntary frameworks designed to address emerging security challenges.
Key instruments include the comprehensive NIS2 Directive, the strategic U.S. Executive Order 14110, along with 2 crucial NIST frameworks : the AI Risk Management Framework (AI RMF) and the Cybersecurity Framework.
These tools collectively establish essential guidelines and requirements for organizations to effectively manage and mitigate AI security risks throughout their systems' lifecycle.

Let´s look at them in details :

> NIS2

Taking over from its 2016 predecessor (EU Network and Information Security Directive), the NIS2 Directive represents a significant evolution in EU cybersecurity legislation, designed to strengthen organizational resilience and indident-response capabilities across both public and private domains. This enhanced framework mandates specific risk management protocols and reporting requirements.
Article 21 outlines comprehensive cybersecurity obligations, encompassing multiple critical areas: systematic risk analysis and security protocols, incident response procedures, supply-chain security measures, effectiveness assessment mechanisms for risk management, fundamental cyber hygiene practices and detailed policies governing cryptography implementation and encryption standards. These requirements collectively form a robust framework for maintaining digital security in an increasingly complex threat landscape.

> EU AI Act

The EU AI Act applies a nuanced, risk-based methodology to security requirements, mirroring its approach to other AI governance aspects. Rather than implementing universal standards, security and robustness obligations are calibrated according to a system's risk level ; with distinct requirements applying to high-risk AI applications and General Purpose AI (GPAI) systems that pose systemic risks to society. This tiered structure ensures that security measures are proportionate to the potential impact and risks associated with each type of AI system.

→ High-risk AI systems:
The EU AI Act establishes comprehensive security protocols focusing on 3 critical aspects: accuracy, security and robustness. The legislation mandates that organizations implement both technical safeguards and organizational procedures to ensure these systems maintain resilience against potential failures, errors and operational inconsistencies. To achieve this level of protection, the EU AI Act suggests practical measures such as implementing redundancy through backup systems and establishing fail-safe mechanisms that can activate when primary systems encounter problems.

The EU AI Act extends its security focus to third-party risks, mandating robust protection against unauthorized manipulation of AI systems - whether targeting their usage patterns, output generation or overall performance through vulnerability exploitation. The legislation emphasizes that security measures must be contextually appropriate and proportional to identified risks. Organizations are required to implement comprehensive protective strategies that address multiple attack vectors including data and model poisoning attempts, adversarial manipulations, evasion tactics, breaches of confidentiality and inherent model vulnerabilities. These protective measures must span the full security lifecycle: prevention, detection, response, resolution, and ongoing control.

Under the EU AI Act, providers developing high-risk AI systems bear a specific regulatory responsibility: they must subject their systems to comprehensive conformity assessments. These evaluations serve as formal verification processes, demonstrating that their AI systems fully meet the stringent requirements established for high-risk applications under the legislation. This mandatory assessment process ensures compliance with the Act's safety and security standards.

→ Obligations for providers of GPAI systems with systemic risks:
The EU AI Act establishes specific security mandates for providers operating General Purpose AI (GPAI) systems that present systemic risks.
These comprehensive requirements encompass 3 key areas of responsibility:
- providers must conduct and document thorough model evaluations following standardized testing protocols with particular emphasis on adversarial testing to identify and address systemic vulnerabilities.
- they must maintain rigorous monitoring systems to document and report significant incidents to the AI Office.
- providers are obligated to implement sufficient cybersecurity measures protecting both the GPAI models themselves and their supporting physical infrastructure to safeguard against systemic risks.

> US Executive Order 14110

U.S. Executive Order 14110 establishes 2 crucial requirements for advanced AI development.
- it mandates that creators of cutting-edge AI systems must disclose their safety test results and other essential system information to federal authorities.
- it empowers NIST to develop and implement comprehensive standards for thorough red-team testing protocols, ensuring AI systems undergo rigorous safety evaluations before they can be released for public use.
This dual approach aims to enhance transparency and safety in the development of powerful AI technologies.

> NIST AI RMF

The NIST AI Risk Management Framework (RMF) addresses several key security vulnerabilities in AI systems, particularly focusing on threats like data poisoning and the unauthorized extraction of critical assets - including models, training data, and intellectual property - through system access points.
According to the framework's definition, an AI system achieves security when it successfully maintains 3 fundamental aspects: confidentiality, integrity, and availability, through robust protection mechanisms that effectively prevent unauthorized access and misuse.
The framework emphasizes practical implementation through integration with existing NIST frameworks, specifically the Cybersecurity Framework and Risk Management Framework, providing a comprehensive approach to security management.

> NIST Cybersecurity Framework

The NIST Cybersecurity Framework serves as a voluntary guidance system that delivers comprehensive security standards, guidelines, and industry-proven best practices to help organizations effectively manage their cybersecurity risks.
This framework is structured around 5 fundamental security functions that form a complete security lifecycle: identify potential risks, implement protective measures, detect security incidents, respond to threats, and recover from security events. This systematic approach provides organizations with a clear roadmap for building and maintaining robust cybersecurity defenses.

How to Implement AI Governance for AI Security ?

Effective AI governance implementation requires continuous security risk assessment throughout a system's lifecycle, with particular attention needed when engaging third-party vendors. While due diligence provides essential insights, it's merely the starting point for risk management.

Organizations should leverage this information to establish robust contractual agreements with third-party vendors that include 3 critical mandates:

- vendors must align their security protocols with the organization's established standards.
- regular security assessments and audits must be conducted to evaluate system resilience and verify vendor compliance with organizational security requirements.
- vendor access privileges should be strictly limited to what's necessary for their specific service delivery, following the principle of least privilege.

These contractual requirements create a framework for maintaining security standards across vendor relationships.

> Red Teaming

Red teaming represents a proactive security assessment approach that evaluates AI systems from an attacker's perspective, eliminating defensive biases that might blind internal teams to vulnerabilities.
This methodology involves conducting simulated adversarial attacks against the AI model to measure its performance against security benchmarks and attempt to provoke unintended behaviors through "jailbreaking" techniques.
The process systematically uncovers various vulnerabilities including security weaknesses, model deficiencies, inherent biases, potential for misinformation generation, and other possible harms. These findings are then systematically reported to development teams for remediation.
By implementing red teaming during development, organizations can strengthen their AI systems' defenses and resolve critical vulnerabilities before public deployment, significantly reducing potential risks to end users.

> Secure data sharing practices

Differential privacy functions as a sophisticated data protection technique that, while primarily designed for privacy enhancement, delivers significant security advantages. This methodology enables group-level data analysis while safeguarding individual privacy through the strategic addition of noise to datasets, effectively obscuring personal identifiers.
This dual-purpose approach means that even if data breaches occur, attackers cannot successfully link the compromised information back to specific individuals, thus minimizing potential harm. While this makes differential privacy an effective security measure by reducing stolen data's value, it presents 2 significant challenges: it can limit legitimate organizational data processing needs, and implementation costs can be substantial, particularly when dealing with large-scale datasets.
This creates a complex balance between security benefits and practical operational considerations.

> HITL

Human in the Loop (HITL) integration represents a strategic approach that embeds human expertise directly into AI decision-making processes. While this methodology potentially introduces human biases into algorithmic outputs, its value in AI security contexts is significant, particularly in enhancing incident detection and response capabilities. HITL proves especially effective in identifying nuanced attacks and subtle manipulations that automated systems might miss due to limitations in their training data. Though it enables ongoing monitoring and verification of AI operations, successfully implementing HITL requires carefully balancing potential contradictions between addressing algorithmic bias and maintaining robust security measures. This delicate equilibrium is crucial for maximizing the security benefits while minimizing the risks of human-introduced biases.

Want to learn more on AI Security ? Read our other articles on this subject or contact us for personalized assistance.

Source : IAPP AI Governance report 2024.