AI Safety: From Theory to Practice in Global Governance

Today´s discussion

AI Safety: From Theory to Practice in Global Governance

STANDARDS, POLICIES AND GLOBAL COLLABORATION

Understanding AI Safety: The Critical Foundation for Global AI Governance and Risk Management in 2024

Understanding AI Safety: The Critical Foundation for Global AI Governance and Risk Management in 2024

The landscape of AI safety encompasses several interconnected elements, with value alignment, transparency and security forming its foundation. While no universal definition exists for this comprehensive concept, it extends beyond basic safeguards to address fundamental challenges. The scope includes mitigating existential threats from advanced AI systems, alongside more immediate concerns like malicious exploitation and uncontrolled AI behavior.

Key frameworks offer different perspectives on risk management. The Center for AI Safety structures its approach around 4 primary threats:

  • malicious applications
  • competitive development risks
  • autonomous system vulnerabilities and
  • organizational challenges.
In contrast, international agreements like the Bletchley and Seoul Declarations emphasize proactive risk management and preparedness for emerging frontier AI challenges.

The practical implications of AI safety extend to combating contemporary threats, including the spread of AI-generated misinformation, sophisticated deepfake technology, and addressing unexpected behaviors in advanced AI systems. This comprehensive approach ensures protection against both current and potential future risks while promoting responsible AI development.

What is the Difference Between AI Security & AI Safety ?

AI Security focuses on protecting AI systems from external threats, unauthorized access and malicious attacks. It includes cybersecurity measures, data protection and preventing system manipulation.

AI Safety is broader, encompassing the design and development of AI systems that behave reliably and safely. It includes value alignment, preventing unintended consequences, ensuring systems operate within defined parameters and managing existential risks. While security is about protecting the system, safety is about ensuring the system itself doesn't cause harm.

Example: In a self-driving car, security prevents hackers from taking control (AI Security), while safety ensures the car makes decisions that don't endanger passengers or pedestrians (AI Safety).

What Laws and Regulations Apply to AI Safety ?

AI safety has gained prominence in major regulatory frameworks and national AI strategies worldwide.
The Biden-Harris administration prioritizes it through Executive Order 14110, which mandates the development of "Safe, Secure and Trustworthy" AI systems.
In 2023, the UK demonstrated its commitment by hosting the inaugural Global AI Safety Summit, placing special emphasis on frontier AI systems' safety.
Similarly, the EU AI Act incorporates safety as a fundamental requirement, particularly for high-impact GPAI and high-risk AI systems, mandating robust security measures and performance standards.

This global policy convergence reflects a growing recognition that AI safety isn't just a technical consideration but a crucial governance priority. These frameworks share common ground in treating safety as a prerequisite for AI development and deployment, though their specific approaches and requirements vary.

> AI Safety Institutes

In the United States:
The U.S. AI Safety Institute (AISI), recently established in November 2023 by NIST (National Institute of Standards and Technology) represents a significant step in formalizing AI safety standards. Its companion initiative, the AI Safety Institute Consortium (AISIC), unites over 200 organizations to develop comprehensive AI safety frameworks and measurement protocols. AISIC's mission combines research collaboration with practical standard-setting, focusing on creating guidelines, tools, and methodologies that can shape global AI safety practices.

AISIC' key objectives include establishing a collaborative knowledge-sharing platform, developing industry-standard safety protocols, and creating benchmarks to assess AI capabilities, with particular attention to potential harmful applications. This structured approach aims to bridge the gap between theoretical safety principles and practical implementation across the AI industry.

The institute's work emphasizes interdisciplinary cooperation, recognizing that AI safety requires input from diverse fields and stakeholders. Through standardized evaluation methods and best practices, AISIC seeks to create a unified approach to ensuring AI system safety and reliability.

In the UK:

The UK's AI Safety Institute serves as a vital hub for managing emerging AI risks through a comprehensive sociotechnical framework. Its three core mandates encompass evaluating advanced AI systems, conducting essential research into AI foundations, and creating channels for knowledge sharing across the AI community. This institute reflects the UK's proactive stance on addressing potential challenges from rapid AI advancement through systematic assessment and collaborative research efforts.

> Bletchley Declaration

The 2023 UK Global AI Safety Summit marked a pivotal moment in global AI governance, culminating in the Bletchley Declaration. This landmark agreement unites nations, leading AI companies and civil society organizations in a shared commitment to responsible AI development. The declaration establishes a framework for international cooperation focusing on innovation and safety while balancing technological advancement with human rights protection. It emphasizes sustainable development, economic growth and building public confidence in AI systems through transparent and collaborative approaches.

The Bletchley Declaration represents a significant shift from individual national initiatives to a coordinated global response in managing frontier AI risks. This collaborative approach acknowledges that AI safety challenges transcend national boundaries and require unified international action.

The countries represented were: Australia, Brazil, Canada, Chile, China, European Union, France, Germany, India, Indonesia, Ireland, Israel, Italy, Japan, Kenya, Kingdom of Saudi Arabia, Netherlands, Nigeria, The Philippines, Republic of Korea, Rwanda, Singapore, Spain, Switzerland, Türkiye, Ukraine, United Arab Emirates, United Kingdom of Great Britain and Northern Ireland, United States of America. 

 

> EU AI Act

The EU AI Act introduces robust security measures for general-purpose AI systems, particularly addressing "systemic risks." These risks are specifically defined as arising from high-impact general purpose models that could substantially affect the EU internal market. The AI Act considers risks that "significantly impact the internal market, and with actual or reasonably foreseeable negative effects on public health, safety, public security, fundamental rights, or the society as a whole, that can be propagated at scale across the value chain."

> AI Safety Standards

ISO/IEC Guide 51:2014 establishes a standardized framework for incorporating safety elements into technical standards. This guidance focuses on protecting humans and environmental welfare through clear safety requirements and recommendations that standards developers must integrate into their documentation. The guide's comprehensive approach ensures safety considerations are systematically addressed across different types of standards.

The Emergence of Compute Governance 

Compute governance is emerging as a critical approach to AI safety regulation, focusing on controlling computational resources rather than AI models themselves. This approach is strategic because:

  1. Compute, unlike AI models, is a finite, trackable resource requiring physical hardware
  2. The semiconductor industry is highly concentrated, making regulation more feasible
  3. Quantifiable thresholds can be established and enforced

Recent regulatory frameworks have implemented specific compute thresholds:

> U.S. Executive Order 14110 requirements:

  • Models using >10^26 integer operations
  • Bio-sequence models using >10^23 integer operations Must provide ongoing testing and security reports

> EU AI Act provisions:

  • GPAI systems using >10^25 floating-point operations for training
  • Triggers Commission notification requirements
  • Classified as systems with potential systemic risk
  • Mandates evaluation, adversarial testing, and incident reporting

These thresholds help identify high-capability AI systems and enable targeted oversight of their development and deployment.

How to Implement AI Governance for AI Safety ?

Various security and robustness measures originally developed for AI systems can be effectively applied to enhance AI safety. These include adversarial testing through red teaming, human-in-the-loop oversight, and privacy-preserving technologies. Additionally, transparency requirements like watermarking systems provide accountability and traceability, further supporting AI safety objectives.

I invite you to read our articles about AI Security and Bias, Fairness and Discrimination

> Prompt Engineering

OpenAI employs prompt engineering as a core safety practice for its generative AI systems. This approach helps systems better interpret context and intent, reducing undesired or harmful outputs. By implementing strict usage policies and optimizing prompt understanding, OpenAI creates guardrails that limit potential misuse while maintaining system functionality. These controls act as a preventive measure at the user interaction level.

> Reports and Complaints

OpenAI implements a user reporting system where human operators monitor and address safety concerns. However, this practice remains uncommon - a 2023 TrustibleAI study found only 3% of organizations offered individual appeals processes. This may change under the EU AI Act, as Article 27(f) requires deployers to assess internal governance and complaint mechanisms when AI risks result in actual harm.

The mandate for Fundamental Rights Impact Assessments (FRIA) of these mechanisms suggests increased adoption of formal feedback systems as organizations work to comply with regulatory requirements.

> Safety by Design

Microsoft emphasizes safety by design across its AI development, implementing protective measures at platform, model, and application levels. Key safety practices include:

  • Red teaming
  • Automated testing
  • Preemptive content classifiers
  • Abusive prompt blocking
  • Swift user banning for system misuse

The company actively balances free expression with content moderation across its platforms (LinkedIn, Gaming Network) focusing on identifying and removing deceptive or abusive content.

This comprehensive approach integrates safety considerations from initial design through deployment and monitoring phases.

> Safety Policies

As an example, prior to the UK AI Global Safety Summit, Meta's AI safety policies for its Llama model encompass:

  • Rigorous model evaluations and red-team testing
  • Clear reporting structures for post-release vulnerabilities
  • Active monitoring systems to detect misuse patterns
  • Implementation of AI content identifiers
  • Strict data input controls and regular audits
  • Dedicated research on security and societal risks
  • Transparent model reporting and sharing protocols

These measures form a comprehensive framework balancing innovation with responsible AI development and deployment.

> Industry Best Practices

Partnership on AI leads significant AI safety research initiatives through 2 key programs:

  • The Guidance for Safe Foundation Model Deployment framework provides operational guidance for model providers, offering customizable deployment protocols based on specific model capabilities. This living document evolves with advancing technology and emerging safety considerations.
  • Their SafeLife benchmark tests reinforcement learning agents' ability to avoid harmful side effects in complex environments. This testing environment evaluates agents' capacity to complete tasks while maintaining safe, non-destructive behavior, enabling comparison and refinement of safety training techniques. 
 

As a Conclusion

AI safety has emerged as a critical framework encompassing value alignment, transparency, and security measures for AI systems. Global initiatives like the Bletchley Declaration and regulatory frameworks including the EU AI Act and US Executive Order 14110 demonstrate increasing international coordination on AI safety governance.

Key developments include the establishment of specialized institutions like the US and UK AI Safety Institutes, focusing on research, evaluation, and risk assessment. Compute governance has become central, with specific thresholds defining high-impact AI systems and triggering regulatory requirements.

Industry leaders have implemented various safety practices: Microsoft emphasizes safety-by-design, OpenAI utilizes prompt engineering and user reporting systems, and Meta employs comprehensive safety policies for its models. Partnership on AI provides frameworks for safe model deployment and benchmarking tools like SafeLife.

The movement towards standardization is evident through ISO guidelines and increasing cooperation between governments, companies, and civil society organizations. This reflects a shift from theoretical concerns to practical implementation of AI safety measures across sectors.

Recent initiatives focus on balancing innovation with responsible development, emphasizing the need for proactive risk management, especially for frontier AI systems. The establishment of clear reporting structures, monitoring systems, and international cooperation frameworks demonstrates the evolution of AI safety from optional considerations to essential requirements.

Why AI Safety is So Important to AI Governance ?

AI safety forms the foundation of effective AI governance by establishing protective frameworks for responsible development. It enables policymakers to create informed regulations based on concrete safety metrics and thresholds. By incorporating safety requirements into governance structures, organizations can better manage AI risks while fostering innovation. Safety considerations drive the development of technical standards and best practices that shape governance frameworks. The integration of safety principles into governance ensures AI systems remain aligned with human values while delivering societal benefits.

Want to know more about AI Safety for your AI Governance implemention ? Read more articles and contact us for a free call to assess your needs.