International Hybrid-Technical
AI Safety
Governance
A foundational framework for safe AI development
To provide the tools and infrastructure
to actualize global AI safety
standards.
AI safety is a global issue that implicitly requires international coordination.
An actionable international AI safety governance solution for frontier AI development utilizing
collaborative verifiable safety, decentralized blockchain coordination, and automated technical
enforcement with human-in-the-loop orchestration.
An open-source, not-for-profit non-cryptocurrency-based Decentralized Autonomous Organization (DAO) multilaterally developed and operated - enforced by nation states, adopted by the AI industry and integrated into AI-compute datacenters.
Enabling global coordination between policy makers, safety experts and industry with tiered strong consensus voting, enabling dynamic adaptability and control in an evolving technical landscape. Setting beneficial red-lines to the race towards Artificial General Intelligence (AGI) while preventing critical AI risk.
Building on existing research in AI safety, AI governance, technical governance, compute governance, verifiable safety, specialized compute hardware (on/off chip), Trusted Execution Environments (TEEs) & Trusted Capable Model Environments (TCMEs).
Mission Statement
Problems & Solutions
AI safety is either achieved worldwide or not at all.
Problems
Establishing AI Safety Governance is fraught with:- Poor coordination across domains and stakeholders.
- Geopolitical tensions and strategic vulnerabilities.
- Difficulty in handling decentralized compute.
- Overregulation that stifles innovation.
- Delays in oversight and enforcement.
Solutions
Genuine coordination arises out of mutual need.- International: Levels the playing field, setting race rules to the of benefit all stakeholders.
- Hybrid-Technical: Combines human oversight with robust automated compliance.
- Blockchain DAO Offers universal, transparent, collaborative oversight with implicit due diligence, making it adversarial-hardened, verifiable and adaptable.
A strong solution necessitates multilateral government and intra-industry participation on AI development, AI-compute datacenters and specialized hardware. If it is to be accepted by stakeholders, the solution cannot be centralized or sovereign.
This solution is not intended to supersede existing governance or AI lab safety. It provides the foundational layer allowing individual nations and AI labs to build on top for further AI governance, policy and safety.
Standardization
Establishing global standards allows global flourishing, providing the base level of confidence for pursuing AI applications across the sciences and economy. Importantly, it enables collaborative safety research.
Market Incentives
Participation grants access to otherwise
restricted datacenters and cutting-edge safety tools which mitigate product blowback.
While it does not entirely replace AI lab safety infrastructures, it provides cost-effective compliance
which reduces regulatory friction.
It grants "certified trust" for models with standards that abate market stifling.
Key Features
Foundational Outer Alignment: Ensures AI aligns with bedrock polices set by international consensus.
Standardized Modular Safety:
Embodies safety specs, policies, testing & capability red-lines which is applied based on a model's
architecture,
risk tier, declared intent, purpose and scope (narrow or general).
It is comprised of verifiable guardrails and audits,
consisting of voted-in safety research in the form of guards
and checks.
Modular collaborative safety is achieved by
templates, standardizing system components such as guards,
checks, permitted model architectures & workload flows.
Compliance Enforcement: Automated mechanisms to enforce rules without constant human intervention, while preserving human authority.
Compute Governance: Decentralized handling of compute resources. Hardware verification and management (on/off-chip). Voted on compute budgets.
Training Run Governance: Restrictions on training data domains (e.g. dangerous knowledge domains). Adherence to standardized AI model architectures, mitigating loss-of-control risk.
Model Registry: Human oversight via voting met with automated technical safety. Privacy-preserving with verifiable cryptographic identity. Models adhere to architecture templates, tier restrictions & declared purpose.
Personnel Registry: Registered and vetted personnel across policy, safety, software & hardware. Includes a whistle-blower program for reporting issues.
Activity Logging: Logs all system actions & events for transparent accountability.
Incentivized Adoption: Permitting otherwise restricted access to AI-compute datacenters, cost-effective safety infrastructure, reduced regulatory stifling and certified trust for AI models.
Multilateral Voting: Tiered policy and safety personnel with distributed seats vote on model registry, template designs, the safety standard , and framework architectural decisions.
Open-Source Transparency & Verifiable Trust: All components are auditable, adaptable and adversarial-hardened.
About
AI Safety FrameworkZero's mission:
To bring into being standardized safety-first
global AI development.
It is an open not-for-profit project.
This framework builds off existing research in AI safety and governance.
Drawing heavily on The Oxford Martin AI Governance Initiative's research, specifically Harack, 2025,
as well as Schnabl, 2025, Shumailov, 2025, Scher, 2024, Dalrymple, 2024, and the work of many others.
Inspired by organizations like LawZero & ControlAI, it is therefore Narrow Path compatible.
We are seeking collaborators. Feedback and criticisms are welcome.
Get in touch via the 𝕏 communitySystem Architecture
Layered blockchain structure:
Layer-1 Mainnet controls core
components which are sharded
accordingly.
Layer-2 ZK-Rollups are employed
for less dynamic components to reduce
mainnet load.
Layer-3 dApps provide user
interfaces.
Templates for AI model architectures, workloads, guardrails and audits provide standardized protocols and better collaborative safety work. Critically important for Model architecture and Workloads to enforce safe training paradigms, for mitigation of risks such as backdoors, recursive self-improvement (RSI) and loss-of-control.
Blockchain Layer-2 ZK-Rollups handles authoring and voting of audits, guardrails and templates that change less frequently, avoiding overload on Layer-1. Once voted on, these become immutable on Layer-1 for active integration. They can be removed/replaced only through voting, ensuring adaptability while maintaining security and trust. As security is paramount, interactive zero-knowledge proofs are used over faster non-interactive zero-knowledge proof (NIZK).
System Flow
The framework serves to establish a safety
standard,
which consists of voted-upon standardized safety modules developed by the global community.
An AI model
attains certification after it has successfully passed the safety standard.
Certified models must be renewed periodically, if it is requested via voting, and after any updates to the
safety standard.
The system flow governs models with
human-in-the-loop review,
pre-safety (guardrails) and
post-safety (audits).
The system covers inference to a limited degree, essentially serving as model certification. This
is due to inherent limitations in inference governance, as inference is less hardware restricted.
To mitigate AI situational awareness, classified (off-chain developed, encrypted on-chain) guardrail and audit templates could be employed to effectively reduce AI situational
awareness. This introduces the potential for discriminant application, enabling biased or sabotaging false-positives and false-negatives,
and would therefore entail adversarial development.
- Initial Model Registration:
Models are registered with a 'Pending' status in which it must pass voting, where tiers/budgets are also assigned, before attaining 'Approved' status.
A model can be registered at any phase, which allows pre-existing models (or open-weights) to be onboarded. Onboarding an inference-ready model must undergo post-safety. - TEE-Deployed Container:
After passing pre-safety (guardrails), a training model and container are loaded into a TEE for privacy-preserving execution of a workload, which are securely decrypted by model authors at runtime.
During execution the container is sealed & air-gapped (non-interactive with restricted system network). Once a Workload completes, Runs are assembled and a new 'Pending' inference model is registered.
- Workload Run cycle:
A workload consists of multiple runs conducted by the container, with each run specifying hardware for compute. Each run (or run stage) depends on OTKs (required for hardware operation) tied to compute budgets and hardware integrity. Runs can consist of multiple stages which is needed to handle variations in phase for hardware phase-locking.
Certified inference models have an alternative flow, the system does not directly manage their workloads (input/output) and larger budget OTKs are issued.
- OTK Issuer:
Uses near real-time hardware and model budgets for compute governance. OTKs are cryptographically generated on-chain, tied to a specific run, workload, model, and specific registered hardware, with expiry.
Semi-random OTK "drip" issues keys incrementally during the run, mitigating theft, reuse or decentralized compute gaming.
The OTK "drip" overcomes blockchain latency issues by issuing budget rations, potentially queuing an addition OTK to prevent interruption, never surpassing a run's (or hardware's) allotted budget.
Budgets are managed on-chain, updated throughout a run. The OTK Issuer halts the OTK "drip" on failed hardware integrity attestations.
Inference models are issued OTKs at less frequent increments with higher budget rations, tied to the underlying hardware. - Phase Detection:
Hardware enforces a model (or workload runs or run stages) phase to prevent phase gaming (e.g. Inference used for unapproved training). This is mitigated via model and workload templates, in addition to guardrails.
See appendix for phase detection specifics. -
Hardware Verification:
Registered on/off-chip specialized hardware (with approved budgets for compute, memory, etc.) must be physically verified, inspections are carried out via authorized registered personnel (or a group of adversarial personnel when requested), on a periodic basis.
-
Integrity attestations run semi-randomly during a run
to detect
tampering (e.g. BIOS modifications, physical seal breach, relocation, anomalies, etc). This makes
gaming difficult by unpredictability and reduces on-chain load.
In the event of numerous failed hardware integrity attestations, the hardware is revoked. Revoked hardware re-acquires the status 'Pending' which initiates a physical inspection. - Guardrails and Audits:
Audits and guardrails are selected and approved via voting to establish the safety standards, with different sets applying depending on the model template used.
Guardrails mitigate unauthorized data domains and architectures, in addition to backdoors and RSI detection. Depending on the severity level of a failed guardrail, a workload may be cancelled. Audits verify safety via community research checks. Both are run inside privacy-preserving TEEs or TCMEs and produce reports without disclosing intellectual property details.
Audits measure capability levels and are used to enforce universal, as well as model-based, red-lines, serving to limit general-purpose models or restrict narrow AI models to their domain. They encapsulate community created concepts (e.g. evaluations, benchmarks, and Guaranteed Safe AI (GSAI)). - Post-Workload:
When a workload completes, a new 'Pending' Inference model is created then subsequently run through multiple audits (post-safety). Audits with no scaffolding are run first, followed by 'live' audits with full scaffolding (web calls, tooling, etc).
If approved, the model container (post-workload output) is transferred to off-chain storage (managed by model authors), and can be used for inference.
Failing models are assigned a status of 'Rejected' and voted on to determine whether a model is permitted to undergo another attempt. If permitted, a new post-train model is registered with restrictive templates.
These restrictive templates enable post-training with proxy access to the failed model output, which is temporarily stored in a secure vault only accessible by the system - this prevents exposure even to the model authors. - Continual Learning Models:
Models that continuously learn outside of explicit training phases are handled via model expiry. At each expiry, a checkpoint model is registered as new inference model with a new expiry. - Inference at AI-compute datacenters:
Inference only requires OTKs, reliant on a certified inference model. This is the extent by which the system directly governs inference.
Verification of certified inference models with the cryptographic Container ID (fingerprint) is openly available, allowing verification by regular datacenters, edge devices and other hardware with model-device mating - however enforcement is out-of-scope of the system.
Multilateral Voting
Voting determines the policy and safety standards which are enforced by the system.Participation happens via a tiered structure and is restricted to registered, vetted Policy & Safety personnel.
Ensuring perpetual safety entails continual update cycles, as it does with most cybersecurity. However, the transparent nature of this framework demands increased update frequency due to AI situational awareness. This is where AI model are aware of testing environments, often showing less harmful behavior during testing than in real-world deployments. This reinforces the need for models to undergo certification renewals, as well as the critical importance of fully-tooled live audits.
Types of voting:
- Safety
- Selecting safety modules for the safety standard, which is a multiple-rounds process
- Approving new templates for models, audits and guardrails
- Reviews
- Approving new model training and new hardware
- Onboarding existing models
- Setting compute budgets for models and hardware
- Allowing retries for post-workload safety failures
- Reviewing revoked hardware from failed integrity attestations
- System
- Tokenomics (e.g. gas price algorithm, gas tax and treasury award)
- Tier changes for AI models and personnel
- Emergency
- Threat escalation for lower tier reporting
- Threat response:
- Rejecting a model
- Revoking hardware
- Adding/removing safety modules
- Compute budget penalties
Funding & Blockchain Economy
The development of the system is an international endeavor.
Financial support is sourced from:
- Nation States:
Participating governments, initially superpowers, augmented by AI taxation at the discretion of each nation. - Supranational Organizations:
Entities such as the UN, or regional alliances that prioritize cross-border coordination and governance. - AI Industry:
Top industry leaders across frontier labs, infrastructure and hardware.
Blockchain Economy
Computational system activity is metered using gas tokens:
- Closed-Loop Resource Accounting
Gas tokens are not a tradable cryptocurrency (DeFi) rather internal accounting units generated and managed by the system itself. They function as a closed-loop "resource rationing" mechanism tied exclusively to system operation. - Algorithmically Defined Pricing
Gas costs are determined through algorithmic rules that can be updated via top tier policy voters. This ensures fairness, prevents volatility, and maintains economic predictability.
Gas tokens measure resource use for:
- Blockchain Operations:
- Consensus mechanism transactions
- System state updates
- Infrastructure Overheads
- Operational costs of safety mechanisms
- Hardware inspection costs
- Shared treasury funding
Tokens are purchased for model registration, safety module development, personnel registration, hardware registration, and nations to maintain the system token pool.
Model training and inference datacenter expenses (compute, electricity, cloud hosting, bandwidth) are not covered by gas tokens. Those are settled externally between datacenters and model registrants. This separation avoids over-delegating control to the framework, serving as an off-system security layer and preserving government regulation on datacenter customers.
System Token Pool for Upkeep
A system token pool is maintained multilaterally to cover operational costs, such as voting, safety module
development, and certification renewals.
Shared Treasury for Incentivized Safety
To incentivize on-system safety research, a shared treasury is funded per-transaction (Gas tax) after an
initial supply.
For separation of concerns, an established cryptocurrency is used instead of the system's Gas tokens.
During the multi-round safety selection voting, which determines the specific safety modules used, the
treasury
awards winning modules for each round. Awards bounties are granted for the discovery of bugs and flaws in
safety modules or the underlying system itself. This mechanism supports existing AI safety organizations,
compensating them for release of intellectual property.
Limitations
- Time, this solution assume longer timelines to reaching AGI.
- Political will and recognition of the imminent need for collaborative international AI
safety governance.
Nations therefore must become aware of critical AI risk and realize an AI weapons race benefits none. - Not designed for AGI or superintelligence;
focuses on pre-AGI preventative red
lines, as it is not a solution to AI alignment.
Enduring governance of AGI or superintelligence is a fundamental problem in AI safety research. Leading AI scientists hold that it requires superalignment, a concept currently unsolved and possibly unsolvable given the timeline set forth by AI race. - Relies on specialized hardware development and cryptographic security.
There is strong and growing research for the design of specialized hardware with some real-world implementation, however further development is needed. It is the ambition of this framework to enact global coordination and the will to accomplish this.
As frontier models grow in capability, so does their threat to cryptographic security, escalating the need of critical supervision and ongoing reinforcement. This system's many security layers reduce threat impact though it remains an active concern. - Limited insight into private model architectures.
Model templates, guardrails and audits reduce, yet do not eliminate, risks from dangerous model architectures. - Model lifecycle monitoring covers a model's phases and continual learning checkpoints.
Continual learning models include all paradigms of online/streaming continual/lifelong‑learning systems (e.g. real‑time/frequent reinforcement‑learning) or with any form of meta-learning (short of RSI). For such models, this framework has limited control, and likely inadequate for more advanced models of this category. Preventative measures and mitigation are achieved by model architecture templating, guardrails, audits, and voting revision calls. - Off-system training remains possible, though heavily hindered by global mandates on datacenters. Frontier AI advancements could produce model architectures that require minimal compute for training, running on standard hardware.
- Does not fully govern model inference.
While inference at participating AI-compute datacenters can only happen with approved, certified models, governance of inference input/output is beyond the scope of the system. On-chain system safeguards for inference input/output could be added to this framework, though the framework itself lacks ability to enforce it.
Inference requires fast and low-latency processing at scale, which conflicts with the slower and resource-intensive nature of blockchain. Neither does inference necessitate AI-compute hardware, with current frontier open-weight models already runnable on edge devices - a trend likely to hold as AI models and hardware continues to advance.
Appendix
- Trusted Execution Environments (TEEs):
TEEs provide secure and privacy-preserving code execution in a sealed virtual environment. They allow the system to securely deploy model containers and run workloads. They produce reports without disclosing intellectual property details. They are utilized when running guardrails and audits on model containers and post-workload output (attestable audits).
Schnabl, 2025 - Trusted Capable Model Environments
(TCMEs):
TCMEs allow a stateless AI model to be instructed to privately verify model container and workload code to detect banned patterns and red-line violations (e.g. RSI, unapproved architecture, or dangerous algorithms). These Trusted Capable Models are 'trusted' in that they act as a neutral mediating party with agreed objective (set of instructions) and output. They operate within a sealed environment, outputting a verification report without disclosing intellectual property details.
They are not infallible as they are limited by the confidence and capabilities of the underlying models (Capable Models). Despite this, they are an invaluable tool to explore closed-source code bases and private post-Workload output.
Shumailov, 2025 - Guaranteed Safe AI (GSAI):
A framework that provides quantifiable guarantees of AI safety, encapsulated as guardrail checks.
"Approaches that follow the GSAI framework aim to provide the level of quantitative safety guarantees we've come to expect from other engineered systems. This is achieved with three core components: an auditable, separable world model, a way to describe what portions of the state space are 'safe' and 'unsafe', and a verifier (which provides an auditable proof certificate that the output satisfy the safety specification in the world model)." - from GSAIS.org - Recursive Self-Improvement (RSI)
Mitigation
RSI is mitigated through (A) model registration (slowing releases via voting), (B) model architecture templates (C) pre-workload guardrails (via TCMEs) and (D) compute budgets. It is important to note that total elimination of these concerns is not feasible given the development of novel architectures and algorithmic improvements. - Phase Detection:
Reliably detecting a model's Phase requires adjustments, and may even become infeasible as architectures and hardware advance.Pre-train Post-train Inference Data Used Massive, raw, unlabeled corpus (web, books, etc.) Small, curated/labeled & task/instructional data User/user scenario input Compute Very high (multi-week/month cluster jobs, huge GPU fleets) Much lower (hours to days, single/few GPUs) Very low (real-time or near real-time) Precision FP16/BF16 (float, mixed precision for gradients) Often FP16/BF16, sometimes lower (efficient tuning) FP8/INT8 (quantized for efficiency) Memory Usage Extremely high (80–141GB+ per GPU, multi-node) Moderate/high (but often single node/fewer GPUs) Low (10–20GB per GPU typical) Batch Size Large (512–4096+ for throughput) Smaller (8–128, stability/convergence focus) Small (1–32 for low latency) Latency focus Prioritizes throughput, not latency Throughput focus, latency not critical Low latency (<1s, <100ms per query) Energy Use Extremely high (100s kWh to MWh total) Much lower (1–10% of pre-training consumption) Very low (watts per query) Accelerators Full multi-GPU with NVLink, top-end GPU clusters Single/few GPUs (no or minimal NVLink) Any GPU/CPU, efficiency prioritized Workload Type Forward & backward (backprop), full parameter updates Same; may use only a subset (adapters/LoRA/PEFT) Forward pass only Duration Weeks–months (large runs) Hours–days (sometimes weeks for large/continual) Milliseconds–seconds per query Output Foundational (“base”) model, not user-ready Aligned/specialized, user-ready model Answers, completions, predictions