SpaZK: 100X Faster Verifiable AI powered by Cross-stack ZKML Optimization

2024-11-08

[read_meter]

TL;DR

To make ZKML-based verifiable AI practical, we need a “cross-stack” design that combines lossless model pruning and quantization techniques and a specialized ZK Proof protocol that is aware of such simplifications.
We proposed SpaZK, the next-gen model-simplification-aware ZKML protocols based on GKR and sumcheck that achieves provable optimal performance in theory and close to a 100X concrete efficiency gain over existing solutions.
SpaZK is highly modular and can be combined with other ZKML techniques to bring large model ZKML to reality.

Today, we are excited to unveil a new ZKML architecture, SpaZK, that improves state-of-the-art verifiable AI model inference performance by close to 100X. The key intuition behind SpaZK is to treat ZKML not only as a ZK Proof problem, but also as an ML problem at the same time. Through cross-stack and holistic optimization, we discovered a special type of model simplification, ternary networks, is inherently ZK-friendly without any loss of model accuracy. By combining these new models with specially designed ZK proving protocols, we can make ZKML much more efficient, marking a significant step towards practically-secure verifiable AI.

SpaZK, as a proving backend, will be integrated with popular ZKML frameworks such as EZKL, Giza, and others to become an essential part of Brevis’s product suite. Combined with Brevis’s existing production-ready ZK Coprocessor, SpaZK will be able to allow smart contracts to process and analyze on-chain data with powerful AI models to enable even more exciting use cases driven by on-chain AI agents and algorithms.

For more details, please refer to our technical paper and implementation.

Mission-critical AI should be verifiable

In recent years, deep learning has significantly advanced fields such as natural language processing, strategic gaming, and life sciences.

However, with the rapid adoption of machine learning models in critical domains, ensuring their integrity has become essential. Deploying unauthorized or unverified models can lead to catastrophic consequences. For instance, in the banking sector, using an unauthorized fraud detection model could result in false positives that inconvenience customers or worse, the failure to identify fraudulent transactions, jeopardizing the security of financial systems. Similarly, if a self-driving car manufacturer were to use an unauthorized vision system to cut costs, the results could be dangerous. A faulty model might fail to detect pedestrians or vehicles accurately, putting lives at risk. In healthcare, relying on an unauthorized diagnostic model to interpret medical images could lead to incorrect diagnoses and improper treatment, harming patient safety and trust.

To prevent these risks, ensuring that only authorized and verifiable models are used is crucial. Namely, one should be able to attest that a certain AI inference result is generated with a “good” or “approved” model meeting certain criteria. Verifiable AI guarantees reliable, accurate, and safe outcomes in mission-critical applications.

ZKML: the frontier of verifiable AI and existing challenges

Zero-knowledge machine learning (ZKML) offers a way to prove that machine learning models function correctly without revealing their inner workings or training data. This allows developers to demonstrate model reliability without exposing proprietary algorithms or sensitive information. By ensuring that only authorized and verified models are used, ZKML addresses the critical need for trust and integrity in AI applications. This innovation enables secure deployment in sensitive areas, bolstering confidence in AI systems while maintaining data privacy and protecting intellectual property.

Zero-knowledge machine learning (ZKML) combines zero-knowledge proofs with machine learning to ensure the integrity and privacy of AI models, but it comes with substantial inefficiencies as the cost. ZKML protocols typically require extensive computation to generate and verify proofs. This inefficiency is magnified when dealing with larger models, making it impractical for real-time applications.
For example, one study showed that proving a single result from a language model with 1.5 billion parameters can take 90 hours. zkLLM, a more advanced system, reduced this time significantly, managing to prove inference on models with up to 13 billion parameters in 1 to 15 minutes. While this is a huge improvement, it’s still a big challenge when you consider that today’s top language models can have hundreds of billions or even trillions of parameters. The time and computational power needed to prove these large models correct can be so high that it might outweigh the benefits of using zero-knowledge proofs. This is a major bottleneck for applying ZKML to the modern large-scale AI models we see today, which is why we need new techniques to speed up these processes.

Normal fully connected layers in neural networks involve the sum of products, with the total amount of computation linear to model parameter size (in this case, the matrix size) times the time for multiplicative operations. When the model is large, this will lead to inefficiencies.

Cross-stack optimization: Efficient ML and ZK-friendliness

Large model inference is computationally demanding, requiring substantial resources to run efficiently. To address this, researchers have developed techniques like pruning (reducing the number of weights) and quantization (reducing the precision of weights) to make large models more manageable. These techniques could be a natural solution to try out when facing inefficiencies in ZKML. However, naive simplification methods often result in significantly reduced model performance and render the resulting model useless to verify in ZK.

We need cross-stack optimization by looking at ZK and ML as a whole problem.

Among all simplification methods and models, ternary networks, which use only three possible values for weights, {-1, 0, 1}, is an especially promising one. This type of model might seem overly simplistic, but it actually leads to a big boost in efficiency without sacrificing accuracy. In addition, they have several great features, making them extremely ZK-friendly and a strong go-to candidate when working on ZKML:

Sparsity from 0-valued weights: Imagine a model as a big web of connections. In a ternary network, many of these connections (weights) are zero, which means they don’t do anything. This “sparsity” means there are fewer active connections to process, making the whole model run faster. It also means that proving the model’s correctness can be done with less effort because there are fewer calculations to verify.
Additive Operations Replacing Multiplicative Ones: Normally, neural networks do a lot of multiplication, especially in fully connected layers. But in a ternary network, because weights are only from {-1, 0, 1}, we can replace all multiplications with much simpler additions and subtractions. For example, multiplying by 0 doesn’t change the value, multiplying by 1 just means adding the value, and multiplying by -1 means subtracting the value. This simplification speeds things up significantly in the ZKML context because additions and subtractions are much “cheaper” (easier and faster) than multiplications.

Ternary fully connected layers in neural networks only involve additions and subtractions, with total amount of computation linear to the number of nonzero entries multiplied by the time for additive operations, saving huge amounts of computational costs when the model is large but sparse and ternary.

In simpler terms, transforming normal neural networks to ternary networks eliminates a huge amount of computations without losing performance. Ternary networks’ ZK friendliness makes it a perfect fit for combining with ZK technology, helping us ensure model integrity and privacy without the huge computational costs.

SpaZK – Shedding Light on Practical ZKML

SpaZK is a super-efficient method specially designed based on GKR and sumcheck for checking if a ternary network-based model operates correctly with high efficiency. The key insight is that it leverages sparsity in the model and only pays attention to the active weights. Additionally, SpaZK leverages simpler operations, using additions and subtractions instead of multiplications, making verification quicker and easier.

SpaZK brings significant efficiency gains to the proving process of machine learning models, making it a practical option for efficient ZKML. Importantly, we showed that when proving on a linear layer, SpaZK achieves asymptotically optimal prover time, which scales linearly with the number of nonzero entries in the model parameters. Such a proving process scales much more favorably with the size of the model compared to existing methods.

Preliminary Results on Mat-vec Multiplications

We conducted a preliminary experiment to evaluate proving strategies for a fundamental component in machine learning models—the linear layer. This operation can be reduced to matrix-vector multiplication, making it amenable to GKR or sumcheck-based proving protocols.

Our experiments utilized matrices of dimension 2^12× 2^12 in three distinct configurations:

Dense matrices with standard numerical entries.
Sparse matrices with a density factor of 1/16.
Ternary sparse matrices with entries constrained to the set {-1, 0, 1}.

While numerous proving protocols exist, we focused our comparative analysis on GKR and sumcheck-based approaches. Our experimental results are as follows:

The sumcheck-based protocol demonstrates substantial performance gains, reducing proving time by a factor of approximately 50 compared to GKR. Furthermore, when combined with SpaZK for sparse and ternary matrices, the performance improvement increases dramatically, achieving a 1100-fold speedup over the baseline GKR implementation. This clearly demonstrates that by combining special models from ML and special protocols from ZK, ZKML performance could be largely boosted.

Preliminary Results on MLPs (Multilayer Perceptrons)

We implemented a full ZKML framework based on SpaZK and Hyrax (via the Jolt implementation) and evaluated it on a multilayer perceptron (MLP) neural network with an input dimension of 784 (28 × 28), utilizing ReLU activation functions for nonlinear transformations. To assess the complete system, we conducted comprehensive end-to-end experiments to validate model integrity using SpaZK. Our implementation employs SpaZK for linear layer verification while utilizing the standard LogUp protocol for ReLU layer proofs. We benchmarked our system against EZKL with Halo2 as backend, a widely-adopted framework for zero-knowledge proofs in machine learning. We want to highlight that we are exploring ways to integrate SpaZK as one of the backends of the EZKL framework so existing users of EZKL can get the benefit of accelerated proving performance.

We conducted experiments across a range of network architectures to evaluate the scalability of SpaZK. Our test suite encompassed networks from a minimal configuration (a single hidden layer with 512 neurons) to more substantial architectures (two hidden layers with 4,096 neurons each). The parameter space explored ranged from approximately 4 × 10^5 parameters in the smallest network to 3.4 × 10^6 parameters in the largest configuration.

The results demonstrate that SpaZK achieves significant performance improvements, reducing proving time by 40-80× compared to existing approaches. This acceleration can be attributed to several key components: while SpaZK plays a central role, other factors like the use of Hyrax also contribute to the final acceleration rates. Notably, we observe that the acceleration factor exhibits a positive correlation with network size, suggesting potential for even more substantial improvements when applied to larger-scale models.

While these end-to-end performance gains are significant, we identify that the theoretical maximum acceleration is currently bounded by the computational overhead of proving ReLU (and more generally, nonlinear) operations. The dramatic efficiency improvements in linear layer proving have shifted the performance bottleneck to these nonlinear components. This observation suggests that developing optimized proving strategies for nonlinear operations could unlock further significant performance improvements in the complete system.

What’s next?

SpaZK is a game-changer for ZKML, offering a highly efficient and practical solution for verifying complex machine learning models. More importantly, we believe it points to a new direction of cross-stack optimization in the ZKML space. SpaZK has significant real-world applications in ensuring the integrity and efficiency of machine learning models across various critical domains in finance, blockchain, healthcare, and more.

SpaZK is a modular proving backend and can be integrated with different ZKML frontends such as ChainML, EZKL, and more. In the near future, SpaZK will be integrated as part of Brevis’s product suite and enhance the computation capability of Brevis ZK Coprocessor.

In the longer term, we hope SpaZK can contribute to a world where you can trust the AI systems in your life without worrying about their integrity. Whether it’s ensuring that your bank’s fraud detection model is accurate, that your self-driving car’s vision system is safe, or that your healthcare diagnostics are reliable, SpaZK makes it possible to achieve this practically-secure intelligence, paving the way for a future where intelligent systems are seamlessly integrated into every part of our lives, safely and efficiently.