[RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool

Linux cryptographic layer development
 help / color / mirror / Atom feed

* [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool
@ 2026-06-09  7:45 kstzavertaylo
  2026-06-09 19:25 ` Eric Biggers
  0 siblings, 1 reply; 4+ messages in thread
From: kstzavertaylo @ 2026-06-09  7:45 UTC (permalink / raw)
  To: linux-crypto, herbert, ebiggers

Hello,
I have been working on an ML-KEM (FIPS 203) implementation for the
Linux kernel. This is an early RFC to solicit feedback on the overall
design and architecture before further polishing.

The implementation consists of two closely related variants sharing
the same core cryptographic logic:
    1. A userspace implementation accompanied by a set of validation
programs, including NIST KAT vectors, timing-leakage testing (dudect),
pool stress tests, and additional functional tests.
    2. A Linux kernel module implementing the KPP interface and
reusing the same core architecture where possible.

Key features include:
   1. Support for all three parameter sets: ML-KEM-512, ML-KEM-768,
and ML-KEM-1024.
   2. The implementation uses a reusable decapsulation pool consisting
of preallocated slots associated with a key context. The goal of this
design is to move memory allocation to key initialization and avoid
per-decapsulation allocations.
   3. Explicit zeroization of sensitive data and constant-time
operations where required.
   4. Portable C11 codebase with minimal differences between userspace
and kernel versions.

I am aware that some aspects (local SHA3/SHAKE implementation, coding
style, etc.) will likely need adjustment to align with upstream
expectations.

At this stage, I would like to ask for feedback on the following points:
   1. Is the general direction (KPP integration + reusable
decapsulation pool) acceptable?
   2. Are there any fundamental concerns with the pool-based architecture?
   3. Would you prefer to reuse kernel crypto primitives for
SHA3/SHAKE, or is the current embedded approach acceptable at this
stage?

The implementation is available at: repository - https://github.com/kstzv/ml-kem

Documentation and implementation details are available in the repository.

Any feedback, criticism or suggestions would be greatly appreciated.

Thank you for your time.
Best regards,
K. Zavertailo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool
  2026-06-09  7:45 [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool kstzavertaylo
@ 2026-06-09 19:25 ` Eric Biggers
  2026-06-12 14:14   ` kstzavertaylo
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Biggers @ 2026-06-09 19:25 UTC (permalink / raw)
  To: kstzavertaylo; +Cc: linux-crypto, herbert

On Tue, Jun 09, 2026 at 10:45:48AM +0300, kstzavertaylo wrote:
> Hello,
> I have been working on an ML-KEM (FIPS 203) implementation for the
> Linux kernel. This is an early RFC to solicit feedback on the overall
> design and architecture before further polishing.
> 
> The implementation consists of two closely related variants sharing
> the same core cryptographic logic:
>     1. A userspace implementation accompanied by a set of validation
> programs, including NIST KAT vectors, timing-leakage testing (dudect),
> pool stress tests, and additional functional tests.
>     2. A Linux kernel module implementing the KPP interface and
> reusing the same core architecture where possible.
> 
> Key features include:
>    1. Support for all three parameter sets: ML-KEM-512, ML-KEM-768,
> and ML-KEM-1024.
>    2. The implementation uses a reusable decapsulation pool consisting
> of preallocated slots associated with a key context. The goal of this
> design is to move memory allocation to key initialization and avoid
> per-decapsulation allocations.
>    3. Explicit zeroization of sensitive data and constant-time
> operations where required.
>    4. Portable C11 codebase with minimal differences between userspace
> and kernel versions.
> 
> I am aware that some aspects (local SHA3/SHAKE implementation, coding
> style, etc.) will likely need adjustment to align with upstream
> expectations.
> 
> At this stage, I would like to ask for feedback on the following points:
>    1. Is the general direction (KPP integration + reusable
> decapsulation pool) acceptable?
>    2. Are there any fundamental concerns with the pool-based architecture?
>    3. Would you prefer to reuse kernel crypto primitives for
> SHA3/SHAKE, or is the current embedded approach acceptable at this
> stage?
> 
> The implementation is available at: repository - https://github.com/kstzv/ml-kem
> 
> Documentation and implementation details are available in the repository.
> 
> Any feedback, criticism or suggestions would be greatly appreciated.

There's already a kernel patchset for ML-KEM and X-Wing ready to go:
https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/T/#u
It's a high quality implementation that fully follows kernel conventions
already.  There just hasn't been a reason to merge it yet, since there's
no user yet.

We could consider replacing my ML-KEM implementation (patch 1 of that
series) with a different one.  But it would have to be a high-quality
implementation that brings something substantially new to the table.

I think only an integration of
https://github.com/pq-code-package/mlkem-native *might* have a chance at
passing that bar.  However, it would be way more code than my
implementation, would have significant integration challenges, and would
need some fixing up to work in the kernel.  The main benefit would be
getting the assembly code, but it's not clear that will be needed.  So
those are some of the reasons I didn't reach for that initially.

I don't think integrating https://github.com/kstzv/ml-kem would be
beneficial, for a number of reasons.

Anyway, I suggest you review the pre-existing patchset
https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/
and give feedback on that, if you have any.

- Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool
  2026-06-09 19:25 ` Eric Biggers
@ 2026-06-12 14:14   ` kstzavertaylo
  2026-06-12 18:32     ` Eric Biggers
  0 siblings, 1 reply; 4+ messages in thread
From: kstzavertaylo @ 2026-06-12 14:14 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, herbert

Thank you for the detailed reply and for pointing me to the existing
ML-KEM/X-Wing patchset. I spent some time reviewing the implementation
to better understand the design choices and how they compare to the
approach I took in my own work.

After reviewing the patchset, I can see several strengths in the
implementation. It integrates cleanly into the existing lib/crypto
infrastructure, reuses kernel cryptographic primitives, avoids large
stack allocations, and includes KUnit-based validation. The
implementation also appears intentionally compact and well aligned
with existing kernel conventions.

While reviewing the implementation, I noticed that decapsulation
allocates a temporary workspace for each operation. This is one of the
areas where my design diverged, which is what originally motivated the
reusable pool approach.

My implementation was developed with a somewhat different goal in
mind. I experimented with a reusable decapsulation workspace model
where memory is allocated during key initialization and then reused
across subsequent decapsulation operations. The main motivation was
reducing allocation frequency and minimizing both stack usage and
repeated memory management during decapsulation.

As a result, the implementation avoids allocations during
decapsulation entirely by reusing preallocated workspaces associated
with the key context. My original hypothesis was that moving memory
allocation to key initialization, thereby eliminating allocations from
the decapsulation path, could reduce allocation overhead during
repeated decapsulation operations and be beneficial in environments
where allocation activity is considered undesirable.

Another difference is the integration level. My prototype explored
direct integration through the KPP interface, whereas the patchset
focuses on providing a reusable cryptographic library component within
lib/crypto. These approaches address somewhat different layers of the
kernel crypto stack.

The primary reason I initially started working on this implementation
was to explore whether a reusable-workspace architecture could be
useful in environments where allocation frequency and memory reuse are
considered important design factors. I therefore wanted to understand
whether such an approach might offer any practical value within the
kernel context, even if the overall implementation strategy differs
from the existing patchset.

The goal is to analyze the results and understand whether the
reusable-workspace approach actually achieves its intended goals in
terms of memory usage, allocation behavior, throughput, and related
metrics. In particular, I am interested in understanding whether such
an approach may provide practical benefits in environments where stack
space is constrained or where reducing allocation activity is
desirable. To better evaluate these tradeoffs, I am currently
preparing a comparison against several established ML-KEM
implementations. If such data would be useful for the discussion, I
would be happy to share the results once they are available.

Best regards,
K. Zavertailo

On Tue, Jun 9, 2026 at 10:25 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Jun 09, 2026 at 10:45:48AM +0300, kstzavertaylo wrote:
> > Hello,
> > I have been working on an ML-KEM (FIPS 203) implementation for the
> > Linux kernel. This is an early RFC to solicit feedback on the overall
> > design and architecture before further polishing.
> >
> > The implementation consists of two closely related variants sharing
> > the same core cryptographic logic:
> >     1. A userspace implementation accompanied by a set of validation
> > programs, including NIST KAT vectors, timing-leakage testing (dudect),
> > pool stress tests, and additional functional tests.
> >     2. A Linux kernel module implementing the KPP interface and
> > reusing the same core architecture where possible.
> >
> > Key features include:
> >    1. Support for all three parameter sets: ML-KEM-512, ML-KEM-768,
> > and ML-KEM-1024.
> >    2. The implementation uses a reusable decapsulation pool consisting
> > of preallocated slots associated with a key context. The goal of this
> > design is to move memory allocation to key initialization and avoid
> > per-decapsulation allocations.
> >    3. Explicit zeroization of sensitive data and constant-time
> > operations where required.
> >    4. Portable C11 codebase with minimal differences between userspace
> > and kernel versions.
> >
> > I am aware that some aspects (local SHA3/SHAKE implementation, coding
> > style, etc.) will likely need adjustment to align with upstream
> > expectations.
> >
> > At this stage, I would like to ask for feedback on the following points:
> >    1. Is the general direction (KPP integration + reusable
> > decapsulation pool) acceptable?
> >    2. Are there any fundamental concerns with the pool-based architecture?
> >    3. Would you prefer to reuse kernel crypto primitives for
> > SHA3/SHAKE, or is the current embedded approach acceptable at this
> > stage?
> >
> > The implementation is available at: repository - https://github.com/kstzv/ml-kem
> >
> > Documentation and implementation details are available in the repository.
> >
> > Any feedback, criticism or suggestions would be greatly appreciated.
>
> There's already a kernel patchset for ML-KEM and X-Wing ready to go:
> https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/T/#u
> It's a high quality implementation that fully follows kernel conventions
> already.  There just hasn't been a reason to merge it yet, since there's
> no user yet.
>
> We could consider replacing my ML-KEM implementation (patch 1 of that
> series) with a different one.  But it would have to be a high-quality
> implementation that brings something substantially new to the table.
>
> I think only an integration of
> https://github.com/pq-code-package/mlkem-native *might* have a chance at
> passing that bar.  However, it would be way more code than my
> implementation, would have significant integration challenges, and would
> need some fixing up to work in the kernel.  The main benefit would be
> getting the assembly code, but it's not clear that will be needed.  So
> those are some of the reasons I didn't reach for that initially.
>
> I don't think integrating https://github.com/kstzv/ml-kem would be
> beneficial, for a number of reasons.
>
> Anyway, I suggest you review the pre-existing patchset
> https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/
> and give feedback on that, if you have any.
>
> - Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool
  2026-06-12 14:14   ` kstzavertaylo
@ 2026-06-12 18:32     ` Eric Biggers
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Biggers @ 2026-06-12 18:32 UTC (permalink / raw)
  To: kstzavertaylo; +Cc: linux-crypto, herbert

On Fri, Jun 12, 2026 at 05:14:54PM +0300, kstzavertaylo wrote:
> Thank you for the detailed reply and for pointing me to the existing
> ML-KEM/X-Wing patchset. I spent some time reviewing the implementation
> to better understand the design choices and how they compare to the
> approach I took in my own work.
> 
> After reviewing the patchset, I can see several strengths in the
> implementation. It integrates cleanly into the existing lib/crypto
> infrastructure, reuses kernel cryptographic primitives, avoids large
> stack allocations, and includes KUnit-based validation. The
> implementation also appears intentionally compact and well aligned
> with existing kernel conventions.
> 
> While reviewing the implementation, I noticed that decapsulation
> allocates a temporary workspace for each operation. This is one of the
> areas where my design diverged, which is what originally motivated the
> reusable pool approach.
> 
> My implementation was developed with a somewhat different goal in
> mind. I experimented with a reusable decapsulation workspace model
> where memory is allocated during key initialization and then reused
> across subsequent decapsulation operations. The main motivation was
> reducing allocation frequency and minimizing both stack usage and
> repeated memory management during decapsulation.
>
> As a result, the implementation avoids allocations during
> decapsulation entirely by reusing preallocated workspaces associated
> with the key context. My original hypothesis was that moving memory
> allocation to key initialization, thereby eliminating allocations from
> the decapsulation path, could reduce allocation overhead during
> repeated decapsulation operations and be beneficial in environments
> where allocation activity is considered undesirable.

In my ML-KEM code, all the decapsulation memory is consolidated into
struct mlkem_decap_workspace.  It would be straightforward to support
the caller providing a pre-allocated workspace.

In the case of X-Wing, we could also support pre-expanding the
decapsulation key.

It just depends on what is actually going to be needed by the kernel
feature(s) that are going to use this.  Which we don't really know yet.

We do know that it hasn't been found to be useful for the crypto
subsystem to provide pools for any other algorithm in the kernel, for a
variety of reasons.  Usually callers can just allocate per-operation, or
they have some sort of object (inode, block device, socket, etc.) that's
a natural place for them to cache whatever they need anyway.  In the
rare cases where some sort of pool is needed it's implemented in the
caller, optimized for the particular use case.  So I think there's a
good chance your pool idea is going off on the wrong track.

> Another difference is the integration level. My prototype explored
> direct integration through the KPP interface, whereas the patchset
> focuses on providing a reusable cryptographic library component within
> lib/crypto. These approaches address somewhat different layers of the
> kernel crypto stack.

We don't need crypto_kpp support, as it's much more complex and harder
to use than the crypto library
(https://docs.kernel.org/crypto/libcrypto.html).  Also it seems it's not
really possible anyway, since crypto_kpp is an old design that works for
Diffie-Hellman but not KEMs.

- Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-12 18:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09  7:45 [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool kstzavertaylo
2026-06-09 19:25 ` Eric Biggers
2026-06-12 14:14   ` kstzavertaylo
2026-06-12 18:32     ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox