* [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool
@ 2026-06-09 7:45 kstzavertaylo
2026-06-09 19:25 ` Eric Biggers
0 siblings, 1 reply; 4+ messages in thread
From: kstzavertaylo @ 2026-06-09 7:45 UTC (permalink / raw)
To: linux-crypto, herbert, ebiggers
Hello,
I have been working on an ML-KEM (FIPS 203) implementation for the
Linux kernel. This is an early RFC to solicit feedback on the overall
design and architecture before further polishing.
The implementation consists of two closely related variants sharing
the same core cryptographic logic:
1. A userspace implementation accompanied by a set of validation
programs, including NIST KAT vectors, timing-leakage testing (dudect),
pool stress tests, and additional functional tests.
2. A Linux kernel module implementing the KPP interface and
reusing the same core architecture where possible.
Key features include:
1. Support for all three parameter sets: ML-KEM-512, ML-KEM-768,
and ML-KEM-1024.
2. The implementation uses a reusable decapsulation pool consisting
of preallocated slots associated with a key context. The goal of this
design is to move memory allocation to key initialization and avoid
per-decapsulation allocations.
3. Explicit zeroization of sensitive data and constant-time
operations where required.
4. Portable C11 codebase with minimal differences between userspace
and kernel versions.
I am aware that some aspects (local SHA3/SHAKE implementation, coding
style, etc.) will likely need adjustment to align with upstream
expectations.
At this stage, I would like to ask for feedback on the following points:
1. Is the general direction (KPP integration + reusable
decapsulation pool) acceptable?
2. Are there any fundamental concerns with the pool-based architecture?
3. Would you prefer to reuse kernel crypto primitives for
SHA3/SHAKE, or is the current embedded approach acceptable at this
stage?
The implementation is available at: repository - https://github.com/kstzv/ml-kem
Documentation and implementation details are available in the repository.
Any feedback, criticism or suggestions would be greatly appreciated.
Thank you for your time.
Best regards,
K. Zavertailo
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool 2026-06-09 7:45 [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool kstzavertaylo @ 2026-06-09 19:25 ` Eric Biggers 2026-06-12 14:14 ` kstzavertaylo 0 siblings, 1 reply; 4+ messages in thread From: Eric Biggers @ 2026-06-09 19:25 UTC (permalink / raw) To: kstzavertaylo; +Cc: linux-crypto, herbert On Tue, Jun 09, 2026 at 10:45:48AM +0300, kstzavertaylo wrote: > Hello, > I have been working on an ML-KEM (FIPS 203) implementation for the > Linux kernel. This is an early RFC to solicit feedback on the overall > design and architecture before further polishing. > > The implementation consists of two closely related variants sharing > the same core cryptographic logic: > 1. A userspace implementation accompanied by a set of validation > programs, including NIST KAT vectors, timing-leakage testing (dudect), > pool stress tests, and additional functional tests. > 2. A Linux kernel module implementing the KPP interface and > reusing the same core architecture where possible. > > Key features include: > 1. Support for all three parameter sets: ML-KEM-512, ML-KEM-768, > and ML-KEM-1024. > 2. The implementation uses a reusable decapsulation pool consisting > of preallocated slots associated with a key context. The goal of this > design is to move memory allocation to key initialization and avoid > per-decapsulation allocations. > 3. Explicit zeroization of sensitive data and constant-time > operations where required. > 4. Portable C11 codebase with minimal differences between userspace > and kernel versions. > > I am aware that some aspects (local SHA3/SHAKE implementation, coding > style, etc.) will likely need adjustment to align with upstream > expectations. > > At this stage, I would like to ask for feedback on the following points: > 1. Is the general direction (KPP integration + reusable > decapsulation pool) acceptable? > 2. Are there any fundamental concerns with the pool-based architecture? > 3. Would you prefer to reuse kernel crypto primitives for > SHA3/SHAKE, or is the current embedded approach acceptable at this > stage? > > The implementation is available at: repository - https://github.com/kstzv/ml-kem > > Documentation and implementation details are available in the repository. > > Any feedback, criticism or suggestions would be greatly appreciated. There's already a kernel patchset for ML-KEM and X-Wing ready to go: https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/T/#u It's a high quality implementation that fully follows kernel conventions already. There just hasn't been a reason to merge it yet, since there's no user yet. We could consider replacing my ML-KEM implementation (patch 1 of that series) with a different one. But it would have to be a high-quality implementation that brings something substantially new to the table. I think only an integration of https://github.com/pq-code-package/mlkem-native *might* have a chance at passing that bar. However, it would be way more code than my implementation, would have significant integration challenges, and would need some fixing up to work in the kernel. The main benefit would be getting the assembly code, but it's not clear that will be needed. So those are some of the reasons I didn't reach for that initially. I don't think integrating https://github.com/kstzv/ml-kem would be beneficial, for a number of reasons. Anyway, I suggest you review the pre-existing patchset https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/ and give feedback on that, if you have any. - Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool 2026-06-09 19:25 ` Eric Biggers @ 2026-06-12 14:14 ` kstzavertaylo 2026-06-12 18:32 ` Eric Biggers 0 siblings, 1 reply; 4+ messages in thread From: kstzavertaylo @ 2026-06-12 14:14 UTC (permalink / raw) To: Eric Biggers; +Cc: linux-crypto, herbert Thank you for the detailed reply and for pointing me to the existing ML-KEM/X-Wing patchset. I spent some time reviewing the implementation to better understand the design choices and how they compare to the approach I took in my own work. After reviewing the patchset, I can see several strengths in the implementation. It integrates cleanly into the existing lib/crypto infrastructure, reuses kernel cryptographic primitives, avoids large stack allocations, and includes KUnit-based validation. The implementation also appears intentionally compact and well aligned with existing kernel conventions. While reviewing the implementation, I noticed that decapsulation allocates a temporary workspace for each operation. This is one of the areas where my design diverged, which is what originally motivated the reusable pool approach. My implementation was developed with a somewhat different goal in mind. I experimented with a reusable decapsulation workspace model where memory is allocated during key initialization and then reused across subsequent decapsulation operations. The main motivation was reducing allocation frequency and minimizing both stack usage and repeated memory management during decapsulation. As a result, the implementation avoids allocations during decapsulation entirely by reusing preallocated workspaces associated with the key context. My original hypothesis was that moving memory allocation to key initialization, thereby eliminating allocations from the decapsulation path, could reduce allocation overhead during repeated decapsulation operations and be beneficial in environments where allocation activity is considered undesirable. Another difference is the integration level. My prototype explored direct integration through the KPP interface, whereas the patchset focuses on providing a reusable cryptographic library component within lib/crypto. These approaches address somewhat different layers of the kernel crypto stack. The primary reason I initially started working on this implementation was to explore whether a reusable-workspace architecture could be useful in environments where allocation frequency and memory reuse are considered important design factors. I therefore wanted to understand whether such an approach might offer any practical value within the kernel context, even if the overall implementation strategy differs from the existing patchset. The goal is to analyze the results and understand whether the reusable-workspace approach actually achieves its intended goals in terms of memory usage, allocation behavior, throughput, and related metrics. In particular, I am interested in understanding whether such an approach may provide practical benefits in environments where stack space is constrained or where reducing allocation activity is desirable. To better evaluate these tradeoffs, I am currently preparing a comparison against several established ML-KEM implementations. If such data would be useful for the discussion, I would be happy to share the results once they are available. Best regards, K. Zavertailo On Tue, Jun 9, 2026 at 10:25 PM Eric Biggers <ebiggers@kernel.org> wrote: > > On Tue, Jun 09, 2026 at 10:45:48AM +0300, kstzavertaylo wrote: > > Hello, > > I have been working on an ML-KEM (FIPS 203) implementation for the > > Linux kernel. This is an early RFC to solicit feedback on the overall > > design and architecture before further polishing. > > > > The implementation consists of two closely related variants sharing > > the same core cryptographic logic: > > 1. A userspace implementation accompanied by a set of validation > > programs, including NIST KAT vectors, timing-leakage testing (dudect), > > pool stress tests, and additional functional tests. > > 2. A Linux kernel module implementing the KPP interface and > > reusing the same core architecture where possible. > > > > Key features include: > > 1. Support for all three parameter sets: ML-KEM-512, ML-KEM-768, > > and ML-KEM-1024. > > 2. The implementation uses a reusable decapsulation pool consisting > > of preallocated slots associated with a key context. The goal of this > > design is to move memory allocation to key initialization and avoid > > per-decapsulation allocations. > > 3. Explicit zeroization of sensitive data and constant-time > > operations where required. > > 4. Portable C11 codebase with minimal differences between userspace > > and kernel versions. > > > > I am aware that some aspects (local SHA3/SHAKE implementation, coding > > style, etc.) will likely need adjustment to align with upstream > > expectations. > > > > At this stage, I would like to ask for feedback on the following points: > > 1. Is the general direction (KPP integration + reusable > > decapsulation pool) acceptable? > > 2. Are there any fundamental concerns with the pool-based architecture? > > 3. Would you prefer to reuse kernel crypto primitives for > > SHA3/SHAKE, or is the current embedded approach acceptable at this > > stage? > > > > The implementation is available at: repository - https://github.com/kstzv/ml-kem > > > > Documentation and implementation details are available in the repository. > > > > Any feedback, criticism or suggestions would be greatly appreciated. > > There's already a kernel patchset for ML-KEM and X-Wing ready to go: > https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/T/#u > It's a high quality implementation that fully follows kernel conventions > already. There just hasn't been a reason to merge it yet, since there's > no user yet. > > We could consider replacing my ML-KEM implementation (patch 1 of that > series) with a different one. But it would have to be a high-quality > implementation that brings something substantially new to the table. > > I think only an integration of > https://github.com/pq-code-package/mlkem-native *might* have a chance at > passing that bar. However, it would be way more code than my > implementation, would have significant integration challenges, and would > need some fixing up to work in the kernel. The main benefit would be > getting the assembly code, but it's not clear that will be needed. So > those are some of the reasons I didn't reach for that initially. > > I don't think integrating https://github.com/kstzv/ml-kem would be > beneficial, for a number of reasons. > > Anyway, I suggest you review the pre-existing patchset > https://lore.kernel.org/linux-crypto/20260525184403.101818-1-ebiggers@kernel.org/ > and give feedback on that, if you have any. > > - Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool 2026-06-12 14:14 ` kstzavertaylo @ 2026-06-12 18:32 ` Eric Biggers 0 siblings, 0 replies; 4+ messages in thread From: Eric Biggers @ 2026-06-12 18:32 UTC (permalink / raw) To: kstzavertaylo; +Cc: linux-crypto, herbert On Fri, Jun 12, 2026 at 05:14:54PM +0300, kstzavertaylo wrote: > Thank you for the detailed reply and for pointing me to the existing > ML-KEM/X-Wing patchset. I spent some time reviewing the implementation > to better understand the design choices and how they compare to the > approach I took in my own work. > > After reviewing the patchset, I can see several strengths in the > implementation. It integrates cleanly into the existing lib/crypto > infrastructure, reuses kernel cryptographic primitives, avoids large > stack allocations, and includes KUnit-based validation. The > implementation also appears intentionally compact and well aligned > with existing kernel conventions. > > While reviewing the implementation, I noticed that decapsulation > allocates a temporary workspace for each operation. This is one of the > areas where my design diverged, which is what originally motivated the > reusable pool approach. > > My implementation was developed with a somewhat different goal in > mind. I experimented with a reusable decapsulation workspace model > where memory is allocated during key initialization and then reused > across subsequent decapsulation operations. The main motivation was > reducing allocation frequency and minimizing both stack usage and > repeated memory management during decapsulation. > > As a result, the implementation avoids allocations during > decapsulation entirely by reusing preallocated workspaces associated > with the key context. My original hypothesis was that moving memory > allocation to key initialization, thereby eliminating allocations from > the decapsulation path, could reduce allocation overhead during > repeated decapsulation operations and be beneficial in environments > where allocation activity is considered undesirable. In my ML-KEM code, all the decapsulation memory is consolidated into struct mlkem_decap_workspace. It would be straightforward to support the caller providing a pre-allocated workspace. In the case of X-Wing, we could also support pre-expanding the decapsulation key. It just depends on what is actually going to be needed by the kernel feature(s) that are going to use this. Which we don't really know yet. We do know that it hasn't been found to be useful for the crypto subsystem to provide pools for any other algorithm in the kernel, for a variety of reasons. Usually callers can just allocate per-operation, or they have some sort of object (inode, block device, socket, etc.) that's a natural place for them to cache whatever they need anyway. In the rare cases where some sort of pool is needed it's implemented in the caller, optimized for the particular use case. So I think there's a good chance your pool idea is going off on the wrong track. > Another difference is the integration level. My prototype explored > direct integration through the KPP interface, whereas the patchset > focuses on providing a reusable cryptographic library component within > lib/crypto. These approaches address somewhat different layers of the > kernel crypto stack. We don't need crypto_kpp support, as it's much more complex and harder to use than the crypto library (https://docs.kernel.org/crypto/libcrypto.html). Also it seems it's not really possible anyway, since crypto_kpp is an old design that works for Diffie-Hellman but not KEMs. - Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-12 18:32 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-09 7:45 [RFC] ML-KEM (FIPS 203) implementation with reusable decapsulation pool kstzavertaylo 2026-06-09 19:25 ` Eric Biggers 2026-06-12 14:14 ` kstzavertaylo 2026-06-12 18:32 ` Eric Biggers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox