Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH RFC 10/12] KVM: guest_memfd: Clarify comment about gmem.file vs kvm->srcu
From: Sean Christopherson @ 2026-06-25 18:19 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-10-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Clarify the existing comment about synchronize_srcu() and
> kvm_gmem_get_pfn() to provide further context. Explain which
> synchronize_srcu() prevents races with how kvm_gmem_get_pfn() is used.
> 
> Also point reader to documentation for better understanding.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  virt/kvm/guest_memfd.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b28..f2218db0af980 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -711,8 +711,13 @@ static void __kvm_gmem_unbind(struct kvm_memory_slot *slot, struct gmem_file *f)
>  	xa_store_range(&f->bindings, start, end - 1, NULL, GFP_KERNEL);
>  
>  	/*
> -	 * synchronize_srcu(&kvm->srcu) ensured that kvm_gmem_get_pfn()
> -	 * cannot see this memslot.
> +	 * This is called when memslots are updated, after the old
> +	 * memslot container is no longer in
> +	 * use. synchronize_srcu(&kvm->srcu) was called there, so
> +	 * kvm_gmem_get_pfn() from KVM's guest fault handling cannot
> +	 * see this memslot. See Documentation/virt/kvm/locking.rst
> +	 * for more information about kvm->srcu and the memslots
> +	 * container.

If we want to add to this comment, I would much rather do so as part of an update
to kvm_gmem_release()'s comment as well.

https://lore.kernel.org/all/20251113232229.1698886-1-seanjc@google.com

^ permalink raw reply

* Re: [PATCH RFC 08/12] Documentation: KVM: Add example for kvm->srcu in relation to mutex/lock
From: Sean Christopherson @ 2026-06-25 18:17 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-8-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Add example of where vcpu->mutex and kvm->slots_lock are held while calling
> synchronize_srcu(&kvm->srcu) to concretely show where the synchronization
> primitives overlap.

Sorry, but NAK.  This is too x86-centric, and IMO the risk of the documentation
becoming stale and confusing outweighs any benefits from providing an incomplete
example.  Because like the kvm_usage_count stuff, I know the code in question,
and the example confused me and makes it harder to understand the rule(s).

^ permalink raw reply

* Re: [PATCH RFC 03/12] Documentation: KVM: Consolidate notes about kvm->slots_lock and irq_lock
From: Sean Christopherson @ 2026-06-25 18:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-3-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Move the detail about ordering between kvm->slots_lock and kvm->irq_lock to
> where the two locks are first mentioned.

Why?

^ permalink raw reply

* Re: [PATCH RFC 02/12] Documentation: KVM: Consolidate notes about cpu_read_lock() and kvm_lock
From: Sean Christopherson @ 2026-06-25 18:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-2-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Move the detail about cpu_read_lock() and kvm_lock to where the acquisition
> order is mentioned.

Why?

^ permalink raw reply

* Re: [PATCH RFC 01/12] Documentation: KVM: Elaborate comment on kvm_usage_lock
From: Sean Christopherson @ 2026-06-25 18:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-1-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> The original comment talks about cpus_read_lock() and kvm_usage_count, but
> doesn't explain why they are related.
> 
> Elaborate comment on kvm_usage_lock to provide more context.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  Documentation/virt/kvm/locking.rst | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 662231e958a07..5564c8b38b9cc 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -248,8 +248,23 @@ time it will be set using the Dirty tracking mechanism described above.
>  :Arch:		any
>  :Protects:	- kvm_usage_count
>  		- hardware virtualization enable/disable
> -:Comment:	Exists to allow taking cpus_read_lock() while kvm_usage_count is
> -		protected, which simplifies the virtualization enabling logic.
> +:Comment:       ``kvm_usage_count`` serves to deduplicate hardware
> +    virtualization enabling and disabling requests from different VMs
> +    being created.

kvm_usage_count does that and more, i.e. this is 'wrong" by being incomplete. 

> +
> +    Hardware virtualization enabling/disabling requires taking
> +    ``cpus_read_lock()``.
> +
> +    ``kvm_lock`` used to also protect ``kvm_usage_count``, but other
> +    parts of the Linux kernel holding ``cpus_read_lock()`` need to
> +    call into KVM to ensure that VM state remains consistent with the
> +    host's state. For example, when the CPU frequency changes, KVM is
> +    notified. ``kvmclock_cpufreq_notifier()`` takes ``kvm_lock`` to
> +    iterate ``vm_list``.
> +
> +    To decouple these, use different locks, ``kvm_lock`` for
> +    ``vm_list`` and ``kvm_usage_lock`` for enabling/disabling hardware
> +    virtualization.

I appreciate the effort, but honestly I think this does more harm than good.  I
already know what this code does, and the above confused me more than anything.

>  
>  ``kvm->mn_invalidate_lock``
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> -- 
> 2.54.0.823.g6e5bcc1fc9-goog
> 

^ permalink raw reply

* Re: [PATCH 00/19] crypto: cmh - add CRI CryptoManager Hub driver
From: Eric Biggers @ 2026-06-25 18:05 UTC (permalink / raw)
  To: Saravanakrishnan Krishnamoorthy
  Cc: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Shuah Khan, Alexandre Ghiti,
	devicetree, Joel Wittenauer, linux-api, linux-crypto, linux-doc,
	linux-kernel, linux-kselftest, linux-riscv, Shuah Khan,
	sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

On Thu, Jun 25, 2026 at 10:33:08AM -0700, Saravanakrishnan Krishnamoorthy wrote:
> ** This message and any attachments are for the sole use of the
> intended recipient(s). It may contain information that is confidential
> and privileged. If you are not the intended recipient of this message,
> you are prohibited from printing, copying, forwarding or saving it.
> Please delete the message and attachments and notify the sender
> immediately. **

Okay, I deleted it.

- Eric

^ permalink raw reply

* Re: [PATCH v4 2/5] mm/zswap: Factor writeback loop out of shrink_worker()
From: Yosry Ahmed @ 2026-06-25 17:59 UTC (permalink / raw)
  To: Hao Jia
  Cc: akpm, tj, hannes, shakeel.butt, mhocko, mkoutny, nphamcs,
	chengming.zhou, muchun.song, roman.gushchin, linux-mm,
	linux-kernel, linux-doc, Hao Jia
In-Reply-To: <91297bc0-268c-e9c2-57ae-6066eee5df2f@gmail.com>

> >> static long zswap_shrink_one(struct mem_cgroup *memcg,
> >>                    struct zswap_shrink_state *s)
> >> {
> >>       long shrunk;
> >>
> >>       shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
> >>       if (shrunk == -ENOENT)
> >>           return 0;
> >>
> >>       s->attempts++;
> >>       if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
> >>           s->stop = true;
> >
> > Do we need 'stop' or can we just return a value here to indicate that
> > we should stop (e.g. -EBUSY)?
> >
>
> Perhaps we could return -EAGAIN instead of -EBUSY? This would align with
> the semantics of the memory.reclaim interface, which returns -EAGAIN
> when it reclaims fewer bytes than requested.

Hmm but -EAGAIN tells the caller to try again, while here -EAGAIN
tells the caller *not* to try again because we exhausted all retries?

> >
> > I think splitting the shrink/retry logic over 2 functions makes it
> > more difficult to follow, so yeah I think fold
> > zswap_shrink_no_candidate() into zswap_shrink_one(). Then the callers
> > only need to iterate memcgs (depending on the context) and call
> > zswap_shrink_one() for each of them.
>
> So, something like this?

Yeah, something like this :)

> /* Track progress of a memcg-tree writeback walk. */
> struct zswap_shrink_state {
>      int attempts;

While at it, I think "attempts" is really the number of scans, right?
Should we rename it? Maybe "scans" or similar?

>      int failures;
> };
>
> /*
>   * Take one step of a memcg-tree writeback walk driven by the caller's
>   * iterator, and fold the result into @s, the retry bookkeeping shared
>   * across steps. @memcg is the iterator's current memcg, or NULL once
>   * it has wrapped around after a full pass over the tree.
>   *
>   * The function returns -EAGAIN to signal the caller to abort the walk
>   * after encountering the following conditions MAX_RECLAIM_RETRIES times:
>   * - No writeback-candidate memcgs were found in a memcg tree walk.
>   * - Shrinking a writeback-candidate memcg failed.

Orthogonal to this patch, but I wonder if this can be simplified. I
wonder if these two conditions can be replaced with "shrinking a memcg
that has zswap entries failed". The "no writeback-candidate memcgs in
the tree" case seems like we should abort right away instead of
retrying?

Nhat, WDYT?

>   *
>   * Return: The number of compressed bytes written back (>= 0), or -EAGAIN
>   * once the retry budget is exhausted and the caller should abort the walk.
>   */
> static long zswap_shrink_one(struct mem_cgroup *memcg,

Nit: zswap_shrink_one_memcg()

BTW, the existing writeback logic has been broken for a while now when
memcg is disabled. I think we constantly hit the !memcg case and run
out of retries. Not sure if your patch changes this in any way, or if
you want to fix that while you're at it :)

>                   struct zswap_shrink_state *s)
> {
>      long shrunk;
>
>      /*
>       * If the iterator has completed a full pass, update the shrink state
>       * and check whether we should keep going.
>       */
>      if (!memcg) {
>          /*
>           * Continue shrinking without incrementing failures if we found
>           * candidate memcgs in the last tree walk.
>           */
>          if (!s->attempts && ++s->failures == MAX_RECLAIM_RETRIES)
>              return -EAGAIN;
>          s->attempts = 0;
>          return 0;
>      }
>
>      shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
>
>      /*
>       * There are no writeback-candidate pages in the memcg. This is not an
>       * issue as long as we can find another memcg with pages in zswap. Skip
>       * this without incrementing attempts and failures.
>       */
>      if (shrunk == -ENOENT)
>          return 0;
>      s->attempts++;
>
>      if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
>          return -EAGAIN;
>
>      return shrunk;
> }
>
> static void shrink_worker(struct work_struct *w)
> {
>      struct zswap_shrink_state s = {};
>      unsigned long thr;
>
>      /* Reclaim down to the accept threshold */
>      thr = zswap_accept_thr_pages();
>
>      while (zswap_total_pages() > thr) {
>          struct mem_cgroup *memcg;
>          long ret;
>
>          cond_resched();
>
>          memcg = zswap_iter_global();

Do we still need this helper? Or should we just keep the memcg
iteration open-coded?

>          ret = zswap_shrink_one(memcg, &s);
>          /* drop the extra reference taken by zswap_iter_global() */
>          mem_cgroup_put(memcg);
>          if (ret == -EAGAIN)
>              break;
>      }
> }

^ permalink raw reply

* [PATCH] docs: pagemap: fix flags location, member name and sample code
From: Zenghui Yu @ 2026-06-25 17:44 UTC (permalink / raw)
  To: linux-mm, linux-doc, linux-kernel
  Cc: akpm, david, ljs, liam, vbabka, rppt, surenb, mhocko, corbet,
	skhan, Zenghui Yu

The userland visible page flags (KPF_*) were initially moved to
include/linux/kernel-page-flags.h in commit 1a9b5b7fe0c5 ("mm: export
stable page flags"), and later moved to
include/uapi/linux/kernel-page-flags.h in commit 607ca46e97a1 ("UAPI:
(Scripted) Disintegrate include/linux"). Upadte the doc to reflect the
current location of these flags.

The member @walk_end of struct pm_scan_arg {} was wrongly written as
"end_walk".

The first sample code of the PAGEMAP_SCAN ioctl wrongly used the
PM_SCAN_CHECK_WPASYNC flag twice, instead of the PM_SCAN_WP_MATCHING flag.
The second one missed PAGE_IS_FILE in the required mask.

Fix them all together.

Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
---
 Documentation/admin-guide/mm/pagemap.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index c57e61b5d8aa..12ac97d9d277 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -67,7 +67,7 @@ number of times a page is mapped.
  * ``/proc/kpageflags``.  This file contains a 64-bit set of flags for each
    page, indexed by PFN.
 
-   The flags are (from ``fs/proc/page.c``, above kpageflags_read):
+   The flags are (from ``include/uapi/linux/kernel-page-flags.h``):
 
     0. LOCKED
     1. ERROR
@@ -264,7 +264,7 @@ The ``struct pm_scan_arg`` is used as the argument of the IOCTL.
     provided or not.
  3. The range is specified through ``start`` and ``end``.
  4. The walk can abort before visiting the complete range such as the user buffer
-    can get full etc. The walk ending address is specified in``end_walk``.
+    can get full etc. The walk ending address is specified in ``walk_end``.
  5. The output buffer of ``struct page_region`` array and size is specified in
     ``vec`` and ``vec_len``.
  6. The optional maximum requested pages are specified in the ``max_pages``.
@@ -275,7 +275,7 @@ Find pages which have been written and WP them as well::
 
    struct pm_scan_arg arg = {
    .size = sizeof(arg),
-   .flags = PM_SCAN_CHECK_WPASYNC | PM_SCAN_CHECK_WPASYNC,
+   .flags = PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC,
    ..
    .category_mask = PAGE_IS_WRITTEN,
    .return_mask = PAGE_IS_WRITTEN,
@@ -288,7 +288,7 @@ present or huge::
    .size = sizeof(arg),
    .flags = 0,
    ..
-   .category_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED,
+   .category_mask = PAGE_IS_WRITTEN | PAGE_IS_FILE | PAGE_IS_SWAPPED,
    .category_inverted = PAGE_IS_SWAPPED,
    .category_anyof_mask = PAGE_IS_PRESENT | PAGE_IS_HUGE,
    .return_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED |
-- 
2.53.0


^ permalink raw reply related

* [PATCH 02/19] crypto: cmh - add core platform driver
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Add the CRI CryptoManager Hub (CMH) hardware crypto accelerator
core platform driver.  This patch provides:

- Platform driver registration and probe/remove lifecycle
- Hardware configuration and core discovery
- Mailbox Queue Interface (MQI) for VCQ command submission
- Transaction manager with async completion and backlog support
- Result handler with threaded-IRQ completion
- DMA buffer management
- Debugfs instrumentation (when CONFIG_CRYPTO_DEV_CMH_DEBUG=y)
- Sysfs attributes (fw_version, hw_version, product, algorithms)
- Kconfig and Makefile integration

No crypto algorithms are registered yet -- those follow in
subsequent patches.

The driver communicates with the hardware via a mailbox-based VCQ
(Virtual Command Queue) interface.  Each crypto operation is packed
into VCQ command entries, submitted to a mailbox, and completed
asynchronously via interrupt.

MODULE_IMPORT_NS(CRYPTO_INTERNAL) imports two symbols:

  - crypto_cipher_setkey()      [crypto/cipher.c, EXPORT_SYMBOL_NS_GPL]
  - crypto_cipher_encrypt_one() [crypto/cipher.c, EXPORT_SYMBOL_NS_GPL]

These are the single-block cipher API used for software-fallback paths:
CCM empty-input tag computation (2 ECB encryptions + XOR) and XCBC(SM4)
empty-message workaround (3 ECB encryptions + XOR).  No public wrapper
exists; this is the same pattern used by in-tree crypto/ccm.c,
crypto/cmac.c, and crypto/xcbc.c.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 Documentation/ABI/testing/debugfs-driver-cmh  |  155 ++
 Documentation/ABI/testing/sysfs-driver-cmh    |   66 +
 Documentation/crypto/device_drivers/cmh.rst   |  356 +++
 Documentation/crypto/device_drivers/index.rst |    1 +
 drivers/crypto/Kconfig                        |    1 +
 drivers/crypto/Makefile                       |    1 +
 drivers/crypto/cmh/Kconfig                    |   46 +
 drivers/crypto/cmh/Makefile                   |   25 +
 drivers/crypto/cmh/cmh_config.c               |  476 ++++
 drivers/crypto/cmh/cmh_debugfs.c              |  286 +++
 drivers/crypto/cmh/cmh_dma.c                  |  373 ++++
 drivers/crypto/cmh/cmh_main.c                 |  365 +++
 drivers/crypto/cmh/cmh_mqi.c                  |  355 +++
 drivers/crypto/cmh/cmh_rh.c                   | 1145 ++++++++++
 drivers/crypto/cmh/cmh_sysfs.c                |  108 +
 drivers/crypto/cmh/cmh_txn.c                  | 1978 +++++++++++++++++
 drivers/crypto/cmh/include/cmh.h              |   27 +
 drivers/crypto/cmh/include/cmh_aes_abi.h      |   97 +
 drivers/crypto/cmh/include/cmh_ccp_abi.h      |  108 +
 drivers/crypto/cmh/include/cmh_config.h       |   91 +
 drivers/crypto/cmh/include/cmh_debugfs.h      |   90 +
 drivers/crypto/cmh/include/cmh_dma.h          |  219 ++
 drivers/crypto/cmh/include/cmh_drbg_abi.h     |   67 +
 drivers/crypto/cmh/include/cmh_eac_abi.h      |   44 +
 drivers/crypto/cmh/include/cmh_hc_abi.h       |  162 ++
 drivers/crypto/cmh/include/cmh_hcq_abi.h      |  221 ++
 drivers/crypto/cmh/include/cmh_kic_abi.h      |   77 +
 drivers/crypto/cmh/include/cmh_mqi.h          |   36 +
 drivers/crypto/cmh/include/cmh_pke_abi.h      |  272 +++
 drivers/crypto/cmh/include/cmh_qse_abi.h      |  181 ++
 drivers/crypto/cmh/include/cmh_registers.h    |  145 ++
 drivers/crypto/cmh/include/cmh_rh.h           |   93 +
 drivers/crypto/cmh/include/cmh_rng.h          |   31 +
 drivers/crypto/cmh/include/cmh_sm3_abi.h      |   79 +
 drivers/crypto/cmh/include/cmh_sm4_abi.h      |  101 +
 drivers/crypto/cmh/include/cmh_sys_abi.h      |  148 ++
 drivers/crypto/cmh/include/cmh_sysfs.h        |   14 +
 drivers/crypto/cmh/include/cmh_txn.h          |  463 ++++
 drivers/crypto/cmh/include/cmh_vcq.h          |  283 +++
 39 files changed, 8786 insertions(+)
 create mode 100644 Documentation/ABI/testing/debugfs-driver-cmh
 create mode 100644 Documentation/ABI/testing/sysfs-driver-cmh
 create mode 100644 Documentation/crypto/device_drivers/cmh.rst
 create mode 100644 drivers/crypto/cmh/Kconfig
 create mode 100644 drivers/crypto/cmh/Makefile
 create mode 100644 drivers/crypto/cmh/cmh_config.c
 create mode 100644 drivers/crypto/cmh/cmh_debugfs.c
 create mode 100644 drivers/crypto/cmh/cmh_dma.c
 create mode 100644 drivers/crypto/cmh/cmh_main.c
 create mode 100644 drivers/crypto/cmh/cmh_mqi.c
 create mode 100644 drivers/crypto/cmh/cmh_rh.c
 create mode 100644 drivers/crypto/cmh/cmh_sysfs.c
 create mode 100644 drivers/crypto/cmh/cmh_txn.c
 create mode 100644 drivers/crypto/cmh/include/cmh.h
 create mode 100644 drivers/crypto/cmh/include/cmh_aes_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_ccp_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_config.h
 create mode 100644 drivers/crypto/cmh/include/cmh_debugfs.h
 create mode 100644 drivers/crypto/cmh/include/cmh_dma.h
 create mode 100644 drivers/crypto/cmh/include/cmh_drbg_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_eac_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_hc_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_hcq_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_kic_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_mqi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_pke_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_qse_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_registers.h
 create mode 100644 drivers/crypto/cmh/include/cmh_rh.h
 create mode 100644 drivers/crypto/cmh/include/cmh_rng.h
 create mode 100644 drivers/crypto/cmh/include/cmh_sm3_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_sm4_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_sys_abi.h
 create mode 100644 drivers/crypto/cmh/include/cmh_sysfs.h
 create mode 100644 drivers/crypto/cmh/include/cmh_txn.h
 create mode 100644 drivers/crypto/cmh/include/cmh_vcq.h

diff --git a/Documentation/ABI/testing/debugfs-driver-cmh b/Documentation/ABI/testing/debugfs-driver-cmh
new file mode 100644
index 000000000000..3bbf903a4511
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-driver-cmh
@@ -0,0 +1,155 @@
+What:          /sys/kernel/debug/cmh/mbx<N>/vcqs_submitted
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Total number of VCQ command entries submitted to
+               mailbox N since the driver was loaded.
+
+What:          /sys/kernel/debug/cmh/mbx<N>/vcqs_completed
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Total number of VCQ command completions received
+               from mailbox N.
+
+What:          /sys/kernel/debug/cmh/mbx<N>/vcqs_errors
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Total number of error completions received from
+               mailbox N.
+
+What:          /sys/kernel/debug/cmh/mbx<N>/queue_full_count
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Number of times the transaction manager skipped
+               mailbox N because its in-flight queue was full.
+
+What:          /sys/kernel/debug/cmh/mbx<N>/max_queue_depth
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) High-water mark of in-flight transactions on
+               mailbox N.
+
+What:          /sys/kernel/debug/cmh/mbx<N>/inject_abort
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (WO) Write any value to inject an MBX_COMMAND_ABORT on
+               mailbox N.  The abort triggers error-IRQ handling that
+               completes all in-flight transactions with -EIO and then
+               issues MBX_COMMAND_RESTART to resume the mailbox.
+               Only available when CONFIG_CRYPTO_DEV_CMH_DEBUG is enabled.
+
+What:          /sys/kernel/debug/cmh/mbx<N>/force_drain
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (WO) Write any value to unconditionally FLUSH and drain
+               all pending transactions on mailbox N, completing each
+               with -ECANCELED, and reset all recovery bookkeeping
+               (including the wedged flag).  The mailbox is re-enabled
+               for new work immediately; no hardware health verification
+               is performed.  Use as a last-resort recovery when the eSW
+               is unresponsive and normal ABORT/RESTART escalation has
+               not recovered the mailbox.
+               Only available when CONFIG_CRYPTO_DEV_CMH_DEBUG is enabled.
+
+What:          /sys/kernel/debug/cmh/tm/cmq_posts
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Total number of cmh_tm_post_command() calls (one
+               per crypto request submitted to the transaction manager).
+
+What:          /sys/kernel/debug/cmh/tm/cmq_depth_max
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) High-water mark of the command queue length.
+
+What:          /sys/kernel/debug/cmh/tm/cmq_eagain_count
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Number of times the command queue was full and
+               returned -EAGAIN to the caller.
+
+What:          /sys/kernel/debug/cmh/tm/backoff_count
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Number of times the transaction manager backed off
+               because all mailbox queues were full.
+
+What:          /sys/kernel/debug/cmh/tm/async_timeout_count
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RO) Number of async crypto requests that timed out
+               waiting for hardware completion.
+
+What:          /sys/kernel/debug/cmh/config/async_timeout_ms
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RW) Async request timeout in milliseconds.  On timeout
+               the driver issues MBX_COMMAND_ABORT; if the eSW is
+               unresponsive, the watchdog escalates through RESTART,
+               FLUSH, and force-drain to bound D-state duration.
+
+What:          /sys/kernel/debug/cmh/config/vcq_timeout_ms
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RW) VCQ command timeout in milliseconds.
+
+What:          /sys/kernel/debug/cmh/config/slow_op_timeout_ms
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RW) Slow-operation timeout in milliseconds.  Used for
+               operations known to take longer (e.g. RSA key generation,
+               PQC key generation).
+
+What:          /sys/kernel/debug/cmh/config/drain_timeout_ms
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RW) Drain timeout in milliseconds.  Maximum time to wait
+               for all in-flight transactions to complete during driver
+               removal or suspend.
+
+What:          /sys/kernel/debug/cmh/config/watchdog_ms
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RW) Result-handler watchdog interval in milliseconds.
+               Detects missed IRQs, stuck mailboxes, and abort-stall
+               conditions.  Clamped to a 10 ms minimum.
+
+What:          /sys/kernel/debug/cmh/config/drbg_timeout_ms
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               (RW) DRBG self-seed timeout in milliseconds.
diff --git a/Documentation/ABI/testing/sysfs-driver-cmh b/Documentation/ABI/testing/sysfs-driver-cmh
new file mode 100644
index 000000000000..62e593fac6fe
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-cmh
@@ -0,0 +1,66 @@
+What:          /sys/devices/platform/<dev>/fw_version
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               Reports the CryptoManager Hub embedded software (eSW) firmware
+               version as a 32-bit hexadecimal value read from the SIC
+               SW_VERSION register.
+
+               Example: "0x00010002"
+
+               Read-only.
+
+What:          /sys/devices/platform/<dev>/hw_version
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               Reports the CryptoManager Hub hardware version as a 32-bit
+               hexadecimal value read from the SIC HW_VERSION0 register.
+
+               Example: "0x00000000"
+
+               Read-only.
+
+What:          /sys/devices/platform/<dev>/boot_status
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               Reports the CryptoManager Hub boot status register as a 32-bit
+               hexadecimal value.  This reflects the firmware boot
+               progress and final state:
+
+                 0x00000066 - firmware booted (post-self-test)
+                 other      - firmware boot in progress or failed
+
+               Read-only.
+
+What:          /sys/devices/platform/<dev>/mbx_available
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               Reports the mailbox availability bitmap as a 32-bit
+               hexadecimal value read from the SIC MBX_AVAILABILITY
+               register.  Each set bit indicates a hardware mailbox
+               instance that the firmware has made available.
+
+               Example: "0x00000003" (mailboxes 0 and 1 available)
+
+               Read-only.
+
+What:          /sys/devices/platform/<dev>/mbx_count
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               Reports the number of mailboxes the driver has configured,
+               as a decimal integer.  This reflects the driver's active
+               configuration (from DT properties or module parameters),
+               which may be fewer than illustrated by mbx_available.
+
+               Example: "2"
+
+               Read-only.
diff --git a/Documentation/crypto/device_drivers/cmh.rst b/Documentation/crypto/device_drivers/cmh.rst
new file mode 100644
index 000000000000..4319b9ff1ab1
--- /dev/null
+++ b/Documentation/crypto/device_drivers/cmh.rst
@@ -0,0 +1,356 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+CRI CryptoManager Hub (CMH) Driver
+====================================
+
+Overview
+========
+
+The ``cmh`` driver supports the CRI CryptoManager Hub hardware cryptographic
+accelerator.  The hardware is accessed through a mailbox-based VCQ
+(Virtual Command Queue) interface: the driver writes command sequences
+into per-mailbox DMA queue buffers and rings a doorbell register; the
+CryptoManager Hub embedded software (eSW) processes the commands and signals
+completion via a per-mailbox interrupt.
+
+The driver registers algorithms with the Linux kernel crypto subsystem
+and exposes a management character device (``/dev/cmh_mgmt``) for
+operations that have no standard crypto API binding.
+
+Hardware Interface
+==================
+
+The CryptoManager Hub is presented as a platform device matched via Device Tree
+(compatible ``"cri,cmh"``).  The driver maps a single MMIO region
+(the SIC -- System Interface Controller) whose sub-regions contain
+per-mailbox doorbell, status, and command queue registers.
+
+The driver manages a configurable number of mailboxes (default 2).
+Each mailbox has a configurable number of slots (default 32) and a
+configurable stride (default 128 bytes per slot).  The driver allocates
+DMA-coherent memory for each mailbox queue during probe.
+
+Interrupts are per-mailbox completion/error interrupts.  The driver
+registers a threaded IRQ handler for each configured mailbox.
+
+The eSW is loaded independently of this driver -- typically by the
+boot firmware or a platform-specific loader -- so the driver does not
+use ``request_firmware()``.  Instead it waits for the eSW to reach
+mission mode during probe, bounded by ``fw_ready_timeout_ms``.
+
+Supported Algorithms
+====================
+
+The driver registers the following algorithm families:
+
+Hash (ahash)
+  SHA-224, SHA-256, SHA-384, SHA-512, SHA3-224, SHA3-256, SHA3-384,
+  SHA3-512, SHAKE-128, SHAKE-256, cSHAKE-128, cSHAKE-256, KMAC-128,
+  KMAC-256, SM3 (10 hash + 2 cSHAKE + 2 KMAC + 1 SM3 = 15 algorithms)
+
+HMAC (ahash)
+  HMAC-SHA-224, HMAC-SHA-256, HMAC-SHA-384, HMAC-SHA-512,
+  HMAC-SHA3-224, HMAC-SHA3-256, HMAC-SHA3-384, HMAC-SHA3-512
+  (8 algorithms)
+
+Symmetric Ciphers (skcipher)
+  AES: ECB, CBC, CTR, CFB, XTS (5 algorithms)
+  SM4: ECB, CBC, CTR, CFB, XTS (5 algorithms)
+  ChaCha20 (1 algorithm)
+
+AEAD
+  AES-GCM, AES-CCM (2 algorithms)
+  SM4-GCM, SM4-CCM (2 algorithms)
+  ``rfc7539(chacha20,poly1305)``, ``rfc7539esp(chacha20,poly1305)``
+  (2 algorithms)
+
+MAC (ahash)
+  CMAC(AES) (1 algorithm)
+  CMAC(SM4), XCBC(SM4) (2 algorithms)
+  Poly1305 (1 algorithm)
+
+Public-Key, Key Agreement, and PQC Signatures
+  RSA (akcipher, 1 algorithm)
+  ECDSA P-256, P-384, P-521 (sig, 3 algorithms)
+  SM2 (sig, verify-only, 1 algorithm)
+  ECDH P-256, P-384, X25519 (kpp, 3 algorithms)
+  ML-DSA-44, ML-DSA-65, ML-DSA-87 (sig, 3 algorithms)
+  SLH-DSA: all 12 parameter sets (sig, 12 algorithms)
+  LMS, LMS-HSS (sig, verify-only, 2 algorithms)
+  XMSS, XMSS-MT (sig, verify-only, 2 algorithms)
+  (ML-KEM keygen/encaps/decaps is available via ``/dev/cmh_mgmt``
+  only -- see `Limitations`_.)
+
+Hardware RNG
+  DRBG-backed hwrng (``/dev/hwrng``, 1 algorithm)
+
+All algorithm driver names use the ``cri-cmh-`` prefix (e.g.
+``cri-cmh-sha256``, ``cri-cmh-ecb-aes``, ``cri-cmh-gcm-aes``,
+``cri-cmh-mldsa44``).  Names generally follow the kernel's hyphenated
+template name; families that have no kernel template (e.g. ML-DSA) use
+the concatenated upstream algorithm name (``mldsa44``).
+
+Most algorithms register at priority 300 (301 for AES-CCM).
+The ML-DSA ``sig`` algorithms register at priority 5001 to
+outrank the kernel's generic software ML-DSA (priority 5000, which is
+verify-only); the CMH driver provides full hardware sign and verify.
+
+Request model
+-------------
+
+All crypto API operations are asynchronous: the driver queues each
+request to its transaction-manager kthread and returns
+``-EINPROGRESS``, invoking the caller's completion callback when the
+hardware finishes.  Requests that set ``CRYPTO_TFM_REQ_MAY_BACKLOG``
+are queued on a backlog of up to ``backlog_max_depth`` entries when the
+command queue is full; without that flag a full queue is reported as
+``-EBUSY``.  Hardware or eSW failures surface as ``-EIO``, malformed
+requests as ``-EINVAL``, oversized requests as ``-EMSGSIZE`` or
+``-EINVAL`` (see `Data-Size Limits`_), and unresponsive hardware as
+``-ETIMEDOUT``.  The ``/dev/cmh_mgmt`` ioctls are, by contrast,
+synchronous -- each ioctl blocks until the hardware completes.
+
+Driver Architecture
+===================
+
+The driver is structured as follows:
+
+Platform Driver
+  Matches DT compatible ``"cri,cmh"``.  Probe initializes all
+  subsystems in order; remove tears them down in reverse.
+
+Configuration
+  Parses DT properties and module parameter overrides.  Validates
+  mailbox counts, slot sizes, and stride values.
+
+MQI (Mailbox Queue Interface)
+  Allocates DMA-coherent queue memory per mailbox.  Manages slot
+  allocation, VCQ command writing, and doorbell ringing.
+
+Transaction Manager
+  A dedicated kthread dequeues crypto requests from a central command
+  queue, builds VCQ command sequences, and submits them to mailbox
+  slots.  Completion is signaled via wait queues.
+
+Response Handler
+  Per-mailbox threaded IRQ handlers walk completed slots, parse
+  results, and fire request completions.  A configurable watchdog
+  timer (the ``watchdog_ms`` debugfs knob, default 200 ms) detects
+  stuck requests and escalates through ABORT, RESTART, and FLUSH
+  recovery.
+
+Key Management (``/dev/cmh_mgmt``)
+  A misc character device providing ioctl-based access to datastore
+  key CRUD, key derivation (KIC), PKE operations (EdDSA, SM2),
+  PQC operations (ML-KEM, ML-DSA, SLH-DSA),
+  EAC error register readback, and DRBG runtime configuration.
+  See ``Documentation/ABI/testing/cmh-mgmt`` for the full ioctl list.
+
+Power Management
+  The driver implements ``DEFINE_SIMPLE_DEV_PM_OPS`` suspend/resume.
+  On suspend, the transaction-manager kthread is stopped and pending
+  transactions are drained, waiting up to ``drain_timeout_ms``
+  (default 10000 ms); resume restarts the kthread.
+
+Module Parameters
+=================
+
+The driver defines four production module parameters and five
+debug-only parameters (compiled only with
+``CONFIG_CRYPTO_DEV_CMH_DEBUG``).  In production, all mailbox topology,
+per-core affinity, slot counts, strides, and timeout tuning are taken
+from Device Tree properties, not module parameters.  The debug-only
+parameters exist solely to force alternate geometries at ``insmod``
+time during bringup and validation (for example, to drive the
+mailbox-contention and cross-mailbox dispatch paths without
+rebuilding the Device Tree); they default to "use the DT value"
+and have no effect in a production build.
+
+Production:
+
+``fw_ready_timeout_ms`` (uint, default 5000, RO)
+  Timeout in milliseconds to wait for CMH eSW to reach mission mode
+  during probe.
+
+``cmq_max_depth`` (uint, default 256, RO)
+  Maximum number of pending commands in the central Command Message
+  Queue.
+
+``backlog_max_depth`` (uint, default 1024, RO)
+  Maximum depth of the backlog queue for ``CRYPTO_TFM_REQ_MAY_BACKLOG``
+  requests.  Set to 0 to disable backlogs.
+
+``hwrng_quality`` (int, default 0, RO)
+  Quality value passed to ``hwrng_register()``.  0 disables kernel CRNG
+  seeding; 1-1024 sets the quality directly.
+
+Debug-only (``CONFIG_CRYPTO_DEV_CMH_DEBUG``):
+
+``mbx_count_override`` (uint, default 0, RO)
+  Override the DT mailbox count (0 = use DT) to force fewer
+  mailboxes than the hardware provides.
+
+``mbx_slots_override`` (uint, default 0, RO)
+  Override all MBX slots_log2 values (0 = use DT).
+
+``mbx_round_robin`` (bool, default false, RO)
+  Ignore DT ``cri,mbx`` affinity pins and round-robin all cores
+  across the configured mailboxes (0 = use DT affinity).  Restores
+  the unpinned dispatch that exercises cross-mailbox distribution.
+
+``drbg_config`` (charp, default "auto", RO)
+  DRBG configuration at probe: ``"auto"`` (normal) or ``"skip"``
+  (skip initial DRBG configuration).
+
+``skip_fw_check`` (bool, default false, RO)
+  Skip the SIC boot status and eSW mission-mode checks at probe.
+  Allows the module to load before the eSW has booted.
+
+Runtime-tunable timeout knobs are exposed via debugfs rather than
+module parameters; see `debugfs Counters`_ below.
+
+sysfs Attributes
+================
+
+The driver exposes five read-only attributes under the platform
+device sysfs directory: ``fw_version``, ``hw_version``,
+``boot_status``, ``mbx_available``, and ``mbx_count``.  See
+``Documentation/ABI/testing/sysfs-driver-cmh`` for the authoritative
+per-attribute description.
+
+debugfs Counters
+================
+
+When built with ``CONFIG_CRYPTO_DEV_CMH_DEBUG``, the driver creates
+``/sys/kernel/debug/cmh/`` with three groups: per-mailbox counters
+(``mbxN/``), transaction-manager statistics (``tm/``), and
+runtime-tunable timeout knobs (``config/``, including
+``drain_timeout_ms`` and ``watchdog_ms``).  See
+``Documentation/ABI/testing/debugfs-driver-cmh`` for the authoritative
+per-file description.
+
+Device Tree Binding
+===================
+
+See ``Documentation/devicetree/bindings/crypto/cri,cmh.yaml`` for the
+full DT binding schema and complete, schema-validated examples
+(including the per-mailbox topology properties ``cri,mbx-instances``,
+``cri,mbx-slots-log2``, and ``cri,mbx-strides-log2``).  When those
+properties are omitted the driver falls back to two mailboxes
+(instances 0 and 1) with the slot/stride defaults described above.
+
+User-Space Interfaces
+=====================
+
+``/dev/cmh_mgmt``
+  Management character device.  Opening it requires ``CAP_SYS_ADMIN``.
+  See ``Documentation/ABI/testing/cmh-mgmt`` for ioctl documentation.
+  The UAPI header is ``<linux/cmh_mgmt_ioctl.h>``.
+
+In-kernel crypto API
+  All algorithms register with the standard kernel crypto API and are
+  consumed by in-kernel users (dm-crypt, fscrypt, IPsec, kTLS, etc.).
+
+  Keys provisioned inside the hardware via ``/dev/cmh_mgmt`` are
+  referenced by an opaque hardware key identifier and are operated on
+  through the ``/dev/cmh_mgmt`` ioctl interface, without ever exposing
+  plaintext key material to user space.  See
+  ``Documentation/ABI/testing/cmh-mgmt`` for key provisioning.
+
+``/dev/hwrng``
+  The DRBG-backed hardware RNG is available as a standard hwrng device.
+
+Limitations
+===========
+
+- LMS and XMSS support verify-only (no sign/keygen in hardware for
+  stateful hash-based signatures).
+- SM2 sig registration is verify-only (sign via ``/dev/cmh_mgmt`` ioctl).
+- EdDSA (Ed25519/Ed448) is available only through ``/dev/cmh_mgmt``
+  ioctls; no kernel ``sig`` registration.
+- ML-KEM operations (encapsulate/decapsulate/keygen) are available only
+  through ``/dev/cmh_mgmt`` ioctls; no standard kernel crypto API
+  binding exists for KEM.
+
+Data-Size Limits
+================
+
+The driver imposes data-size limits on several APIs.  These are
+driver-level safety caps for kernel memory allocation unless noted
+otherwise.
+
+Symmetric / AEAD / MAC linearization caps:
+
+==============================  =======  =======================================
+Scope                           Limit    Origin
+==============================  =======  =======================================
+AES skcipher                    32 MiB   Driver-imposed DMA linearization cap
+SM4 skcipher                    32 MiB   Driver-imposed DMA linearization cap
+All AEAD + ChaCha20 skcipher    1 MiB    Driver-imposed DMA linearization cap
+==============================  =======  =======================================
+
+MAC and keyed-hash algorithms that buffer all input in kernel memory
+(hardware lacks context save/restore):
+
+====================  =======  =============================================
+Algorithm             Limit    Reason
+====================  =======  =============================================
+``cmac(aes)``         64 KiB   AES core has no external save/restore
+``cmac(sm4)``         64 KiB   SM4 core has no external save/restore
+``xcbc(sm4)``         64 KiB   SM4 core has no external save/restore
+``poly1305``          64 KiB   CCP core has no external save/restore
+``hmac(sha*)``        64 KiB   HMAC save/restore not supported (see below)
+``hmac(sha3-*)``      64 KiB   HMAC save/restore not supported (see below)
+``kmac128``           64 KiB   eSW rejects save when outlen != 0
+``kmac256``           64 KiB   eSW rejects save when outlen != 0
+====================  =======  =============================================
+
+HMAC save/restore is unsupported by the eSW firmware.  For HMAC-SHA3,
+exposing the Keccak sponge state would allow key recovery because the
+sponge permutation is invertible; HMAC-SHA2 save/restore is likewise
+not exposed by the eSW.
+
+HMAC ``.export()``/``.import()`` (used for request cloning) is limited
+to a single-page accumulated-data window of 4092 bytes (one page minus
+a 4-byte length header), since the crypto subsystem pre-allocates the
+state buffer per request.  Cloning a request that has accumulated more
+input than this window fails.
+
+Requests exceeding the limit are rejected with ``-EINVAL``.  Pure hash
+algorithms (SHA-2, SHA-3, SHAKE, cSHAKE, SM3) have no data limit because
+the hardware supports incremental save/restore.
+
+cSHAKE uses save/restore for ``.export()``/``.import()`` but accumulates
+data in ``.update()`` by design (the Keccak sponge has no block-alignment
+boundary to trigger per-update HW submission, and HC_CMD_GATHER amortizes
+the cost into a single finalize-time submission).
+
+Asymmetric / PQC algorithm limits:
+
+==============================  =========  ====================================
+Scope                           Limit      Origin
+==============================  =========  ====================================
+RSA key size                    4096 bit   HW-imposed
+ML-DSA message                  10 KiB     eSW-imposed (QSE ABI)
+SLH-DSA message                 128 B      eSW-imposed (HCQ ABI)
+SLH-DSA context                 255 B      Spec-imposed (FIPS 205)
+LMS public key                  60 B       eSW-imposed (HCQ ABI)
+LMS message                     256 B      eSW-imposed (HCQ ABI)
+LMS signature                   13,364 B   eSW-imposed (HCQ ABI)
+XMSS public key                 136 B      eSW-imposed (HCQ ABI)
+XMSS message                    64 B       eSW-imposed (HCQ ABI)
+XMSS signature                  27,688 B   eSW-imposed (HCQ ABI)
+SM2 encrypt message             32 B       eSW KDF (single SM3 block)
+==============================  =========  ====================================
+
+Miscellaneous limits:
+
+==============================  =========  ====================================
+Scope                           Limit      Origin
+==============================  =========  ====================================
+cSHAKE/KMAC customization       256 B      VCQ slot layout constraint
+KIC HKDF key                    64 B       Partially eSW-derived
+KIC HKDF label                  56 B       VCQ slot layout constraint
+Key/blob mgmt ioctls            256 KiB    Driver-imposed sanity cap
+==============================  =========  ====================================
diff --git a/Documentation/crypto/device_drivers/index.rst b/Documentation/crypto/device_drivers/index.rst
index c81d311ac61b..c0247fc97bf8 100644
--- a/Documentation/crypto/device_drivers/index.rst
+++ b/Documentation/crypto/device_drivers/index.rst
@@ -6,4 +6,5 @@ Hardware Device Driver Specific Documentation
 .. toctree::
    :maxdepth: 1

+   cmh
    octeontx2
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 216a00bad5d7..777c12496bdf 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -839,4 +839,5 @@ source "drivers/crypto/starfive/Kconfig"
 source "drivers/crypto/inside-secure/eip93/Kconfig"
 source "drivers/crypto/ti/Kconfig"

+source "drivers/crypto/cmh/Kconfig"
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 5a950c7abc39..7ad69348f0f0 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -47,3 +47,4 @@ obj-y += intel/
 obj-y += starfive/
 obj-y += cavium/
 obj-y += ti/
+obj-$(CONFIG_CRYPTO_DEV_CMH) += cmh/
diff --git a/drivers/crypto/cmh/Kconfig b/drivers/crypto/cmh/Kconfig
new file mode 100644
index 000000000000..fa5adeca2512
--- /dev/null
+++ b/drivers/crypto/cmh/Kconfig
@@ -0,0 +1,46 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# CRI CryptoManager Hub (CMH) hardware crypto accelerator
+#
+
+config CRYPTO_DEV_CMH
+       tristate "CRI CryptoManager Hub (CMH) hardware crypto accelerator"
+       depends on CRYPTO && OF && HAS_IOMEM && (64BIT || COMPILE_TEST)
+       select CRYPTO_HASH
+       select CRYPTO_SKCIPHER
+       select CRYPTO_AEAD
+       select CRYPTO_AKCIPHER
+       select CRYPTO_SIG
+       select CRYPTO_KPP
+       select CRYPTO_ECC
+       select CRYPTO_RSA
+       select CRYPTO_AES
+       select CRYPTO_CCM
+       select CRYPTO_SM4_GENERIC
+       select HW_RANDOM
+       help
+         Driver for the CRI CryptoManager Hub (CMH) hardware crypto accelerator.
+         Accesses the hardware via a mailbox-based VCQ (Virtual Command
+         Queue) interface and registers algorithms with the kernel
+         crypto subsystem.
+
+         Supported algorithm families: AES (ECB/CBC/CTR/XTS/CFB),
+         SM4 (ECB/CBC/CTR/XTS/CFB), ChaCha20-Poly1305, AES-GCM, AES-CCM,
+         SHA-2, SHA-3, SHAKE, CSHAKE, KMAC, SM3, HMAC, AES-CMAC,
+         SM4-CMAC, SM4-XCBC, RSA, ECDSA, ECDH, SM2, and DRBG (hwrng).
+         Ioctl-only algorithms: EdDSA, ML-KEM.
+
+         To compile this driver as a module, choose M here.
+
+config CRYPTO_DEV_CMH_DEBUG
+       bool "CMH debug instrumentation (debugfs counters)"
+       depends on CRYPTO_DEV_CMH && DEBUG_FS
+       help
+         Enable per-mailbox debugfs counters under
+         /sys/kernel/debug/cmh/ for the CMH driver.
+         Exposes VCQ submit/complete/error counts, queue depth
+         high-water marks, and transaction manager backoff statistics.
+
+         Useful for bringup, validation, and performance analysis.
+         Not recommended for production.
+
diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
new file mode 100644
index 000000000000..0a4591c9fd86
--- /dev/null
+++ b/drivers/crypto/cmh/Makefile
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the CRI CryptoManager Hub (CMH) hardware crypto accelerator driver.
+#
+
+obj-$(CONFIG_CRYPTO_DEV_CMH) += cmh.o
+
+cmh-y := \
+       cmh_main.o \
+       cmh_config.o \
+       cmh_mqi.o \
+       cmh_txn.o \
+       cmh_rh.o \
+       cmh_dma.o \
+       cmh_sysfs.o
+
+ccflags-y += -I$(src)/include
+
+# Suppress -Woverride-init for the [0 ... N] = -1 range-initializer pattern
+# (standard kernel idiom for sparse lookup tables with a default value).
+CFLAGS_cmh_config.o += -Wno-override-init
+
+# Debug instrumentation: per-mailbox debugfs counters.
+# cmh_debugfs.o is linked into the composite cmh.o (same tristate).
+cmh-$(CONFIG_CRYPTO_DEV_CMH_DEBUG) += cmh_debugfs.o
diff --git a/drivers/crypto/cmh/cmh_config.c b/drivers/crypto/cmh/cmh_config.c
new file mode 100644
index 000000000000..4631eebb1556
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_config.c
@@ -0,0 +1,476 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Configuration from Device Tree
+ *
+ * The CMH device tree node provides:
+ *   - reg: SIC base + size (mandatory)
+ *   - interrupts: per-MBX IRQs (mandatory for IRQ mode)
+ *   - cri,mbx-instances: array of MBX instance IDs
+ *   - cri,mbx-slots-log2: per-MBX slot count as log2
+ *   - cri,mbx-strides-log2: per-MBX stride as log2
+ *
+ * Per-core-type child nodes (e.g. aes@3, pke@a):
+ *   - reg: hardware core ID (CORE_ID_* from cmh_vcq.h)
+ *   - cri,mbx: (optional) pin to a specific MBX index
+ *
+ * Module parameters (non-topology):
+ *   - fw_ready_timeout_ms: CMH eSW mission-mode boot timeout
+ *   (hwrng_quality, cmq_max_depth, backlog_max_depth live in other files)
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/platform_device.h>
+#include <linux/of.h>
+
+#include "cmh_config.h"
+#include "cmh_dma.h"
+
+/* -- Module parameters ------------------------------------------------- */
+
+static unsigned int fw_ready_timeout_ms = CMH_DEFAULT_FW_READY_TIMEOUT_MS;
+module_param(fw_ready_timeout_ms, uint, 0444);
+MODULE_PARM_DESC(fw_ready_timeout_ms,
+                "Timeout in ms to wait for CMH eSW mission mode (default 5000)");
+
+/*
+ * Debug-only MBX overrides for stress testing.
+ * When non-zero, these override the corresponding DT values, enabling
+ * contention stress tests to force a minimal MBX config
+ * (e.g. mbx_count_override=1 mbx_slots_override=1 for 1 MBX, 2 slots).
+ */
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+static unsigned int mbx_count_override;
+module_param(mbx_count_override, uint, 0444);
+MODULE_PARM_DESC(mbx_count_override,
+                "[debug] Override DT MBX count (0 = use DT, default: 0)");
+
+static unsigned int mbx_slots_override;
+module_param(mbx_slots_override, uint, 0444);
+MODULE_PARM_DESC(mbx_slots_override,
+                "[debug] Override all MBX slots_log2 (0 = use DT, default: 0)");
+
+static bool mbx_round_robin;
+module_param(mbx_round_robin, bool, 0444);
+MODULE_PARM_DESC(mbx_round_robin,
+                "[debug] Ignore DT cri,mbx pins and round-robin all cores across MBXes (0 = use DT affinity, default: 0)");
+#endif
+
+/* -- Core ID -> core_type lookup --------------------------------------- */
+
+/*
+ * Map hardware core IDs (from DT child "reg") to enum cmh_core_type.
+ *
+ * Entries set to -1 are not dispatchable crypto cores: system cores
+ * (SYS, DMA, KIC, TIC, MPU, EMC, EAC) and the DRBG singleton
+ * (handled separately in cmh_rng.c).
+ */
+static const int core_id_to_type[CORE_ID_NUM] = {
+       [0 ... CORE_ID_NUM - 1] = -1,
+       [CORE_ID_HC]  = CMH_CORE_HC,
+       [CORE_ID_AES] = CMH_CORE_AES,
+       [CORE_ID_SM4] = CMH_CORE_SM4,
+       [CORE_ID_SM3] = CMH_CORE_SM3,
+       [CORE_ID_CCP] = CMH_CORE_CCP,
+       [CORE_ID_PKE] = CMH_CORE_PKE,
+       [CORE_ID_QSE] = CMH_CORE_QSE,
+       [CORE_ID_HCQ] = CMH_CORE_HCQ,
+};
+
+/* Human-readable names for error messages */
+static const char * const core_type_names[CMH_NUM_CORE_TYPES] = {
+       [CMH_CORE_HC]  = "hc",
+       [CMH_CORE_AES] = "aes",
+       [CMH_CORE_SM4] = "sm4",
+       [CMH_CORE_SM3] = "sm3",
+       [CMH_CORE_CCP] = "ccp",
+       [CMH_CORE_PKE] = "pke",
+       [CMH_CORE_QSE] = "qse",
+       [CMH_CORE_HCQ] = "hcq",
+};
+
+/* -- DT child node enumeration ----------------------------------------- */
+
+static int cmh_config_populate_cores(struct cmh_config *cfg,
+                                    struct device_node *np)
+{
+       struct device_node *child;
+       u32 core_id, mbx_val;
+       int type, ret;
+
+       for_each_child_of_node(np, child) {
+               ret = of_property_read_u32(child, "reg", &core_id);
+               if (ret) {
+                       dev_warn(cmh_dev(),
+                                "DT child %pOFn: missing 'reg', skipping\n",
+                                child);
+                       continue;
+               }
+
+               if (core_id >= CORE_ID_NUM) {
+                       dev_info(cmh_dev(),
+                                "DT child %pOFn: core_id 0x%02x unknown, skipping\n",
+                                child, core_id);
+                       continue;
+               }
+
+               type = core_id_to_type[core_id];
+               if (type < 0) {
+                       /* Not a dispatchable crypto core (DRBG, SYS, etc.) */
+                       dev_dbg(cmh_dev(),
+                               "DT child %pOFn: core_id 0x%02x not dispatchable\n",
+                               child, core_id);
+                       continue;
+               }
+
+               if (cfg->core_types[type].num_instances >=
+                   CMH_MAX_CORE_INSTANCES) {
+                       dev_err(cmh_dev(),
+                               "DT: too many instances for %s (max %u)\n",
+                               core_type_names[type],
+                               CMH_MAX_CORE_INSTANCES);
+                       of_node_put(child);
+                       return -EINVAL;
+               }
+
+               {
+                       struct cmh_core_type_cfg *ct = &cfg->core_types[type];
+                       u32 idx = ct->num_instances;
+
+                       ct->core_ids[idx] = core_id;
+                       ret = of_property_read_u32(child, "cri,mbx", &mbx_val);
+                       ct->mbx[idx] = ret ? -1 : (s32)mbx_val;
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+                       /*
+                        * Debug knob for the cross-core stress test: drop the
+                        * DT MBX pin so cmh_core_select_instance() round-robins
+                        * this core across all configured MBXes (the unpinned
+                        * dispatch behaviour exercised before cri,mbx affinity
+                        * was added to the baseline device tree).
+                        */
+                       if (mbx_round_robin)
+                               ct->mbx[idx] = -1;
+#endif
+                       ct->num_instances++;
+               }
+       }
+
+       return 0;
+}
+
+/* -- Validation -------------------------------------------------------- */
+
+static int cmh_config_validate_core_types(struct cmh_config *cfg)
+{
+       unsigned int i, j, k;
+
+       for (i = 0; i < CMH_NUM_CORE_TYPES; i++) {
+               struct cmh_core_type_cfg *ct = &cfg->core_types[i];
+               const char *name = core_type_names[i];
+
+               /* Zero instances is valid -- core absent from DT */
+               if (ct->num_instances == 0)
+                       continue;
+
+               if (ct->num_instances > CMH_MAX_CORE_INSTANCES) {
+                       dev_err(cmh_dev(), "%s: num_instances %u > max %u\n",
+                               name, ct->num_instances,
+                               CMH_MAX_CORE_INSTANCES);
+                       return -EINVAL;
+               }
+
+               /* Validate MBX indices */
+               for (j = 0; j < ct->num_instances; j++) {
+                       if (ct->mbx[j] >= 0 &&
+                           (u32)ct->mbx[j] >= cfg->mbx_count) {
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+                               if (mbx_count_override > 0) {
+                                       dev_info(cmh_dev(),
+                                                "%s: mbx[%u]=%d >= overridden mbx_count %u, auto-assigning\n",
+                                                name, j, ct->mbx[j],
+                                                cfg->mbx_count);
+                                       ct->mbx[j] = -1;
+                                       continue;
+                               }
+#endif
+                               dev_err(cmh_dev(), "%s: mbx[%u]=%d >= mbx_count %u\n",
+                                       name, j, ct->mbx[j],
+                                       cfg->mbx_count);
+                               return -EINVAL;
+                       }
+               }
+
+               /* No duplicate core IDs within this type */
+               for (j = 1; j < ct->num_instances; j++) {
+                       for (k = 0; k < j; k++) {
+                               if (ct->core_ids[j] == ct->core_ids[k]) {
+                                       dev_err(cmh_dev(),
+                                               "%s: duplicate core_id 0x%02x at [%u] and [%u]\n",
+                                               name, ct->core_ids[j],
+                                               k, j);
+                                       return -EINVAL;
+                               }
+                       }
+               }
+
+               /* No duplicate MBX within this type (if explicit) */
+               for (j = 1; j < ct->num_instances; j++) {
+                       if (ct->mbx[j] < 0)
+                               continue;
+                       for (k = 0; k < j; k++) {
+                               if (ct->mbx[k] == ct->mbx[j]) {
+                                       dev_err(cmh_dev(),
+                                               "%s: duplicate mbx %d at [%u] and [%u]\n",
+                                               name, ct->mbx[j], k, j);
+                                       return -EINVAL;
+                               }
+                       }
+               }
+
+               /* All core IDs must fit in VCQ 8-bit field */
+               for (j = 0; j < ct->num_instances; j++) {
+                       if (ct->core_ids[j] > CORE_ID_MAX) {
+                               dev_err(cmh_dev(),
+                                       "%s: core_ids[%u]=0x%02x > CORE_ID_MAX\n",
+                                       name, j, ct->core_ids[j]);
+                               return -EINVAL;
+                       }
+               }
+       }
+
+       /* Cross-type: no core ID used by more than one type */
+       for (i = 0; i < CMH_NUM_CORE_TYPES; i++) {
+               struct cmh_core_type_cfg *ct_i = &cfg->core_types[i];
+
+               for (j = i + 1; j < CMH_NUM_CORE_TYPES; j++) {
+                       struct cmh_core_type_cfg *ct_j = &cfg->core_types[j];
+
+                       for (k = 0; k < ct_i->num_instances; k++) {
+                               unsigned int m;
+
+                               for (m = 0; m < ct_j->num_instances; m++) {
+                                       if (ct_i->core_ids[k] !=
+                                           ct_j->core_ids[m])
+                                               continue;
+                                       dev_err(cmh_dev(),
+                                               "core_id 0x%02x conflict: %s[%u] and %s[%u]\n",
+                                               ct_i->core_ids[k],
+                                               core_type_names[i], k,
+                                               core_type_names[j], m);
+                                       return -EINVAL;
+                               }
+                       }
+               }
+       }
+
+       return 0;
+}
+
+static int cmh_config_validate(struct cmh_config *cfg)
+{
+       unsigned int i, j;
+       unsigned long max_instance_end;
+
+       if (cfg->mbx_count == 0 || cfg->mbx_count > CMH_MAX_CONFIGURED_MBX) {
+               dev_err(cmh_dev(), "mbx_count %u out of range (1..%u)\n",
+                       cfg->mbx_count, CMH_MAX_CONFIGURED_MBX);
+               return -EINVAL;
+       }
+
+       for (i = 0; i < cfg->mbx_count; i++) {
+               struct cmh_mbx_config *m = &cfg->mailboxes[i];
+
+               if (m->instance >= CMH_MAX_MBX_INSTANCES) {
+                       dev_err(cmh_dev(), "mbx_instances[%u]=%u >= %u\n",
+                               i, m->instance, CMH_MAX_MBX_INSTANCES);
+                       return -EINVAL;
+               }
+
+               if (m->slots_log2 < CMH_MBX_SLOTS_LOG2_MIN ||
+                   m->slots_log2 > CMH_MBX_SLOTS_LOG2_MAX) {
+                       dev_err(cmh_dev(), "mbx_slots[%u]=%u out of range (%u..%u)\n",
+                               i, m->slots_log2,
+                              CMH_MBX_SLOTS_LOG2_MIN, CMH_MBX_SLOTS_LOG2_MAX);
+                       return -EINVAL;
+               }
+
+               if (m->stride_log2 < CMH_MBX_STRIDE_LOG2_MIN ||
+                   m->stride_log2 > CMH_MBX_STRIDE_LOG2_MAX) {
+                       dev_err(cmh_dev(), "mbx_strides[%u]=%u out of range (%u..%u)\n",
+                               i, m->stride_log2,
+                              CMH_MBX_STRIDE_LOG2_MIN, CMH_MBX_STRIDE_LOG2_MAX);
+                       return -EINVAL;
+               }
+
+               /* Check for duplicate instance indices */
+               for (j = 0; j < i; j++) {
+                       if (cfg->mailboxes[j].instance == m->instance) {
+                               dev_err(cmh_dev(), "duplicate instance %u at indices %u and %u\n",
+                                       m->instance, j, i);
+                               return -EINVAL;
+                       }
+               }
+       }
+
+       /* Ensure SIC region is large enough for all requested instances */
+       max_instance_end = 0;
+       for (i = 0; i < cfg->mbx_count; i++) {
+               unsigned long end = ((unsigned long)cfg->mailboxes[i].instance + 1)
+                                   << CMH_MBX_INSTANCE_SHIFT;
+               if (end > max_instance_end)
+                       max_instance_end = end;
+       }
+
+       if (max_instance_end > cfg->sic_size) {
+               dev_err(cmh_dev(), "sic_size 0x%zx too small for instance requiring 0x%lx\n",
+                       cfg->sic_size, max_instance_end);
+               return -EINVAL;
+       }
+
+       return 0;
+}
+
+/* -- Public Interface -------------------------------------------------- */
+
+/**
+ * cmh_config_init() - Initialize device configuration from platform/DT data
+ * @cfg: Configuration structure to populate
+ * @pdev: Platform device providing DT node and resources
+ *
+ * Parse the "cri,cmh" device tree node for MMIO base address, interrupt
+ * specifiers, and per-mailbox properties (instance indices, slot counts,
+ * strides).  When DT properties are absent, fall back to module parameter
+ * arrays.  Populate per-core-type instance configuration from module
+ * parameters, then validate the complete configuration.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_config_init(struct cmh_config *cfg, struct platform_device *pdev)
+{
+       struct device_node *np = pdev->dev.of_node;
+       struct resource *res;
+       unsigned int i;
+       int ret, irq, nr;
+
+       if (!np) {
+               dev_err(&pdev->dev, "device tree node required\n");
+               return -ENODEV;
+       }
+
+       cfg->of_node = np;
+
+       /* SIC base + size from DT "reg" property (mandatory) */
+       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+       if (!res) {
+               dev_err(cmh_dev(), "missing DT reg resource\n");
+               return -EINVAL;
+       }
+       cfg->sic_base = res->start;
+       cfg->sic_size = resource_size(res);
+
+       /*
+        * IRQ resolution order:
+        *   1. Platform-level IRQ from the first DT "interrupts" entry.
+        *   2. If absent (cfg->irq == -1), cmh_rh_resolve_irqs() tries
+        *      per-MBX of_irq_get() for per-mailbox interrupt routing.
+        *   3. If no IRQs are available at all, the response handler
+        *      falls back to watchdog-timer polling (200 ms default).
+        */
+       irq = platform_get_irq_optional(pdev, 0);
+       cfg->irq = (irq >= 0) ? irq : -1;
+
+       cfg->sic_mapped = NULL;
+       cfg->fw_ready_timeout_ms = fw_ready_timeout_ms;
+
+       /* -- MBX configuration from DT --------------------------------- */
+
+       nr = of_property_count_u32_elems(np, "cri,mbx-instances");
+       if (nr <= 0) {
+               dev_err(cmh_dev(), "missing or empty cri,mbx-instances in DT\n");
+               return -EINVAL;
+       }
+       if ((unsigned int)nr > CMH_MAX_CONFIGURED_MBX) {
+               dev_err(cmh_dev(), "too many MBX instances in DT (%d > %u)\n",
+                       nr, CMH_MAX_CONFIGURED_MBX);
+               return -EINVAL;
+       }
+       cfg->mbx_count = nr;
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+       if (mbx_count_override > 0) {
+               if (mbx_count_override > cfg->mbx_count) {
+                       dev_err(cmh_dev(),
+                               "mbx_count_override %u > DT count %u\n",
+                               mbx_count_override, cfg->mbx_count);
+                       return -EINVAL;
+               }
+               dev_info(cmh_dev(), "[debug] overriding mbx_count: %u -> %u\n",
+                        cfg->mbx_count, mbx_count_override);
+               cfg->mbx_count = mbx_count_override;
+       }
+#endif
+
+       for (i = 0; i < cfg->mbx_count; i++) {
+               struct cmh_mbx_config *m = &cfg->mailboxes[i];
+               u32 val;
+
+               ret = of_property_read_u32_index(np, "cri,mbx-instances",
+                                                i, &val);
+               if (ret) {
+                       dev_err(cmh_dev(), "missing cri,mbx-instances[%u] in DT\n",
+                               i);
+                       return ret;
+               }
+               m->instance = val;
+
+               ret = of_property_read_u32_index(np, "cri,mbx-slots-log2",
+                                                i, &val);
+               if (ret) {
+                       m->slots_log2 = CMH_DEFAULT_SLOTS_LOG2;
+                       dev_info(cmh_dev(),
+                                "MBX[%u]: cri,mbx-slots-log2 absent, using default %u\n",
+                                i, CMH_DEFAULT_SLOTS_LOG2);
+               } else {
+                       m->slots_log2 = val;
+               }
+
+               ret = of_property_read_u32_index(np, "cri,mbx-strides-log2",
+                                                i, &val);
+               if (ret) {
+                       m->stride_log2 = CMH_DEFAULT_STRIDE_LOG2;
+                       dev_info(cmh_dev(),
+                                "MBX[%u]: cri,mbx-strides-log2 absent, using default %u\n",
+                                i, CMH_DEFAULT_STRIDE_LOG2);
+               } else {
+                       m->stride_log2 = val;
+               }
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+               if (mbx_slots_override > 0) {
+                       m->slots_log2 = mbx_slots_override;
+                       if (i == 0)
+                               dev_info(cmh_dev(),
+                                        "[debug] overriding slots_log2 -> %u\n",
+                                        mbx_slots_override);
+               }
+#endif
+
+               m->queue_size = (1UL << m->slots_log2) << m->stride_log2;
+               m->dma_handle = 0;
+               m->virt_addr = NULL;
+               m->reg_base = NULL;
+       }
+
+       /* -- Core-type enumeration from DT child nodes ----------------- */
+
+       ret = cmh_config_populate_cores(cfg, np);
+       if (ret)
+               return ret;
+
+       ret = cmh_config_validate(cfg);
+       if (ret)
+               return ret;
+
+       return cmh_config_validate_core_types(cfg);
+}
diff --git a/drivers/crypto/cmh/cmh_debugfs.c b/drivers/crypto/cmh/cmh_debugfs.c
new file mode 100644
index 000000000000..bd7b083b9ef1
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_debugfs.c
@@ -0,0 +1,286 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- debugfs Per-MBX Counters and Fault Injection
+ *
+ * Creates the /sys/kernel/debug/cmh/ tree with:
+ *   mbxN/vcqs_submitted   (ro) Total VCQs sent to MBX N
+ *   mbxN/vcqs_completed   (ro) Total completions received
+ *   mbxN/vcqs_errors      (ro) Total error completions
+ *   mbxN/queue_full_count (ro) Times select_mailbox() skipped this MBX
+ *   mbxN/max_queue_depth  (ro) High-water mark of in-flight transactions
+ *   mbxN/inject_abort     (wo) Write any value to inject MBX_COMMAND_ABORT
+ *   mbxN/force_drain      (wo) Write any value to force-drain all pending txns
+ *   tm/cmq_posts          (ro) Total cmh_tm_post_command() calls
+ *   tm/cmq_depth_max      (ro) High-water mark of CMQ length
+ *   tm/cmq_eagain_count   (ro) Times CMQ was full (-EAGAIN)
+ *   tm/backoff_count      (ro) Times TM backed off (all MBX queues full)
+ *   tm/async_timeout_count (ro) Async requests that timed out
+ *
+ * This file is only compiled when CONFIG_CRYPTO_DEV_CMH_DEBUG=y (see Kbuild).
+ * Requires CONFIG_DEBUG_FS=y in the kernel (standard for dev builds).
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+
+#include "cmh_debugfs.h"
+#include "cmh_config.h"
+#include "cmh_registers.h"
+#include "cmh_dma.h"
+#include "cmh_txn.h"
+#include "cmh_rh.h"
+#include "cmh_rng.h"
+
+/* -- Module State ---------------------------------------------------------- */
+
+static struct {
+       struct dentry           *root;          /* /sys/kernel/debug/cmh/ */
+       struct cmh_mbx_stats    *mbx;           /* array[mbx_count] */
+       struct cmh_tm_stats      tm;
+       struct cmh_config       *cfg;           /* for inject_abort register access */
+       u32                      mbx_count;
+} dbgfs;
+
+/* -- debugfs file ops for atomic64_t --------------------------------------- */
+
+static int cmh_dbgfs_u64_get(void *data, u64 *val)
+{
+       *val = (u64)atomic64_read((atomic64_t *)data);
+       return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(cmh_dbgfs_u64_ro_fops,
+                        cmh_dbgfs_u64_get, NULL, "%llu\n");
+
+/* -- Per-MBX directory ----------------------------------------------------- */
+
+/*
+ * inject_abort -- write-only debugfs file for fault injection.
+ *
+ * Writing any value triggers MBX_COMMAND_ABORT on this mailbox.
+ * The eSW calls mbx_abort() -> mbx_cmd_error(mbx, -EPIPE), fires the
+ * error IRQ, and the LKM RH completes in-flight transactions with -EIO
+ * then issues MBX_COMMAND_RESTART to resume the mailbox.
+ *
+ * Private data points to the MBX index (cast to void *).
+ */
+static ssize_t inject_abort_write(struct file *file,
+                                 const char __user *ubuf,
+                                 size_t count, loff_t *ppos)
+{
+       u32 idx = (u32)(unsigned long)file->private_data;
+       void __iomem *base;
+
+       if (!dbgfs.cfg || idx >= dbgfs.cfg->mbx_count)
+               return -EINVAL;
+
+       base = dbgfs.cfg->mailboxes[idx].reg_base;
+       dev_warn(cmh_dev(), "debugfs: injecting ABORT on mbx[%u]\n", idx);
+       cmh_reg_write32(MBX_COMMAND_ABORT, base, R_MBX_COMMAND);
+
+       return count;
+}
+
+static const struct file_operations inject_abort_fops = {
+       .owner = THIS_MODULE,
+       .open = simple_open,
+       .write = inject_abort_write,
+       .llseek = noop_llseek,
+};
+
+/*
+ * force_drain -- write-only debugfs file for administrative recovery.
+ *
+ * Writing any value issues MBX_COMMAND_FLUSH, drains all pending
+ * transactions on this mailbox (completing each with -ECANCELED),
+ * and resets all recovery bookkeeping (abort_stall_ticks,
+ * restart_pending, restart_retries, flush_count, wedged).
+ *
+ * Use this to recover D-state processes when the eSW is dead and
+ * normal ABORT/RESTART escalation has not recovered the mailbox.
+ */
+static ssize_t force_drain_write(struct file *file,
+                                const char __user *ubuf,
+                                size_t count, loff_t *ppos)
+{
+       u32 idx = (u32)(unsigned long)file->private_data;
+
+       if (!dbgfs.cfg || idx >= dbgfs.cfg->mbx_count)
+               return -EINVAL;
+
+       cmh_rh_force_drain_mbx(idx);
+       return count;
+}
+
+static const struct file_operations force_drain_fops = {
+       .owner = THIS_MODULE,
+       .open = simple_open,
+       .write = force_drain_write,
+       .llseek = noop_llseek,
+};
+
+static void create_mbx_dir(u32 idx, struct dentry *parent)
+{
+       struct cmh_mbx_stats *s = &dbgfs.mbx[idx];
+       struct dentry *d;
+       char name[16];
+
+       snprintf(name, sizeof(name), "mbx%u", idx);
+       d = debugfs_create_dir(name, parent);
+
+       debugfs_create_file("vcqs_submitted",   0444, d,
+                           &s->vcqs_submitted,   &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("vcqs_completed",   0444, d,
+                           &s->vcqs_completed,   &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("vcqs_errors",      0444, d,
+                           &s->vcqs_errors,      &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("queue_full_count", 0444, d,
+                           &s->queue_full_count, &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("max_queue_depth",  0444, d,
+                           &s->max_queue_depth,  &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("inject_abort",     0200, d,
+                           (void *)(uintptr_t)idx, &inject_abort_fops);
+       debugfs_create_file("force_drain",      0200, d,
+                           (void *)(uintptr_t)idx, &force_drain_fops);
+}
+
+/* -- TM directory ---------------------------------------------------------- */
+
+static void create_tm_dir(struct dentry *parent)
+{
+       struct cmh_tm_stats *s = &dbgfs.tm;
+       struct dentry *d;
+
+       d = debugfs_create_dir("tm", parent);
+
+       debugfs_create_file("cmq_posts",        0444, d,
+                           &s->cmq_posts,        &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("cmq_depth_max",    0444, d,
+                           &s->cmq_depth_max,    &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("cmq_eagain_count", 0444, d,
+                           &s->cmq_eagain_count, &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("backoff_count",    0444, d,
+                           &s->backoff_count,    &cmh_dbgfs_u64_ro_fops);
+       debugfs_create_file("async_timeout_count", 0444, d,
+                           &s->async_timeout_count, &cmh_dbgfs_u64_ro_fops);
+}
+
+/* -- Config directory: timeout tuning ---------------------------------- */
+
+static void create_config_dir(struct dentry *parent)
+{
+       struct dentry *d;
+
+       d = debugfs_create_dir("config", parent);
+
+       /* TM timeouts */
+       debugfs_create_u32("async_timeout_ms",   0644, d,
+                          cmh_tm_timeout_async_ptr());
+       debugfs_create_u32("vcq_timeout_ms",     0644, d,
+                          cmh_tm_timeout_vcq_ptr());
+       debugfs_create_u32("slow_op_timeout_ms", 0644, d,
+                          cmh_tm_timeout_slow_op_ptr());
+       debugfs_create_u32("drain_timeout_ms",   0644, d,
+                          cmh_tm_timeout_drain_ptr());
+
+       /* RH watchdog */
+       debugfs_create_u32("watchdog_ms",        0644, d,
+                          cmh_rh_timeout_watchdog_ptr());
+
+       /* DRBG timeout */
+       debugfs_create_u32("drbg_timeout_ms",    0644, d,
+                          cmh_rng_timeout_drbg_ptr());
+}
+
+/* -- Public Interface ------------------------------------------------------ */
+
+/**
+ * cmh_debugfs_init() - Create debugfs directory hierarchy for CMH
+ * @cfg: Platform configuration containing mailbox count and register bases.
+ *
+ * Allocates per-mailbox statistics and creates the /sys/kernel/debug/cmh/
+ * tree with per-mailbox counters, fault-injection files, and transaction
+ * manager statistics.  debugfs is optional; failure to create entries does
+ * not prevent module initialisation.
+ *
+ * Return: 0 on success (always returns 0 -- debugfs is best-effort).
+ */
+int cmh_debugfs_init(struct cmh_config *cfg)
+{
+       u32 mbx_count = cfg->mbx_count;
+       u32 i;
+
+       dbgfs.root = debugfs_create_dir("cmh", NULL);
+       if (IS_ERR_OR_NULL(dbgfs.root)) {
+               if (!IS_ERR(dbgfs.root))
+                       dev_warn(cmh_dev(), "debugfs: creation returned NULL -- counters disabled\n");
+               else
+                       dev_warn(cmh_dev(), "debugfs: creation failed (rc=%ld) -- counters disabled\n",
+                                PTR_ERR(dbgfs.root));
+               dbgfs.root = NULL;
+               return 0;  /* debugfs is optional -- never fail module init */
+       }
+
+       dbgfs.mbx_count = mbx_count;
+       dbgfs.cfg = cfg;
+       dbgfs.mbx = kcalloc(mbx_count, sizeof(*dbgfs.mbx), GFP_KERNEL);
+       if (!dbgfs.mbx) {
+               debugfs_remove_recursive(dbgfs.root);
+               dbgfs.root = NULL;
+               return 0;
+       }
+
+       for (i = 0; i < mbx_count; i++)
+               create_mbx_dir(i, dbgfs.root);
+
+       create_tm_dir(dbgfs.root);
+
+       create_config_dir(dbgfs.root);
+
+       dev_dbg(cmh_dev(), "debugfs: initialized (%u mailboxes)\n", mbx_count);
+       return 0;
+}
+
+/**
+ * cmh_debugfs_cleanup() - Remove all CMH debugfs entries
+ *
+ * Tears down the /sys/kernel/debug/cmh/ tree and frees per-mailbox
+ * statistics memory.  Safe to call even if cmh_debugfs_init() was never
+ * called or failed.
+ */
+void cmh_debugfs_cleanup(void)
+{
+       debugfs_remove_recursive(dbgfs.root);
+       dbgfs.root = NULL;
+       kfree(dbgfs.mbx);
+       dbgfs.mbx = NULL;
+       dev_dbg(cmh_dev(), "debugfs: cleaned up\n");
+}
+
+/**
+ * cmh_debugfs_mbx_stats() - Return per-mailbox statistics pointer
+ * @mbx_idx: Zero-based mailbox index.
+ *
+ * Return: Pointer to the statistics structure for @mbx_idx, or NULL if
+ *         debugfs is disabled or @mbx_idx is out of range.
+ */
+struct cmh_mbx_stats *cmh_debugfs_mbx_stats(u32 mbx_idx)
+{
+       if (!dbgfs.mbx || mbx_idx >= dbgfs.mbx_count)
+               return NULL;
+       return &dbgfs.mbx[mbx_idx];
+}
+
+/**
+ * cmh_debugfs_tm_stats() - Return transaction manager statistics pointer
+ *
+ * Return: Pointer to the singleton TM statistics structure.  The pointer
+ *         is always valid (points to static storage).
+ */
+struct cmh_tm_stats *cmh_debugfs_tm_stats(void)
+{
+       return &dbgfs.tm;
+}
diff --git a/drivers/crypto/cmh/cmh_dma.c b/drivers/crypto/cmh/cmh_dma.c
new file mode 100644
index 000000000000..36ea277420cf
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_dma.c
@@ -0,0 +1,373 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- DMA Operations
+ *
+ * Implements the cmh_dma.h interface using the kernel DMA API
+ * (dma_map_single, dma_alloc_coherent, etc.).
+ *
+ * Scatterlist linearization rationale
+ * ------------------------------------
+ * The eSW firmware supports SCATTERGATHER commands for all core
+ * types (AES_CMD_SCATTERGATHER, SM4_CMD_SCATTERGATHER,
+ * CCP_CMD_SCATTERGATHER, HC_CMD_GATHER), using a proprietary
+ * linked-list-item (LLI) descriptor chain format.  The hash driver
+ * already uses this via cmh_dma_build_sg() + HC_CMD_GATHER.
+ *
+ * For symmetric cipher and AEAD commands, the LKM currently
+ * linearizes scatterlist input into contiguous bounce buffers via
+ * scatterwalk_map_and_copy() rather than building LLI chains from
+ * kernel scatterlists.  This is a deliberate first-submission
+ * simplification with a concrete technical justification:
+ *
+ *   - The hash SG path is unidirectional (DMA_TO_DEVICE gather only).
+ *     Skcipher and AEAD require bidirectional handling: separate src
+ *     and dst scatterlists (which may alias for in-place operations),
+ *     plus AAD and authentication tag regions with distinct DMA
+ *     directions and alignment constraints.
+ *   - The CMH LLI format requires 64-byte aligned descriptor chain
+ *     pointers (the .lli field) with 32-bit length fields.  This
+ *     alignment is automatically satisfied by dma_alloc_coherent()
+ *     for the descriptor array; data buffer addresses have no
+ *     hardware alignment requirement.  Kernel SG entries have no
+ *     alignment guarantee for data, so direct SG-to-LLI translation
+ *     requires per-segment validation, potential splitting at
+ *     descriptor boundaries, and separate chains for src/dst/AAD --
+ *     substantially more complex than the unidirectional hash
+ *     gather case.
+ *   - Each skcipher/AEAD driver caps linearization at
+ *     CMH_AES_MAX_CRYPTLEN / CMH_SM4_MAX_CRYPTLEN (32 MiB).
+ *     Requests exceeding this cap are rejected with -EINVAL.
+ *     In practice, crypto API callers (dm-crypt, IPsec, kernel TLS)
+ *     send page-sized or smaller buffers, so the bounce allocation
+ *     is typically <= PAGE_SIZE and succeeds even under GFP_ATOMIC.
+ *
+ * A shared SG-to-LLI adapter handling bidirectional mappings,
+ * alignment splitting, and in-place src==dst detection for the
+ * skcipher/AEAD/MAC paths is planned as a follow-up series once the
+ * core driver is accepted.
+ *
+ * This linearization pattern is consistent with other upstream HW
+ * crypto drivers that use bounce buffers in their initial
+ * submissions (e.g. ccree, sa2ul, omap-aes).
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/dma-mapping.h>
+#include <linux/platform_device.h>
+#include <linux/overflow.h>
+#include <linux/string.h>
+
+#include "cmh_dma.h"
+
+/* Module-global device pointer, set in cmh_dma_init() */
+static struct device *cmh_device;
+
+/**
+ * cmh_dma_init() - Initialize the standard DMA backend
+ * @pdev: Platform device providing the struct device for DMA ops
+ *
+ * Stores the device pointer for use by all DMA wrapper functions.
+ *
+ * Return: 0 (always succeeds for the standard backend).
+ */
+int cmh_dma_init(struct platform_device *pdev)
+{
+       cmh_device = &pdev->dev;
+       return 0;
+}
+
+/**
+ * cmh_dma_cleanup() - Tear down the standard DMA backend
+ *
+ * Clears the stored device pointer.
+ */
+void cmh_dma_cleanup(void)
+{
+       cmh_device = NULL;
+}
+
+/**
+ * cmh_dev() - Return the platform device pointer
+ *
+ * Return: struct device pointer, or NULL outside probe/remove lifecycle.
+ */
+struct device *cmh_dev(void)
+{
+       return cmh_device;
+}
+
+/* -- Streaming DMA -------------------------------------------------------- */
+
+/**
+ * cmh_dma_map_single() - Map a kernel buffer for streaming DMA
+ * @buf:  Kernel virtual address
+ * @size: Buffer length in bytes
+ * @dir:  DMA direction
+ *
+ * Return: DMA address, or a DMA_MAPPING_ERROR value on failure.
+ */
+dma_addr_t cmh_dma_map_single(void *buf, size_t size,
+                             enum dma_data_direction dir)
+{
+       return dma_map_single(cmh_device, buf, size, dir);
+}
+
+/**
+ * cmh_dma_unmap_single() - Unmap a streaming DMA buffer
+ * @addr: DMA address returned by cmh_dma_map_single()
+ * @size: Buffer length in bytes
+ * @dir:  DMA direction (must match the map call)
+ */
+void cmh_dma_unmap_single(dma_addr_t addr, size_t size,
+                         enum dma_data_direction dir)
+{
+       dma_unmap_single(cmh_device, addr, size, dir);
+}
+
+/**
+ * cmh_dma_sync_for_cpu() - Sync a DMA buffer for CPU access
+ * @addr: DMA address of the mapped buffer
+ * @size: Region length in bytes
+ * @dir:  DMA direction
+ */
+void cmh_dma_sync_for_cpu(dma_addr_t addr, size_t size,
+                         enum dma_data_direction dir)
+{
+       dma_sync_single_for_cpu(cmh_device, addr, size, dir);
+}
+
+/**
+ * cmh_dma_sync_for_device() - Sync a DMA buffer for device access
+ * @addr: DMA address of the mapped buffer
+ * @size: Region length in bytes
+ * @dir:  DMA direction
+ */
+void cmh_dma_sync_for_device(dma_addr_t addr, size_t size,
+                            enum dma_data_direction dir)
+{
+       dma_sync_single_for_device(cmh_device, addr, size, dir);
+}
+
+/**
+ * cmh_dma_map_error() - Check whether a DMA mapping failed
+ * @addr: DMA address to check
+ *
+ * Return: Non-zero if @addr indicates a mapping error.
+ */
+int cmh_dma_map_error(dma_addr_t addr)
+{
+       return dma_mapping_error(cmh_device, addr);
+}
+
+/* -- Coherent DMA --------------------------------------------------------- */
+
+/**
+ * cmh_dma_alloc() - Allocate coherent DMA memory
+ * @size:   Allocation size in bytes
+ * @handle: Output DMA address
+ * @gfp:    GFP allocation flags
+ *
+ * Return: Kernel virtual address, or NULL on failure.
+ */
+void *cmh_dma_alloc(size_t size, dma_addr_t *handle, gfp_t gfp)
+{
+       return dma_alloc_coherent(cmh_device, size, handle, gfp);
+}
+
+/**
+ * cmh_dma_free() - Free coherent DMA memory
+ * @size:   Allocation size (must match cmh_dma_alloc)
+ * @virt:   Kernel virtual address
+ * @handle: DMA address
+ */
+void cmh_dma_free(size_t size, void *virt, dma_addr_t handle)
+{
+       dma_free_coherent(cmh_device, size, virt, handle);
+}
+
+/* -- Buffer write helpers ------------------------------------------------- */
+
+/**
+ * cmh_dma_write() - Copy data into a DMA buffer
+ * @dst: Destination (from cmh_dma_alloc)
+ * @src: Source kernel buffer
+ * @len: Byte count
+ */
+void cmh_dma_write(void *dst, const void *src, size_t len)
+{
+       memcpy(dst, src, len);
+}
+
+/**
+ * cmh_dma_fence() - No-op on standard DMA API platforms (coherent)
+ * @ptr: Unused -- present for interface compatibility
+ */
+void cmh_dma_fence(void *ptr)
+{
+       /* Standard DMA API: coherent memory, no cross-slave fence needed */
+}
+
+/**
+ * cmh_dma_zero() - Zero a DMA buffer
+ * @dst: Destination (from cmh_dma_alloc)
+ * @len: Byte count
+ */
+void cmh_dma_zero(void *dst, size_t len)
+{
+       memset(dst, 0, len);
+}
+
+/**
+ * cmh_dma_build_sg() - Build a scatter-gather DMA mapping
+ * @bufs: Array of buffer descriptors to map
+ * @count: Number of entries in @bufs
+ * @gfp: GFP flags for memory allocation
+ *
+ * Allocates a streaming-DMA descriptor array and maps each buffer in @bufs
+ * for DMA-to-device transfer, filling CMH eSW-format scatter-gather
+ * descriptors with linked-list pointers.
+ *
+ * The descriptor array uses streaming DMA (kmalloc + dma_map_single) rather
+ * than dma_alloc_coherent so that cmh_dma_free_sg() -- which calls
+ * dma_unmap_single + kfree -- is safe from any context including BH-disabled
+ * completion callbacks.
+ *
+ * Return: Pointer to the allocated cmh_sg_map on success, NULL on failure.
+ */
+struct cmh_sg_map *cmh_dma_build_sg(const struct cmh_dma_buf *bufs, u32 count,
+                                   gfp_t gfp)
+{
+       struct cmh_sg_map *sgm;
+       u32 i;
+
+       if (!count)
+               return NULL;
+
+       sgm = kzalloc(struct_size(sgm, bufs, count), gfp);
+       if (!sgm)
+               return NULL;
+
+       sgm->count = count;
+       sgm->items_size = array_size(count, sizeof(*sgm->items));
+       if (sgm->items_size == SIZE_MAX)
+               goto err_free_sgm;
+
+       /*
+        * Allocate descriptor array with kmalloc and map for streaming DMA.
+        * We map first to obtain items_dma (needed for .lli pointers),
+        * then sync-for-cpu, fill descriptors, and sync-for-device.
+        */
+       sgm->items = kzalloc(sgm->items_size, gfp);
+       if (!sgm->items)
+               goto err_free_sgm;
+
+       sgm->items_dma = cmh_dma_map_single(sgm->items, sgm->items_size,
+                                           DMA_TO_DEVICE);
+       if (cmh_dma_map_error(sgm->items_dma))
+               goto err_free_items;
+
+       /* Map each source buffer for device read */
+       for (i = 0; i < count; i++) {
+               dma_addr_t dma;
+
+               if (!bufs[i].len)
+                       goto err_unmap;
+               sgm->bufs[i].len = bufs[i].len;
+               dma = cmh_dma_map_single(bufs[i].data, bufs[i].len,
+                                        DMA_TO_DEVICE);
+               if (cmh_dma_map_error(dma))
+                       goto err_unmap;
+               sgm->bufs[i].dma = dma;
+       }
+
+       /*
+        * Reclaim CPU ownership of the descriptor buffer.  After
+        * dma_map_single the device owns the mapping; we must call
+        * sync_for_cpu before writing regardless of direction.  The
+        * direction matches the original mapping (DMA_TO_DEVICE) --
+        * this tells the DMA layer which cache operations apply:
+        * invalidate so the CPU sees coherent data before we fill
+        * the SG descriptors and later sync_for_device.
+        */
+       cmh_dma_sync_for_cpu(sgm->items_dma, sgm->items_size,
+                            DMA_TO_DEVICE);
+
+       /* Fill CMH eSW SG descriptors */
+       for (i = 0; i < count; i++) {
+               u64 lli_val;
+
+               if (i + 1 < count)
+                       lli_val = (u64)(sgm->items_dma +
+                               (i + 1) * sizeof(*sgm->items));
+               else
+                       lli_val = 0;
+
+               sgm->items[i].lli = lli_val;
+               sgm->items[i].src = (u64)sgm->bufs[i].dma;
+               sgm->items[i].dst = 0;
+               sgm->items[i].len = (u64)bufs[i].len;
+       }
+
+       /* Flush descriptor writes to device */
+       cmh_dma_sync_for_device(sgm->items_dma, sgm->items_size,
+                               DMA_TO_DEVICE);
+
+       return sgm;
+
+err_unmap:
+       while (i--)
+               cmh_dma_unmap_single(sgm->bufs[i].dma,
+                                    sgm->bufs[i].len, DMA_TO_DEVICE);
+       cmh_dma_unmap_single(sgm->items_dma, sgm->items_size,
+                            DMA_TO_DEVICE);
+err_free_items:
+       kfree(sgm->items);
+err_free_sgm:
+       kfree(sgm);
+       return NULL;
+}
+
+/**
+ * cmh_dma_free_sg() - Unmap and free a scatter-gather mapping
+ * @sgm: Scatter-gather mapping created by cmh_dma_build_sg(), or NULL
+ *
+ * Unmaps all DMA-mapped buffers, unmaps and frees the descriptor array,
+ * and releases the cmh_sg_map structure.  Safe to call from any context
+ * (including BH-disabled completion callbacks) because it uses only
+ * dma_unmap_single + kfree -- no vunmap/dma_free_coherent.
+ */
+void cmh_dma_free_sg(struct cmh_sg_map *sgm)
+{
+       u32 i;
+
+       if (!sgm)
+               return;
+
+       for (i = 0; i < sgm->count; i++)
+               cmh_dma_unmap_single(sgm->bufs[i].dma,
+                                    sgm->bufs[i].len, DMA_TO_DEVICE);
+
+       cmh_dma_unmap_single(sgm->items_dma, sgm->items_size,
+                            DMA_TO_DEVICE);
+       kfree(sgm->items);
+       kfree(sgm);
+}
+
+/**
+ * cmh_dma_orphan_free() - Orphan cleanup callback for abandoned DMA buffers
+ * @data: Pointer to a struct cmh_dma_orphan describing the orphaned mapping
+ *
+ * Called by the transaction manager when a synchronous operation times out
+ * and the caller has already returned.  Unmaps the DMA buffer and frees
+ * the backing memory and the orphan descriptor itself.
+ */
+void cmh_dma_orphan_free(void *data)
+{
+       struct cmh_dma_orphan *o = data;
+
+       cmh_dma_unmap_single(o->addr, o->len, o->dir);
+       kfree_sensitive(o->buf);
+       kfree(o);
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
new file mode 100644
index 000000000000..452b8272908f
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -0,0 +1,365 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Platform Driver Entry and Exit
+ *
+ * Responsibilities:
+ *   - Match "cri,cmh" DT node via platform_driver
+ *   - Parse device-tree properties via cmh_config_init()
+ *   - ioremap the SIC region
+ *   - Verify CMH boot status (sanity check)
+ *   - Compute per-instance register bases
+ *   - Initialize MBX queues (MQI)
+ *   - Start Transaction Manager kthread
+ *   - Register Response Handler IRQ
+ *   - Register Kernel Crypto API hash algorithms
+ *   - Clean up in reverse order on exit or error
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/dma-mapping.h>
+#include <linux/platform_device.h>
+#include <linux/of.h>
+
+#include "cmh.h"
+#include "cmh_dma.h"
+#include "cmh_mqi.h"
+#include "cmh_txn.h"
+#include "cmh_rh.h"
+#include "cmh_registers.h"
+#include "cmh_debugfs.h"
+#include "cmh_sysfs.h"
+
+#include <linux/iopoll.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Alex Ousherovitch <aousherovitch@rambus.com>");
+MODULE_AUTHOR("Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>");
+MODULE_AUTHOR("Joel Wittenauer <Joel.Wittenauer@cryptography.com>");
+MODULE_DESCRIPTION("CRI CryptoManager Hub (CMH) hardware crypto accelerator");
+MODULE_ALIAS("platform:cmh");
+MODULE_IMPORT_NS("CRYPTO_INTERNAL");
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+static bool skip_fw_check;
+module_param(skip_fw_check, bool, 0444);
+MODULE_PARM_DESC(skip_fw_check,
+                "[debug] Skip eSW boot status check at probe (default: false)");
+#else
+#define skip_fw_check false
+#endif
+
+/* Global device state (single-instance module) */
+
+static struct cmh_device *g_cmh_dev;
+
+/* SIC Sanity Check */
+
+static int cmh_check_sic(struct cmh_config *cfg)
+{
+       u32 boot_status;
+       u32 hw_version;
+       u32 sw_boot;
+       int ret;
+
+       boot_status = cmh_reg_read32(cfg->sic_mapped, R_SIC_BOOT_STATUS);
+       hw_version  = cmh_reg_read32(cfg->sic_mapped, R_SIC_HW_VERSION0);
+
+       dev_info(cmh_dev(), "SIC boot_status=0x%08x hw_version=0x%08x\n",
+                boot_status, hw_version);
+
+       if ((boot_status & SIC_BOOT_STATUS_MASK) != SIC_BOOT_STATUS_PASS) {
+               dev_err(cmh_dev(), "SIC boot status check failed (0x%02x != 0x%02x)\n",
+                       boot_status & SIC_BOOT_STATUS_MASK, SIC_BOOT_STATUS_PASS);
+               return -EIO;
+       }
+
+       /* Wait for CMH eSW to reach mission mode */
+       ret = read_poll_timeout(ioread32, sw_boot,
+                               sw_boot & SIC_SW_BOOT_STATUS_MISSION,
+                               1000,
+                               (unsigned long)cfg->fw_ready_timeout_ms * 1000UL,
+                               false,
+                               cfg->sic_mapped + R_SIC_SW_BOOT_STATUS);
+       if (ret) {
+               sw_boot = cmh_reg_read32(cfg->sic_mapped, R_SIC_SW_BOOT_STATUS);
+               dev_err(cmh_dev(), "CMH eSW not ready (sw_boot_status=0x%08x, timeout=%ums)\n",
+                       sw_boot, cfg->fw_ready_timeout_ms);
+               return -ETIMEDOUT;
+       }
+
+       dev_info(cmh_dev(), "CMH eSW in mission mode (sw_boot_status=0x%08x)\n",
+                sw_boot);
+
+       return 0;
+}
+
+/* Module Init -- platform driver probe */
+
+static int cmh_probe(struct platform_device *pdev)
+{
+       struct cmh_device *dev;
+       struct cmh_config *cfg;
+       unsigned int i;
+       int ret;
+
+       /* Single-instance guard: reject if already probed */
+       if (g_cmh_dev)
+               return -EBUSY;
+
+       dev_info(&pdev->dev, "loading v%s\n", CMH_VERSION);
+
+       dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
+       if (!dev)
+               return -ENOMEM;
+
+       dev->dev = &pdev->dev;
+       cfg = &dev->config;
+
+       /* Declare DMA addressing capability */
+       ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+       if (ret) {
+               dev_err(&pdev->dev, "dma_set_mask_and_coherent failed (rc=%d)\n",
+                       ret);
+               goto err_free_dev;
+       }
+
+       /* Initialize DMA backend (standard API or FPGA pool) */
+       ret = cmh_dma_init(pdev);
+       if (ret) {
+               dev_err(&pdev->dev, "DMA init failed (rc=%d)\n", ret);
+               goto err_free_dev;
+       }
+
+       /* Step 1: Parse and validate configuration (DT + module params) */
+       ret = cmh_config_init(cfg, pdev);
+       if (ret)
+               goto err_dma_init;
+
+       dev_info(cmh_dev(), "sic_base=0x%llx size=0x%zx mbx_count=%u irq=%d\n",
+                (unsigned long long)cfg->sic_base, cfg->sic_size,
+                cfg->mbx_count, cfg->irq);
+
+       /* Step 2: ioremap the SIC region */
+       cfg->sic_mapped = devm_platform_ioremap_resource(pdev, 0);
+       if (IS_ERR(cfg->sic_mapped)) {
+               ret = PTR_ERR(cfg->sic_mapped);
+               cfg->sic_mapped = NULL;
+               dev_err(cmh_dev(), "ioremap failed for SIC region (rc=%d)\n",
+                       ret);
+               goto err_dma_init;
+       }
+
+       /* Step 3: Verify CMH is alive */
+       if (skip_fw_check) {
+               dev_info(cmh_dev(), "skipping eSW boot check (skip_fw_check=1)\n");
+       } else {
+               ret = cmh_check_sic(cfg);
+               if (ret)
+                       goto err_dma_init;
+       }
+
+       /* Step 4: Compute per-instance register bases */
+       for (i = 0; i < cfg->mbx_count; i++) {
+               struct cmh_mbx_config *m = &cfg->mailboxes[i];
+
+               m->reg_base = cmh_mbx_instance_base(cfg->sic_mapped,
+                                                   m->instance);
+
+               dev_dbg(cmh_dev(), "mbx[%u] instance=%u reg_base=%p\n",
+                       i, m->instance, m->reg_base);
+       }
+
+       (void)cmh_debugfs_init(cfg);
+
+       /* Initialise mailbox queue interface */
+       ret = cmh_mqi_init(cfg);
+       if (ret)
+               goto err_mqi_init;
+
+       /* Initialise transaction manager */
+       ret = cmh_tm_init(cfg);
+       if (ret)
+               goto err_tm_init;
+
+       /* Initialise response handler */
+       ret = cmh_rh_init(cfg);
+       if (ret)
+               goto err_rh_init;
+
+       g_cmh_dev = dev;
+       platform_set_drvdata(pdev, dev);
+
+       dev_info(cmh_dev(), "initialized successfully\n");
+       return 0;
+
+err_rh_init:
+       cmh_tm_cleanup();
+err_tm_init:
+       cmh_mqi_cleanup(cfg);
+err_mqi_init:
+       cmh_debugfs_cleanup();
+err_dma_init:
+       cmh_dma_cleanup();
+err_free_dev:
+       return ret;
+}
+
+/* Module Exit -- platform driver remove */
+
+static void cmh_remove(struct platform_device *pdev)
+{
+       struct cmh_device *dev = platform_get_drvdata(pdev);
+       struct cmh_config *cfg;
+
+       if (!dev)
+               return;
+
+       cfg = &dev->config;
+
+       cmh_rh_cleanup(cfg);
+       cmh_tm_cleanup();
+       cmh_mqi_cleanup(cfg);
+       cmh_debugfs_cleanup();
+       cmh_dma_cleanup();
+
+       dev_info(&pdev->dev, "unloaded successfully\n");
+
+       g_cmh_dev = NULL;
+}
+
+static const struct of_device_id cmh_of_match[] = {
+       { .compatible = "cri,cmh" },
+       { /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, cmh_of_match);
+
+/*
+ * PM suspend/resume.
+ *
+ * Suspend: drain the TM first (while the RH is still active and can
+ * deliver completions for in-flight transactions), then quiesce the
+ * RH (cancel watchdog, mask HW interrupts).  This ordering ensures
+ * the drain_timeout_ms wait in cmh_tm_quiesce() can actually succeed
+ * -- if we suspended RH first, no completions would be delivered and
+ * the drain would always hit the force-cancel path.
+ *
+ * IRQ handlers remain registered (standard PM pattern: the kernel
+ * disables the IRQ lines during suspend, no need to free/re-request).
+ *
+ * Resume: re-check the SIC/SW boot status, re-synchronise the RH
+ * with hardware (head positions, interrupt masks, watchdog), then
+ * restart the TM kthread.
+ */
+
+static int cmh_suspend(struct device *dev)
+{
+       struct cmh_device *cmh = dev_get_drvdata(dev);
+
+       if (!cmh)
+               return 0;
+
+       dev_info(dev, "suspending\n");
+       cmh_tm_quiesce();
+       cmh_rh_suspend(&cmh->config);
+       return 0;
+}
+
+static int cmh_resume(struct device *dev)
+{
+       struct cmh_device *cmh = dev_get_drvdata(dev);
+       int ret;
+
+       if (!cmh)
+               return 0;
+
+       ret = cmh_check_sic(&cmh->config);
+       if (ret) {
+               dev_err(dev, "resume: CMH eSW health check failed (%d)\n",
+                       ret);
+               return ret;
+       }
+
+       /*
+        * cmh_rh_resume() is void: it only re-syncs MMIO head pointers,
+        * clears stale interrupt status bits (W1C), re-enables interrupt
+        * masks, and re-arms the watchdog timer -- none of which can fail
+        * after the SIC health check above has confirmed HW accessibility.
+        */
+       cmh_rh_resume(&cmh->config);
+
+       ret = cmh_tm_resume();
+       if (ret) {
+               dev_err(dev, "resume: TM restart failed (%d)\n", ret);
+               return ret;
+       }
+       dev_info(dev, "resumed successfully\n");
+       return 0;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(cmh_pm_ops,
+                               cmh_suspend,
+                               cmh_resume);
+
+/*
+ * Runtime PM is intentionally not implemented.  The CMH hardware does
+ * not expose HLOS-accessible clock gates or power domains -- the eSW
+ * firmware manages HW power state independently.  There is no mechanism
+ * for the kernel to idle, gate clocks, or power down the accelerator
+ * block from HLOS.  If a future platform variant exposes power control
+ * to HLOS (e.g. via a SCMI power domain), runtime PM support can be
+ * added at that time using SET_RUNTIME_PM_OPS and pm_runtime_get/put
+ * around VCQ submission paths.
+ *
+ * System sleep (suspend/resume) is supported via DEFINE_SIMPLE_DEV_PM_OPS
+ * above: suspend quiesces the TM and masks IRQs; resume re-verifies
+ * eSW health (SIC status) and restarts the TM thread.
+ */
+
+static struct platform_driver cmh_driver = {
+       .probe      = cmh_probe,
+       .remove     = cmh_remove,
+       .driver = {
+               .name           = CMH_DRV_NAME,
+               .of_match_table = cmh_of_match,
+               .dev_groups     = cmh_sysfs_groups,
+               .pm             = pm_sleep_ptr(&cmh_pm_ops),
+       },
+};
+
+static int __init cmh_init(void)
+{
+       int ret;
+
+       ret = platform_driver_register(&cmh_driver);
+       if (ret)
+               return ret;
+
+       /*
+        * platform_driver_register() does not propagate probe() errors.
+        * If a DT node matched but probe() failed (e.g. bad module params),
+        * g_cmh_dev will not have been set.  Detect this and unregister.
+        *
+        * This is intentional for a non-discoverable accelerator with no
+        * hotplug or deferred-probe scenarios -- the device is either
+        * present at boot or not.  Leaving the driver registered after a
+        * probe failure would silently produce a non-functional module.
+        */
+       if (!g_cmh_dev) {
+               platform_driver_unregister(&cmh_driver);
+               return -ENODEV;
+       }
+
+       return 0;
+}
+
+static void __exit cmh_exit(void)
+{
+       platform_driver_unregister(&cmh_driver);
+}
+
+module_init(cmh_init);
+module_exit(cmh_exit);
diff --git a/drivers/crypto/cmh/cmh_mqi.c b/drivers/crypto/cmh/cmh_mqi.c
new file mode 100644
index 000000000000..9a135be58562
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_mqi.c
@@ -0,0 +1,355 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Mailbox Queue Initializer
+ *
+ * Responsibilities:
+ *   - Allocate queue buffers for each configured mailbox
+ *   - Execute the MBX lock/setup/enable register sequence
+ *   - Readback-verify all critical register writes
+ *   - Hold lock for MBX lifetime (CMH eSW requires it for host access)
+ *   - Clean up (flush + unlock + free) on exit or error
+ *
+ * Register sequence per instance (per CMH MBX hardware specification):
+ *   1. Read R_MBX_LOCK -> non-zero = ownership token acquired
+ *   2. W1C stale R_MBX_INTERRUPT bits (avoids spurious error cascade)
+ *   3. Set R_MBX_INTERRUPT_MASK = MBX_IRQ_MASK
+ *   4. Write QUEUE_LO/HI, SLOTS, STRIDE (queue address + geometry)
+ *   5. Sync TAIL = HEAD (CMH eSW owns HEAD; avoids stale-queue parse)
+ *   6. Readback verify QUEUE_LO/HI/SLOTS/STRIDE
+ *   7. Write HOST_INFO (signals CMH eSW "MBX configured")
+ *   8. Write COMMAND = MBX_COMMAND_RUN
+ *   9. Lock stays held -- released only in teardown
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/delay.h>
+#include <linux/jiffies.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+
+#include "cmh_mqi.h"
+#include "cmh_dma.h"
+#include "cmh_registers.h"
+#include "cmh_config.h"
+
+/* Flush polling: eSW clears R_MBX_COMMAND to 0 when flush completes */
+#define MBX_FLUSH_POLL_US      50
+#define MBX_FLUSH_TIMEOUT_US   1000000 /* 1 second */
+
+/* MBX Lock / Unlock */
+
+/*
+ * Attempt to acquire the MBX hardware lock.
+ * Returns the lock token (non-zero) on success, 0 on timeout.
+ */
+static u32 cmh_mbx_lock(void __iomem *reg_base, u32 instance)
+{
+       unsigned long deadline = jiffies + msecs_to_jiffies(MBX_LOCK_TIMEOUT_MS);
+       u32 lock;
+
+       while (time_before(jiffies, deadline)) {
+               lock = cmh_reg_read32(reg_base, R_MBX_LOCK);
+               if (lock) {
+                       dev_dbg(cmh_dev(), "mbx %u lock acquired (token=0x%08x)\n",
+                               instance, lock);
+                       return lock;
+               }
+               /* HW lock may be held by CMH eSW -- back off before retry */
+               usleep_range(MBX_LOCK_POLL_MIN_US, MBX_LOCK_POLL_MAX_US);
+       }
+
+       return 0;
+}
+
+/* Release the MBX lock: clear interrupt mask, write token back */
+static void cmh_mbx_unlock(void __iomem *reg_base, u32 lock_val)
+{
+       cmh_reg_write32(0, reg_base, R_MBX_INTERRUPT_MASK);
+       cmh_reg_write32(lock_val, reg_base, R_MBX_LOCK);
+}
+
+/* Register Readback Verification */
+
+static int cmh_verify_reg(void __iomem *base, u32 offset, u32 expected,
+                         const char *name, u32 instance)
+{
+       u32 actual = cmh_reg_read32(base, offset);
+
+       if (actual != expected) {
+               dev_err(cmh_dev(), "mbx %u %s readback mismatch: 0x%08x != 0x%08x\n",
+                       instance, name, actual, expected);
+               return -EIO;
+       }
+       return 0;
+}
+
+/* Clear any stale interrupt bits left from a prior module lifecycle. */
+static void cmh_mbx_clear_stale_irqs(void __iomem *base, u32 instance)
+{
+       u32 stale = cmh_reg_read32(base, R_MBX_INTERRUPT);
+
+       if (stale) {
+               cmh_reg_write32(stale, base, R_MBX_INTERRUPT);
+               dev_dbg(cmh_dev(), "mbx %u cleared stale irq bits=0x%x\n",
+                       instance, stale);
+       }
+}
+
+/* Read CMH eSW HEAD and set TAIL = HEAD so the queue appears empty. */
+static void cmh_mbx_sync_tail_to_head(void __iomem *base, u32 instance)
+{
+       u32 fw_head = cmh_reg_read32(base, R_MBX_QUEUE_HEAD);
+
+       cmh_reg_write32(fw_head, base, R_MBX_QUEUE_TAIL);
+       if (fw_head)
+               dev_dbg(cmh_dev(), "mbx %u synced tail=%u to fw head\n",
+                       instance, fw_head);
+}
+
+/* Per-Mailbox Setup */
+
+static int cmh_mbx_setup_one(struct cmh_mbx_config *mbx)
+{
+       void __iomem *base = mbx->reg_base;
+       u32 addr_lo = lower_32_bits(mbx->dma_handle);
+       u32 addr_hi = upper_32_bits(mbx->dma_handle);
+       u32 lock_val;
+       int ret;
+
+       /* Step 1: Acquire exclusive access */
+       lock_val = cmh_mbx_lock(base, mbx->instance);
+       if (!lock_val) {
+               dev_err(cmh_dev(), "mbx %u lock timeout after %u ms\n",
+                       mbx->instance, MBX_LOCK_TIMEOUT_MS);
+               return -ETIMEDOUT;
+       }
+
+       /*
+        * Step 1.5: Clear stale interrupt bits from a prior module lifecycle.
+        *
+        * After rmmod, the CMH eSW may leave ERROR_IRQ set in
+        * R_MBX_INTERRUPT even though STATUS is IDLE.  If we enable
+        * the mask first, the stale bits immediately trigger the
+        * CMH eSW interrupt chain, which can cascade into ERROR
+        * status before the first hash operation.  W1C-clear them first.
+        */
+       cmh_mbx_clear_stale_irqs(base, mbx->instance);
+
+       /* Step 2: Program interrupt mask (enable DONE/ERROR interrupts) */
+       cmh_reg_write32(MBX_IRQ_MASK, base, R_MBX_INTERRUPT_MASK);
+
+       /* Step 3: Configure queue address (64-bit split) */
+       cmh_reg_write32(addr_lo, base, R_MBX_QUEUE_LO);
+       cmh_reg_write32(addr_hi, base, R_MBX_QUEUE_HI);
+
+       /* Step 4: Configure queue geometry */
+       cmh_reg_write32(mbx->slots_log2, base, R_MBX_QUEUE_SLOTS);
+       cmh_reg_write32(mbx->stride_log2, base, R_MBX_QUEUE_STRIDE);
+
+       /*
+        * Step 5: Synchronise TAIL to CMH eSW's HEAD.
+        *
+        * R_MBX_QUEUE_HEAD is read-only from the host side -- only the
+        * CMH eSW can write it.  On a fresh boot HEAD is 0; after an
+        * rmmod/insmod cycle it retains the value from the previous
+        * session (e.g. 44).  Writing 0 from the host is silently
+        * dropped by the MBX HW.
+        *
+        * If we set TAIL=0 while HEAD=44 the CMH eSW sees a non-empty
+        * queue (head != tail with wrap-around) and immediately tries
+        * to load a VCQ at the old head offset into our freshly-zeroed
+        * DMA buffer, causing an "Invalid VCQ" EFAULT -> ECHILD cascade.
+        *
+        * Fix: read HEAD and set TAIL = HEAD so the queue looks empty.
+        */
+       cmh_mbx_sync_tail_to_head(base, mbx->instance);
+
+       /*
+        * Step 6: Readback verify critical registers.
+        * HOST_INFO is deliberately deferred to after verification -- writing
+        * it tells the CMH eSW "MBX is ready" and the CMH eSW may inspect
+        * (and clear) the queue registers immediately.
+        */
+       ret = cmh_verify_reg(base, R_MBX_QUEUE_LO, addr_lo,
+                            "QUEUE_LO", mbx->instance);
+       if (ret)
+               goto err_unlock;
+
+       ret = cmh_verify_reg(base, R_MBX_QUEUE_HI, addr_hi,
+                            "QUEUE_HI", mbx->instance);
+       if (ret)
+               goto err_unlock;
+
+       ret = cmh_verify_reg(base, R_MBX_QUEUE_SLOTS, mbx->slots_log2,
+                            "QUEUE_SLOTS", mbx->instance);
+       if (ret)
+               goto err_unlock;
+
+       ret = cmh_verify_reg(base, R_MBX_QUEUE_STRIDE, mbx->stride_log2,
+                            "QUEUE_STRIDE", mbx->instance);
+       if (ret)
+               goto err_unlock;
+
+       /*
+        * Step 7: Host identification -- signals CMH eSW that MBX is configured.
+        * Must come after readback verification (CMH eSW may inspect the MBX
+        * immediately) and before COMMAND_RUN.
+        */
+       cmh_reg_write32(MBX_HOST_INFO_LKM, base, R_MBX_HOST_INFO);
+
+       /* Step 8: Enable -- start the mailbox */
+       cmh_reg_write32(MBX_COMMAND_RUN, base, R_MBX_COMMAND);
+
+       /* Read status while we still hold the lock */
+       dev_dbg(cmh_dev(), "mbx %u setup: dma=0x%08x%08x slots=%u stride=%u status=0x%08x\n",
+               mbx->instance, addr_hi, addr_lo,
+                mbx->slots_log2, mbx->stride_log2,
+                cmh_reg_read32(base, R_MBX_STATUS));
+
+       /*
+        * Lock stays held for the lifetime of this MBX session.
+        *
+        * mbx->lock_val is the ownership token returned by R_MBX_LOCK at
+        * acquisition time.  The CMH eSW validates this token on every
+        * register access and requires it to be written back to release.
+        * It is NOT a transient mutex -- it persists until teardown.
+        */
+       mbx->lock_val = lock_val;
+
+       return 0;
+
+err_unlock:
+       cmh_mbx_unlock(base, lock_val);
+       return ret;
+}
+
+/* Per-Mailbox Teardown */
+
+static void cmh_mbx_teardown_one(struct cmh_mbx_config *mbx)
+{
+       void __iomem *base = mbx->reg_base;
+       u32 status;
+
+       if (!base || !mbx->lock_val)
+               return;
+
+       if (MBX_STATUS_CODE(cmh_reg_read32(base, R_MBX_STATUS)) !=
+           MBX_STATUS_OFFLINE) {
+               cmh_reg_write32(MBX_COMMAND_FLUSH, base, R_MBX_COMMAND);
+
+               /*
+                * Wait for the eSW to process the flush before releasing
+                * the DMA buffer.  The eSW clears R_MBX_COMMAND to zero
+                * upon completion; if it doesn't within 1 s, log a
+                * warning and proceed (best-effort teardown).
+                *
+                * DMA safety: by this point the RH and TM are already
+                * shut down (remove order: algos -> RH -> TM -> MQI),
+                * so no new transactions can be submitted and no
+                * completions are in flight.  The queue buffer is only
+                * read by the eSW during active command processing;
+                * after flush the eSW will not touch it again.
+                */
+               if (read_poll_timeout(cmh_reg_read32, status,
+                                     status == 0,
+                                     MBX_FLUSH_POLL_US,
+                                     MBX_FLUSH_TIMEOUT_US,
+                                     true, base, R_MBX_COMMAND))
+                       dev_warn(cmh_dev(),
+                                "mbx %u flush timeout during teardown (status=0x%08x)\n",
+                                mbx->instance,
+                                cmh_reg_read32(base, R_MBX_STATUS));
+       }
+
+       cmh_mbx_unlock(base, mbx->lock_val);
+       mbx->lock_val = 0;
+}
+
+/* Public Interface */
+
+/**
+ * cmh_mqi_init() - Initialize all mailbox queues
+ * @cfg: CMH configuration describing the mailboxes to set up
+ *
+ * Allocates DMA queue buffers for each configured mailbox, then executes
+ * the MBX lock/setup/enable register sequence.  On failure, all
+ * successfully initialized mailboxes are torn down and buffers freed.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mqi_init(struct cmh_config *cfg)
+{
+       unsigned int i, j;
+       int ret;
+
+       /* Allocate queue buffers */
+       for (i = 0; i < cfg->mbx_count; i++) {
+               struct cmh_mbx_config *m = &cfg->mailboxes[i];
+
+               m->virt_addr = cmh_dma_alloc(m->queue_size, &m->dma_handle,
+                                            GFP_KERNEL);
+               if (!m->virt_addr) {
+                       ret = -ENOMEM;
+                       goto err_free_bufs;
+               }
+
+               dev_dbg(cmh_dev(), "mqi[%u] alloc %zu bytes @ virt=%pK dma=%pad\n",
+                       i, m->queue_size, m->virt_addr, &m->dma_handle);
+       }
+
+       /* Lock/setup/enable each mailbox */
+       for (i = 0; i < cfg->mbx_count; i++) {
+               ret = cmh_mbx_setup_one(&cfg->mailboxes[i]);
+               if (ret) {
+                       dev_err(cmh_dev(), "mqi[%u] setup failed (rc=%d)\n",
+                               i, ret);
+                       goto err_teardown;
+               }
+       }
+
+       dev_info(cmh_dev(), "MQI init complete (%u mailboxes)\n", cfg->mbx_count);
+       return 0;
+
+err_teardown:
+       for (j = 0; j < i; j++)
+               cmh_mbx_teardown_one(&cfg->mailboxes[j]);
+err_free_bufs:
+       for (j = 0; j < cfg->mbx_count; j++) {
+               if (cfg->mailboxes[j].virt_addr)
+                       cmh_dma_free(cfg->mailboxes[j].queue_size,
+                                    cfg->mailboxes[j].virt_addr,
+                                    cfg->mailboxes[j].dma_handle);
+               cfg->mailboxes[j].virt_addr = NULL;
+               cfg->mailboxes[j].dma_handle = 0;
+       }
+       return ret;
+}
+
+/**
+ * cmh_mqi_cleanup() - Clean up all mailbox queues
+ * @cfg: CMH configuration describing the mailboxes to tear down
+ *
+ * Tears down each mailbox (flush + unlock) and frees the DMA queue
+ * buffers allocated by cmh_mqi_init().
+ */
+void cmh_mqi_cleanup(struct cmh_config *cfg)
+{
+       unsigned int i;
+
+       for (i = 0; i < cfg->mbx_count; i++) {
+               struct cmh_mbx_config *m = &cfg->mailboxes[i];
+
+               cmh_mbx_teardown_one(m);
+
+               if (m->virt_addr)
+                       cmh_dma_free(m->queue_size, m->virt_addr,
+                                    m->dma_handle);
+               m->virt_addr = NULL;
+               m->dma_handle = 0;
+       }
+
+       dev_info(cmh_dev(), "MQI cleanup complete\n");
+}
diff --git a/drivers/crypto/cmh/cmh_rh.c b/drivers/crypto/cmh/cmh_rh.c
new file mode 100644
index 000000000000..48cb51d24a5e
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_rh.c
@@ -0,0 +1,1145 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Response Handler
+ *
+ * IRQ-driven completion processing using request_threaded_irq():
+ *
+ *   Hardirq:  For each MBX, read R_MBX_INTERRUPT.  If any bit is set,
+ *             W1C-clear it and mark the MBX for threaded processing.
+ *             Return IRQ_WAKE_THREAD if any MBX had work.
+ *
+ *   Thread:   For each pending MBX, read R_MBX_QUEUE_HEAD.  Walk the
+ *             per-MBX transaction queue (oldest first): for every txn
+ *             whose last_vcq_id < new_head, check status, fire the
+ *             completion callback, and free the transaction object.
+ *
+ * The DT "cri,cmh" node declares one PLIC interrupt per mailbox,
+ * matching the real CMH ch_sys_interrupt_mbx[N-1:0] topology.
+ * Each MBX gets its own Linux virq; the same hardirq/thread pair
+ * is registered for all of them.  The handler still scans all
+ * mailboxes on every invocation -- this is intentional, as it
+ * provides robustness against coalesced or missed edges.
+ *
+ * IRQ source: resolved from the "cri,cmh" DT node at init time.
+ * The module's irq= parameter can override with a single shared IRQ.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/slab.h>
+#include <linux/atomic.h>
+#include <linux/of.h>
+#include <linux/of_irq.h>
+#include <linux/timer.h>
+#include <linux/jiffies.h>
+
+#include "cmh_rh.h"
+#include "cmh_txn.h"
+#include "cmh_registers.h"
+#include "cmh_config.h"
+#include "cmh_debugfs.h"
+#include "cmh_dma.h"
+
+/* Per-mailbox IRQ bookkeeping */
+struct cmh_rh_mbx {
+       u32        last_head;   /* last-observed MBX head position */
+       atomic_t   irq_bits;   /* interrupt bits saved by hardirq (atomic_or) */
+       bool       pending;     /* threaded handler should process this MBX */
+       bool       restart_pending; /* RESTART issued, awaiting eSW ack */
+       u32        restart_retries; /* watchdog ticks since RESTART issued */
+       u32        flush_count;     /* consecutive failed FLUSH escalations */
+       bool       wedged;          /* recovery failed, MBX offline */
+       u32        abort_stall_ticks; /* ticks since async timeout ABORT issued */
+};
+
+/* Module-level RH state */
+static struct {
+       struct cmh_config      *cfg;
+       int                     irqs[CMH_MAX_CONFIGURED_MBX]; /* per-MBX virqs */
+       u32                     nirqs;          /* number of registered IRQs */
+       struct cmh_rh_mbx      *mbx;            /* array[cfg->mbx_count] */
+       atomic_t                irq_count;      /* hardirq invocation counter */
+       bool                    active;
+} rh;
+
+/*
+ * Serialise the read-last_head / process_mbx / update-last_head
+ * sequence between the threaded IRQ handler (process context) and
+ * the watchdog timer (softirq context).  Without this, a timer
+ * softirq can preempt the kthread mid-sequence, causing both paths
+ * to process the same head advance and prematurely complete a
+ * subsequent transaction before the CMH eSW has written its DMA
+ * output -- leading to data corruption and SLAB freelist poisoning.
+ *
+ * The kthread acquires with spin_lock_bh (disables softirqs), the
+ * watchdog acquires with spin_lock (already in softirq context).
+ */
+static DEFINE_SPINLOCK(rh_process_lock);
+
+/*
+ * Watchdog timer -- missed-IRQ recovery.
+ *
+ * Fires every watchdog_ms while rh.active.  Reads MBX head registers;
+ * if any head has advanced without an IRQ, processes completions and
+ * logs a notice.  Standard kernel pattern, analogous to NIC watchdog
+ * timers.
+ *
+ * Safe from timer/softirq context: cmh_reg_read32() is an MMIO read,
+ * cmh_tm_pop_transaction() uses spin_lock_irqsave(), and TM completion
+ * callbacks (crypto_request_complete et al.) are documented safe from
+ * any context including softirq.  rh_process_lock serialises the
+ * head-read / process / head-update sequence against the threaded
+ * IRQ handler to prevent double-processing of the same completion.
+ *
+ * Default 200 ms (5 fires/s) provides ~10 recovery attempts within
+ * the default vcq_timeout_ms (2 s).  Tune via debugfs config/watchdog_ms
+ * for platforms where interrupt delivery is more reliable (e.g. MSI on
+ * FPGA/silicon -- 500 ms--1 s may suffice as a safety net).
+ */
+#define CMH_RH_WATCHDOG_MS_DEFAULT  200
+
+/*
+ * Floor for watchdog_ms to prevent a zero/near-zero value from
+ * spinning the timer in a tight softirq loop.  Enforced at the
+ * point of use so debugfs writes are never rejected.
+ */
+#define CMH_RH_WATCHDOG_MS_MIN      10
+
+/*
+ * Maximum watchdog ticks to wait for the eSW to process RESTART
+ * before escalating to FLUSH.  At the default 200 ms interval,
+ * 5 retries = 1 s -- generous for an operation that should take
+ * microseconds.  If the eSW hasn't responded by then, issue
+ * MBX_COMMAND_FLUSH to hard-reset the mailbox state.
+ */
+#define CMH_RH_RESTART_MAX_RETRIES  5
+
+/*
+ * Maximum consecutive FLUSH escalations before marking the MBX as
+ * wedged.  Each FLUSH cycle takes RESTART_MAX_RETRIES watchdog ticks
+ * (~1 s at default interval).  Two failed FLUSHes (~2 s total)
+ * strongly indicate the eSW is not processing MBX commands at all.
+ */
+#define CMH_RH_FLUSH_MAX_FAILURES   2
+
+/*
+ * Time budget (ms) after an async timeout ABORT before escalating
+ * to FLUSH + force-drain.  Converted to watchdog ticks at runtime
+ * via abort_stall_ms / watchdog_ms, so the actual wall-clock bound
+ * stays constant regardless of watchdog_ms tuning.
+ *
+ * The stall detector fires when:
+ *   - The head-of-queue transaction is in TXN_TIMED_OUT state
+ *   - HEAD hasn't advanced (eSW didn't process the ABORT)
+ *   - abort_stall_ticks exceeds the derived threshold
+ *
+ * At that point we issue FLUSH + force-drain, completing all pending
+ * transactions with -ETIMEDOUT and waking any blocked waiters.
+ *
+ * Default 5000 ms bounds worst-case D-state to
+ * async_timeout (2 s) + abort_stall (5 s) = ~7 s.
+ */
+#define CMH_RH_ABORT_STALL_MS       5000
+
+static unsigned int watchdog_ms = CMH_RH_WATCHDOG_MS_DEFAULT;
+
+/*
+ * Re-poke R_MBX_QUEUE_TAIL to generate a fresh interrupt to the eSW.
+ * Writing the current value back is a queue no-op but guarantees a
+ * SIC interrupt edge, ensuring the eSW wakes from WFI.
+ */
+static void cmh_rh_poke_tail(void __iomem *base)
+{
+       u32 tail = cmh_reg_read32(base, R_MBX_QUEUE_TAIL);
+
+       cmh_reg_write32(tail, base, R_MBX_QUEUE_TAIL);
+}
+
+/*
+ * Drain all remaining in-flight transactions for a mailbox, completing
+ * each with the given error code.  Called after FLUSH (which discards
+ * all queued VCQs) or when marking a mailbox as wedged.  Updates
+ * last_head to the current hardware HEAD so subsequent polls don't
+ * re-process the same (now-dead) VCQ IDs as successful completions.
+ *
+ * Caller must hold rh_process_lock.
+ */
+static void cmh_rh_drain_mbx(u32 mbx_idx, int error)
+{
+       struct transaction_obj *txn;
+
+       while ((txn = cmh_tm_pop_transaction(mbx_idx)) != NULL) {
+               dev_dbg(cmh_dev(), "rh: mbx[%u] drain vcq=%u..%u err=%d\n",
+                       mbx_idx, txn->first_vcq_id,
+                       txn->last_vcq_id, error);
+               cmh_txn_finish(txn, error);
+               cmh_tm_txq_completion_notify();
+       }
+
+       rh.mbx[mbx_idx].last_head =
+               cmh_reg_read32(rh.cfg->mailboxes[mbx_idx].reg_base,
+                              R_MBX_QUEUE_HEAD);
+}
+
+/**
+ * cmh_rh_force_drain_mbx() - FLUSH + drain a mailbox from external context
+ * @mbx_idx: Mailbox index to drain
+ *
+ * Issues MBX_COMMAND_FLUSH to the eSW, drains all pending transactions
+ * (completing each with -ECANCELED), and resets all recovery bookkeeping
+ * including the wedged flag.  This is an administrative last-resort
+ * recovery path exposed via debugfs.
+ *
+ * Context: process context.  Acquires rh_process_lock internally.
+ */
+void cmh_rh_force_drain_mbx(u32 mbx_idx)
+{
+       void __iomem *base;
+
+       if (!rh.cfg || !rh.mbx || mbx_idx >= rh.cfg->mbx_count)
+               return;
+
+       base = rh.cfg->mailboxes[mbx_idx].reg_base;
+
+       dev_warn(cmh_dev(), "rh: force-drain mbx[%u] (debugfs)\n", mbx_idx);
+       spin_lock_bh(&rh_process_lock);
+       cmh_reg_write32(MBX_IRQ_MASK, base, R_MBX_INTERRUPT);
+       cmh_reg_write32(MBX_COMMAND_FLUSH, base, R_MBX_COMMAND);
+       cmh_rh_poke_tail(base);
+       cmh_rh_drain_mbx(mbx_idx, -ECANCELED);
+       rh.mbx[mbx_idx].abort_stall_ticks = 0;
+       WRITE_ONCE(rh.mbx[mbx_idx].restart_pending, false);
+       rh.mbx[mbx_idx].restart_retries = 0;
+       rh.mbx[mbx_idx].flush_count = 0;
+       WRITE_ONCE(rh.mbx[mbx_idx].wedged, false);
+       spin_unlock_bh(&rh_process_lock);
+}
+
+/**
+ * cmh_rh_mbx_is_wedged() - Check if a mailbox is permanently wedged
+ * @mbx_idx: Mailbox index to check
+ *
+ * Return: true if the mailbox has failed recovery and is offline.
+ */
+bool cmh_rh_mbx_is_wedged(u32 mbx_idx)
+{
+       if (!rh.mbx || !rh.cfg || mbx_idx >= rh.cfg->mbx_count)
+               return false;
+
+       return READ_ONCE(rh.mbx[mbx_idx].wedged);
+}
+
+/**
+ * cmh_rh_abort_mbx() - Issue MBX_COMMAND_ABORT under rh_process_lock
+ * @mbx_idx: Mailbox index to abort
+ *
+ * Serialises the ABORT write with RESTART/FLUSH commands issued by the
+ * watchdog, preventing command-register clobber races.  Safe to call
+ * from any context (uses spin_lock_bh).
+ */
+void cmh_rh_abort_mbx(u32 mbx_idx)
+{
+       void __iomem *base;
+
+       if (!rh.cfg || !rh.mbx || mbx_idx >= rh.cfg->mbx_count)
+               return;
+
+       base = rh.cfg->mailboxes[mbx_idx].reg_base;
+
+       spin_lock_bh(&rh_process_lock);
+       cmh_reg_write32(MBX_COMMAND_ABORT, base, R_MBX_COMMAND);
+       spin_unlock_bh(&rh_process_lock);
+}
+
+static struct timer_list rh_watchdog;
+
+/*
+ * Hardirq handler -- runs with interrupts disabled.
+ *
+ * Read and W1C-clear R_MBX_INTERRUPT for each mailbox.
+ * If any MBX had a pending interrupt, return IRQ_WAKE_THREAD.
+ * Shared-IRQ safe: returns IRQ_NONE if we didn't handle anything.
+ */
+static irqreturn_t cmh_rh_hardirq(int irq, void *data)
+{
+       struct cmh_config *cfg = data;
+       bool handled = false;
+       u32 i;
+
+       for (i = 0; i < cfg->mbx_count; i++) {
+               void __iomem *base = cfg->mailboxes[i].reg_base;
+               u32 bits;
+
+               bits = cmh_reg_read32(base, R_MBX_INTERRUPT);
+               if (!bits)
+                       continue;
+
+               /* W1C: write back the set bits to clear them */
+               cmh_reg_write32(bits, base, R_MBX_INTERRUPT);
+
+               /*
+                * Accumulate bits atomically so a second hardirq
+                * firing while the threaded handler runs does not
+                * overwrite the first set of bits.
+                */
+               atomic_or((int)bits, &rh.mbx[i].irq_bits);
+               WRITE_ONCE(rh.mbx[i].pending, true);
+               handled = true;
+       }
+
+       /*
+        * Ordering: the kernel IRQ threading infrastructure
+        * performs a full barrier between hardirq return and
+        * the threaded handler invocation.
+        */
+       if (handled)
+               atomic_inc(&rh.irq_count);
+
+       return handled ? IRQ_WAKE_THREAD : IRQ_NONE;
+}
+
+/*
+ * Process completions for a single mailbox.
+ *
+ * Walk the per-MBX transaction queue FIFO.  For each transaction
+ * whose last_vcq_id is strictly less than the new head, fire the
+ * completion callback and free the object.
+ *
+ * "Strictly less than" using signed (s32) arithmetic handles wrap-around:
+ * the CMH eSW uses monotonically increasing 32-bit VCQ IDs.
+ */
+static void cmh_rh_process_mbx(u32 mbx_idx, u32 new_head, u32 irq_bits)
+{
+       struct transaction_obj *txn;
+       int error = 0;
+
+       /* Determine error state from saved IRQ bits */
+       if (irq_bits & MBX_ERROR_IRQ) {
+               void __iomem *base = rh.cfg->mailboxes[mbx_idx].reg_base;
+               u32 status = cmh_reg_read32(base, R_MBX_STATUS);
+
+               error = -EIO;
+               dev_dbg(cmh_dev(), "rh: mbx[%u] error status=0x%08x (code=%u cmd_idx=%u)\n",
+                       mbx_idx, status,
+                      MBX_STATUS_ERROR_CODE(status),
+                      MBX_STATUS_CMD_INDEX(status));
+
+               /*
+                * ECHILD (10) in the parent status means a child VCQ
+                * failed internally.  Read R_MBX_CHILD for the actual
+                * root cause (real errno, child core ID, child cmd idx).
+                */
+               if (MBX_STATUS_ERROR_CODE(status) == ECHILD) {
+                       u32 child = cmh_reg_read32(base, R_MBX_CHILD);
+
+                       dev_dbg(cmh_dev(),
+                               "rh: mbx[%u] child error=0x%08x (core=%u code=%u cmd_idx=%u)\n",
+                               mbx_idx, child,
+                               MBX_STATUS_CORE_ID(child),
+                               MBX_STATUS_ERROR_CODE(child),
+                               MBX_STATUS_CMD_INDEX(child));
+               }
+
+               /*
+                * CMH eSW does not advance head on error -- the MBX is
+                * stuck in ERROR state until the host issues a recovery
+                * command.  However, HEAD may have advanced past one or
+                * more already-completed transactions before the error
+                * occurred (their completion IRQ may not have been
+                * processed yet).  Retire those normally first, then
+                * force-complete the NEXT transaction (the one that
+                * actually failed) with -EIO.
+                *
+                * MBX command semantics after ERROR:
+                *   CONTINUE -- re-run the same VCQ at HEAD (retry)
+                *   RESTART  -- advance HEAD+1, skip failed, resume
+                *   FLUSH    -- HEAD=TAIL, flush all HWCs, discard queue
+                */
+
+               /* First: retire transactions completed before the error */
+               while ((txn = cmh_tm_peek_transaction(mbx_idx)) != NULL) {
+                       if ((s32)(new_head - txn->last_vcq_id) <= 0)
+                               break;
+                       txn = cmh_tm_pop_transaction(mbx_idx);
+                       if (!txn)
+                               break;
+                       dev_dbg(cmh_dev(),
+                               "rh: mbx[%u] pre-error complete vcq=%u..%u\n",
+                               mbx_idx, txn->first_vcq_id,
+                               txn->last_vcq_id);
+                       cmh_txn_finish(txn, 0);
+                       cmh_tm_txq_completion_notify();
+               }
+
+               /* Now pop and fail the transaction that actually errored */
+               txn = cmh_tm_pop_transaction(mbx_idx);
+               if (txn) {
+                       dev_dbg(cmh_dev(), "rh: mbx[%u] error-complete vcq=%u..%u\n",
+                               mbx_idx, txn->first_vcq_id,
+                               txn->last_vcq_id);
+                       cmh_txn_finish(txn, error);
+                       cmh_tm_txq_completion_notify();
+               } else {
+                       u32 head_reg, tail_reg;
+
+                       head_reg = cmh_reg_read32(base, R_MBX_QUEUE_HEAD);
+                       tail_reg = cmh_reg_read32(base, R_MBX_QUEUE_TAIL);
+                       dev_warn_ratelimited(cmh_dev(),
+                                            "rh: mbx[%u] ERROR with empty txn queue (orphaned) status=0x%08x head=%u tail=%u core=%u ecode=%u cmd_idx=%u\n",
+                                            mbx_idx, status,
+                                            head_reg, tail_reg,
+                                            MBX_STATUS_CORE_ID(status),
+                                            MBX_STATUS_ERROR_CODE(status),
+                                            MBX_STATUS_CMD_INDEX(status));
+               }
+               {
+                       struct cmh_mbx_stats *s = cmh_debugfs_mbx_stats(mbx_idx);
+
+                       if (s)
+                               atomic64_inc(&s->vcqs_errors);
+               }
+
+               /*
+                * W1C-clear R_MBX_INTERRUPT before issuing RESTART.
+                *
+                * The eSW sets MBX_ERROR_IRQ in R_MBX_INTERRUPT when
+                * it writes ERROR status.  On platforms where the
+                * hardirq handler runs (IRQ wired to GIC), this bit
+                * is cleared there.  On polling-only platforms (no
+                * IRQ line), it must be cleared explicitly before
+                * issuing a recovery command to de-assert the
+                * MBX-to-SIC interrupt line.
+                */
+               cmh_reg_write32(MBX_IRQ_MASK, base, R_MBX_INTERRUPT);
+               cmh_reg_write32(MBX_COMMAND_RESTART, base, R_MBX_COMMAND);
+
+               /*
+                * Poke R_MBX_QUEUE_TAIL to guarantee the eSW receives
+                * an interrupt.
+                *
+                * Writing R_MBX_COMMAND alone may not produce a new
+                * SIC interrupt edge if the MBX-to-SIC line is still
+                * asserted from prior error processing.  The eSW RUN
+                * handler re-writes ERROR_IRQ to R_MBX_INTERRUPT on
+                * every spurious wakeup while in ERROR state, which
+                * can keep the SIC line high on level-triggered HW.
+                *
+                * R_MBX_QUEUE_TAIL writes always generate a fresh
+                * interrupt to the eSW (this is the normal VCQ
+                * submission path).  Writing the current TAIL value
+                * back is a no-op from the queue perspective but
+                * ensures the eSW wakes from WFI and processes the
+                * RESTART command.
+                */
+               cmh_rh_poke_tail(base);
+               WRITE_ONCE(rh.mbx[mbx_idx].restart_pending, true);
+               rh.mbx[mbx_idx].restart_retries = 0;
+               return;
+       }
+
+       /*
+        * Pop completed transactions.  A transaction is complete when
+        * the CMH eSW has advanced head past its last VCQ ID:
+        *   (s32)(new_head - txn->last_vcq_id) > 0
+        * Using signed comparison for correct wrap-around handling.
+        *
+        * Multi-VCQ note: transactions spanning multiple slots (e.g.
+        * SLH-DSA with 3+ VCQs) are treated atomically -- either the
+        * head has passed all of them or none.  The CMH eSW processes
+        * multi-VCQ groups sequentially within a single mailbox and
+        * only advances HEAD after the entire group completes.  Per-slot
+        * progress validation (checking intermediate HEAD positions
+        * within a multi-VCQ group) is not implemented because:
+        *   1. The eSW guarantees atomic group completion semantics
+        *   2. Partial progress is only observable during processing,
+        *      never at a completion boundary
+        *   3. Adding intermediate checks would require tracking
+        *      per-slot status with no correctness benefit
+        *
+        * A defensive WARN_ON_ONCE detects eSW misbehavior: if HEAD
+        * lands between first_vcq_id and last_vcq_id of a multi-VCQ
+        * transaction, the eSW violated its atomic group contract.
+        */
+       while ((txn = cmh_tm_peek_transaction(mbx_idx)) != NULL) {
+               if ((s32)(new_head - txn->last_vcq_id) <= 0) {
+                       /*
+                        * Not yet complete.  For multi-VCQ transactions,
+                        * assert HEAD hasn't partially advanced into the
+                        * group -- that would indicate eSW firmware bug.
+                        */
+                       WARN_ON_ONCE(txn->first_vcq_id != txn->last_vcq_id &&
+                                    (s32)(new_head - txn->first_vcq_id) > 0);
+                       break;
+               }
+
+               txn = cmh_tm_pop_transaction(mbx_idx);
+               if (!txn)
+                       break;
+
+               dev_dbg(cmh_dev(), "rh: mbx[%u] complete vcq=%u..%u err=%d\n",
+                       mbx_idx, txn->first_vcq_id, txn->last_vcq_id,
+                        error);
+
+               {
+                       struct cmh_mbx_stats *s = cmh_debugfs_mbx_stats(mbx_idx);
+
+                       if (s) {
+                               u32 n = txn->last_vcq_id -
+                                       txn->first_vcq_id + 1;
+
+                               atomic64_add(n, &s->vcqs_completed);
+                       }
+               }
+
+               cmh_txn_finish(txn, error);
+               cmh_tm_txq_completion_notify();
+       }
+}
+
+/*
+ * Threaded IRQ handler -- runs in process context.
+ *
+ * Walk all MBXes that had pending interrupts.  After processing the
+ * pending set, do a final hardware poll of all MBX head registers to
+ * catch completions whose PLIC interrupt was consumed during an
+ * earlier register access (e.g. an inline interrupt notification
+ * during MMIO can cause the PLIC edge to be claimed before the
+ * hardirq sees it).
+ */
+static irqreturn_t cmh_rh_thread(int irq, void *data)
+{
+       struct cmh_config *cfg = data;
+       u32 i;
+       bool recheck;
+
+       do {
+               recheck = false;
+
+               for (i = 0; i < cfg->mbx_count; i++) {
+                       u32 new_head, irq_bits;
+
+                       if (!READ_ONCE(rh.mbx[i].pending))
+                               continue;
+
+                       irq_bits = (u32)atomic_xchg(&rh.mbx[i].irq_bits, 0);
+                       WRITE_ONCE(rh.mbx[i].pending, false);
+
+                       spin_lock_bh(&rh_process_lock);
+                       new_head = cmh_reg_read32(cfg->mailboxes[i].reg_base,
+                                                 R_MBX_QUEUE_HEAD);
+
+                       if (new_head == rh.mbx[i].last_head && !irq_bits) {
+                               spin_unlock_bh(&rh_process_lock);
+                               continue;
+                       }
+
+                       cmh_rh_process_mbx(i, new_head, irq_bits);
+                       rh.mbx[i].last_head = new_head;
+                       spin_unlock_bh(&rh_process_lock);
+               }
+
+               /*
+                * Re-check: if the hardirq fired again while we were
+                * processing, pending flags will be set again.
+                */
+               for (i = 0; i < cfg->mbx_count; i++) {
+                       if (READ_ONCE(rh.mbx[i].pending)) {
+                               recheck = true;
+                               break;
+                       }
+               }
+       } while (recheck);
+
+       /*
+        * Final hardware poll: read every MBX head register and status
+        * to catch completions or errors whose interrupt was missed.
+        */
+       for (i = 0; i < cfg->mbx_count; i++) {
+               u32 new_head;
+               u32 status;
+               u32 poll_irq_bits = 0;
+
+               spin_lock_bh(&rh_process_lock);
+               new_head = cmh_reg_read32(cfg->mailboxes[i].reg_base,
+                                         R_MBX_QUEUE_HEAD);
+               status = cmh_reg_read32(cfg->mailboxes[i].reg_base,
+                                       R_MBX_STATUS);
+
+               if (MBX_STATUS_CODE(status) == MBX_STATUS_ERROR) {
+                       if (READ_ONCE(rh.mbx[i].wedged)) {
+                               spin_unlock_bh(&rh_process_lock);
+                               continue;
+                       }
+                       if (READ_ONCE(rh.mbx[i].restart_pending)) {
+                               /*
+                                * HEAD advanced while restart_pending means
+                                * RESTART worked but next VCQ also failed.
+                                * Clear restart state and process new error.
+                                */
+                               if (new_head != rh.mbx[i].last_head) {
+                                       WRITE_ONCE(rh.mbx[i].restart_pending,
+                                                  false);
+                                       rh.mbx[i].restart_retries = 0;
+                               } else {
+                                       spin_unlock_bh(&rh_process_lock);
+                                       continue;
+                               }
+                       }
+                       poll_irq_bits = MBX_ERROR_IRQ;
+               } else {
+                       WRITE_ONCE(rh.mbx[i].restart_pending, false);
+                       rh.mbx[i].restart_retries = 0;
+                       rh.mbx[i].flush_count = 0;
+               }
+
+               if (new_head != rh.mbx[i].last_head || poll_irq_bits) {
+                       cmh_rh_process_mbx(i, new_head, poll_irq_bits);
+                       rh.mbx[i].last_head = new_head;
+               }
+               spin_unlock_bh(&rh_process_lock);
+       }
+
+       return IRQ_HANDLED;
+}
+
+/*
+ * Watchdog timer callback -- missed-IRQ recovery.
+ *
+ * Reads all MBX head registers.  If any head advanced without a
+ * corresponding IRQ, process the completions here.  Re-arms itself
+ * while rh.active is true.
+ */
+static void cmh_rh_watchdog_fn(struct timer_list *t)
+{
+       u32 i;
+
+       if (!rh.active || !rh.cfg || !rh.mbx)
+               return;
+
+       for (i = 0; i < rh.cfg->mbx_count; i++) {
+               u32 new_head;
+               u32 status;
+               u32 irq_bits = 0;
+
+               spin_lock(&rh_process_lock);
+               new_head = cmh_reg_read32(rh.cfg->mailboxes[i].reg_base,
+                                         R_MBX_QUEUE_HEAD);
+               status = cmh_reg_read32(rh.cfg->mailboxes[i].reg_base,
+                                       R_MBX_STATUS);
+
+               if (MBX_STATUS_CODE(status) == MBX_STATUS_ERROR) {
+                       if (READ_ONCE(rh.mbx[i].wedged)) {
+                               spin_unlock(&rh_process_lock);
+                               continue;
+                       }
+                       /*
+                        * Back-to-back failure scenario: the crypto API
+                        * (e.g. testmgr) may submit requests continuously.
+                        * If RESTART succeeds but the next VCQ also fails,
+                        * the entire RESTART->IDLE->RUN->ERROR cycle can
+                        * complete within a single 200ms watchdog period.
+                        * Without the HEAD-advance check below, the watchdog
+                        * would mistake the new error for a failed RESTART,
+                        * increment restart_retries, and eventually escalate
+                        * to FLUSH -- wedging the mailbox unnecessarily.
+                        */
+                       if (READ_ONCE(rh.mbx[i].restart_pending)) {
+                               void __iomem *base =
+                                       rh.cfg->mailboxes[i].reg_base;
+
+                               /*
+                                * HEAD advanced since RESTART was issued:
+                                * RESTART succeeded, this is a fresh error.
+                                * Clear recovery state and process normally.
+                                */
+                               if (new_head != rh.mbx[i].last_head) {
+                                       dev_dbg(cmh_dev(),
+                                               "rh: watchdog: mbx[%u] head advanced %u->%u during restart -- new error\n",
+                                               i, rh.mbx[i].last_head,
+                                               new_head);
+                                       WRITE_ONCE(rh.mbx[i].restart_pending,
+                                                  false);
+                                       rh.mbx[i].restart_retries = 0;
+                                       goto new_error;
+                               }
+
+                               rh.mbx[i].restart_retries++;
+                               if (rh.mbx[i].restart_retries >
+                                   CMH_RH_RESTART_MAX_RETRIES) {
+                                       rh.mbx[i].flush_count++;
+                                       if (rh.mbx[i].flush_count >=
+                                           CMH_RH_FLUSH_MAX_FAILURES) {
+                                               u32 hb, ei, cmd;
+
+                                               cmd = cmh_reg_read32(base, R_MBX_COMMAND);
+                                               hb = cmh_reg_read32(rh.cfg->sic_mapped,
+                                                                   R_SIC_SW_HEARTBEAT);
+                                               ei = cmh_reg_read32(rh.cfg->sic_mapped,
+                                                                   R_SIC_SW_ERROR_INFO);
+                                               dev_crit(cmh_dev(),
+                                                        "rh: mbx[%u] wedged after %u FLUSHes (cmd=0x%x status=0x%x hb=0x%x err=0x%x)\n",
+                                                        i,
+                                                        rh.mbx[i].flush_count,
+                                                        cmd, status,
+                                                        hb, ei);
+                                               WRITE_ONCE(rh.mbx[i].wedged,
+                                                          true);
+                                               cmh_rh_drain_mbx(i, -EIO);
+                                               spin_unlock(&rh_process_lock);
+                                               continue;
+                                       }
+                                       /*
+                                        * Backstop: eSW did not respond
+                                        * to RESTART within the retry
+                                        * budget.  Escalate to FLUSH
+                                        * which is a harder reset of
+                                        * the eSW mailbox state.
+                                        */
+                                       dev_err(cmh_dev(),
+                                               "rh: watchdog: mbx[%u] RESTART unresponsive after %u ticks, escalating to FLUSH (attempt %u/%u)\n",
+                                               i, rh.mbx[i].restart_retries,
+                                               rh.mbx[i].flush_count,
+                                               CMH_RH_FLUSH_MAX_FAILURES);
+                                       cmh_reg_write32(MBX_IRQ_MASK,
+                                                       base,
+                                                       R_MBX_INTERRUPT);
+                                       cmh_reg_write32(MBX_COMMAND_FLUSH,
+                                                       base,
+                                                       R_MBX_COMMAND);
+                                       cmh_rh_poke_tail(base);
+                                       cmh_rh_drain_mbx(i, -EIO);
+                                       WRITE_ONCE(rh.mbx[i].restart_pending,
+                                                  false);
+                                       rh.mbx[i].restart_retries = 0;
+                                       spin_unlock(&rh_process_lock);
+                                       continue;
+                               }
+                               /*
+                                * RESTART was already issued on a prior
+                                * tick but the eSW hasn't cleared the
+                                * ERROR status yet.  Do NOT pop another
+                                * transaction -- that would cascade-kill
+                                * unrelated in-flight work.  Re-poke TAIL
+                                * in case the eSW missed the interrupt.
+                                */
+                               cmh_rh_poke_tail(base);
+                               dev_dbg_ratelimited(cmh_dev(),
+                                                   "rh: watchdog: mbx[%u] restart pending (%u/%u) status=0x%08x, re-poke\n",
+                                                   i,
+                                                   rh.mbx[i].restart_retries,
+                                                   CMH_RH_RESTART_MAX_RETRIES,
+                                                   status);
+                               spin_unlock(&rh_process_lock);
+                               continue;
+                       }
+new_error:
+                       dev_dbg_ratelimited(cmh_dev(),
+                                           "rh: watchdog: mbx[%u] error status=0x%08x (missed error IRQ) head=%u tail=%u core=%u ecode=%u cmd_idx=%u\n",
+                                           i, status, new_head,
+                                           cmh_reg_read32(rh.cfg->mailboxes[i].reg_base,
+                                                          R_MBX_QUEUE_TAIL),
+                                           MBX_STATUS_CORE_ID(status),
+                                           MBX_STATUS_ERROR_CODE(status),
+                                           MBX_STATUS_CMD_INDEX(status));
+                       irq_bits = MBX_ERROR_IRQ;
+               } else {
+                       /* eSW cleared ERROR -- recovery succeeded */
+                       WRITE_ONCE(rh.mbx[i].restart_pending, false);
+                       rh.mbx[i].restart_retries = 0;
+                       rh.mbx[i].flush_count = 0;
+               }
+
+               if (new_head != rh.mbx[i].last_head || irq_bits) {
+                       if (new_head != rh.mbx[i].last_head)
+                               dev_dbg_ratelimited(cmh_dev(),
+                                                   "rh: watchdog: mbx[%u] head %u->%u (missed IRQ recovery)\n",
+                                                   i, rh.mbx[i].last_head,
+                                                   new_head);
+                       cmh_rh_process_mbx(i, new_head, irq_bits);
+                       rh.mbx[i].last_head = new_head;
+                       rh.mbx[i].abort_stall_ticks = 0;
+               }
+
+               /*
+                * Abort-stall detector: if the head-of-queue transaction
+                * timed out (state == TXN_TIMED_OUT) but the eSW hasn't
+                * responded (HEAD didn't advance, no ERROR status):
+                *
+                *   tick 1:        issue MBX_COMMAND_ABORT (serialised
+                *                  under rh_process_lock -- safe against
+                *                  concurrent RESTART/FLUSH)
+                *   ticks 2..N-1:  wait for eSW to respond with ERROR
+                *   tick N:         escalate to FLUSH + force-drain
+                *
+                * If the eSW responds with ERROR between ticks, the ERROR
+                * status branch above handles RESTART recovery and resets
+                * abort_stall_ticks via the restart_pending guard.
+                */
+               if (!READ_ONCE(rh.mbx[i].wedged) &&
+                   !READ_ONCE(rh.mbx[i].restart_pending)) {
+                       struct transaction_obj *head_txn;
+
+                       head_txn = cmh_tm_peek_transaction(i);
+                       if (head_txn &&
+                           atomic_read(&head_txn->state) == TXN_TIMED_OUT) {
+                               unsigned int stall_max;
+                               void __iomem *base =
+                                       rh.cfg->mailboxes[i].reg_base;
+
+                               rh.mbx[i].abort_stall_ticks++;
+
+                               if (rh.mbx[i].abort_stall_ticks == 1) {
+                                       dev_warn(cmh_dev(),
+                                                "rh: watchdog: mbx[%u] head txn timed out, issuing ABORT\n",
+                                                i);
+                                       cmh_reg_write32(MBX_COMMAND_ABORT,
+                                                       base,
+                                                       R_MBX_COMMAND);
+                               }
+
+                               stall_max = DIV_ROUND_UP(CMH_RH_ABORT_STALL_MS,
+                                                        max(watchdog_ms,
+                                                            CMH_RH_WATCHDOG_MS_MIN));
+                               if (rh.mbx[i].abort_stall_ticks >=
+                                   stall_max) {
+                                       dev_err(cmh_dev(),
+                                               "rh: watchdog: mbx[%u] abort stall (%u ticks) -- FLUSH + drain\n",
+                                               i, rh.mbx[i].abort_stall_ticks);
+                                       cmh_reg_write32(MBX_COMMAND_FLUSH,
+                                                       base, R_MBX_COMMAND);
+                                       cmh_rh_drain_mbx(i, -ETIMEDOUT);
+                                       rh.mbx[i].abort_stall_ticks = 0;
+                               }
+                       } else {
+                               rh.mbx[i].abort_stall_ticks = 0;
+                       }
+               }
+               spin_unlock(&rh_process_lock);
+       }
+
+       if (rh.active) {
+               unsigned int wdog = max(watchdog_ms, CMH_RH_WATCHDOG_MS_MIN);
+
+               mod_timer(&rh_watchdog,
+                         jiffies + msecs_to_jiffies(wdog));
+       }
+}
+
+/*
+ * Resolve per-MBX Linux virqs for the CMH interrupt lines.
+ *
+ * Strategy:
+ *   1. If cfg->irq >= 0, use it as a shared IRQ for all MBXes
+ *   2. Otherwise, find the "cri,cmh" DT node and map one IRQ per
+ *      active mailbox using cfg->mailboxes[i].instance as the DT
+ *      interrupt index (matching the per-MBX PLIC wiring)
+ *
+ * Populates rh.irqs[] and rh.nirqs.  Returns 0 on success, or a
+ * negative errno if no IRQs could be resolved (polling-only mode).
+ */
+static int cmh_rh_resolve_irqs(struct cmh_config *cfg)
+{
+       struct device_node *np;
+       u32 i;
+
+       rh.nirqs = 0;
+
+       /* Single legacy IRQ from DT: shared across all MBXes */
+       if (cfg->irq >= 0) {
+               rh.irqs[0] = cfg->irq;
+               rh.nirqs = 1;
+               dev_dbg(cmh_dev(), "rh: using single DT IRQ %d for all %u MBXes\n",
+                       cfg->irq, cfg->mbx_count);
+               return 0;
+       }
+
+       np = cfg->of_node;
+       if (!np) {
+               dev_warn(cmh_dev(), "rh: no DT node -- IRQ disabled\n");
+               return -ENODEV;
+       }
+
+       for (i = 0; i < cfg->mbx_count; i++) {
+               int dt_idx = cfg->mailboxes[i].instance;
+               int virq = of_irq_get(np, dt_idx);
+
+               if (virq <= 0) {
+                       dev_warn(cmh_dev(), "rh: failed to map IRQ for MBX%u (DT index %d, rc=%d)\n",
+                                i, dt_idx, virq);
+                       return -ENODEV;
+               }
+               rh.irqs[i] = virq;
+               dev_dbg(cmh_dev(), "rh: MBX%u -> IRQ %d (DT index %d)\n",
+                       i, virq, dt_idx);
+       }
+
+       rh.nirqs = cfg->mbx_count;
+       return 0;
+}
+
+/**
+ * cmh_rh_init() - Initialize the response handler
+ * @cfg: Device configuration (mailbox count, MMIO bases, IRQ info)
+ *
+ * Resolve per-mailbox IRQs from the device tree (or module parameter
+ * override), register threaded IRQ handlers (hardirq + kthread), and
+ * arm the missed-IRQ software watchdog timer.  If no IRQs can be
+ * resolved, falls back to watchdog-only polling mode.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_rh_init(struct cmh_config *cfg)
+{
+       int ret;
+       u32 i;
+
+       rh.cfg = cfg;
+       rh.nirqs = 0;
+       rh.active = false;
+       atomic_set(&rh.irq_count, 0);
+
+       /* Allocate per-MBX tracking */
+       rh.mbx = kcalloc(cfg->mbx_count, sizeof(*rh.mbx), GFP_KERNEL);
+       if (!rh.mbx)
+               return -ENOMEM;
+
+       /* Resolve per-MBX IRQs */
+       if (cmh_rh_resolve_irqs(cfg) < 0) {
+               /*
+                * No IRQs available.  The watchdog timer provides
+                * a polling fallback: it reads MBX head registers
+                * periodically and processes completions.  This is
+                * slower than IRQ-driven completion but functional.
+                *
+                * Completion latency in polling-only mode is bounded
+                * by the watchdog interval (default 200 ms, tunable
+                * via debugfs config/watchdog_ms).
+                */
+               dev_warn(cmh_dev(),
+                        "rh: no IRQs -- using watchdog polling (interval %u ms)\n",
+                        watchdog_ms);
+
+               /* Seed last_head from HW before first watchdog tick */
+               for (i = 0; i < cfg->mbx_count; i++)
+                       rh.mbx[i].last_head =
+                               cmh_reg_read32(cfg->mailboxes[i].reg_base,
+                                              R_MBX_QUEUE_HEAD);
+
+               rh.active = true;
+               timer_setup(&rh_watchdog, cmh_rh_watchdog_fn, 0);
+               mod_timer(&rh_watchdog, jiffies +
+                         msecs_to_jiffies(max(watchdog_ms,
+                                              CMH_RH_WATCHDOG_MS_MIN)));
+               return 0;
+       }
+
+       /* Initialize per-MBX state: read current head positions */
+       for (i = 0; i < cfg->mbx_count; i++)
+               rh.mbx[i].last_head = cmh_reg_read32(rh.cfg->mailboxes[i].reg_base,
+                                                    R_MBX_QUEUE_HEAD);
+
+       /*
+        * Register threaded IRQ handlers.
+        *
+        * DT per-MBX path: one distinct virq per MBX, nirqs == mbx_count.
+        * DT single-IRQ path: one shared IRQ, nirqs == 1.  The handler
+        * scans all mailboxes unconditionally, so a single registration
+        * suffices.
+        *
+        * Use IRQF_SHARED only for the single-IRQ path where one line
+        * is shared across all MBXes.  Dedicated per-MBX virqs need no
+        * sharing flag.
+        */
+       {
+               unsigned long irqflags = (rh.nirqs == 1 && cfg->mbx_count > 1)
+                                         ? IRQF_SHARED : 0;
+
+               for (i = 0; i < rh.nirqs; i++) {
+                       ret = request_threaded_irq(rh.irqs[i],
+                                                  cmh_rh_hardirq,
+                                                  cmh_rh_thread,
+                                                  irqflags,
+                                                  "cmh", cfg);
+                       if (ret) {
+                               dev_err(cmh_dev(), "rh: request_threaded_irq(%d) for MBX%u failed (rc=%d)\n",
+                                       rh.irqs[i], i, ret);
+                               /* Unwind previously registered IRQs */
+                               while (i--)
+                                       free_irq(rh.irqs[i], cfg);
+                               rh.nirqs = 0;
+                               kfree(rh.mbx);
+                               rh.mbx = NULL;
+                               return ret;
+                       }
+               }
+       }
+
+       rh.active = true;
+
+       /* Enable MBX completion interrupts (DONE + ERROR) */
+       for (i = 0; i < cfg->mbx_count; i++) {
+               u32 stale;
+
+               /*
+                * W1C any interrupt bits that accumulated between
+                * MQI setup and now (e.g. CMH eSW processing stale
+                * commands) before enabling the mask.
+                */
+               stale = cmh_reg_read32(cfg->mailboxes[i].reg_base,
+                                      R_MBX_INTERRUPT);
+               if (stale)
+                       cmh_reg_write32(stale, cfg->mailboxes[i].reg_base,
+                                       R_MBX_INTERRUPT);
+
+               cmh_reg_write32(MBX_IRQ_MASK,
+                               cfg->mailboxes[i].reg_base,
+                               R_MBX_INTERRUPT_MASK);
+       }
+
+       dev_info(cmh_dev(), "rh: initialized (%u IRQs, %u mailboxes, watchdog %u ms)\n",
+                rh.nirqs, cfg->mbx_count, watchdog_ms);
+
+       /* Arm missed-IRQ watchdog timer */
+       timer_setup(&rh_watchdog, cmh_rh_watchdog_fn, 0);
+       mod_timer(&rh_watchdog, jiffies +
+                 msecs_to_jiffies(max(watchdog_ms,
+                                      CMH_RH_WATCHDOG_MS_MIN)));
+
+       return 0;
+}
+
+/**
+ * cmh_rh_suspend() - Suspend the response handler
+ * @cfg: Device configuration
+ *
+ * Stop the watchdog timer and mask mailbox interrupts at the hardware
+ * level.  The IRQ handlers remain registered so that resume can
+ * re-enable them without re-requesting.
+ */
+void cmh_rh_suspend(struct cmh_config *cfg)
+{
+       u32 i;
+
+       if (!rh.active)
+               return;
+
+       /* Stop the watchdog before masking HW interrupts */
+       timer_delete_sync(&rh_watchdog);
+
+       /* Mask MBX interrupts at the hardware level */
+       for (i = 0; i < cfg->mbx_count; i++)
+               cmh_reg_write32(0, cfg->mailboxes[i].reg_base,
+                               R_MBX_INTERRUPT_MASK);
+
+       /*
+        * Ensure no threaded IRQ handler is still in-flight.
+        * After masking, a handler may already have been scheduled.
+        * synchronize_irq() waits for it to complete before we
+        * proceed with suspend (which tears down TM state).
+        */
+       for (i = 0; i < rh.nirqs; i++)
+               synchronize_irq(rh.irqs[i]);
+
+       rh.active = false;
+       dev_dbg(cmh_dev(), "rh: suspended\n");
+}
+
+/**
+ * cmh_rh_resume() - Resume the response handler after suspend
+ * @cfg: Device configuration
+ *
+ * Re-synchronize per-mailbox head tracking with hardware, clear stale
+ * interrupt bits accumulated during the power transition, re-enable
+ * mailbox completion interrupts, and re-arm the watchdog timer.
+ */
+void cmh_rh_resume(struct cmh_config *cfg)
+{
+       u32 i;
+
+       if (!rh.mbx || !cfg)
+               return;
+
+       /* Re-sync per-MBX head tracking with hardware */
+       for (i = 0; i < cfg->mbx_count; i++) {
+               u32 stale;
+
+               rh.mbx[i].last_head =
+                       cmh_reg_read32(cfg->mailboxes[i].reg_base,
+                                      R_MBX_QUEUE_HEAD);
+
+               /* W1C any stale interrupt bits from the power transition */
+               stale = cmh_reg_read32(cfg->mailboxes[i].reg_base,
+                                      R_MBX_INTERRUPT);
+               if (stale)
+                       cmh_reg_write32(stale, cfg->mailboxes[i].reg_base,
+                                       R_MBX_INTERRUPT);
+
+               /* Re-enable MBX completion interrupts */
+               cmh_reg_write32(MBX_IRQ_MASK, cfg->mailboxes[i].reg_base,
+                               R_MBX_INTERRUPT_MASK);
+       }
+
+       rh.active = true;
+
+       /* Re-arm the watchdog */
+       mod_timer(&rh_watchdog, jiffies +
+                 msecs_to_jiffies(max(watchdog_ms,
+                                      CMH_RH_WATCHDOG_MS_MIN)));
+       dev_dbg(cmh_dev(), "rh: resumed\n");
+}
+
+/**
+ * cmh_rh_cleanup() - Clean up the response handler
+ * @cfg: Device configuration
+ *
+ * Stop the watchdog timer, mask mailbox interrupts at the hardware
+ * level, release all registered IRQ handlers, and free per-mailbox
+ * tracking state.  Safe to call even if init was never completed.
+ */
+void cmh_rh_cleanup(struct cmh_config *cfg)
+{
+       if (rh.active) {
+               u32 i;
+
+               /* Cancel watchdog before disabling interrupts */
+               timer_delete_sync(&rh_watchdog);
+
+               /* Disable MBX interrupts before releasing handlers */
+               for (i = 0; i < cfg->mbx_count; i++)
+                       cmh_reg_write32(0,
+                                       cfg->mailboxes[i].reg_base,
+                                       R_MBX_INTERRUPT_MASK);
+
+               /* Release all per-MBX IRQs */
+               for (i = 0; i < rh.nirqs; i++)
+                       free_irq(rh.irqs[i], cfg);
+               dev_dbg(cmh_dev(), "rh: %u IRQs released\n", rh.nirqs);
+               rh.nirqs = 0;
+               rh.active = false;
+       }
+
+       dev_dbg(cmh_dev(), "rh: %u IRQs handled\n",
+               atomic_read(&rh.irq_count));
+
+       kfree(rh.mbx);
+       rh.mbx = NULL;
+
+       dev_info(cmh_dev(), "rh: cleaned up\n");
+}
+
+/* -- debugfs timeout accessor ------------------------------------------ */
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+/**
+ * cmh_rh_timeout_watchdog_ptr() - Return pointer to watchdog_ms for debugfs
+ *
+ * Exposes the Response Handler watchdog timeout for runtime tuning
+ * via debugfs config/ directory.
+ *
+ * Return: pointer to the static watchdog_ms variable.
+ */
+unsigned int *cmh_rh_timeout_watchdog_ptr(void) { return &watchdog_ms; }
+#endif
diff --git a/drivers/crypto/cmh/cmh_sysfs.c b/drivers/crypto/cmh/cmh_sysfs.c
new file mode 100644
index 000000000000..ab482a222167
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_sysfs.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- sysfs Device Attributes
+ *
+ * Exposes hardware identity and status as read-only sysfs attributes
+ * under /sys/devices/platform/cmh/.  Wired via .dev_groups in the
+ * platform_driver struct -- the driver core creates and removes these
+ * automatically around .probe() / .remove().
+ *
+ * Because .dev_groups is used (not manual sysfs_create_group), the
+ * driver core guarantees that attributes are created after .probe()
+ * sets drvdata and removed before .remove() clears it.  Therefore
+ * platform_get_drvdata() cannot return NULL in any show callback and
+ * no NULL check is needed.  Same pattern as caam/ctrl.c and
+ * ccree/cc_sysfs.c.
+ */
+
+#include <linux/device.h>
+#include <linux/platform_device.h>
+#include <linux/sysfs.h>
+
+#include "cmh.h"
+#include "cmh_registers.h"
+#include "cmh_sysfs.h"
+
+static ssize_t fw_version_show(struct device *dev,
+                              struct device_attribute *attr, char *buf)
+{
+       struct cmh_device *cmh = platform_get_drvdata(to_platform_device(dev));
+       struct cmh_config *cfg = &cmh->config;
+
+       if (!cfg->sic_mapped)
+               return -ENODEV;
+
+       return sysfs_emit(buf, "0x%08x\n",
+                         cmh_reg_read32(cfg->sic_mapped, R_SIC_SW_VERSION));
+}
+static DEVICE_ATTR_RO(fw_version);
+
+static ssize_t hw_version_show(struct device *dev,
+                              struct device_attribute *attr, char *buf)
+{
+       struct cmh_device *cmh = platform_get_drvdata(to_platform_device(dev));
+       struct cmh_config *cfg = &cmh->config;
+
+       if (!cfg->sic_mapped)
+               return -ENODEV;
+
+       return sysfs_emit(buf, "0x%08x\n",
+                         cmh_reg_read32(cfg->sic_mapped, R_SIC_HW_VERSION0));
+}
+static DEVICE_ATTR_RO(hw_version);
+
+static ssize_t boot_status_show(struct device *dev,
+                               struct device_attribute *attr, char *buf)
+{
+       struct cmh_device *cmh = platform_get_drvdata(to_platform_device(dev));
+       struct cmh_config *cfg = &cmh->config;
+
+       if (!cfg->sic_mapped)
+               return -ENODEV;
+
+       return sysfs_emit(buf, "0x%08x\n",
+                         cmh_reg_read32(cfg->sic_mapped, R_SIC_BOOT_STATUS));
+}
+static DEVICE_ATTR_RO(boot_status);
+
+static ssize_t mbx_available_show(struct device *dev,
+                                 struct device_attribute *attr, char *buf)
+{
+       struct cmh_device *cmh = platform_get_drvdata(to_platform_device(dev));
+       struct cmh_config *cfg = &cmh->config;
+
+       if (!cfg->sic_mapped)
+               return -ENODEV;
+
+       return sysfs_emit(buf, "0x%08x\n",
+                         cmh_reg_read32(cfg->sic_mapped, R_SIC_MBX_AVAILABILITY));
+}
+static DEVICE_ATTR_RO(mbx_available);
+
+static ssize_t mbx_count_show(struct device *dev,
+                             struct device_attribute *attr, char *buf)
+{
+       struct cmh_device *cmh = platform_get_drvdata(to_platform_device(dev));
+
+       return sysfs_emit(buf, "%u\n", cmh->config.mbx_count);
+}
+static DEVICE_ATTR_RO(mbx_count);
+
+static struct attribute *cmh_sysfs_attrs[] = {
+       &dev_attr_fw_version.attr,
+       &dev_attr_hw_version.attr,
+       &dev_attr_boot_status.attr,
+       &dev_attr_mbx_available.attr,
+       &dev_attr_mbx_count.attr,
+       NULL,
+};
+
+static const struct attribute_group cmh_sysfs_group = {
+       .attrs = cmh_sysfs_attrs,
+};
+
+const struct attribute_group *cmh_sysfs_groups[] = {
+       &cmh_sysfs_group,
+       NULL,
+};
diff --git a/drivers/crypto/cmh/cmh_txn.c b/drivers/crypto/cmh/cmh_txn.c
new file mode 100644
index 000000000000..3c696a8baac5
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_txn.c
@@ -0,0 +1,1978 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Transaction Manager
+ *
+ * Dedicated kthread that dequeues command messages, builds VCQs in
+ * DMA queue slots, and rings the MBX doorbell.
+ *
+ * Command flow:
+ *   1. Caller posts command_msg via cmh_tm_post_command()
+ *   2. TM thread wakes, dequeues msg from CMQ
+ *   3. Selects mailbox (core-to-MBX affinity, or caller-pinned)
+ *   4. Copies pre-built VCQ entries into DMA slot at tail
+ *   5. Creates transaction_obj, appends to per-MBX txn queue
+ *   6. Writes tail+1 -> R_MBX_QUEUE_TAIL (doorbell)
+ *
+ * The Response Handler (cmh_rh.c) walks per-MBX txn queues
+ * when an IRQ fires and the head advances, firing completion callbacks.
+ *
+ * Transaction state machine
+ * -------------------------
+ * Each async transaction moves through the following states.  DMA
+ * buffers remain mapped and owned by the HW until the COMPLETE state
+ * is reached -- only then are they safe to unmap/free.
+ *
+ *   QUEUED --[TM posts to HW]--> INFLIGHT
+ *   (cmq)   |                       |      \
+ *          |                       |       \--[timer fires]-->
+ *          |                       |            TIMED_OUT
+ *          |                       |               |
+ *          |                  [HW completes /   [HW completes /
+ *          |                   RH pops txn]      RH pops txn]
+ *          |                       |               |
+ *          |                       v               v
+ *          |                    COMPLETE        COMPLETE
+ *          |                   (err=HW rc)    (err=-ETIMEDOUT)
+ *          |
+ *          +--[pre-submit fail]--> freed (callback never fires)
+ *
+ * Note: QUEUED is the command_msg phase (sitting in the CMQ list,
+ * not yet a transaction_obj).  The transaction_obj states tracked
+ * by atomic_cmpxchg are INFLIGHT, TIMED_OUT, and COMPLETE only.
+ *
+ * Completion callback context guarantee:
+ *   The crypto_request_complete() callback is invoked from one of:
+ *     - The RH threaded IRQ handler (process context, BH disabled)
+ *     - The watchdog timer (softirq / timer context)
+ *     - The TM kthread during queue drain/cleanup (process context)
+ *
+ *   It is NEVER invoked from hardirq context.
+ *
+ *   The watchdog path runs from timer softirq because it must recover
+ *   missed IRQs without sleeping.  This is crypto-API-compliant:
+ *   crypto_request_complete() is documented safe from any context
+ *   (including softirq).  Callers must NOT assume process context in
+ *   their completion callbacks -- all operations therein must be
+ *   softirq-safe (no mutex, no GFP_KERNEL, no sleeping locks).
+ *
+ *   For backlog promotion (-EINPROGRESS callbacks), the callback runs
+ *   under the CMQ spinlock with IRQs disabled -- callers must handle
+ *   this per the crypto API backlog contract.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/kthread.h>
+#include <linux/slab.h>
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/string.h>
+#include <linux/completion.h>
+#include <linux/overflow.h>
+#include <linux/refcount.h>
+
+#include "cmh_txn.h"
+#include "cmh_rh.h"
+#include "cmh_registers.h"
+#include "cmh_config.h"
+#include "cmh_vcq.h"
+#include "cmh_debugfs.h"
+#include "cmh_dma.h"
+
+/* Module State */
+
+static struct {
+       struct cmh_config      *cfg;
+       struct task_struct     *thread;
+       bool                    running;
+
+       /* Command Message Queue (CMQ) */
+       struct list_head        cmq;
+       spinlock_t              cmq_lock;   /* protects cmq + backlog lists */
+       wait_queue_head_t       cmq_waitq;
+
+       /* Backlog queue for CRYPTO_TFM_REQ_MAY_BACKLOG requests */
+       struct list_head        backlog;
+       u32                     backlog_depth;
+
+       /* Per-mailbox transaction queues */
+       struct cmh_mbx_txq     *txqs;       /* array[cfg->mbx_count] */
+
+       /* Round-robin mailbox selector */
+       u32                     next_mbx;
+} tm;
+
+static unsigned int cmq_max_depth = 256;
+module_param(cmq_max_depth, uint, 0444);
+MODULE_PARM_DESC(cmq_max_depth,
+                "Max pending commands in the Command Message Queue (default: 256)");
+
+static unsigned int backlog_max_depth = 1024;
+module_param(backlog_max_depth, uint, 0444);
+MODULE_PARM_DESC(backlog_max_depth,
+                "Max pending commands in the backlog queue (0 = disable backlog, default: 1024)");
+
+static unsigned int async_timeout_ms = 2000;
+
+#define CMH_TM_BACKOFF_MIN_US   100  /* queue-full backoff range (us) */
+#define CMH_TM_BACKOFF_MAX_US   500
+static unsigned int cmq_depth;       /* current CMQ depth, protected by tm.cmq_lock */
+
+/*
+ * Monotonically increasing counter bumped by cmh_tm_txq_completion_notify().
+ * Used as a generation check in the queue-full backoff predicate so that
+ * wait_event_interruptible_timeout() returns immediately when a TXQ
+ * completion frees a slot, rather than sleeping for the full timeout.
+ */
+static atomic_t txq_completion_gen;
+
+/* -- Debugfs stat helpers (avoid anonymous compound blocks) ------------- */
+
+static void cmh_stat_inc_mbx_queue_full(u32 mbx_idx)
+{
+       struct cmh_mbx_stats *s = cmh_debugfs_mbx_stats(mbx_idx);
+
+       if (s)
+               atomic64_inc(&s->queue_full_count);
+}
+
+static void cmh_stat_record_vcq_submit(u32 mbx_idx, u32 num_vcqs, u32 depth)
+{
+       struct cmh_mbx_stats *s = cmh_debugfs_mbx_stats(mbx_idx);
+
+       if (s) {
+               atomic64_add(num_vcqs, &s->vcqs_submitted);
+               cmh_stat_update_max(&s->max_queue_depth, (s64)depth);
+       }
+}
+
+static void cmh_stat_inc_tm_backoff(void)
+{
+       struct cmh_tm_stats *s = cmh_debugfs_tm_stats();
+
+       if (s)
+               atomic64_inc(&s->backoff_count);
+}
+
+static void cmh_stat_inc_cmq_eagain(void)
+{
+       struct cmh_tm_stats *s = cmh_debugfs_tm_stats();
+
+       if (s)
+               atomic64_inc(&s->cmq_eagain_count);
+}
+
+static void cmh_stat_record_cmq_post(u32 depth)
+{
+       struct cmh_tm_stats *s = cmh_debugfs_tm_stats();
+
+       if (s) {
+               atomic64_inc(&s->cmq_posts);
+               cmh_stat_update_max(&s->cmq_depth_max, (s64)depth);
+       }
+}
+
+static void cmh_stat_inc_async_timeout(void)
+{
+       struct cmh_tm_stats *s = cmh_debugfs_tm_stats();
+
+       if (s)
+               atomic64_inc(&s->async_timeout_count);
+}
+
+/*
+ * Drop one reference on a command_msg; free when the last ref is dropped.
+ * Used by cmh_tm_submit_sync() to share msg ownership between the
+ * waiter (caller) and the TM subsystem (thread or cleanup drain).
+ */
+static void command_msg_put(struct command_msg *msg)
+{
+       if (refcount_dec_and_test(&msg->refs)) {
+               kfree(msg->vcq_data);
+               kfree(msg);
+       }
+}
+
+/*
+ * Drop one reference on a transaction_obj; free when the last ref drops.
+ * Two references are held when the per-request timeout timer is armed:
+ * one for the TXQ owner (RH/cleanup), one for the timer callback.
+ * When no timer is armed, only the owner ref exists.
+ */
+static void txn_put(struct transaction_obj *txn)
+{
+       if (refcount_dec_and_test(&txn->refs))
+               kfree(txn);
+}
+
+/*
+ * Per-request async timeout callback (runs in softirq / timer context).
+ *
+ * This function ONLY marks the transaction state as TIMED_OUT via
+ * atomic cmpxchg and drops the timer reference.  It does NOT fire
+ * the completion callback, does NOT touch DMA buffers, and does NOT
+ * write any MBX registers.
+ *
+ * Rationale: the HW may still be writing to DMA buffers at this
+ * point.  Unmapping or freeing them here would be a use-after-free.
+ * The actual -ETIMEDOUT completion fires later, from process
+ * context, when the RH threaded IRQ pops the transaction after the
+ * HW finishes (or after MBX abort/drain on rmmod/suspend).
+ *
+ * MBX_COMMAND_ABORT is NOT issued here.  It is issued by the RH
+ * watchdog abort-stall detector under rh_process_lock, which
+ * serialises it against RESTART/FLUSH recovery commands.  Writing
+ * ABORT from timer softirq without the lock caused a race where
+ * concurrent timeouts clobbered an in-progress RESTART, wedging
+ * the mailbox.
+ *
+ * Context: softirq (timer).  Must not sleep.
+ */
+static void txn_timeout_fn(struct timer_list *t)
+{
+       struct transaction_obj *txn = timer_container_of(txn, t, timeout_timer);
+       int old;
+
+       old = atomic_cmpxchg(&txn->state, TXN_INFLIGHT, TXN_TIMED_OUT);
+       if (old == TXN_INFLIGHT) {
+               dev_err_ratelimited(cmh_dev(),
+                                   "tm: async timeout vcq=%u..%u mbx=%u cmd_id=0x%08x\n",
+                                   txn->first_vcq_id,
+                                   txn->last_vcq_id, txn->mailbox_idx,
+                                   txn->command_id);
+               cmh_stat_inc_async_timeout();
+       }
+
+       txn_put(txn); /* drop timer ref */
+}
+
+/**
+ * cmh_txn_finish() - Complete a popped transaction with FSM + timer cleanup
+ * @txn: Transaction object to complete
+ * @error: Error code from HW (0 on success)
+ *
+ * Three cases:
+ *   1. Normal: state INFLIGHT -> COMPLETE.  Fire callback with HW error.
+ *   2. Timed out: state already TXN_TIMED_OUT (timer marked it).
+ *      Fire callback with -ETIMEDOUT.  DMA is now safe because the
+ *      HW has finished and HEAD has advanced past this VCQ.
+ *   3. Force-cancel (drain/quiesce): handled by caller, not here.
+ */
+void cmh_txn_finish(struct transaction_obj *txn, int error)
+{
+       int old;
+
+       old = atomic_cmpxchg(&txn->state, TXN_INFLIGHT, TXN_COMPLETE);
+
+       /* Dequeue the timer if still pending; drop timer ref if we did */
+       if (timer_delete(&txn->timeout_timer))
+               txn_put(txn);
+
+       if (old == TXN_INFLIGHT) {
+               /* HW completion (may carry error) */
+               if (txn->complete)
+                       txn->complete(txn->completion_data, error);
+       } else if (old == TXN_TIMED_OUT) {
+               /* Timer won earlier; now HW is done -- deliver -ETIMEDOUT */
+               if (txn->complete)
+                       txn->complete(txn->completion_data, -ETIMEDOUT);
+       }
+
+       txn_put(txn); /* drop owner ref */
+}
+
+/* Mailbox Slot Addressing */
+
+/*
+ * Return a kernel-virtual pointer to the VCQ slot for the given vcqid.
+ * Mirrors CMH eSW's mbx_queue_addr() but uses the kernel virt_addr.
+ */
+static void *mbx_slot_ptr(struct cmh_mbx_config *mbx, u32 vcqid)
+{
+       u32 slot_mask = (1U << mbx->slots_log2) - 1U;
+       u32 slot_offset = (vcqid & slot_mask) << mbx->stride_log2;
+
+       return (u8 *)mbx->virt_addr + slot_offset;
+}
+
+/*
+ * Return the number of free slots in a mailbox queue.
+ */
+static u32 mbx_free_slots(struct cmh_mbx_config *mbx)
+{
+       u32 head = cmh_reg_read32(mbx->reg_base, R_MBX_QUEUE_HEAD);
+       u32 tail = cmh_reg_read32(mbx->reg_base, R_MBX_QUEUE_TAIL);
+       u32 size = 1U << mbx->slots_log2;
+
+       return size - (u32)(tail - head);
+}
+
+/**
+ * cmh_tm_max_cmds_per_vcq() - Return max commands per VCQ slot
+ *
+ * Scans all mailbox configurations and returns the minimum number of
+ * VCQ command entries that fit in a single slot, clamped to the
+ * MIN_VCQ_CMDS..MAX_VCQ_CMDS range.
+ *
+ * Return: Maximum usable VCQ command count per slot.
+ */
+u32 cmh_tm_max_cmds_per_vcq(void)
+{
+       u32 i, min_cmds = MAX_VCQ_CMDS;
+
+       for (i = 0; i < tm.cfg->mbx_count; i++) {
+               u32 stride = 1U << tm.cfg->mailboxes[i].stride_log2;
+               u32 cmds = stride / (u32)sizeof(struct vcq_cmd);
+
+               if (cmds < min_cmds)
+                       min_cmds = cmds;
+       }
+
+       if (min_cmds < MIN_VCQ_CMDS)
+               min_cmds = MIN_VCQ_CMDS;
+
+       return min_cmds;
+}
+
+/**
+ * cmh_tm_mbx_count() - Return the number of configured mailboxes
+ *
+ * Return: Number of mailboxes in the current configuration.
+ */
+u32 cmh_tm_mbx_count(void)
+{
+       return tm.cfg->mbx_count;
+}
+
+/* Core-to-MBX Affinity -- Config-Driven Multi-Instance Support */
+
+/*
+ * Per-core-type configuration table.  Each entry holds one or more
+ * (core_id, mbx_idx) instances.  Defaults: single instance per core
+ * type with the standard CORE_ID_* and MBX auto-assigned on first use
+ * (mbx_idx = -1).  Module params can override for explicit assignment
+ * and multi-instance support.
+ *
+ * Round-robin across instances for each new crypto operation.
+ */
+
+struct core_instance_info {
+       u32             core_id;        /* VCQ dispatch core_id */
+       /*
+        * Assigned MBX index, or -1 (sentinel) for auto-assign on first
+        * use.  Uses atomic_t for a lockless once-only latch: the first
+        * caller does atomic_cmpxchg(&mbx_idx, -1, new_mbx); all later
+        * callers see the winning value via atomic_read().
+        */
+       atomic_t        mbx_idx;
+};
+
+struct core_type_info {
+       u32             num_instances;
+       struct core_instance_info instances[CMH_MAX_CORE_INSTANCES];
+       atomic_t        next_instance;  /* round-robin counter */
+};
+
+static struct core_type_info core_types[CMH_NUM_CORE_TYPES] = {
+       [CMH_CORE_HC]  = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_HC,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+       [CMH_CORE_AES] = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_AES,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+       [CMH_CORE_SM4] = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_SM4,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+       [CMH_CORE_SM3] = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_SM3,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+       [CMH_CORE_CCP] = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_CCP,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+       [CMH_CORE_PKE] = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_PKE,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+       [CMH_CORE_QSE] = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_QSE,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+       [CMH_CORE_HCQ] = { .num_instances = 1,
+         .instances = { { .core_id = CORE_ID_HCQ,
+                          .mbx_idx = ATOMIC_INIT(-1) } } },
+};
+
+/* Round-robin counter for auto-assigning MBXes to core instances */
+static atomic_t affinity_next_mbx = ATOMIC_INIT(0);
+
+/**
+ * cmh_tm_affinity_reset() - Reset core-to-MBX affinity state
+ *
+ * Clears all auto-assigned MBX bindings and resets round-robin
+ * counters for both the global MBX allocator and per-core-type
+ * instance selectors.
+ */
+void cmh_tm_affinity_reset(void)
+{
+       u32 i, j;
+
+       atomic_set(&affinity_next_mbx, 0);
+
+       /* Reset multi-instance table */
+       for (i = 0; i < CMH_NUM_CORE_TYPES; i++) {
+               struct core_type_info *ct = &core_types[i];
+
+               atomic_set(&ct->next_instance, 0);
+               for (j = 0; j < ct->num_instances; j++)
+                       atomic_set(&ct->instances[j].mbx_idx, -1);
+       }
+}
+
+/**
+ * cmh_core_default_id() - Return default core_id for a core type
+ * @type: Core type selector
+ *
+ * Returns the first-instance core_id for @type without advancing the
+ * round-robin counter.  Used by callers pinned to a fixed MBX (e.g.
+ * mgmt ioctls on MGMT_MBX) that only need the VCQ core_id field.
+ *
+ * Return: VCQ core_id value for the default instance of @type.
+ */
+u32 cmh_core_default_id(enum cmh_core_type type)
+{
+       if (WARN_ON_ONCE(type >= CMH_NUM_CORE_TYPES))
+               return 0;
+
+       return core_types[type].instances[0].core_id;
+}
+
+/**
+ * cmh_core_select_instance() - Select a core instance via round-robin
+ * @type: Core type selector
+ *
+ * Round-robin across configured instances, each permanently pinned to
+ * its MBX (auto-assigned on first use if mbx_idx was -1).
+ *
+ * Uses atomic_inc_return (pre-increment), so the very first call for a
+ * given type returns instance[1 % N].  Over the lifetime of the module
+ * the distribution is perfectly balanced; the off-by-one only affects
+ * the first cycle.
+ *
+ * The (u32) cast before the modulo ensures correct behaviour across
+ * the INT_MAX -> INT_MIN wraparound of atomic_t: (u32)INT_MIN =
+ * 0x80000000, and 0x80000000 % N still yields a valid index.
+ *
+ * Return: A core_dispatch with (core_id, mbx_idx) for the selected
+ *         instance.
+ */
+struct core_dispatch cmh_core_select_instance(enum cmh_core_type type)
+{
+       struct core_type_info *ct;
+       struct core_instance_info *inst;
+       struct core_dispatch d;
+       u32 idx, count;
+       s32 mbx, new_mbx, old;
+
+       if (WARN_ON_ONCE(type >= CMH_NUM_CORE_TYPES))
+               return (struct core_dispatch){ .core_id = 0, .mbx_idx = -1 };
+
+       ct = &core_types[type];
+       idx = (u32)atomic_inc_return(&ct->next_instance) % ct->num_instances;
+       inst = &ct->instances[idx];
+
+       d.core_id = inst->core_id;
+
+       mbx = atomic_read(&inst->mbx_idx);
+       if (mbx >= 0) {
+               d.mbx_idx = mbx;
+               return d;
+       }
+
+       /* Auto-assign on first use */
+       count = tm.cfg->mbx_count;
+       new_mbx = (s32)((u32)atomic_inc_return(&affinity_next_mbx) % count);
+       old = atomic_cmpxchg(&inst->mbx_idx, -1, new_mbx);
+
+       if (old >= 0) {
+               d.mbx_idx = old;
+       } else {
+               d.mbx_idx = new_mbx;
+               dev_info(cmh_dev(),
+                        "tm: core 0x%02x -> mbx %d (auto)\n",
+                        inst->core_id, new_mbx);
+       }
+
+       return d;
+}
+
+/**
+ * cmh_core_num_instances() - Return instance count for a core type
+ * @type: Core type selector
+ *
+ * Return: Number of configured instances for @type.
+ */
+u32 cmh_core_num_instances(enum cmh_core_type type)
+{
+       if (WARN_ON_ONCE(type >= CMH_NUM_CORE_TYPES))
+               return 1;
+
+       return core_types[type].num_instances;
+}
+
+/**
+ * cmh_core_get_instance() - Get dispatch info for a specific instance
+ * @type: Core type selector
+ * @idx: Instance index within @type
+ *
+ * Returns (core_id, mbx_idx) for a specific instance by index,
+ * without advancing the round-robin counter.  Triggers MBX auto-assign
+ * on first use if the instance has no MBX yet.
+ *
+ * Return: A core_dispatch with (core_id, mbx_idx) for instance @idx.
+ */
+struct core_dispatch cmh_core_get_instance(enum cmh_core_type type, u32 idx)
+{
+       struct core_type_info *ct;
+       struct core_instance_info *inst;
+       struct core_dispatch d;
+       u32 count;
+       s32 mbx, new_mbx, old;
+
+       if (WARN_ON_ONCE(type >= CMH_NUM_CORE_TYPES))
+               return (struct core_dispatch){ .core_id = 0, .mbx_idx = -1 };
+
+       ct = &core_types[type];
+       if (WARN_ON_ONCE(idx >= ct->num_instances))
+               return (struct core_dispatch){ .core_id = 0, .mbx_idx = -1 };
+
+       inst = &ct->instances[idx];
+       d.core_id = inst->core_id;
+
+       mbx = atomic_read(&inst->mbx_idx);
+       if (mbx >= 0) {
+               d.mbx_idx = mbx;
+               return d;
+       }
+
+       /* Auto-assign on first use */
+       count = tm.cfg->mbx_count;
+       new_mbx = (s32)((u32)atomic_inc_return(&affinity_next_mbx) % count);
+       old = atomic_cmpxchg(&inst->mbx_idx, -1, new_mbx);
+
+       if (old >= 0) {
+               d.mbx_idx = old;
+       } else {
+               d.mbx_idx = new_mbx;
+               dev_info(cmh_dev(),
+                        "tm: core 0x%02x -> mbx %d (auto)\n",
+                        inst->core_id, new_mbx);
+       }
+
+       return d;
+}
+
+/**
+ * cmh_tm_txq_completion_notify() - Wake TM thread after RH completion
+ *
+ * Wakes the TM thread after the Response Handler completes a
+ * transaction.  This unblocks the TM if it is waiting for a free MBX
+ * slot.  The generation counter bump ensures the wait_event predicate
+ * evaluates to true on the next check.
+ */
+void cmh_tm_txq_completion_notify(void)
+{
+       atomic_inc(&txq_completion_gen);
+       wake_up_interruptible(&tm.cmq_waitq);
+}
+
+/* Mailbox Selection */
+
+/*
+ * Select a mailbox with at least @slots_needed free slots (round-robin).
+ * Returns mailbox index, or -EAGAIN if no mailbox qualifies.
+ *
+ * Note: the free-slot check here is advisory -- actual slot availability
+ * is enforced by the ring arithmetic under dispatch_lock in submit_vcq().
+ * A TOCTOU gap exists between this check and the subsequent slot write,
+ * but it is safe: the worst case is a spurious -EAGAIN / backoff, never
+ * a ring overcommit.
+ */
+static int select_mailbox(u32 slots_needed)
+{
+       u32 count = tm.cfg->mbx_count;
+       u32 start = tm.next_mbx;
+       u32 i;
+
+       for (i = 0; i < count; i++) {
+               u32 idx = (start + i) % count;
+
+               if (cmh_rh_mbx_is_wedged(idx))
+                       continue;
+
+               if (mbx_free_slots(&tm.cfg->mailboxes[idx]) >= slots_needed) {
+                       tm.next_mbx = (idx + 1) % count;
+                       return (int)idx;
+               }
+               cmh_stat_inc_mbx_queue_full(idx);
+       }
+
+       return -EAGAIN;
+}
+
+/*
+ * Resolve the target mailbox for a command message.
+ *
+ * If the message has a pinned MBX and it has enough free slots, use it.
+ * Otherwise fall back to round-robin selection.  Returns mailbox index,
+ * or -EAGAIN when no MBX has enough free slots or all are wedged.
+ */
+static int resolve_mbx(struct command_msg *msg)
+{
+       u32 slots = msg->num_vcqs > 0 ? msg->num_vcqs : 1;
+
+       if (msg->target_mbx >= 0 &&
+           (u32)msg->target_mbx < tm.cfg->mbx_count) {
+               if (cmh_rh_mbx_is_wedged((u32)msg->target_mbx))
+                       return -EAGAIN;
+               if (mbx_free_slots(&tm.cfg->mailboxes[msg->target_mbx]) >=
+                   slots)
+                       return msg->target_mbx;
+               return -EAGAIN; /* pinned MBX full, retry */
+       }
+
+       return select_mailbox(slots);
+}
+
+/* VCQ Submission */
+
+/*
+ * Write VCQ(s) into consecutive DMA slots and ring the doorbell.
+ *
+ * A command_msg may carry one or more VCQs (num_vcqs field).  For a
+ * multi-VCQ message the flat vcq_data array contains N VCQs laid out
+ * contiguously, each starting with its own header whose cmds field
+ * gives that VCQ's entry count.  All VCQs are written to consecutive
+ * MBX slots and tracked by a single transaction_obj.
+ *
+ * Returns 0 on success, negative errno on failure.
+ */
+static int submit_vcq(struct command_msg *msg, u32 mbx_idx)
+{
+       struct cmh_mbx_config *mbx = &tm.cfg->mailboxes[mbx_idx];
+       struct cmh_mbx_txq *txq = &tm.txqs[mbx_idx];
+       struct transaction_obj *txn;
+       const struct vcq_cmd *cmds = msg->vcq_data;
+       u32 num_vcqs = msg->num_vcqs > 0 ? msg->num_vcqs : 1;
+       u32 tail, stride_bytes, offset = 0;
+       unsigned long flags;
+       u32 v;
+
+       mutex_lock(&txq->dispatch_lock);
+
+       /* Read current tail (first VCQ ID) */
+       tail = cmh_reg_read32(mbx->reg_base, R_MBX_QUEUE_TAIL);
+       stride_bytes = 1U << mbx->stride_log2;
+
+       /* Allocate transaction tracking object */
+       txn = kzalloc_obj(*txn, GFP_KERNEL);
+       if (!txn) {
+               mutex_unlock(&txq->dispatch_lock);
+               return -ENOMEM;
+       }
+
+       /* Write each VCQ into a consecutive DMA slot */
+       for (v = 0; v < num_vcqs; v++) {
+               u32 vcq_cmds, copy_size;
+               void *slot;
+
+               /*
+                * For single-VCQ messages (backward compat) use the
+                * msg-level vcq_count.  For multi-VCQ, parse the per-VCQ
+                * header to find each VCQ's command count.
+                */
+               if (num_vcqs == 1) {
+                       vcq_cmds = msg->vcq_count;
+               } else {
+                       const struct vcq_hdr *hdr =
+                               (const struct vcq_hdr *)&cmds[offset].hwc;
+                       vcq_cmds = hdr->cmds;
+               }
+
+               copy_size = vcq_cmds * sizeof(struct vcq_cmd);
+               if (copy_size > stride_bytes) {
+                       dev_err(cmh_dev(), "tm: VCQ %u too large (%u bytes > stride %u)\n",
+                               v, copy_size, stride_bytes);
+                       mutex_unlock(&txq->dispatch_lock);
+                       kfree(txn);
+                       return -EMSGSIZE;
+               }
+
+               if (vcq_cmds < MIN_VCQ_CMDS || vcq_cmds > MAX_VCQ_CMDS) {
+                       dev_err(cmh_dev(), "tm: invalid vcq_count %u (range %u..%u)\n",
+                               vcq_cmds, MIN_VCQ_CMDS, MAX_VCQ_CMDS);
+                       mutex_unlock(&txq->dispatch_lock);
+                       kfree(txn);
+                       return -EINVAL;
+               }
+
+               /* Copy pre-built VCQ into DMA slot */
+               slot = mbx_slot_ptr(mbx, tail + v);
+               cmh_dma_write(slot, &cmds[offset], copy_size);
+
+               /* Zero remaining slot bytes to avoid stale data */
+               if (copy_size < stride_bytes)
+                       cmh_dma_zero((u8 *)slot + copy_size,
+                                    stride_bytes - copy_size);
+
+               offset += vcq_cmds;
+       }
+
+       /* Ensure VCQ data is visible in memory before advancing tail */
+       wmb();
+       /* FPGA: confirm DRAM accepted writes before SIC doorbell (cross-slave) */
+       cmh_dma_fence(mbx_slot_ptr(mbx, tail + num_vcqs - 1));
+
+       /* Fill in transaction spanning all VCQs */
+       txn->first_vcq_id = tail;
+       txn->last_vcq_id = tail + num_vcqs - 1;
+       txn->mailbox_idx = mbx_idx;
+       txn->command_id = msg->command_id;
+       txn->error_code = 0;
+       txn->complete = msg->complete;
+       txn->completion_data = msg->completion_data;
+       atomic_set(&txn->state, TXN_INFLIGHT);
+       timer_setup(&txn->timeout_timer, txn_timeout_fn, 0);
+       INIT_LIST_HEAD(&txn->list);
+
+       /*
+        * Set refcount: 2 if a per-txn timer will be armed (one ref for
+        * the TXQ owner that pops it, one for the timer callback), or 1
+        * if no timer (sync paths, or async_timeout_ms == 0).
+        */
+       if (msg->timeout_jiffies)
+               refcount_set(&txn->refs, 2);
+       else
+               refcount_set(&txn->refs, 1);
+
+       /* Enqueue transaction under spinlock */
+       spin_lock_irqsave(&txq->lock, flags);
+       list_add_tail(&txn->list, &txq->head);
+       txq->depth++;
+       spin_unlock_irqrestore(&txq->lock, flags);
+
+       /* Ring doorbell: advance tail by number of VCQs submitted */
+       cmh_reg_write32(tail + num_vcqs, mbx->reg_base, R_MBX_QUEUE_TAIL);
+
+       /* Arm per-request timeout after doorbell (async only) */
+       if (msg->timeout_jiffies)
+               mod_timer(&txn->timeout_timer,
+                         jiffies + msg->timeout_jiffies);
+
+       mutex_unlock(&txq->dispatch_lock);
+
+       cmh_stat_record_vcq_submit(mbx_idx, num_vcqs, txq->depth);
+
+       dev_dbg(cmh_dev(), "tm: submitted %u vcq(s) id=%u..%u to mbx[%u] tail_now=%u\n",
+               num_vcqs, tail, tail + num_vcqs - 1, mbx_idx,
+                tail + num_vcqs);
+
+       return 0;
+}
+
+/* TM Thread */
+
+static int cmh_tm_thread(void *data)
+{
+       struct command_msg *msg;
+       unsigned long flags;
+       int mbx_idx, ret;
+
+       dev_info(cmh_dev(), "tm: thread started\n");
+
+       while (!kthread_should_stop()) {
+               /* Wait for work or stop signal */
+               wait_event_interruptible(tm.cmq_waitq,
+                                        !list_empty(&tm.cmq) || kthread_should_stop());
+
+               if (kthread_should_stop())
+                       break;
+
+               /* Dequeue one command message */
+               spin_lock_irqsave(&tm.cmq_lock, flags);
+               if (list_empty(&tm.cmq)) {
+                       spin_unlock_irqrestore(&tm.cmq_lock, flags);
+                       continue;
+               }
+               msg = list_first_entry(&tm.cmq, struct command_msg, list);
+               list_del_init(&msg->list);
+               cmq_depth--;
+
+               /*
+                * Promote one backlogged request into the CMQ now that
+                * there is room.  Notify the crypto consumer with
+                * -EINPROGRESS so it knows the request has left backlog.
+                */
+               if (!list_empty(&tm.backlog)) {
+                       struct command_msg *bl;
+
+                       bl = list_first_entry(&tm.backlog,
+                                             struct command_msg, list);
+                       list_move_tail(&bl->list, &tm.cmq);
+                       tm.backlog_depth--;
+                       cmq_depth++;
+                       cmh_stat_record_cmq_post(cmq_depth);
+                       /*
+                        * Signal -EINPROGRESS while still under cmq_lock
+                        * so the consumer sees it before the final
+                        * completion.  The callback must be IRQ-safe
+                        * (required by the async contract anyway).
+                        */
+                       if (bl->complete)
+                               bl->complete(bl->completion_data,
+                                            -EINPROGRESS);
+               }
+
+               spin_unlock_irqrestore(&tm.cmq_lock, flags);
+
+               /* Select a mailbox: pinned or round-robin */
+               mbx_idx = resolve_mbx(msg);
+
+               if (mbx_idx < 0) {
+                       /*
+                        * Queue full -- re-enqueue at front and wait.
+                        *
+                        * Sleep on cmq_waitq with a short timeout.  The RH
+                        * calls cmh_tm_txq_completion_notify() after each
+                        * completed transaction, which bumps the generation
+                        * counter and wakes us immediately.  The timeout is
+                        * a safety net for missed wakeups.
+                        */
+                       int gen = atomic_read(&txq_completion_gen);
+                       unsigned long tmo;
+
+                       spin_lock_irqsave(&tm.cmq_lock, flags);
+                       list_add(&msg->list, &tm.cmq);
+                       cmq_depth++;
+                       spin_unlock_irqrestore(&tm.cmq_lock, flags);
+
+                       tmo = usecs_to_jiffies(CMH_TM_BACKOFF_MAX_US);
+                       wait_event_interruptible_timeout(tm.cmq_waitq,
+                                                        kthread_should_stop() ||
+                                                        atomic_read(&txq_completion_gen) != gen,
+                                                        tmo ?: 1);
+                       cmh_stat_inc_tm_backoff();
+                       continue;
+               }
+
+               /* Submit VCQ to selected mailbox */
+               WRITE_ONCE(msg->actual_mbx, mbx_idx);
+               ret = submit_vcq(msg, mbx_idx);
+               if (ret && msg->complete)
+                       msg->complete(msg->completion_data, ret);
+               command_msg_put(msg);
+       }
+
+       dev_info(cmh_dev(), "tm: thread stopped\n");
+       return 0;
+}
+
+/* Public Interface */
+
+/**
+ * cmh_tm_init() - Initialize the Transaction Manager subsystem
+ * @cfg: Hardware configuration describing mailboxes and core types
+ *
+ * Allocates per-mailbox transaction queues, applies core-type
+ * configuration, and starts the TM kthread.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_tm_init(struct cmh_config *cfg)
+{
+       u32 i, j;
+
+       if (cmq_max_depth == 0) {
+               dev_warn(cmh_dev(),
+                        "tm: cmq_max_depth=0 invalid, clamping to 1\n");
+               cmq_max_depth = 1;
+       }
+
+       tm.cfg = cfg;
+       tm.next_mbx = 0;
+       cmq_depth = 0;
+
+       cmh_tm_affinity_reset();
+
+       /* Apply per-core-type config from DT child nodes */
+       for (i = 0; i < CMH_NUM_CORE_TYPES; i++) {
+               struct cmh_core_type_cfg *src = &cfg->core_types[i];
+               struct core_type_info *ct = &core_types[i];
+
+               ct->num_instances = src->num_instances;
+               for (j = 0; j < src->num_instances; j++) {
+                       ct->instances[j].core_id = src->core_ids[j];
+                       if (src->mbx[j] >= 0)
+                               atomic_set(&ct->instances[j].mbx_idx,
+                                          src->mbx[j]);
+               }
+       }
+
+       /* Initialize CMQ and backlog */
+       INIT_LIST_HEAD(&tm.cmq);
+       INIT_LIST_HEAD(&tm.backlog);
+       tm.backlog_depth = 0;
+       spin_lock_init(&tm.cmq_lock);
+       init_waitqueue_head(&tm.cmq_waitq);
+
+       /* Allocate per-mailbox transaction queues */
+       tm.txqs = kcalloc(cfg->mbx_count, sizeof(*tm.txqs), GFP_KERNEL);
+       if (!tm.txqs)
+               return -ENOMEM;
+
+       for (i = 0; i < cfg->mbx_count; i++) {
+               INIT_LIST_HEAD(&tm.txqs[i].head);
+               spin_lock_init(&tm.txqs[i].lock);
+               mutex_init(&tm.txqs[i].dispatch_lock);
+               tm.txqs[i].depth = 0;
+       }
+
+       /* Start TM thread */
+       tm.thread = kthread_run(cmh_tm_thread, NULL, "cmh_tm");
+       if (IS_ERR(tm.thread)) {
+               int ret = PTR_ERR(tm.thread);
+
+               dev_err(cmh_dev(), "tm: failed to start thread (rc=%d)\n", ret);
+               tm.thread = NULL;
+               kfree(tm.txqs);
+               tm.txqs = NULL;
+               return ret;
+       }
+
+       WRITE_ONCE(tm.running, true);
+       dev_info(cmh_dev(),
+                "tm: initialized (%u mailboxes, cmq_depth=%u backlog=%u)\n",
+                cfg->mbx_count, cmq_max_depth, backlog_max_depth);
+
+       return 0;
+}
+
+/*
+ * cmh_tm_stop_and_drain_cmq() - Stop TM thread and drain CMQ/backlog
+ *
+ * Shared preamble for cmh_tm_cleanup() and cmh_tm_quiesce(): stops the
+ * kthread, marks the TM as not running, then splices the CMQ and backlog
+ * to local lists and cancels every pending command_msg outside the lock.
+ */
+static void cmh_tm_stop_and_drain_cmq(void)
+{
+       struct command_msg *msg, *tmp_msg;
+       unsigned long flags;
+       LIST_HEAD(cmq_drain);
+       LIST_HEAD(backlog_drain);
+
+       if (tm.thread) {
+               kthread_stop(tm.thread);
+               tm.thread = NULL;
+       }
+       WRITE_ONCE(tm.running, false);
+
+       spin_lock_irqsave(&tm.cmq_lock, flags);
+       list_splice_init(&tm.cmq, &cmq_drain);
+       cmq_depth = 0;
+       list_splice_init(&tm.backlog, &backlog_drain);
+       tm.backlog_depth = 0;
+       spin_unlock_irqrestore(&tm.cmq_lock, flags);
+
+       list_for_each_entry_safe(msg, tmp_msg, &cmq_drain, list) {
+               list_del(&msg->list);
+               if (msg->complete)
+                       msg->complete(msg->completion_data, -ECANCELED);
+               command_msg_put(msg);
+       }
+       list_for_each_entry_safe(msg, tmp_msg, &backlog_drain, list) {
+               list_del(&msg->list);
+               if (msg->complete)
+                       msg->complete(msg->completion_data, -ECANCELED);
+               command_msg_put(msg);
+       }
+}
+
+/**
+ * cmh_tm_cleanup() - Tear down the Transaction Manager subsystem
+ *
+ * Stops the TM kthread, drains the CMQ, backlog, and all per-mailbox
+ * transaction queues, notifying waiters with -ECANCELED or -ETIMEDOUT.
+ * Frees all TM-owned resources.
+ */
+void cmh_tm_cleanup(void)
+{
+       struct transaction_obj *txn, *tmp_txn;
+       unsigned long flags;
+       u32 i;
+
+       cmh_tm_stop_and_drain_cmq();
+
+       /* Drain per-mailbox transaction queues */
+       if (tm.txqs) {
+               for (i = 0; i < tm.cfg->mbx_count; i++) {
+                       LIST_HEAD(drain);
+                       int old;
+
+                       spin_lock_irqsave(&tm.txqs[i].lock, flags);
+                       list_splice_init(&tm.txqs[i].head, &drain);
+                       tm.txqs[i].depth = 0;
+                       spin_unlock_irqrestore(&tm.txqs[i].lock, flags);
+
+                       list_for_each_entry_safe(txn, tmp_txn, &drain, list) {
+                               list_del(&txn->list);
+
+                               if (timer_delete_sync(&txn->timeout_timer))
+                                       txn_put(txn);
+
+                               old = atomic_cmpxchg(&txn->state,
+                                                    TXN_INFLIGHT,
+                                                    TXN_COMPLETE);
+                               if (txn->complete) {
+                                       if (old == TXN_INFLIGHT)
+                                               txn->complete(txn->completion_data,
+                                                             -ECANCELED);
+                                       else if (old == TXN_TIMED_OUT)
+                                               txn->complete(txn->completion_data,
+                                                             -ETIMEDOUT);
+                               }
+
+                               txn_put(txn);
+                       }
+               }
+               kfree(tm.txqs);
+               tm.txqs = NULL;
+       }
+
+       dev_info(cmh_dev(), "tm: cleaned up\n");
+}
+
+/*
+ * Default drain timeout for suspend/quiesce (milliseconds).
+ * Covers all symmetric + PKE operations.  PQC callers (SLH-DSA sign
+ * at up to 120 s) should complete before system suspend is requested.
+ */
+static unsigned int drain_timeout_ms = 10000;
+
+/**
+ * cmh_tm_quiesce() - Quiesce the TM for suspend or shutdown
+ *
+ * Stops the TM kthread, drains the CMQ and backlog, then waits up to
+ * drain_timeout_ms for in-flight transactions to complete via the
+ * Response Handler.  Any remaining transactions after the deadline
+ * are force-cancelled.
+ */
+void cmh_tm_quiesce(void)
+{
+       struct transaction_obj *txn, *tmp_txn;
+       unsigned long deadline;
+       unsigned long flags;
+       u32 i;
+       bool drained = true;
+
+       cmh_tm_stop_and_drain_cmq();
+
+       /* Wait for in-flight TXQ transactions to complete via RH */
+       if (!tm.txqs)
+               goto out;
+
+       deadline = jiffies + msecs_to_jiffies(drain_timeout_ms);
+       do {
+               drained = true;
+               for (i = 0; i < tm.cfg->mbx_count; i++) {
+                       if (READ_ONCE(tm.txqs[i].depth)) {
+                               drained = false;
+                               break;
+                       }
+               }
+               if (drained)
+                       break;
+               usleep_range(1000, 2000);
+       } while (time_before(jiffies, deadline));
+
+       if (!drained) {
+               dev_warn(cmh_dev(),
+                        "tm: quiesce drain timeout (%u ms), cancelling remaining transactions\n",
+                        drain_timeout_ms);
+               for (i = 0; i < tm.cfg->mbx_count; i++) {
+                       LIST_HEAD(drain);
+                       int old;
+
+                       spin_lock_irqsave(&tm.txqs[i].lock, flags);
+                       list_splice_init(&tm.txqs[i].head, &drain);
+                       tm.txqs[i].depth = 0;
+                       spin_unlock_irqrestore(&tm.txqs[i].lock, flags);
+
+                       list_for_each_entry_safe(txn, tmp_txn, &drain, list) {
+                               list_del(&txn->list);
+
+                               if (timer_delete_sync(&txn->timeout_timer))
+                                       txn_put(txn);
+
+                               old = atomic_cmpxchg(&txn->state,
+                                                    TXN_INFLIGHT,
+                                                    TXN_COMPLETE);
+                               if (txn->complete) {
+                                       if (old == TXN_INFLIGHT)
+                                               txn->complete(txn->completion_data,
+                                                             -ECANCELED);
+                                       else if (old == TXN_TIMED_OUT)
+                                               txn->complete(txn->completion_data,
+                                                             -ETIMEDOUT);
+                               }
+
+                               txn_put(txn);
+                       }
+               }
+       }
+
+out:
+       dev_info(cmh_dev(), "tm: quiesced%s\n",
+                drained ? "" : " (forced)");
+}
+
+/**
+ * cmh_tm_resume() - Resume the TM after suspend
+ *
+ * Restarts the TM kthread after a prior cmh_tm_quiesce().
+ *
+ * Return: 0 on success, negative errno if kthread creation fails.
+ */
+int cmh_tm_resume(void)
+{
+       if (tm.thread || !tm.cfg)
+               return 0;
+
+       tm.thread = kthread_run(cmh_tm_thread, NULL, "cmh_tm");
+       if (IS_ERR(tm.thread)) {
+               int ret = PTR_ERR(tm.thread);
+
+               dev_err(cmh_dev(), "tm: resume kthread_run failed (%d)\n",
+                       ret);
+               tm.thread = NULL;
+               return ret;
+       }
+       WRITE_ONCE(tm.running, true);
+       dev_info(cmh_dev(), "tm: resumed\n");
+       return 0;
+}
+
+/**
+ * cmh_tm_try_cancel_command() - Cancel a queued command message
+ * @msg: Command message to cancel
+ *
+ * Attempts to remove @msg from the CMQ before the TM thread dequeues
+ * it.  Must be called while @msg is still valid (before the caller's
+ * stack frame that owns it is freed).
+ *
+ * Return: true if @msg was removed, false if already consumed by TM.
+ */
+bool cmh_tm_try_cancel_command(struct command_msg *msg)
+{
+       unsigned long flags;
+       bool cancelled = false;
+
+       spin_lock_irqsave(&tm.cmq_lock, flags);
+       if (!list_empty(&msg->list)) {
+               list_del_init(&msg->list);
+               cmq_depth--;
+               cancelled = true;
+       }
+       spin_unlock_irqrestore(&tm.cmq_lock, flags);
+
+       return cancelled;
+}
+
+/**
+ * cmh_tm_post_command() - Post a command message to the CMQ
+ * @msg: Pre-built command message to enqueue
+ *
+ * Enqueues @msg on the Command Message Queue and wakes the TM thread.
+ * If the CMQ is full, the message may be placed on the backlog queue
+ * (returning -EBUSY) if @msg->backlog_ok is set, or rejected with
+ * -EAGAIN.
+ *
+ * Return: 0 on success, -EBUSY if backlogged, -EAGAIN if full,
+ *         -ENODEV if TM is not running.
+ */
+int cmh_tm_post_command(struct command_msg *msg)
+{
+       unsigned long flags;
+
+       if (!READ_ONCE(tm.running))
+               return -ENODEV;
+
+       spin_lock_irqsave(&tm.cmq_lock, flags);
+       if (cmq_depth >= cmq_max_depth) {
+               if (msg->backlog_ok &&
+                   tm.backlog_depth < backlog_max_depth) {
+                       list_add_tail(&msg->list, &tm.backlog);
+                       tm.backlog_depth++;
+                       spin_unlock_irqrestore(&tm.cmq_lock, flags);
+                       return -EBUSY;
+               }
+               spin_unlock_irqrestore(&tm.cmq_lock, flags);
+               cmh_stat_inc_cmq_eagain();
+               return -EAGAIN;
+       }
+       INIT_LIST_HEAD(&msg->list);
+       list_add_tail(&msg->list, &tm.cmq);
+       cmq_depth++;
+       cmh_stat_record_cmq_post(cmq_depth);
+       spin_unlock_irqrestore(&tm.cmq_lock, flags);
+
+       wake_up_interruptible(&tm.cmq_waitq);
+       return 0;
+}
+
+/* Synchronous Submit (refcounted completion + timeout) */
+
+/*
+ * Heap-allocated sync context with refcounting.
+ *
+ * The completion callback may fire after the waiter has timed out and
+ * returned (e.g. during cmh_tm_cleanup on rmmod).  If the struct lived
+ * on the waiter's stack, the callback would touch freed memory --
+ * triggering a "BUG: spinlock bad magic" on the completion's spinlock.
+ *
+ * Two references are held: one by the waiter, one by the callback.
+ * Whichever runs last frees the struct.
+ */
+struct cmh_sync_ctx {
+       struct completion   done;
+       int                 error;
+       refcount_t          refs;   /* 2: waiter + callback */
+
+       /* Optional orphan cleanup -- called when the last ref drops after
+        * the waiter abandoned an in-flight VCQ (noabort path).  Lets the
+        * caller defer DMA-buffer cleanup until the eSW finishes writing.
+        */
+       void (*orphan_cb)(void *data);
+       void *orphan_data;
+};
+
+static void cmh_sync_ctx_put(struct cmh_sync_ctx *ctx)
+{
+       if (refcount_dec_and_test(&ctx->refs)) {
+               if (ctx->orphan_cb)
+                       ctx->orphan_cb(ctx->orphan_data);
+               kfree(ctx);
+       }
+}
+
+static void cmh_sync_complete(void *data, int error)
+{
+       struct cmh_sync_ctx *ctx = data;
+
+       ctx->error = error;
+       complete(&ctx->done);
+       cmh_sync_ctx_put(ctx);
+}
+
+/*
+ * Default VCQ completion timeout (milliseconds), tunable via debugfs
+ * config/vcq_timeout_ms.  Only affects the default timeout used by cmh_tm_submit_sync()
+ * and cmh_tm_submit_sync_mbx(); callers that pass an explicit timeout_hz
+ * (e.g. RSA keygen) are not affected.
+ */
+static unsigned int vcq_timeout_ms = 2000;
+
+/*
+ * Extended timeout for slow crypto operations: RSA keygen, PQC
+ * keygen/sign/verify.  Tunable via debugfs config/slow_op_timeout_ms.
+ */
+static unsigned int slow_op_timeout_ms = 300000;
+
+/**
+ * cmh_tm_submit_sync_tmo() - Synchronous VCQ submit with timeout
+ * @vcq_cmds: Array of pre-built VCQ command entries
+ * @vcq_count: Total number of entries in @vcq_cmds
+ * @num_vcqs: Number of VCQs packed in @vcq_cmds
+ * @target_mbx: Pinned mailbox index, or -1 for round-robin
+ * @timeout_hz: Completion timeout in jiffies
+ *
+ * Posts a VCQ command to the TM, waits for completion up to
+ * @timeout_hz.  On timeout, issues MBX_COMMAND_ABORT if the VCQ is
+ * already in-flight.  Must be called from process context.
+ *
+ * Return: 0 on success, -ETIMEDOUT, or negative errno.
+ */
+int cmh_tm_submit_sync_tmo(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                          u32 num_vcqs, s32 target_mbx,
+                          unsigned long timeout_hz)
+{
+       struct cmh_sync_ctx *sync;
+       struct command_msg *msg;
+       unsigned long left;
+       int ret;
+
+       /*
+        * This path sleeps (GFP_KERNEL allocations + wait_for_completion)
+        * and is not safe from atomic / non-sleepable contexts.  All
+        * current callers run in process context (crypto API userspace or
+        * ioctl), so this is never violated today.  Catch it loudly if
+        * a future caller gets this wrong.
+        */
+       WARN_ON_ONCE(!in_task());
+
+       sync = kzalloc_obj(*sync, GFP_KERNEL);
+       if (!sync)
+               return -ENOMEM;
+
+       msg = kzalloc_obj(*msg, GFP_KERNEL);
+       if (!msg) {
+               kfree(sync);
+               return -ENOMEM;
+       }
+
+       init_completion(&sync->done);
+       sync->error = 0;
+       refcount_set(&sync->refs, 2);  /* waiter + callback */
+
+       /*
+        * Heap-copy the caller's VCQ array so the msg owns its data.
+        * This decouples VCQ lifetime from the caller's stack frame,
+        * which matters when the TM thread backs off (resolve_mbx
+        * returns -1) and re-enqueues the msg after the caller's
+        * wait_for_completion_timeout expires.
+        */
+       msg->vcq_data = kmemdup(vcq_cmds, vcq_count * sizeof(*vcq_cmds),
+                               GFP_KERNEL);
+       if (!msg->vcq_data) {
+               kfree(msg);
+               kfree(sync);
+               return -ENOMEM;
+       }
+
+       INIT_LIST_HEAD(&msg->list);
+       if (WARN_ON_ONCE(vcq_count < MIN_VCQ_CMDS)) {
+               ret = -EINVAL;
+               goto err_free;
+       }
+       msg->command_id = vcq_cmds[1].id;  /* first real command's ID */
+       msg->vcq_count  = vcq_count;
+       msg->num_vcqs   = num_vcqs;
+       msg->target_mbx = target_mbx;
+       msg->actual_mbx = -1;
+       msg->complete   = cmh_sync_complete;
+       msg->completion_data = sync;
+       refcount_set(&msg->refs, 2);       /* waiter + TM subsystem */
+
+       ret = cmh_tm_post_command(msg);
+       if (ret) {
+err_free:
+               kfree(msg->vcq_data);
+               kfree(msg);
+               kfree(sync);  /* callback will never fire */
+               return ret;
+       }
+
+       dev_dbg(cmh_dev(), "tm: submit_sync posted cmd 0x%08x, waiting...\n",
+               msg->command_id);
+
+       left = wait_for_completion_timeout(&sync->done, timeout_hz);
+       if (!left) {
+               dev_err(cmh_dev(),
+                       "tm: submit_sync timeout (%lums) cmd=0x%08x\n",
+                       timeout_hz * 1000 / HZ, msg->command_id);
+               if (cmh_tm_try_cancel_command(msg)) {
+                       /*
+                        * Msg was still queued -- TM never saw it.
+                        * Drop the callback ref (no txn will fire it)
+                        * and free msg directly (sole owner).
+                        */
+                       cmh_sync_ctx_put(sync);  /* no txn -> drop cb ref */
+                       cmh_sync_ctx_put(sync);  /* drop waiter ref */
+                       command_msg_put(msg);    /* matches refcount_set(2) */
+                       command_msg_put(msg);
+               } else {
+                       /*
+                        * TM has dequeued msg and the VCQ is in-flight.
+                        * Issue MBX_COMMAND_ABORT to force-stop the VCQ;
+                        * the RH will fire MBX_ERROR_IRQ, complete the
+                        * transaction with -EIO, and issue RESTART.
+                        *
+                        * cmh_rh_abort_mbx() serialises the write under
+                        * rh_process_lock, preventing clobber of a
+                        * concurrent RESTART/FLUSH from the watchdog.
+                        */
+                       s32 abrt_mbx = READ_ONCE(msg->actual_mbx);
+
+                       if (abrt_mbx >= 0 &&
+                           (u32)abrt_mbx < tm.cfg->mbx_count) {
+                               dev_warn(cmh_dev(),
+                                        "tm: aborting mbx[%d] cmd=0x%08x\n",
+                                        abrt_mbx, msg->command_id);
+                               cmh_rh_abort_mbx((u32)abrt_mbx);
+                       }
+
+                       /*
+                        * Wait for the RH completion (ABORT triggers
+                        * MBX_ERROR_IRQ within microseconds).  Fixed
+                        * 5 s ceiling -- not configurable because if
+                        * ABORT doesn't complete in this window the
+                        * HW is wedged and more waiting won't help.
+                        */
+                       left = wait_for_completion_timeout(&sync->done,
+                                                          5 * HZ);
+                       if (!left) {
+                               /*
+                                * ABORT did not complete within 5 s -- HW
+                                * is wedged.  The eSW may still be writing
+                                * to DMA buffers owned by the caller, so we
+                                * cannot let the caller free them.  Transfer
+                                * ownership to the sync_ctx orphan mechanism;
+                                * the RH callback (if it ever fires) will
+                                * free via orphan_cb.  If it never fires, the
+                                * buffers leak -- acceptable for a wedged HW
+                                * path that should never occur in practice.
+                                */
+                               dev_err(cmh_dev(),
+                                       "tm: abort timeout (5s) cmd=0x%08x - DMA buffers orphaned\n",
+                                       msg->command_id);
+                       }
+                       cmh_sync_ctx_put(sync);  /* drop waiter ref */
+                       command_msg_put(msg);    /* drop waiter ref on msg */
+               }
+               return -ETIMEDOUT;
+       }
+
+       ret = sync->error;
+       cmh_sync_ctx_put(sync);  /* drop waiter ref */
+       command_msg_put(msg);    /* drop waiter ref on msg */
+       return ret;
+}
+
+/**
+ * cmh_tm_submit_sync_mbx() - Synchronous VCQ submit on a target MBX
+ * @vcq_cmds: Array of pre-built VCQ command entries
+ * @vcq_count: Total number of entries in @vcq_cmds
+ * @num_vcqs: Number of VCQs packed in @vcq_cmds
+ * @target_mbx: Pinned mailbox index, or -1 for round-robin
+ *
+ * Convenience wrapper around cmh_tm_submit_sync_tmo() using the
+ * default vcq_timeout_ms module parameter.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_tm_submit_sync_mbx(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                          u32 num_vcqs, s32 target_mbx)
+{
+       return cmh_tm_submit_sync_tmo(vcq_cmds, vcq_count, num_vcqs,
+                                    target_mbx,
+                                    msecs_to_jiffies(vcq_timeout_ms));
+}
+
+/**
+ * cmh_tm_async_timeout_jiffies() - Default async per-request timeout
+ *
+ * Return: Timeout in jiffies from the async_timeout_ms module param,
+ *         or 0 if async timeouts are disabled.
+ */
+unsigned long cmh_tm_async_timeout_jiffies(void)
+{
+       return async_timeout_ms ? msecs_to_jiffies(async_timeout_ms) : 0;
+}
+
+/**
+ * cmh_tm_slow_op_timeout_jiffies() - Timeout for slow crypto ops
+ *
+ * Returns the extended timeout used for RSA keygen, PQC keygen/sign,
+ * and similar long-running operations.
+ *
+ * Return: Timeout in jiffies from the slow_op_timeout_ms module param.
+ */
+unsigned long cmh_tm_slow_op_timeout_jiffies(void)
+{
+       return msecs_to_jiffies(slow_op_timeout_ms);
+}
+
+/**
+ * cmh_tm_submit_async() - Asynchronous VCQ submission
+ * @vcq_cmds: Array of pre-built VCQ command entries
+ * @vcq_count: Total number of entries in @vcq_cmds
+ * @num_vcqs: Number of VCQs packed in @vcq_cmds
+ * @target_mbx: Pinned mailbox index, or -1 for round-robin
+ * @callback: Completion callback (see context note below)
+ * @callback_data: Opaque data passed to @callback
+ * @backlog_ok: Allow backlogging if CMQ is full
+ * @timeout_jiffies: Per-request timeout (0 = no timeout)
+ *
+ * Builds a command_msg, heap-copies the VCQ data, and posts it to the
+ * CMQ via cmh_tm_post_command().
+ *
+ * Callback context guarantee:
+ *   The @callback may be invoked from one of:
+ *   - RH threaded IRQ handler (process context, BH disabled)
+ *   - RH watchdog timer (softirq / timer context)
+ *   - TM kthread if submit_vcq() fails post-dequeue
+ *   - cmh_tm_cleanup()/cmh_tm_quiesce() during drain (process context)
+ *   It is NEVER invoked from hardirq context.
+ *
+ *   Because the watchdog path runs from timer softirq, callbacks
+ *   MUST be safe in atomic/softirq context: no mutex, no GFP_KERNEL,
+ *   no sleeping locks.  crypto_request_complete() is safe (documented
+ *   callable from any context).  kfree_sensitive() and
+ *   scatterwalk_map_and_copy() are also safe (non-sleeping).
+ *   Callers must not assume thread affinity (callback may run on any CPU).
+ *
+ * Unlike the _sync variants, this function:
+ *   - Does NOT allocate a cmh_sync_ctx or wait for completion
+ *   - Uses GFP_ATOMIC for internal allocations because the crypto API
+ *     may call ->encrypt/->decrypt/->hash_final from softirq context
+ *     (e.g. network stack via IPsec/TLS); GFP_KERNEL would deadlock.
+ *
+ * The command_msg is single-owner (refcount 1) -- the TM subsystem
+ * owns it after post and frees it after dispatching to the HW.
+ *
+ * DMA buffer ownership: the caller transfers ownership to the callback
+ * on return of 0 or -EBUSY.  On any other return, the caller must
+ * clean up DMA buffers itself -- the callback will never fire.
+ *
+ * Return: 0 on successful post, -EBUSY if backlogged, -ENOMEM,
+ *         -EINVAL, -EAGAIN, or -ENODEV on failure.
+ */
+int cmh_tm_submit_async(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                       u32 num_vcqs, s32 target_mbx,
+                       cmh_completion_fn callback, void *callback_data,
+                       bool backlog_ok, unsigned long timeout_jiffies)
+{
+       struct command_msg *msg;
+       int ret;
+
+       msg = kzalloc_obj(*msg, GFP_ATOMIC);
+       if (!msg)
+               return -ENOMEM;
+
+       msg->vcq_data = kmemdup(vcq_cmds,
+                               array_size(vcq_count, sizeof(*vcq_cmds)),
+                               GFP_ATOMIC);
+       if (!msg->vcq_data) {
+               kfree(msg);
+               return -ENOMEM;
+       }
+
+       INIT_LIST_HEAD(&msg->list);
+       if (WARN_ON_ONCE(vcq_count < MIN_VCQ_CMDS)) {
+               kfree(msg->vcq_data);
+               kfree(msg);
+               return -EINVAL;
+       }
+       msg->command_id      = vcq_cmds[1].id;
+       msg->vcq_count       = vcq_count;
+       msg->num_vcqs        = num_vcqs;
+       msg->target_mbx      = target_mbx;
+       msg->actual_mbx      = -1;
+       msg->complete        = callback;
+       msg->completion_data = callback_data;
+       msg->backlog_ok      = backlog_ok;
+       msg->timeout_jiffies = timeout_jiffies;
+       refcount_set(&msg->refs, 1);  /* sole owner: TM subsystem */
+
+       ret = cmh_tm_post_command(msg);
+       if (ret && ret != -EBUSY) {
+               kfree(msg->vcq_data);
+               kfree(msg);
+       }
+       return ret;
+}
+
+/**
+ * cmh_tm_submit_sync_noabort() - Sync submit without MBX abort on timeout
+ * @vcq_cmds: Array of pre-built VCQ command entries
+ * @vcq_count: Total number of entries in @vcq_cmds
+ * @num_vcqs: Number of VCQs packed in @vcq_cmds
+ * @timeout_hz: Completion timeout in jiffies
+ * @orphan_cb: Optional cleanup callback for abandoned DMA buffers
+ * @orphan_data: Opaque data passed to @orphan_cb
+ *
+ * On timeout, if the command was still queued it is cancelled and
+ * -EAGAIN is returned (caller may free all resources).  If the VCQ is
+ * already in-flight, the waiter drops its refs and returns -EINPROGRESS
+ * -- the RH callback will fire when the eSW finishes the VCQ and free
+ * the sync_ctx / msg via the refcount mechanism.
+ *
+ * @orphan_cb is invoked when the last ref on the sync_ctx drops after
+ * the waiter abandoned an in-flight VCQ, allowing the caller to defer
+ * DMA-buffer cleanup until the eSW finishes writing.
+ *
+ * This prevents a short-timeout command (e.g. DRBG GENERATE from the
+ * hwrng kthread) from aborting the entire MBX and killing unrelated
+ * long-running operations (e.g. SLH-DSA sign at 120 s).
+ *
+ * Return: 0 on success, -EAGAIN if cancelled from queue,
+ *         -EINPROGRESS if left in-flight, or negative errno.
+ */
+int cmh_tm_submit_sync_noabort(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                              u32 num_vcqs, unsigned long timeout_hz,
+                              void (*orphan_cb)(void *), void *orphan_data)
+{
+       struct cmh_sync_ctx *sync;
+       struct command_msg *msg;
+       unsigned long left;
+       int ret;
+
+       WARN_ON_ONCE(!in_task());
+
+       sync = kzalloc_obj(*sync, GFP_KERNEL);
+       if (!sync)
+               return -ENOMEM;
+
+       msg = kzalloc_obj(*msg, GFP_KERNEL);
+       if (!msg) {
+               kfree(sync);
+               return -ENOMEM;
+       }
+
+       init_completion(&sync->done);
+       sync->error = 0;
+       refcount_set(&sync->refs, 2);
+
+       INIT_LIST_HEAD(&msg->list);
+       if (WARN_ON_ONCE(vcq_count < MIN_VCQ_CMDS)) {
+               kfree(msg);
+               kfree(sync);
+               return -EINVAL;
+       }
+       msg->command_id = vcq_cmds[1].id;
+       msg->vcq_data = kmemdup(vcq_cmds, vcq_count * sizeof(*vcq_cmds),
+                               GFP_KERNEL);
+       if (!msg->vcq_data) {
+               kfree(msg);
+               kfree(sync);
+               return -ENOMEM;
+       }
+       msg->vcq_count  = vcq_count;
+       msg->num_vcqs   = num_vcqs;
+       msg->target_mbx = -1;
+       msg->actual_mbx = -1;
+       msg->complete   = cmh_sync_complete;
+       msg->completion_data = sync;
+       refcount_set(&msg->refs, 2);
+
+       ret = cmh_tm_post_command(msg);
+       if (ret) {
+               kfree(msg->vcq_data);
+               kfree(msg);
+               kfree(sync);
+               return ret;
+       }
+
+       left = wait_for_completion_timeout(&sync->done, timeout_hz);
+       if (!left) {
+               if (cmh_tm_try_cancel_command(msg)) {
+                       /* Still queued -- TM never saw it, clean up fully */
+                       cmh_sync_ctx_put(sync);  /* drop cb ref */
+                       cmh_sync_ctx_put(sync);  /* drop waiter ref */
+                       command_msg_put(msg);    /* matches refcount_set(2) */
+                       command_msg_put(msg);
+                       return -EAGAIN;
+               }
+
+               /*
+                * In-flight: skip ABORT.  Transfer orphan cleanup
+                * ownership to sync_ctx -- the RH callback will
+                * eventually complete this VCQ, and when the last
+                * ref drops, orphan_cb frees any DMA buffers the
+                * eSW was still writing to.
+                */
+               dev_dbg_ratelimited(cmh_dev(),
+                                   "tm: noabort timeout (%lums) cmd=0x%08x, leaving in-flight\n",
+                                   timeout_hz * 1000 / HZ,
+                                   msg->command_id);
+               sync->orphan_cb   = orphan_cb;
+               sync->orphan_data = orphan_data;
+               cmh_sync_ctx_put(sync);
+               command_msg_put(msg);
+               return -EINPROGRESS;
+       }
+
+       ret = sync->error;
+       cmh_sync_ctx_put(sync);
+       command_msg_put(msg);
+       return ret;
+}
+
+/**
+ * cmh_tm_submit_sync() - Synchronous VCQ submit with default timeout
+ * @vcq_cmds: Array of pre-built VCQ command entries
+ * @vcq_count: Total number of entries in @vcq_cmds
+ * @num_vcqs: Number of VCQs packed in @vcq_cmds
+ *
+ * Convenience wrapper: submits via round-robin MBX selection with the
+ * default vcq_timeout_ms.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_tm_submit_sync(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                      u32 num_vcqs)
+{
+       return cmh_tm_submit_sync_mbx(vcq_cmds, vcq_count, num_vcqs, -1);
+}
+
+#define MBX_FLUSH_TIMEOUT_MS   1000
+#define MBX_FLUSH_POLL_MIN_US  10
+#define MBX_FLUSH_POLL_MAX_US  50
+
+/**
+ * cmh_tm_flush_mbx() - Issue MBX_COMMAND_FLUSH and wait for completion
+ * @mbx_idx: Mailbox index to flush
+ *
+ * Resets the eSW child mailbox state: clears the VCQ command queue,
+ * resets head/tail, and -- critically -- resets the child temp stack
+ * via mbx_hdr_init() (sets hdr->temp back to &cmds[MAX_VCQ_CMDS]).
+ *
+ * Why this is needed:
+ *   KIC derivation commands that output to SYS_REF_TEMP allocate on the
+ *   per-MBX child temp LIFO stack (mbx_alloc_temp, each costing
+ *   ROUND_UP(len,4)+56 bytes).  These allocations persist across VCQ
+ *   completions because mbx_vcq_done() does NOT reset the temp stack.
+ *   Without an explicit flush, sequential KIC-TEMP ioctls exhaust the
+ *   ~960-byte temp area and subsequent derives fail with ENOMEM.
+ *
+ * What is NOT affected:
+ *   KIC HW keys, datastore objects, DRBG state -- these survive the
+ *   flush.  Only the queue pointers and temp stack are reset.
+ *
+ * Concurrency:
+ *   Acquires the per-MBX dispatch_lock mutex to serialise with VCQ
+ *   dispatch in submit_vcq().  This prevents the flush from resetting
+ *   head/tail while the TM kthread is writing a VCQ to a DMA slot on
+ *   the same MBX.  The eSW clears R_MBX_COMMAND to zero once the flush
+ *   completes.
+ *
+ * Return: 0 on success, -EINVAL, -ENODEV, -EBUSY, or -ETIMEDOUT.
+ */
+int cmh_tm_flush_mbx(s32 mbx_idx)
+{
+       struct cmh_mbx_config *mbx;
+       struct cmh_mbx_txq *txq;
+       void __iomem *base;
+       u32 reg;
+       int ret;
+
+       if (!tm.cfg || mbx_idx < 0 || (u32)mbx_idx >= tm.cfg->mbx_count)
+               return -EINVAL;
+
+       mbx = &tm.cfg->mailboxes[mbx_idx];
+       base = mbx->reg_base;
+       if (!base)
+               return -ENODEV;
+
+       txq = &tm.txqs[mbx_idx];
+       mutex_lock(&txq->dispatch_lock);
+
+       /* Ensure no command is already pending */
+       if (cmh_reg_read32(base, R_MBX_COMMAND) != 0) {
+               mutex_unlock(&txq->dispatch_lock);
+               return -EBUSY;
+       }
+
+       cmh_reg_write32(MBX_COMMAND_FLUSH, base, R_MBX_COMMAND);
+
+       /* Poll until eSW clears the command register */
+       ret = read_poll_timeout(cmh_reg_read32, reg, reg == 0,
+                               MBX_FLUSH_POLL_MIN_US,
+                               MBX_FLUSH_TIMEOUT_MS * 1000,
+                               true, base, R_MBX_COMMAND);
+       if (ret)
+               dev_err(cmh_dev(), "mbx %u flush timeout (cmd=0x%08x)\n",
+                       mbx->instance,
+                       cmh_reg_read32(base, R_MBX_COMMAND));
+
+       mutex_unlock(&txq->dispatch_lock);
+       return ret;
+}
+
+/**
+ * cmh_vcq_pack_and_submit() - Pack payload into VCQs and submit sync
+ * @payload: Array of VCQ command entries (without headers)
+ * @count: Number of entries in @payload
+ * @packed: Caller-provided output buffer for packed VCQ data
+ * @max_packed: Size of @packed buffer in vcq_cmd entries
+ * @target_mbx: Pinned mailbox index, or -1 for round-robin
+ *
+ * Splits @payload into VCQ-sized chunks, prepends headers, and submits
+ * synchronously.
+ *
+ * Return: 0 on success, -EMSGSIZE if @packed is too small, or
+ *         negative errno from submit.
+ */
+int cmh_vcq_pack_and_submit(const struct vcq_cmd *payload, u32 count,
+                           struct vcq_cmd *packed, u32 max_packed,
+                           s32 target_mbx)
+{
+       u32 max_per_vcq = cmh_tm_max_cmds_per_vcq();
+       u32 max_payload_per = max_per_vcq - 1;
+       u32 num_vcqs = 0, total = 0, i = 0;
+
+       while (i < count) {
+               u32 chunk = min_t(u32, count - i, max_payload_per);
+               u32 vcq_cmds = chunk + 1;
+
+               if (total + vcq_cmds > max_packed)
+                       return -EMSGSIZE;
+
+               vcq_set_header(&packed[total], vcq_cmds);
+               memcpy(&packed[total + 1], &payload[i],
+                      chunk * sizeof(struct vcq_cmd));
+
+               total += vcq_cmds;
+               i += chunk;
+               num_vcqs++;
+       }
+
+       return cmh_tm_submit_sync_mbx(packed, total, num_vcqs, target_mbx);
+}
+
+/**
+ * cmh_vcq_pack_and_submit_async() - Pack payload and submit async
+ * @payload: Array of VCQ command entries (without headers)
+ * @count: Number of entries in @payload
+ * @packed: Caller-provided output buffer for packed VCQ data
+ * @max_packed: Size of @packed buffer in vcq_cmd entries
+ * @target_mbx: Pinned mailbox index, or -1 for round-robin
+ * @callback: Completion callback
+ * @callback_data: Opaque data passed to @callback
+ * @backlog_ok: Allow backlogging if CMQ is full
+ * @timeout_jiffies: Per-request timeout (0 = no timeout)
+ *
+ * Asynchronous variant of cmh_vcq_pack_and_submit().  Splits @payload
+ * into VCQ-sized chunks, prepends headers, and submits via
+ * cmh_tm_submit_async().
+ *
+ * Return: 0 on success, -EBUSY if backlogged, -EMSGSIZE if @packed
+ *         is too small, or negative errno from submit.
+ */
+int cmh_vcq_pack_and_submit_async(const struct vcq_cmd *payload, u32 count,
+                                 struct vcq_cmd *packed, u32 max_packed,
+                                 s32 target_mbx,
+                                 cmh_completion_fn callback,
+                                 void *callback_data,
+                                 bool backlog_ok,
+                                 unsigned long timeout_jiffies)
+{
+       u32 max_per_vcq = cmh_tm_max_cmds_per_vcq();
+       u32 max_payload_per = max_per_vcq - 1;
+       u32 num_vcqs = 0, total = 0, i = 0;
+
+       while (i < count) {
+               u32 chunk = min_t(u32, count - i, max_payload_per);
+               u32 vcq_cmds = chunk + 1;
+
+               if (total + vcq_cmds > max_packed)
+                       return -EMSGSIZE;
+
+               vcq_set_header(&packed[total], vcq_cmds);
+               memcpy(&packed[total + 1], &payload[i],
+                      chunk * sizeof(struct vcq_cmd));
+
+               total += vcq_cmds;
+               i += chunk;
+               num_vcqs++;
+       }
+
+       return cmh_tm_submit_async(packed, total, num_vcqs, target_mbx,
+                                  callback, callback_data, backlog_ok,
+                                  timeout_jiffies);
+}
+
+/**
+ * cmh_tm_peek_transaction() - Peek at the head of a mailbox TXQ
+ * @mbx_idx: Mailbox index to inspect
+ *
+ * Returns a pointer to the oldest in-flight transaction without
+ * removing it from the queue.  The caller must not free the returned
+ * object.
+ *
+ * Return: Pointer to the head transaction, or NULL if empty.
+ */
+struct transaction_obj *cmh_tm_peek_transaction(u32 mbx_idx)
+{
+       struct cmh_mbx_txq *txq;
+       struct transaction_obj *txn = NULL;
+       unsigned long flags;
+
+       if (!tm.txqs || mbx_idx >= tm.cfg->mbx_count)
+               return NULL;
+
+       txq = &tm.txqs[mbx_idx];
+
+       spin_lock_irqsave(&txq->lock, flags);
+       if (!list_empty(&txq->head))
+               txn = list_first_entry(&txq->head, struct transaction_obj,
+                                      list);
+       spin_unlock_irqrestore(&txq->lock, flags);
+
+       return txn;
+}
+
+/**
+ * cmh_tm_pop_transaction() - Remove and return the head of a MBX TXQ
+ * @mbx_idx: Mailbox index to pop from
+ *
+ * Dequeues the oldest in-flight transaction from the per-mailbox
+ * transaction queue.  The caller takes ownership and must eventually
+ * call cmh_txn_finish() or txn_put().
+ *
+ * Return: Pointer to the dequeued transaction, or NULL if empty.
+ */
+struct transaction_obj *cmh_tm_pop_transaction(u32 mbx_idx)
+{
+       struct cmh_mbx_txq *txq;
+       struct transaction_obj *txn;
+       unsigned long flags;
+
+       if (!tm.txqs || mbx_idx >= tm.cfg->mbx_count)
+               return NULL;
+
+       txq = &tm.txqs[mbx_idx];
+
+       spin_lock_irqsave(&txq->lock, flags);
+       if (list_empty(&txq->head)) {
+               spin_unlock_irqrestore(&txq->lock, flags);
+               return NULL;
+       }
+       txn = list_first_entry(&txq->head, struct transaction_obj, list);
+       list_del_init(&txn->list);
+       txq->depth--;
+       spin_unlock_irqrestore(&txq->lock, flags);
+
+       return txn;
+}
+
+/* -- debugfs timeout accessors ----------------------------------------- */
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+/**
+ * cmh_tm_timeout_async_ptr() - Return pointer to async_timeout_ms for debugfs
+ *
+ * Return: pointer to the static async_timeout_ms variable.
+ */
+unsigned int *cmh_tm_timeout_async_ptr(void)    { return &async_timeout_ms; }
+
+/**
+ * cmh_tm_timeout_vcq_ptr() - Return pointer to vcq_timeout_ms for debugfs
+ *
+ * Return: pointer to the static vcq_timeout_ms variable.
+ */
+unsigned int *cmh_tm_timeout_vcq_ptr(void)      { return &vcq_timeout_ms; }
+
+/**
+ * cmh_tm_timeout_slow_op_ptr() - Return pointer to slow_op_timeout_ms for debugfs
+ *
+ * Return: pointer to the static slow_op_timeout_ms variable.
+ */
+unsigned int *cmh_tm_timeout_slow_op_ptr(void)  { return &slow_op_timeout_ms; }
+
+/**
+ * cmh_tm_timeout_drain_ptr() - Return pointer to drain_timeout_ms for debugfs
+ *
+ * Return: pointer to the static drain_timeout_ms variable.
+ */
+unsigned int *cmh_tm_timeout_drain_ptr(void)    { return &drain_timeout_ms; }
+#endif
diff --git a/drivers/crypto/cmh/include/cmh.h b/drivers/crypto/cmh/include/cmh.h
new file mode 100644
index 000000000000..18150ba39129
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Top-level Device Structure
+ */
+
+#ifndef CMH_H
+#define CMH_H
+
+#include <linux/device.h>
+
+#include "cmh_config.h"
+
+#define CMH_DRV_NAME   "cmh"
+#define CMH_VERSION    "1.0.0"
+
+/**
+ * struct cmh_device - Top-level driver state for a CMH hardware instance
+ * @config: Hardware configuration (core mappings, MBX layout, feature flags)
+ * @dev:    Platform or parent device used for DMA and logging
+ */
+struct cmh_device {
+       struct cmh_config       config;
+       struct device          *dev;
+};
+
+#endif /* CMH_H */
diff --git a/drivers/crypto/cmh/include/cmh_aes_abi.h b/drivers/crypto/cmh/include/cmh_aes_abi.h
new file mode 100644
index 000000000000..78405bdf70ff
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_aes_abi.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- AES Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH AES ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_AES_ABI_H
+#define CMH_AES_ABI_H
+
+#include <linux/types.h>
+
+/* AES Block Size */
+
+#define CMH_AES_BLOCK_SIZE     16U
+#define CMH_AES_IV_SIZE                16U
+
+/* AES Modes (per CMH AES ABI) */
+
+#define AES_MODE_ECB           1U
+#define AES_MODE_CBC           2U
+#define AES_MODE_CTR           3U
+#define AES_MODE_CFB           4U
+#define AES_MODE_GCM           5U
+#define AES_MODE_CMAC          6U
+#define AES_MODE_CCM           7U
+#define AES_MODE_XTS           8U
+
+/* AES Operations (per CMH AES ABI) */
+
+#define AES_OP_DECRYPT         1U
+#define AES_OP_ENCRYPT         2U
+
+/* AES Command IDs */
+
+#define AES_CMD_INIT           0x01U
+#define AES_CMD_AAD_UPDATE     0x02U
+#define AES_CMD_AAD_FINAL      0x03U
+#define AES_CMD_UPDATE         0x04U
+#define AES_CMD_FINAL          0x05U
+#define AES_CMD_SCATTERGATHER  0x06U
+#define AES_CMD_CCM_INIT       0x0AU
+#define AES_CMD_AAD_FINAL_AUTH 0x0EU
+
+/* AES Command Structures */
+
+struct aes_cmd_init {
+       u64 key;        /* datastore reference for the key */
+       u64 iv;         /* DMA address of the IV (or nonce in CCM) */
+       u32 keylen;     /* key length in bytes */
+       u32 ivlen;      /* IV length in bytes (0..16) */
+       u32 mode;       /* AES mode (AES_MODE_*) */
+       u32 op;         /* AES operation (AES_OP_*) */
+       u32 aadlen;     /* AAD length or 0 */
+       u32 iolen;      /* plaintext/ciphertext length */
+       u32 taglen;     /* tag length or 0 */
+};
+
+struct aes_cmd_aad_final {
+       u64 data;       /* DMA address of AAD data */
+       u32 datalen;    /* AAD data length */
+};
+
+struct aes_cmd_aad_final_auth {
+       u64 data;       /* DMA address of final AAD data */
+       u32 datalen;    /* final AAD data length */
+       u64 tag;                /* DMA address of tag */
+       u32 taglen;     /* tag length */
+};
+
+struct aes_cmd_update {
+       u64 input;      /* DMA address of input data */
+       u64 output;     /* DMA address of output data */
+       u32 iolen;      /* input/output data length */
+};
+
+struct aes_cmd_final {
+       u64 input;      /* DMA address of last input data */
+       u64 output;     /* DMA address of last output data */
+       u64 tag;        /* DMA address of tag (AEAD only) */
+       u32 iolen;      /* last input/output data length */
+       u32 taglen;     /* tag length (AEAD only) */
+};
+
+/* AES Command Union */
+
+union aes_cmd {
+       struct aes_cmd_init             cmd_init;
+       struct aes_cmd_update           cmd_update;
+       struct aes_cmd_final            cmd_final;
+       struct aes_cmd_aad_final        cmd_aad_final;
+       struct aes_cmd_aad_final_auth   cmd_aad_final_auth;
+};
+
+#endif /* CMH_AES_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_ccp_abi.h b/drivers/crypto/cmh/include/cmh_ccp_abi.h
new file mode 100644
index 000000000000..4e3eb9feaec9
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_ccp_abi.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- CCP Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH CCP ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ *
+ * The CCP core provides three modes:
+ *   - ChaCha20 stream cipher (skcipher)
+ *   - Poly1305 one-time authenticator (shash)
+ *   - ChaCha20-Poly1305 AEAD (RFC 7539)
+ */
+
+#ifndef CMH_CCP_ABI_H
+#define CMH_CCP_ABI_H
+
+#include <linux/types.h>
+
+/* CCP Block Sizes */
+
+#define CCP_CHACHA_BLOCK_SIZE  64U     /* ChaCha20 block = 512 bits */
+#define CCP_POLY_BLOCK_SIZE    16U     /* Poly1305 block = 128 bits */
+#define CCP_CTRNONCE_SIZE      16U     /* 4-byte LE counter + 12-byte nonce */
+#define CCP_POLY_KEY_SIZE      16U     /* r_key and s_key each 16 bytes */
+#define CCP_POLY_TAG_SIZE      16U     /* Poly1305 tag = 128 bits */
+#define CCP_CHACHA_CTR_LEN     4U      /* 32-bit counter */
+
+/* CCP Operations (per CMH CCP ABI) */
+
+#define CCP_OP_DECRYPT         1U
+#define CCP_OP_ENCRYPT         2U
+
+/* CCP Command IDs */
+
+#define CCP_CMD_CHACHA20_INIT  0x01U
+#define CCP_CMD_POLY1305_INIT  0x02U
+#define CCP_CMD_AEAD_INIT      0x03U
+#define CCP_CMD_AAD_UPDATE     0x04U
+#define CCP_CMD_AAD_FINAL      0x05U
+#define CCP_CMD_UPDATE         0x06U
+#define CCP_CMD_FINAL          0x07U
+#define CCP_CMD_SCATTERGATHER  0x08U
+/* CCP_CMD_FLUSH = VCQ_CMD_FLUSH (0xFF) -- defined in cmh_vcq.h */
+
+/* CCP Command Structures */
+
+struct ccp_cmd_chacha {
+       u64 key;                /* datastore reference for the key */
+       u64 ctrnonce;           /* DMA address of the 16-byte counter+nonce */
+       u32 keylen;             /* key length: 16 or 32 bytes */
+       u32 ctrnoncelen;        /* always 16 */
+       u32 ctrlen;             /* counter length: 4 bytes */
+       u32 op;                 /* CCP_OP_ENCRYPT or CCP_OP_DECRYPT */
+};
+
+struct ccp_cmd_poly {
+       u64 rkey;               /* datastore reference for the r key */
+       u64 skey;               /* datastore reference for the s key */
+       u32 rkeylen;            /* always 16 */
+       u32 skeylen;            /* always 16 */
+};
+
+struct ccp_cmd_aead {
+       u64 key;                /* datastore reference for the key */
+       u64 ctrnonce;           /* DMA address of the 16-byte counter+nonce */
+       u32 keylen;             /* key length: 32 bytes */
+       u32 ctrnoncelen;        /* always 16 */
+       u32 op;                 /* CCP_OP_ENCRYPT or CCP_OP_DECRYPT */
+};
+
+struct ccp_cmd_aad_update {
+       u64 aad;                /* DMA address of AAD data */
+       u32 aadlen;             /* AAD length (must be multiple of 16) */
+};
+
+struct ccp_cmd_aad_final {
+       u64 aad;                /* DMA address of last AAD data */
+       u32 aadlen;             /* last AAD length (any size) */
+};
+
+struct ccp_cmd_update {
+       u64 input;              /* DMA address of input data */
+       u64 output;             /* DMA address of output data */
+       u32 iolen;              /* input/output length */
+};
+
+struct ccp_cmd_final {
+       u64 input;              /* DMA address of last input data */
+       u64 output;             /* DMA address of last output data */
+       u64 tag;                /* DMA address of the 16-byte tag */
+       u32 iolen;              /* last input/output data length */
+       u32 taglen;             /* tag length (always 16) */
+};
+
+/* CCP Command Union */
+
+union ccp_cmd {
+       struct ccp_cmd_chacha   cmd_chacha;
+       struct ccp_cmd_poly     cmd_poly;
+       struct ccp_cmd_aead     cmd_aead;
+       struct ccp_cmd_aad_update cmd_aad_update;
+       struct ccp_cmd_aad_final cmd_aad_final;
+       struct ccp_cmd_update   cmd_update;
+       struct ccp_cmd_final    cmd_final;
+};
+
+#endif /* CMH_CCP_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_config.h b/drivers/crypto/cmh/include/cmh_config.h
new file mode 100644
index 000000000000..6a9e629ed353
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_config.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Configuration Structures and Defaults
+ */
+
+#ifndef CMH_CONFIG_H
+#define CMH_CONFIG_H
+
+#include <linux/types.h>
+#include <linux/dma-mapping.h>
+
+#include "cmh_registers.h"
+#include "cmh_vcq.h"
+
+/* Limits */
+
+/*
+ * Max mailboxes the driver manages simultaneously.  The hardware address
+ * space supports CMH_MAX_MBX_INSTANCES (64) instance indices, but this
+ * compile-time constant caps how many the driver allocates DMA queues,
+ * IRQ slots, and per-transform cache entries for.  To manage more
+ * mailboxes (up to the HW max), increase this value and rebuild the LKM
+ * -- it cannot be changed via module parameters at runtime.
+ */
+#define CMH_MAX_CONFIGURED_MBX    16
+#define CMH_MAX_CORE_INSTANCES    8
+
+/* MBX setup parameter ranges (per CMH hardware specification) */
+#define CMH_MBX_SLOTS_LOG2_MIN        1
+#define CMH_MBX_SLOTS_LOG2_MAX        15
+#define CMH_MBX_STRIDE_LOG2_MIN       7
+#define CMH_MBX_STRIDE_LOG2_MAX       10
+
+/* Default Configuration Values */
+
+#define CMH_DEFAULT_MBX_COUNT         2
+#define CMH_DEFAULT_SLOTS_LOG2        5   /* 2^5 = 32 slots */
+#define CMH_DEFAULT_STRIDE_LOG2       7   /* 2^7 = 128 bytes per slot */
+#define CMH_DEFAULT_IRQ               (-1) /* polling mode */
+#define CMH_DEFAULT_FW_READY_TIMEOUT_MS  5000 /* 5s for mission mode */
+
+/* Per-Core-Type Instance Configuration */
+
+struct cmh_core_type_cfg {
+       u32     num_instances;
+       u32     core_ids[CMH_MAX_CORE_INSTANCES];
+       s32     mbx[CMH_MAX_CORE_INSTANCES]; /* -1 = auto-assign */
+};
+
+/* Per-Mailbox Configuration */
+
+struct cmh_mbx_config {
+       u32             instance;       /* 0-based MBX instance index (0..63) */
+       u32             slots_log2;     /* log2(slot count), range 1..15 */
+       u32             stride_log2;    /* log2(bytes per slot), range 7..10 */
+       u32             lock_val;       /* MBX lock token (non-zero while held) */
+       dma_addr_t      dma_handle;     /* DMA bus address from dma_alloc_coherent */
+       void           *virt_addr;      /* kernel virtual address of MBXQ buffer */
+       size_t          queue_size;     /* total queue buffer size in bytes */
+       void __iomem   *reg_base;       /* ioremap'd register base for this instance */
+};
+
+/* Global Device Configuration */
+
+struct cmh_config {
+       phys_addr_t                 sic_base;
+       size_t                      sic_size;
+       void __iomem               *sic_mapped;    /* ioremap'd SIC region */
+       struct device_node         *of_node;       /* DT node (may be NULL) */
+       u32                         mbx_count;
+       struct cmh_mbx_config       mailboxes[CMH_MAX_CONFIGURED_MBX];
+       int                         irq;           /* -1 = poll, else IRQ line */
+       unsigned int                fw_ready_timeout_ms; /* FW mission-mode timeout */
+       struct cmh_core_type_cfg    core_types[CMH_NUM_CORE_TYPES];
+};
+
+/* Module Parameter Interface */
+
+struct platform_device;
+
+/**
+ * cmh_config_init() - Populate config from module params and device-tree
+ * @cfg: Configuration structure to fill
+ * @pdev: Platform device (for DT properties and IRQ lookup)
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_config_init(struct cmh_config *cfg, struct platform_device *pdev);
+
+#endif /* CMH_CONFIG_H */
diff --git a/drivers/crypto/cmh/include/cmh_debugfs.h b/drivers/crypto/cmh/include/cmh_debugfs.h
new file mode 100644
index 000000000000..abaa837470c5
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_debugfs.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- debugfs Per-MBX and TM Counters
+ *
+ * Exposes diagnostic counters under /sys/kernel/debug/cmh/:
+ *
+ *   mbxN/vcqs_submitted      Total VCQs sent to MBX N
+ *   mbxN/vcqs_completed      Total completions received
+ *   mbxN/vcqs_errors         Total error completions
+ *   mbxN/queue_full_count    Times select_mailbox() skipped this MBX
+ *   mbxN/max_queue_depth     High-water mark of in-flight transactions
+ *
+ *   tm/cmq_posts             Total cmh_tm_post_command() calls
+ *   tm/cmq_depth_max         High-water mark of CMQ length
+ *   tm/cmq_eagain_count      Times CMQ was full (-EAGAIN)
+ *   tm/backoff_count         Times TM backed off (all MBX queues full)
+ *   tm/async_timeout_count   Async requests that timed out
+ *
+ * Counters are atomic64_t -- safe to read from any context.
+ * When CONFIG_CRYPTO_DEV_CMH_DEBUG is off, all functions become no-ops and the
+ * compiler eliminates the counter code entirely.
+ */
+
+#ifndef CMH_DEBUGFS_H
+#define CMH_DEBUGFS_H
+
+#include <linux/types.h>
+#include <linux/atomic.h>
+
+/* Per-Mailbox Statistics */
+
+struct cmh_mbx_stats {
+       atomic64_t vcqs_submitted;
+       atomic64_t vcqs_completed;
+       atomic64_t vcqs_errors;
+       atomic64_t queue_full_count;
+       atomic64_t max_queue_depth;
+};
+
+/* TM-Level Statistics */
+
+struct cmh_tm_stats {
+       atomic64_t cmq_posts;
+       atomic64_t cmq_depth_max;
+       atomic64_t cmq_eagain_count;
+       atomic64_t backoff_count;
+       atomic64_t async_timeout_count;
+};
+
+/**
+ * cmh_stat_update_max() - Atomically update a high-water mark counter
+ * @counter: atomic64_t counter to update
+ * @val: New candidate value
+ *
+ * Updates @counter to @val if @val exceeds the current maximum.
+ * Lock-free via atomic cmpxchg loop.
+ */
+static inline void cmh_stat_update_max(atomic64_t *counter, s64 val)
+{
+       s64 cur;
+
+       do {
+               cur = atomic64_read(counter);
+               if (val <= cur)
+                       return;
+       } while (atomic64_cmpxchg(counter, cur, val) != cur);
+}
+
+/* Interface (stub when CONFIG_CRYPTO_DEV_CMH_DEBUG is off) */
+
+struct cmh_config;
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+
+int  cmh_debugfs_init(struct cmh_config *cfg);
+void cmh_debugfs_cleanup(void);
+
+struct cmh_mbx_stats *cmh_debugfs_mbx_stats(u32 mbx_idx);
+struct cmh_tm_stats  *cmh_debugfs_tm_stats(void);
+
+#else /* !CONFIG_CRYPTO_DEV_CMH_DEBUG */
+
+static inline int  cmh_debugfs_init(struct cmh_config *c) { return 0; }
+static inline void cmh_debugfs_cleanup(void) {}
+static inline struct cmh_mbx_stats *cmh_debugfs_mbx_stats(u32 i) { return NULL; }
+static inline struct cmh_tm_stats  *cmh_debugfs_tm_stats(void)   { return NULL; }
+
+#endif /* CONFIG_CRYPTO_DEV_CMH_DEBUG */
+#endif /* CMH_DEBUGFS_H */
diff --git a/drivers/crypto/cmh/include/cmh_dma.h b/drivers/crypto/cmh/include/cmh_dma.h
new file mode 100644
index 000000000000..7dd0d8311785
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_dma.h
@@ -0,0 +1,219 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- DMA Interface
+ *
+ * Platform-independent DMA operations for the CMH crypto accelerator.
+ * All functions are implemented in cmh_dma.c (standard kernel DMA API).
+ *
+ * Alternate backends may be linked in place of cmh_dma.c for
+ * non-standard platforms.  Such backends must implement the same
+ * symbol set and may use different allocation and mapping semantics
+ * (e.g. pool-based alloc/free instead of address translation).
+ */
+
+#ifndef CMH_DMA_H
+#define CMH_DMA_H
+
+#include <linux/dma-mapping.h>
+#include <linux/types.h>
+
+#include "cmh_vcq.h"
+
+struct platform_device;
+
+/**
+ * cmh_dma_init() - Initialize the DMA backend
+ * @pdev: Platform device (provides struct device for DMA ops)
+ *
+ * Called early in .probe().  The standard backend stores the device
+ * pointer; alternate backends may set up additional resources.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_dma_init(struct platform_device *pdev);
+
+/**
+ * cmh_dma_cleanup() - Tear down the DMA backend
+ *
+ * Called in .remove() and error paths.  Releases any resources
+ * allocated by cmh_dma_init().
+ */
+void cmh_dma_cleanup(void);
+
+/**
+ * cmh_dev() - Global device accessor
+ *
+ * Returns the struct device * associated with the platform_driver instance.
+ * Valid only between cmh_dma_init() and cmh_dma_cleanup().
+ *
+ * Return: Platform device pointer, or NULL outside lifecycle.
+ */
+struct device *cmh_dev(void);
+
+/* Streaming DMA map / unmap (short-lived per-request buffers) */
+
+dma_addr_t cmh_dma_map_single(void *buf, size_t size,
+                             enum dma_data_direction dir);
+void cmh_dma_unmap_single(dma_addr_t addr, size_t size,
+                         enum dma_data_direction dir);
+
+/*
+ * Sync a DMA_FROM_DEVICE buffer so the CPU sees device-written data.
+ *
+ * Required before reading *buf when SWIOTLB bounce buffering is active
+ * (e.g. arm64 without IOMMU): the device writes to the bounce buffer,
+ * not the original allocation, so the CPU must sync before access.
+ * On architectures without bounce buffers (e.g. rv64) this is a no-op.
+ *
+ * Call between cmh_tm_submit_sync() and the first CPU read of the buffer,
+ * while the mapping is still live (before cmh_dma_unmap_single).
+ */
+void cmh_dma_sync_for_cpu(dma_addr_t addr, size_t size,
+                         enum dma_data_direction dir);
+
+/*
+ * Sync a DMA_TO_DEVICE buffer so the device sees CPU-written data.
+ *
+ * Required after CPU writes to a mapped streaming buffer (e.g. SG
+ * descriptor arrays that need items_dma for .lli pointer calculation
+ * before content is written).  Must be called before the device reads.
+ */
+void cmh_dma_sync_for_device(dma_addr_t addr, size_t size,
+                            enum dma_data_direction dir);
+
+int cmh_dma_map_error(dma_addr_t addr);
+
+/* Coherent DMA alloc / free (long-lived MBX queue buffers) */
+
+void *cmh_dma_alloc(size_t size, dma_addr_t *handle, gfp_t gfp);
+void cmh_dma_free(size_t size, void *virt, dma_addr_t handle);
+
+/**
+ * cmh_dma_write() - Copy data into a DMA-allocated buffer
+ * @dst: Destination pointer (from cmh_dma_alloc)
+ * @src: Source kernel buffer
+ * @len: Number of bytes to copy
+ *
+ * Copies @len bytes from @src to @dst.  @dst must have been obtained
+ * from cmh_dma_alloc().  Abstracted to allow platforms with non-standard
+ * DMA buffer access semantics.
+ */
+void cmh_dma_write(void *dst, const void *src, size_t len);
+
+/**
+ * cmh_dma_fence() - Fence preceding writes to DMA-allocated memory
+ * @ptr: Any pointer into the region that was written
+ *
+ * Ensures all preceding CPU writes to DMA memory are committed to the
+ * target memory controller before subsequent MMIO register writes.
+ *
+ * Required on FPGA platforms where DMA memory and device control
+ * registers reside on different AXI slaves -- a CPU-side wmb() only
+ * orders store dispatch, not arrival at the target.  A read from the
+ * DMA memory slave forces the memory controller to serialize behind
+ * all preceding writes from this CPU before responding, guaranteeing
+ * the data is committed before the doorbell register write is issued.
+ *
+ * On standard DMA API platforms (cache-coherent), this is a no-op.
+ */
+void cmh_dma_fence(void *ptr);
+
+/**
+ * cmh_dma_zero() - Zero a DMA-allocated buffer
+ * @dst: Destination pointer (from cmh_dma_alloc)
+ * @len: Number of bytes to zero
+ */
+void cmh_dma_zero(void *dst, size_t len);
+
+/*
+ * CMH eSW scatter-gather chain -- built with proper DMA mappings.
+ *
+ * The CMH eSW DMAC walks a linked list of dma_scattergather_item
+ * descriptors.  Each .src is the DMA address of an input buffer;
+ * each .lli is the DMA address of the next descriptor (0 = end).
+ *
+ * The descriptor array uses streaming DMA (kmalloc + dma_map_single)
+ * so that cmh_dma_free_sg() is safe from any context -- including
+ * BH-disabled completion callbacks where dma_free_coherent's
+ * vunmap() path would crash on non-coherent architectures.
+ */
+
+/* Input descriptor for cmh_dma_build_sg() -- one per data buffer */
+struct cmh_dma_buf {
+       void *data;
+       u32   len;
+};
+
+/* Opaque handle returned by cmh_dma_build_sg(); pass to cmh_dma_free_sg() */
+struct cmh_sg_map {
+       struct dma_scattergather_item *items;   /* CPU virtual address */
+       dma_addr_t  items_dma;                  /* DMA address (pass to GATHER cmd) */
+       size_t      items_size;                 /* allocation size */
+       u32         count;
+       struct {
+               dma_addr_t dma;
+               u32        len;
+       } bufs[];                               /* per-entry source DMA handles */
+};
+
+/**
+ * cmh_dma_build_sg() - Build a DMA-mapped CMH eSW SG chain
+ * @bufs: Array of kernel buffer descriptors (data pointer + length)
+ * @count: Number of entries in @bufs (must be > 0; returns NULL for 0)
+ * @gfp: Allocation flags (GFP_KERNEL or GFP_ATOMIC)
+ *
+ * Allocates a dma_scattergather_item chain using streaming DMA
+ * (kmalloc + dma_map_single), DMA-maps each source buffer, and
+ * links the descriptors.
+ * The returned cmh_sg_map->items_dma is the address to pass to
+ * vcq_add_hc_gather() (or any core's scatter-gather command).
+ *
+ * Caller contract:
+ *   - Each bufs[i].data must point to DMA-mappable memory (kmalloc,
+ *     page-allocated, or vmalloc with DMA support).  Stack buffers
+ *     are NOT safe.
+ *   - Each bufs[i].len must be > 0.
+ *   - The returned cmh_sg_map must remain alive (not freed) until
+ *     the hardware completes the scatter-gather operation.  Only then
+ *     may cmh_dma_free_sg() be called.
+ *   - There is no hardware-imposed limit on @count, but callers are
+ *     responsible for bounding it to avoid excessive DMA mappings.
+ *     In practice, hash uses <= 2 entries (partial + new data).
+ *
+ * Return: Opaque cmh_sg_map handle, or NULL on allocation/mapping failure.
+ */
+struct cmh_sg_map *cmh_dma_build_sg(const struct cmh_dma_buf *bufs, u32 count,
+                                   gfp_t gfp);
+
+/**
+ * cmh_dma_free_sg() - Unmap all buffers and free the SG chain
+ * @sgm: Handle from cmh_dma_build_sg(), or NULL (no-op)
+ */
+void cmh_dma_free_sg(struct cmh_sg_map *sgm);
+
+/*
+ * Orphan-DMA context -- generic helper for the noabort submit path.
+ *
+ * When cmh_tm_submit_sync_noabort() times out with a VCQ still
+ * in-flight, the eSW will continue writing to DMA buffers after the
+ * caller returns.  Callers wrap their DMA state in this struct and
+ * pass cmh_dma_orphan_free as the orphan_cb -- the RH callback frees
+ * the mapping + buffer when the VCQ eventually completes.
+ *
+ * Drain guarantee: cmh_tm_cleanup() calls timer_delete_sync() on each
+ * TXN timeout timer and splices all TXQ entries before invoking their
+ * completion callbacks.  This ensures no orphan callback can race with
+ * or run after TM cleanup completes -- by that point every in-flight
+ * transaction has been force-completed and its orphan_cb invoked.
+ */
+struct cmh_dma_orphan {
+       void                    *buf;
+       dma_addr_t               addr;
+       size_t                   len;
+       enum dma_data_direction  dir;
+};
+
+void cmh_dma_orphan_free(void *data);
+
+#endif /* CMH_DMA_H */
diff --git a/drivers/crypto/cmh/include/cmh_drbg_abi.h b/drivers/crypto/cmh/include/cmh_drbg_abi.h
new file mode 100644
index 000000000000..d4cebfe83d4b
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_drbg_abi.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- DRBG Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH DRBG ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_DRBG_ABI_H
+#define CMH_DRBG_ABI_H
+
+#include <linux/types.h>
+
+/* DRBG Commands */
+
+#define DRBG_CMD_CONFIG         0x01U
+#define DRBG_CMD_GENERATE       0x02U
+#define DRBG_CMD_DATASTORE      0x03U
+#define DRBG_CMD_RESET          0x04U
+
+/* DRBG Entropy Ratio (per CMH DRBG ABI) */
+
+#define DRBG_ENTROPY_RATIO_ONE          0U
+#define DRBG_ENTROPY_RATIO_ONE_HALF     1U
+#define DRBG_ENTROPY_RATIO_ONE_THIRD    2U
+#define DRBG_ENTROPY_RATIO_ONE_FOURTH   3U
+
+/* DRBG Security Strength (per CMH DRBG ABI) */
+
+#define DRBG_SECURITY_STRENGTH_128      0x00U
+#define DRBG_SECURITY_STRENGTH_256      0x10U
+
+/* DRBG Personalization Data Length */
+
+#define DRBG_PADATA_LEN         16U
+
+/* DRBG Command Structures */
+
+struct drbg_cmd_config {
+       u32 entropy_ratio;      /* drbg_entropy_ratio value */
+       u32 security_strength;  /* drbg_security_strength value */
+       u8  padata[DRBG_PADATA_LEN];
+};
+
+struct drbg_cmd_generate {
+       u64 dst;                /* DMA physical address for output */
+       u32 len;                /* requested output length in bytes */
+       u8  padata[DRBG_PADATA_LEN];
+};
+
+struct drbg_cmd_datastore {
+       u64 ref;                /* datastore reference */
+       u32 len;                /* data length in bytes */
+       u32 type;               /* datastore type */
+       u8  padata[DRBG_PADATA_LEN];
+};
+
+/* DRBG Command Union */
+
+union drbg_cmd {
+       struct drbg_cmd_config    cmd_config;
+       struct drbg_cmd_generate  cmd_generate;
+       struct drbg_cmd_datastore cmd_datastore;
+};
+
+#endif /* CMH_DRBG_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_eac_abi.h b/drivers/crypto/cmh/include/cmh_eac_abi.h
new file mode 100644
index 000000000000..f0ebd3de1fb4
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_eac_abi.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- EAC (Error and Alarm Controller) ABI Definitions
+ *
+ * Kernel-side definitions for the CMH EAC ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_EAC_ABI_H
+#define CMH_EAC_ABI_H
+
+#include <linux/types.h>
+
+/* EAC Commands */
+
+#define EAC_CMD_READ           0x01U
+
+/* EAC Read Response -- eSW writes this to the DMA destination buffer */
+
+struct eac_read_rsp {
+       u64 mailbox_notification; /* bitmask: MBX that raised safety notif */
+       u32 hw_error;             /* bitmask: HWC that raised error */
+       u32 hw_nmi;               /* bitmask: HWC that raised NMI */
+       u32 hw_panic;             /* bitmask: HWC that raised HW panic */
+       u32 safety_fatal;         /* bitmask: HWC that raised fatal safety */
+       u32 safety_notification;  /* bitmask: HWC that raised safety notif */
+       u32 sw_info0;             /* eSW tracing information */
+       u32 sw_info1;             /* eSW tracing information */
+       u32 sram_bank_errors[4];  /* correctable ECC error counts per bank */
+};
+
+/* EAC Command Structures */
+
+struct eac_cmd_read {
+       u64 dst;        /* DMA destination for eac_read_rsp */
+       u32 len;        /* must be >= sizeof(struct eac_read_rsp) */
+};
+
+union eac_cmd {
+       struct eac_cmd_read cmd_read;
+};
+
+#endif /* CMH_EAC_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_hc_abi.h b/drivers/crypto/cmh/include/cmh_hc_abi.h
new file mode 100644
index 000000000000..4e8c5ea3c69c
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_hc_abi.h
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Hash Core (HC) ABI Definitions
+ *
+ * Kernel-side definitions for the CMH HC (Hash Core) ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_HC_ABI_H
+#define CMH_HC_ABI_H
+
+#include <linux/bits.h>
+#include <linux/types.h>
+
+/* HC Commands */
+
+#define HC_CMD_INIT             0x01U
+#define HC_CMD_HMAC             0x02U
+#define HC_CMD_UPDATE           0x03U
+#define HC_CMD_FINAL            0x04U
+#define HC_CMD_UPDATE2D         0x05U
+#define HC_CMD_SQUEEZE          0x07U
+#define HC_CMD_GATHER           0x08U
+#define HC_CMD_CSHAKE           0x09U
+#define HC_CMD_KMAC             0x0AU
+#define HC_CMD_SAVE             0x0BU
+#define HC_CMD_RESTORE          0x0CU
+
+/* HC Algorithms (per CMH HC ABI) */
+
+#define HC_ALGO_SHA2_224        1U
+#define HC_ALGO_SHA2_256        2U
+#define HC_ALGO_SHA2_384        3U
+#define HC_ALGO_SHA2_512        4U
+#define HC_ALGO_SHA3_224        5U
+#define HC_ALGO_SHA3_256        6U
+#define HC_ALGO_SHA3_384        7U
+#define HC_ALGO_SHA3_512        8U
+#define HC_ALGO_SHAKE128        9U
+#define HC_ALGO_SHAKE256        10U
+
+/* HC Algo Flags */
+
+#define HC_ALGO_FLAG_SCA_KEY    BIT(18)      /* SCA key in 2 shares */
+#define HC_ALGO_FLAG_SCA_OUT    BIT(19)      /* SCA output in 2 shares */
+
+#define HC_ALGO_SET(flags, algo)  (((flags) & 0xFF0000UL) | ((algo) & 0xFFUL))
+#define HC_ALGO_GET(algo)         ((algo) & 0xFFU)
+
+/* Hash Digest Sizes */
+
+#define CMH_SHA224_DIGEST_SIZE  28U
+#define CMH_SHA256_DIGEST_SIZE  32U
+#define CMH_SHA384_DIGEST_SIZE  48U
+#define CMH_SHA512_DIGEST_SIZE  64U
+
+/* SHA-3 digest sizes are the same as SHA-2 for matching output widths */
+#define CMH_SHA3_224_DIGEST_SIZE  28U
+#define CMH_SHA3_256_DIGEST_SIZE  32U
+#define CMH_SHA3_384_DIGEST_SIZE  48U
+#define CMH_SHA3_512_DIGEST_SIZE  64U
+
+/* SHAKE default output lengths (fixed-output ahash registration) */
+#define CMH_SHAKE128_DIGEST_SIZE  32U   /* 128-bit security -> 32 bytes */
+#define CMH_SHAKE256_DIGEST_SIZE  64U   /* 256-bit security -> 64 bytes */
+
+/* HC Context (for SAVE/RESTORE) */
+
+#define HC_CONTEXT_WORDS        149U
+#define HC_CONTEXT_SIZE         (HC_CONTEXT_WORDS * 4 + 4)  /* ctx[149] + crc */
+
+/* cSHAKE function name max length */
+
+#define HC_CSHAKE_MAX_NAMELEN   36U
+
+/*
+ * Maximum customization string (S) length for cSHAKE / KMAC.
+ *
+ * S is packed as inline VCQ data after the CSHAKE/KMAC command slot.
+ * The worst-case VCQ layout (KMAC with raw key + GATHER) uses 5 fixed
+ * slots out of CMH_KMAC_MAX_PAYLOAD (9), leaving 4 inline slots.
+ * Each VCQ slot is 64 bytes, so the safe limit is 4 * 64 = 256 bytes.
+ */
+#define HC_CSHAKE_MAX_CUSTOMLEN 256U
+
+/* HC Command Structures */
+
+struct hc_cmd_init {
+       u32 algo;       /* hc_algo value, optionally ORed with HC_ALGO_FLAG_* */
+};
+
+struct hc_cmd_hmac {
+       u64 key;        /* datastore reference for HMAC key */
+       u32 keylen;     /* key length in bytes */
+       u32 algo;       /* hc_algo value */
+};
+
+struct hc_cmd_update {
+       u64 input;      /* DMA physical address of input data */
+       u32 inlen;      /* input data length in bytes */
+};
+
+struct hc_cmd_final {
+       u64 digest;     /* DMA physical address for output digest */
+       u32 outlen;     /* digest length in bytes */
+};
+
+struct hc_cmd_update2d {
+       u64 input;      /* DMA source address for input data */
+       u64 output;     /* DMA destination address for pass-through data */
+       u32 iolen;      /* input/pass-through data length in bytes */
+};
+
+struct hc_cmd_gather {
+       u64 lista;      /* DMA address of dma_scattergather_item chain */
+       u32 sgcmd;      /* HC sub-command: HC_CMD_UPDATE or HC_CMD_UPDATE2D */
+};
+
+struct hc_cmd_cshake {
+       u64 custom;     /* DMA address for the customization string */
+       u32 customlen;  /* length of the customization string */
+       u32 algo;       /* HC_ALGO_SHAKE128 or HC_ALGO_SHAKE256 */
+       u32 namelen;    /* length of the function name string */
+       u8  name[HC_CSHAKE_MAX_NAMELEN]; /* function name string (inline) */
+};
+
+struct hc_cmd_kmac {
+       u64 key;        /* datastore reference for KMAC key */
+       u64 custom;     /* DMA address for the customization string */
+       u32 keylen;     /* key length in bytes */
+       u32 customlen;  /* length of the customization string */
+       u32 algo;       /* HC_ALGO_SHAKE128 or HC_ALGO_SHAKE256 */
+       u32 outlen;     /* requested output digest length */
+};
+
+struct hc_cmd_save {
+       u64 output;     /* DMA physical address for saved context */
+       u32 outlen;     /* must be HC_CONTEXT_SIZE */
+};
+
+struct hc_cmd_restore {
+       u64 input;      /* DMA physical address of saved context */
+       u32 inlen;      /* must be HC_CONTEXT_SIZE */
+};
+
+/* HC Command Union */
+
+union hc_cmd {
+       struct hc_cmd_init      cmd_init;
+       struct hc_cmd_hmac      cmd_hmac;
+       struct hc_cmd_cshake    cmd_cshake;
+       struct hc_cmd_kmac      cmd_kmac;
+       struct hc_cmd_update    cmd_update;
+       struct hc_cmd_final     cmd_final;
+       struct hc_cmd_update2d  cmd_update2d;
+       struct hc_cmd_gather    cmd_gather;
+       struct hc_cmd_save      cmd_save;
+       struct hc_cmd_restore   cmd_restore;
+};
+
+#endif /* CMH_HC_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_hcq_abi.h b/drivers/crypto/cmh/include/cmh_hcq_abi.h
new file mode 100644
index 000000000000..b9fc2a80a408
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_hcq_abi.h
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- HCQ Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH HCQ ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_HCQ_ABI_H
+#define CMH_HCQ_ABI_H
+
+#include <linux/compiler_attributes.h>
+#include <linux/types.h>
+
+/* VCQ layout: header + [SYS cmds] + HCQ_CMD + [sys_read] + flush */
+#define HCQ_VCQ_CMDS_MIN       3       /* header + cmd + flush */
+#define HCQ_VCQ_CMDS_MAX       6       /* keygen: hdr+new+write+cmd+read+flush */
+
+/* HCQ Command IDs */
+#define HCQ_CMD_XMSS_VERIFY                    0x03U
+#define HCQ_CMD_LMS_VERIFY                     0x04U
+#define HCQ_CMD_SLHDSA_VERIFY_INTERNAL         0x05U
+#define HCQ_CMD_SLHDSA_VERIFY                  0x06U
+#define HCQ_CMD_SLHDSA_VERIFY_PREHASH          0x07U
+#define HCQ_CMD_SLHDSA_VERIFY_PREHASH_DIGEST   0x08U
+#define HCQ_CMD_SLHDSA_KEYGEN                  0x09U
+#define HCQ_CMD_SLHDSA_SIGN_INTERNAL           0x10U
+#define HCQ_CMD_SLHDSA_SIGN                    0x11U
+#define HCQ_CMD_SLHDSA_SIGN_PREHASH            0x12U
+#define HCQ_CMD_SLHDSA_SIGN_PREHASH_DIGEST     0x13U
+#define HCQ_CMD_SLHDSA_PUBGEN                  0x14U
+
+/* SLH-DSA Parameter Set IDs */
+#define HCQ_SLHDSA_SHAKE_128S  1U
+#define HCQ_SLHDSA_SHAKE_128F  2U
+#define HCQ_SLHDSA_SHAKE_192S  3U
+#define HCQ_SLHDSA_SHAKE_192F  4U
+#define HCQ_SLHDSA_SHAKE_256S  5U
+#define HCQ_SLHDSA_SHAKE_256F  6U
+#define HCQ_SLHDSA_SHA2_128S   7U
+#define HCQ_SLHDSA_SHA2_128F   8U
+#define HCQ_SLHDSA_SHA2_192S   9U
+#define HCQ_SLHDSA_SHA2_192F   10U
+#define HCQ_SLHDSA_SHA2_256S   11U
+#define HCQ_SLHDSA_SHA2_256F   12U
+#define HCQ_SLHDSA_PARAM_MAX   12U
+
+/* SLH-DSA Prehash Algorithm IDs */
+#define HCQ_SLHDSA_PREHASH_SHA256      1U
+#define HCQ_SLHDSA_PREHASH_SHA512      2U
+#define HCQ_SLHDSA_PREHASH_SHAKE128    3U
+#define HCQ_SLHDSA_PREHASH_SHAKE256    4U
+
+/* SLH-DSA size limits */
+#define SLHDSA_MAX_PK_SIZE     64U     /* 2*n, n=32 */
+#define SLHDSA_MAX_SK_SIZE     128U    /* 4*n, n=32 */
+#define SLHDSA_MAX_SEED_SIZE   96U     /* 3*n, n=32 */
+#define SLHDSA_MAX_SIG_SIZE    49856U  /* SHAKE-256f / SHA2-256f */
+#define SLHDSA_MAX_MSG_LEN     128U
+#define SLHDSA_MAX_CTX_LEN     255U
+
+/* LMS/HSS size limits -- derived from eSW HCQ ABI constraints */
+#define LMS_MAX_PK_LEN         60U     /* eSW public-key buffer */
+#define LMS_MAX_MSG_LEN                256U    /* SHS_LMS_MESSAGE_LEN_MAX */
+#define LMS_MAX_SIG_LEN                13364U  /* eSW signature buffer */
+
+/* XMSS/XMSS-MT size limits -- derived from eSW HCQ ABI constraints */
+#define XMSS_MAX_PK_LEN        136U    /* eSW public-key buffer */
+#define XMSS_MAX_MSG_LEN       64U     /* SHS_XMSS_MESSAGE_LEN_MAX */
+#define XMSS_MAX_SIG_LEN       27688U  /* eSW signature buffer */
+
+/* SLH-DSA n-value for each parameter set (index = param_set - 1) */
+extern const u32 slhdsa_n[];
+
+/* SLH-DSA signature sizes (index = param_set - 1) */
+extern const u32 slhdsa_sig_size[];
+
+/* Derive PK/SK/seed sizes from n */
+static inline u32 slhdsa_pk_size(u32 param_set)
+{
+       if (param_set < 1U || param_set > HCQ_SLHDSA_PARAM_MAX)
+               return 0;
+       return 2U * slhdsa_n[param_set - 1U];
+}
+
+static inline u32 slhdsa_sk_size(u32 param_set)
+{
+       if (param_set < 1U || param_set > HCQ_SLHDSA_PARAM_MAX)
+               return 0;
+       return 4U * slhdsa_n[param_set - 1U];
+}
+
+static inline u32 slhdsa_seed_size(u32 param_set)
+{
+       if (param_set < 1U || param_set > HCQ_SLHDSA_PARAM_MAX)
+               return 0;
+       return 3U * slhdsa_n[param_set - 1U];
+}
+
+static inline u32 slhdsa_get_sig_size(u32 param_set)
+{
+       if (param_set < 1U || param_set > HCQ_SLHDSA_PARAM_MAX)
+               return 0;
+       return slhdsa_sig_size[param_set - 1U];
+}
+
+/* HCQ Command Structures -- match CMH eSW ABI exactly */
+
+struct hcq_cmd_xmss_verify {
+       u32 xmss_mt;    /* 0 = XMSS, 1 = XMSS-MT */
+       u32 pk_len;
+       u32 sig_len;
+       u32 dig_len;
+       u64 pk;
+       u64 sig;
+       u64 dig;
+};
+
+struct hcq_cmd_lms_verify {
+       u32 lms_hss;    /* 0 = LMS, 1 = LMS-HSS */
+       u32 pk_len;
+       u32 sig_len;
+       u32 dig_len;
+       u64 pk;
+       u64 sig;
+       u64 dig;
+};
+
+struct hcq_cmd_slhdsa_verify_internal {
+       u32 parameter_set;
+       u32 message_len;
+       u64 message;
+       u64 pk;
+       u64 sig;
+};
+
+struct hcq_cmd_slhdsa_verify {
+       u32 parameter_set;
+       u32 message_len;
+       u64 message;
+       u64 context;
+       u64 pk;
+       u64 sig;
+       u32 context_len;
+};
+
+struct hcq_cmd_slhdsa_verify_prehash {
+       u32 parameter_set;
+       u32 prehash_algo;
+       u32 message_len;
+       u32 context_len;
+       u64 message;
+       u64 context;
+       u64 pk;
+       u64 sig;
+};
+
+struct hcq_cmd_slhdsa_keygen {
+       u32 parameter_set;
+       u32 seed_len;
+       u32 pk_len;
+       u32 sk_len;
+       u64 seed;       /* DS reference */
+       u64 pk;         /* extmem addr */
+       u64 sk;         /* DS reference */
+};
+
+struct hcq_cmd_slhdsa_sign_internal {
+       u32 parameter_set;
+       u32 message_len;
+       u64 add_random; /* extmem addr, 0 = none */
+       u64 message;
+       u64 sk;         /* DS reference */
+       u64 sig;        /* extmem addr */
+};
+
+struct hcq_cmd_slhdsa_sign {
+       u32 parameter_set;
+       u32 message_len;
+       u64 add_random;
+       u64 message;
+       u64 context;
+       u64 sk;         /* DS reference */
+       u64 sig;        /* extmem addr */
+       u32 context_len;
+};
+
+struct hcq_cmd_slhdsa_sign_prehash {
+       u32 parameter_set;
+       u32 prehash_algo;
+       u32 message_len;
+       u32 context_len;
+       u64 add_random;
+       u64 message;
+       u64 context;
+       u64 sk;         /* DS reference */
+       u64 sig;        /* extmem addr */
+};
+
+struct hcq_cmd_slhdsa_pubgen {
+       u32 parameter_set;
+       u32 sk_len;
+       u64 sk;         /* DS reference */
+       u64 pk;         /* extmem addr */
+};
+
+union hcq_cmd {
+       struct hcq_cmd_xmss_verify              cmd_xmss_verify;
+       struct hcq_cmd_lms_verify               cmd_lms_verify;
+       struct hcq_cmd_slhdsa_verify_internal   cmd_slhdsa_verify_internal;
+       struct hcq_cmd_slhdsa_verify            cmd_slhdsa_verify;
+       struct hcq_cmd_slhdsa_verify_prehash    cmd_slhdsa_verify_prehash;
+       struct hcq_cmd_slhdsa_keygen            cmd_slhdsa_keygen;
+       struct hcq_cmd_slhdsa_sign_internal     cmd_slhdsa_sign_internal;
+       struct hcq_cmd_slhdsa_sign              cmd_slhdsa_sign;
+       struct hcq_cmd_slhdsa_sign_prehash      cmd_slhdsa_sign_prehash;
+       struct hcq_cmd_slhdsa_pubgen            cmd_slhdsa_pubgen;
+};
+
+#endif /* CMH_HCQ_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_kic_abi.h b/drivers/crypto/cmh/include/cmh_kic_abi.h
new file mode 100644
index 000000000000..7f4fe3b9fd89
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_kic_abi.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- KIC Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH KIC ABI (KIC commands only).
+ * Derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_KIC_ABI_H
+#define CMH_KIC_ABI_H
+
+#include <linux/types.h>
+
+/* KIC Commands */
+
+#define KIC_CMD_HKDF1          0x06U
+#define KIC_CMD_HKDF2          0x07U
+#define KIC_CMD_AES_CMAC_KDF   0x08U
+#define KIC_CMD_DKEK_DERIVE    0x09U
+
+/* Maximum key size for KIC operations (bytes) */
+#define KIC_KEY_SIZE           32U
+
+/*
+ * KIC Command Structures
+ *
+ * Field names (llen, len) mirror the CMH eSW ABI register layout.
+ * llen = label length, len = output key length.
+ */
+
+struct kic_cmd_hkdf1 {
+       u64 dst;        /* DS ref for derived key (SYS_REF_LAST) */
+       u64 base;       /* base key reference (e.g., KIC_KEY1) */
+       u64 label;      /* label pointer (0 for inline-next-slot) */
+       u32 llen;       /* label length */
+       u32 len;                /* output key length */
+       u32 type;       /* SYS_TYPE_SET(flags, core_id) */
+};
+
+struct kic_cmd_hkdf2 {
+       u64 dst;        /* DS ref for derived key */
+       u64 base;       /* base key reference */
+       u64 salt;       /* salt key reference (SYS_REF_NONE = no salt) */
+       u64 label;      /* label pointer */
+       u32 llen;       /* label length */
+       u32 len;                /* output key length */
+       u32 type;       /* SYS_TYPE_SET(flags, core_id) */
+};
+
+struct kic_cmd_aes_cmac_kdf {
+       u64 base_key;   /* KIC/DS reference for base key */
+       u64 out_key;    /* DS reference for derived key */
+       u64 label;      /* label DMA address */
+       u32 key_len;    /* base & output key length (must be 32) */
+       u32 label_len;  /* label length */
+       u32 type;       /* SYS_TYPE_SET(flags, core_id) for output */
+};
+
+struct kic_cmd_dkek_derive {
+       u64 base_key;           /* KIC base key reference */
+       u64 out_key;            /* DS reference for the derived KEK */
+       u32 host_id;            /* host ID (0 = caller's own) */
+       u32 metadata_len;       /* metadata length */
+       u64 metadata;           /* metadata DMA address */
+};
+
+/* KIC Command Union */
+
+union kic_cmd {
+       struct kic_cmd_hkdf1 cmd_hkdf1;
+       struct kic_cmd_hkdf2 cmd_hkdf2;
+       struct kic_cmd_aes_cmac_kdf cmd_aes_cmac_kdf;
+       struct kic_cmd_dkek_derive cmd_dkek_derive;
+};
+
+#endif /* CMH_KIC_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_mqi.h b/drivers/crypto/cmh/include/cmh_mqi.h
new file mode 100644
index 000000000000..93b847859953
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_mqi.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Mailbox Queue Initializer
+ *
+ * Allocates DMA-capable queue buffers and programs MBX registers
+ * via the MBX lock/setup/enable/unlock register sequence.
+ */
+
+#ifndef CMH_MQI_H
+#define CMH_MQI_H
+
+#include "cmh_config.h"
+
+#define MBX_LOCK_TIMEOUT_MS     1000
+#define MBX_LOCK_POLL_MIN_US    10
+#define MBX_LOCK_POLL_MAX_US    50
+#define MBX_HOST_INFO_LKM       0x4C4B4D00U  /* "LKM\0" as host identifier */
+
+/**
+ * cmh_mqi_init() - Allocate MBX queue buffers and program registers
+ * @cfg: Global device configuration
+ *
+ * Performs the lock/setup/enable/unlock sequence for each configured MBX.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mqi_init(struct cmh_config *cfg);
+
+/**
+ * cmh_mqi_cleanup() - Free MBX queue buffers and release locks
+ * @cfg: Global device configuration
+ */
+void cmh_mqi_cleanup(struct cmh_config *cfg);
+
+#endif /* CMH_MQI_H */
diff --git a/drivers/crypto/cmh/include/cmh_pke_abi.h b/drivers/crypto/cmh/include/cmh_pke_abi.h
new file mode 100644
index 000000000000..e0e7b946b4e3
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_pke_abi.h
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- PKE Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH PKE ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_PKE_ABI_H
+#define CMH_PKE_ABI_H
+
+#include <linux/types.h>
+
+/* PKE Command IDs */
+
+#define PKE_CMD_ECDSA_VERIFY           0x03U
+#define PKE_CMD_ECDSA_SIGN             0x04U
+#define PKE_CMD_ECDSA_PUBGEN           0x05U
+#define PKE_CMD_ECDSA_KEYGEN           0x06U
+#define PKE_CMD_EDDSA_VERIFY           0x07U
+#define PKE_CMD_EDDSA_SIGN             0x08U
+#define PKE_CMD_EDDSA_PUBGEN           0x09U
+#define PKE_CMD_ECDH_KEYGEN            0x0AU
+#define PKE_CMD_ECDH                   0x0BU
+#define PKE_CMD_RSA_ENC                        0x0CU
+#define PKE_CMD_RSA_DEC                        0x0DU
+#define PKE_CMD_RSA_KEYGEN             0x0EU
+#define PKE_CMD_RSA_CRT_DEC            0x0FU
+#define PKE_CMD_SM2_ECDH_KEYGEN                0x16U
+#define PKE_CMD_SM2_ECDH               0x17U
+#define PKE_CMD_SM2_DEC_POINT          0x18U
+#define PKE_CMD_SM2_ENC_POINT          0x19U
+#define PKE_CMD_SM2_ID_DIGEST          0x1AU
+#define PKE_CMD_SM2_ECDH_HASH          0x1BU
+#define PKE_CMD_SM2_DEC_HASH           0x1CU
+#define PKE_CMD_SM2_ENC_HASH           0x1DU
+#define PKE_CMD_EDDSA_PRIV_KEYGEN_SCA  0x21U
+#define PKE_CMD_FLUSH                  0xFFU
+
+/* EC Curve IDs (per CMH PKE ABI) */
+
+#define PKE_CURVE_P192                 0x01U
+#define PKE_CURVE_P224                 0x02U
+#define PKE_CURVE_P256                 0x03U
+#define PKE_CURVE_P384                 0x04U
+#define PKE_CURVE_P521                 0x05U
+#define PKE_CURVE_SECP256K1            0x07U
+#define PKE_CURVE_BP192R1              0x11U
+#define PKE_CURVE_BP224R1              0x12U
+#define PKE_CURVE_BP256R1              0x13U
+#define PKE_CURVE_BP320R1              0x14U
+#define PKE_CURVE_BP384R1              0x15U
+#define PKE_CURVE_BP512R1              0x16U
+#define PKE_CURVE_ANSSI_FRP256V1       0x17U
+#define PKE_CURVE_SM2                  0x18U
+#define PKE_CURVE_25519                        0x21U
+#define PKE_CURVE_448                  0x22U
+
+/* PKE Command Structures -- match CMH eSW ABI exactly */
+
+struct pke_cmd_ecdsa_verify {
+       u32 curve;
+       u32 digest_len;
+       u64 public_key;
+       u64 digest;
+       u64 signature;
+       u64 rprime;
+};
+
+struct pke_cmd_ecdsa_sign {
+       u32 curve;
+       u32 secret_key_len;
+       u64 digest;
+       u64 signature;
+       u64 secret_key;         /* DS reference */
+       u32 digest_len;
+};
+
+struct pke_cmd_ecdsa_pubgen {
+       u32 curve;
+       u32 secret_key_len;
+       u64 public_key;
+       u64 secret_key;         /* DS reference */
+};
+
+struct pke_cmd_ecdsa_keygen {
+       u32 curve;
+       u32 secret_key_len;
+       u64 secret_key;         /* DS reference */
+       u32 secret_key_type;
+};
+
+struct pke_cmd_eddsa_verify {
+       u32 curve;
+       u32 digest_len;
+       u64 public_key_y;
+       u64 digest;
+       u64 signature;
+       u64 rprime;
+};
+
+struct pke_cmd_eddsa_sign {
+       u32 curve;
+       u32 secret_key_len;
+       u64 digest;
+       u64 signature;
+       u64 secret_key;         /* DS reference */
+       u32 digest_len;
+};
+
+struct pke_cmd_eddsa_pubgen {
+       u32 curve;
+       u32 secret_key_len;
+       u64 public_key_y;
+       u64 secret_key;         /* DS reference */
+};
+
+struct pke_cmd_ecdh_keygen {
+       u32 curve;
+       u32 secret_key_len;
+       u64 public_key_x;
+       u64 secret_key;         /* DS reference */
+};
+
+struct pke_cmd_ecdh {
+       u32 curve;
+       u32 secret_key_len;
+       u32 shared_secret_len;
+       u32 shared_secret_type;
+       u64 peer_key_x;
+       u64 secret_key;         /* DS reference */
+       u64 shared_secret;      /* DS reference for result */
+};
+
+struct pke_cmd_rsa_enc {
+       u32 bits;
+       u32 e_len;
+       u64 e;
+       u64 n;
+       u64 m;
+       u64 c;
+};
+
+struct pke_cmd_rsa_dec {
+       u32 bits;
+       u32 e_len;
+       u64 e;
+       u64 n;
+       u64 c;
+       u64 m;
+       u64 d;                  /* DS reference */
+};
+
+struct pke_cmd_rsa_crt_dec {
+       u32 bits;
+       u32 e_len;
+       u64 e;
+       u64 n;
+       u64 c;
+       u64 m;
+       u64 crt;                /* DS reference */
+};
+
+struct pke_cmd_rsa_keygen {
+       u32 bits;
+       u32 d_type;
+       u64 e;
+       u64 n;
+       u64 d;                  /* DS reference */
+       u64 crt;                /* DS reference */
+       u32 crt_type;
+};
+
+struct pke_cmd_eddsa_keygen_sca {
+       u32 curve;
+       u64 secret_key;         /* DS reference: input normal SK */
+       u64 sca_secret_key;     /* DS reference: output blinded SK */
+};
+
+/* SM2 Command Structures */
+
+struct pke_cmd_sm2_ecdh_keygen {
+       u64 nonce;              /* DMA addr (32B input or output) */
+       u64 session_key;        /* DMA addr output (64B) */
+       u32 nonce_len;          /* 0 = HW generates, 32 = caller provides */
+};
+
+struct pke_cmd_sm2_ecdh {
+       u32 nonce_len;          /* 0 or 32 */
+       u32 private_key_len;    /* must be 32 */
+       u64 nonce;              /* DMA addr (32B) */
+       u64 peer_public_key;    /* DMA addr (64B) */
+       u64 peer_session_key;   /* DMA addr (64B) */
+       u64 private_key;                /* DS reference */
+       u64 shared_point;       /* DS reference (output, 64B) */
+       u32 shared_point_type;  /* SYS_TYPE_SET(flags, CORE_ID_PKE) */
+};
+
+struct pke_cmd_sm2_dec_point {
+       u32 ciphertext_len;     /* total CT length (97..128) */
+       u32 private_key_len;    /* must be 32 */
+       u64 ciphertext;         /* DMA addr (64B: C1 point) */
+       u64 dec_point;          /* DMA addr output (64B) */
+       u64 private_key;                /* DS reference */
+};
+
+struct pke_cmd_sm2_enc_point {
+       u64 nonce;              /* DMA addr (32B, optional) */
+       u64 public_key;         /* DMA addr (64B) */
+       u64 ciphertext;         /* DMA addr output (64B: C1) */
+       u64 enc_point;          /* DMA addr output (64B) */
+       u32 nonce_len;          /* 0 or 32 */
+};
+
+struct pke_cmd_sm2_id_digest {
+       u64 id;                 /* DMA addr (identity, <=32B) */
+       u64 public_key;         /* DMA addr (64B) */
+       u64 digest;             /* DMA addr output (32B) */
+       u32 id_len;             /* identity length in bytes */
+};
+
+struct pke_cmd_sm2_ecdh_hash {
+       u64 peer_id_digest;     /* DMA addr (32B) */
+       u64 id_digest;          /* DMA addr (32B) */
+       u64 shared_point;       /* DS reference (64B input) */
+       u64 shared_key;         /* DS reference (16B output) */
+       u32 shared_key_type;    /* SYS_TYPE_SET(flags, CORE_ID_PKE) */
+};
+
+struct pke_cmd_sm2_dec_hash {
+       u64 ciphertext;         /* DMA addr (full ciphertext) */
+       u64 dec_point;          /* DMA addr (64B) */
+       u64 plaintext;          /* DMA addr output (ct_len - 96 bytes) */
+       u32 ciphertext_len;     /* 97..128 */
+};
+
+struct pke_cmd_sm2_enc_hash {
+       u64 message;            /* DMA addr (plaintext) */
+       u64 enc_point;          /* DMA addr (64B) */
+       u64 ciphertext;         /* DMA addr output (96 + msg_len) */
+       u32 message_len;        /* 1..32 */
+};
+
+/* PKE Command Union */
+
+union pke_cmd {
+       struct pke_cmd_ecdsa_verify     cmd_ecdsa_verify;
+       struct pke_cmd_ecdsa_sign       cmd_ecdsa_sign;
+       struct pke_cmd_ecdsa_pubgen     cmd_ecdsa_pubgen;
+       struct pke_cmd_ecdsa_keygen     cmd_ecdsa_keygen;
+       struct pke_cmd_eddsa_verify     cmd_eddsa_verify;
+       struct pke_cmd_eddsa_sign       cmd_eddsa_sign;
+       struct pke_cmd_eddsa_pubgen     cmd_eddsa_pubgen;
+       struct pke_cmd_ecdh_keygen      cmd_ecdh_keygen;
+       struct pke_cmd_ecdh             cmd_ecdh;
+       struct pke_cmd_rsa_enc          cmd_rsa_enc;
+       struct pke_cmd_rsa_dec          cmd_rsa_dec;
+       struct pke_cmd_rsa_crt_dec      cmd_rsa_crt_dec;
+       struct pke_cmd_rsa_keygen       cmd_rsa_keygen;
+       struct pke_cmd_eddsa_keygen_sca cmd_eddsa_keygen_sca;
+       struct pke_cmd_sm2_ecdh_keygen  cmd_sm2_ecdh_keygen;
+       struct pke_cmd_sm2_ecdh         cmd_sm2_ecdh;
+       struct pke_cmd_sm2_dec_point    cmd_sm2_dec_point;
+       struct pke_cmd_sm2_enc_point    cmd_sm2_enc_point;
+       struct pke_cmd_sm2_id_digest    cmd_sm2_id_digest;
+       struct pke_cmd_sm2_ecdh_hash    cmd_sm2_ecdh_hash;
+       struct pke_cmd_sm2_dec_hash     cmd_sm2_dec_hash;
+       struct pke_cmd_sm2_enc_hash     cmd_sm2_enc_hash;
+};
+
+#endif /* CMH_PKE_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_qse_abi.h b/drivers/crypto/cmh/include/cmh_qse_abi.h
new file mode 100644
index 000000000000..9834620e21d7
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_qse_abi.h
@@ -0,0 +1,181 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- QSE Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH QSE ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_QSE_ABI_H
+#define CMH_QSE_ABI_H
+
+#include <linux/bits.h>
+#include <linux/compiler_attributes.h>
+#include <linux/types.h>
+
+/* VCQ layout: header + [SYS_NEW] + QSE_CMD + flush */
+#define QSE_VCQ_CMDS_MIN       3       /* header + cmd + flush */
+#define QSE_VCQ_CMDS_MAX       4       /* header + sys_new + cmd + flush */
+
+/* QSE Flags */
+#define QSE_FLAG_USE_REF       BIT(0)
+#define QSE_FLAG_USE_RNG       BIT(1)
+
+/* QSE Command IDs */
+#define QSE_CMD_ML_KEM_KEYGEN          0x01U
+#define QSE_CMD_ML_KEM_ENC             0x02U
+#define QSE_CMD_ML_KEM_DEC             0x03U
+#define QSE_CMD_ML_DSA_KEYGEN          0x04U
+#define QSE_CMD_ML_DSA_SIGN            0x05U
+#define QSE_CMD_ML_DSA_VERIFY          0x06U
+#define QSE_CMD_ML_KEM_KEYGEN_MASKED   0x07U
+#define QSE_CMD_ML_KEM_ENC_MASKED      0x08U
+#define QSE_CMD_ML_KEM_DEC_MASKED      0x09U
+#define QSE_CMD_ML_DSA_KEYGEN_MASKED   0x0AU
+#define QSE_CMD_ML_DSA_SIGN_MASKED     0x0BU
+
+/* ML-KEM category values */
+#define ML_KEM_K_512           2U
+#define ML_KEM_K_768           3U
+#define ML_KEM_K_1024          4U
+
+/* ML-DSA mode values */
+#define ML_DSA_MODE_44         2U
+#define ML_DSA_MODE_65         3U
+#define ML_DSA_MODE_87         5U
+
+/* ML-DSA special message length for externalMu (pre-hashed 64-byte input) */
+#define ML_DSA_MLEN_EXTERNAL_MU        0xFFFFFFFFU
+#define ML_DSA_EXTMU_LEN       64U     /* actual copy size for externalMu */
+
+/* ML-DSA maximum message length */
+#define ML_DSA_MAX_MLEN                10240U
+
+/* Shared secret size */
+#define ML_KEM_SS_LEN          32U
+#define ML_KEM_SS_LEN_MASKED   64U
+
+/* Seed sizes */
+#define QSE_SEED_LEN           32U
+#define QSE_SEED_LEN_MASKED    64U
+
+/*
+ * ML-KEM size tables -- indexed by (k - 2).
+ *  [0] = ML-KEM-512 (k=2)
+ *  [1] = ML-KEM-768 (k=3)
+ *  [2] = ML-KEM-1024 (k=4)
+ */
+#define ML_KEM_LEVELS          3U
+
+#define ML_KEM_EK_SIZE(k)      (384U * (k) + 32U)
+#define ML_KEM_DK_SIZE(k)      (768U * (k) + 96U)
+#define ML_KEM_DK_SIZE_MASKED(k) (1152U * (k) + 128U)
+
+static inline u32 ml_kem_ct_size(u32 k)
+{
+       u32 du = (k == 4U) ? 11U : 10U;
+       u32 dv = (k == 4U) ? 5U : 4U;
+
+       return 32U * (k * du + dv);
+}
+
+#define ML_KEM_CT_SIZE(k)      ml_kem_ct_size(k)
+
+/*
+ * ML-DSA size tables -- indexed by mode.
+ * Mode values: 2 (ML-DSA-44), 3 (ML-DSA-65), 5 (ML-DSA-87).
+ */
+extern const u32 ml_dsa_pk_size[];
+extern const u32 ml_dsa_sk_size[];
+extern const u32 ml_dsa_sk_size_masked[];
+extern const u32 ml_dsa_sig_size[];
+
+/* Map ML-DSA mode (2/3/5) -> table index (0/1/2) */
+static inline int ml_dsa_mode_idx(u32 mode)
+{
+       switch (mode) {
+       case 2: return 0;
+       case 3: return 1;
+       case 5: return 2;
+       default: return -1;
+       }
+}
+
+/* Map ML-KEM k (2/3/4) -> table index (0/1/2), or -1 if invalid */
+static inline int ml_kem_k_idx(u32 k)
+{
+       if (k >= 2U && k <= 4U)
+               return (int)(k - 2U);
+       return -1;
+}
+
+/* QSE Command Structures -- match CMH eSW ABI exactly */
+
+struct qse_cmd_ml_kem_keygen {
+       u32 k;
+       u32 flags;
+       u64 seed;
+       u64 z;
+       u64 ek;
+       u64 dk;
+       u32 dk_type;
+};
+
+struct qse_cmd_ml_kem_enc {
+       u32 k;
+       u32 flags;
+       u64 coin;
+       u64 ek;
+       u64 ct;
+       u64 ss;
+       u32 ss_type;
+};
+
+struct qse_cmd_ml_kem_dec {
+       u32 k;
+       u32 flags;
+       u64 ct;
+       u64 dk;
+       u64 ss;
+       u32 ss_type;
+};
+
+struct qse_cmd_ml_dsa_keygen {
+       u32 mode;
+       u32 flags;
+       u64 seed;
+       u64 pk;
+       u64 sk;
+       u32 sk_type;
+};
+
+struct qse_cmd_ml_dsa_sign {
+       u32 mode;
+       u32 flags;
+       u64 rnd;
+       u64 m;
+       u64 sk;
+       u64 sig;
+       u32 mlen;
+};
+
+struct qse_cmd_ml_dsa_verify {
+       u32 mode;
+       u32 flags;
+       u64 m;
+       u64 pk;
+       u64 sig;
+       u32 mlen;
+};
+
+union qse_cmd {
+       struct qse_cmd_ml_kem_keygen cmd_ml_kem_keygen;
+       struct qse_cmd_ml_kem_enc    cmd_ml_kem_enc;
+       struct qse_cmd_ml_kem_dec    cmd_ml_kem_dec;
+       struct qse_cmd_ml_dsa_keygen cmd_ml_dsa_keygen;
+       struct qse_cmd_ml_dsa_sign   cmd_ml_dsa_sign;
+       struct qse_cmd_ml_dsa_verify cmd_ml_dsa_verify;
+};
+
+#endif /* CMH_QSE_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_registers.h b/drivers/crypto/cmh/include/cmh_registers.h
new file mode 100644
index 000000000000..9481b30b76d1
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_registers.h
@@ -0,0 +1,145 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Hardware Register Definitions
+ *
+ * Derived from the CMH hardware register specification.
+ * All offsets are taken directly from the hardware documentation.
+ */
+
+#ifndef CMH_REGISTERS_H
+#define CMH_REGISTERS_H
+
+#include <linux/io.h>
+#include <linux/types.h>
+
+/* MBX Instance Addressing */
+
+#define CMH_MBX_INSTANCE_SHIFT        12
+#define CMH_MBX_INSTANCE_SIZE         BIT(CMH_MBX_INSTANCE_SHIFT) /* 0x1000 */
+#define CMH_MAX_MBX_INSTANCES         64U
+
+/* MBX Per-Instance Register Offsets */
+
+#define R_MBX_LOCK                    0x000U
+#define R_MBX_HOST_INFO               0x004U
+#define R_MBX_QUEUE_LO                0x008U
+#define R_MBX_QUEUE_HI                0x00CU
+#define R_MBX_QUEUE_SLOTS             0x010U
+#define R_MBX_QUEUE_STRIDE            0x014U
+#define R_MBX_QUEUE_HEAD              0x018U
+#define R_MBX_QUEUE_TAIL              0x01CU
+#define R_MBX_INTERRUPT               0x020U
+#define R_MBX_INTERRUPT_MASK          0x024U
+#define R_MBX_COMMAND                 0x028U
+#define R_MBX_STATUS                  0x02CU
+#define R_MBX_CHILD                   0x030U
+#define R_MBX_ID                      0x034U
+#define R_MBX_HOST_CONFIG             0x038U
+#define R_MBX_SCRATCH                 0x03CU
+
+#define MBX_QUEUE_ALIGNMENT           0x4U
+
+/* MBX Interrupt Bits */
+
+#define MBX_DONE_IRQ                  BIT(0)
+#define MBX_ERROR_IRQ                 BIT(1)
+#define MBX_IRQ_MASK                  (MBX_DONE_IRQ | MBX_ERROR_IRQ)
+
+/* MBX Command Values */
+
+#define MBX_COMMAND_RUN               0x000U
+#define MBX_COMMAND_PAUSE             0xC2FU
+#define MBX_COMMAND_CONTINUE          0x5DBU
+#define MBX_COMMAND_RESTART           0xB78U
+#define MBX_COMMAND_ABORT             0x6F6U
+#define MBX_COMMAND_FLUSH             0x3A5U
+
+/* MBX Status Values */
+
+#define MBX_STATUS_IDLE               0x01U
+#define MBX_STATUS_BUSY               0x10U
+#define MBX_STATUS_HOLD               0x20U
+#define MBX_STATUS_PAUSED             0x28U
+#define MBX_STATUS_SUCCESS            0x40U
+#define MBX_STATUS_ERROR              0x80U
+#define MBX_STATUS_OFFLINE            0x88U  /* ERROR | 0x08: offline/stopped */
+
+#define MBX_MASK_DONE                 (MBX_STATUS_IDLE | MBX_STATUS_SUCCESS)
+#define MBX_MASK_RUNNING              (MBX_STATUS_BUSY | MBX_STATUS_HOLD)
+#define MBX_MASK_STOPPED              MBX_STATUS_OFFLINE
+
+/* MBX Status Field Extraction */
+
+#define MBX_STATUS_CODE(v)            ((v) & 0xFFU)
+#define MBX_STATUS_CORE_ID(v)         (((v) >> 8) & 0xFFU)
+#define MBX_STATUS_ERROR_CODE(v)      (((v) >> 16) & 0xFFU)
+#define MBX_STATUS_CMD_INDEX(v)       (((v) >> 24) & 0xFFU)
+
+/* SIC Register Offsets (relative to SIC base / instance 0 base) */
+
+#define R_SIC_BOOT_STATUS             0x100U
+#define SIC_BOOT_STATUS_MASK          0x77U
+#define SIC_BOOT_STATUS_PASS          0x66U
+
+#define R_SIC_MBX_AVAILABILITY        0x104U
+#define R_SIC_MBX_AVAILABILITY2       0x108U
+
+#define R_SIC_SW_BOOT_STATUS          0x12CU
+#define SIC_SW_BOOT_STATUS_STARTED    BIT(0)
+#define SIC_SW_BOOT_STATUS_READY      BIT(1)
+#define SIC_SW_BOOT_STATUS_MISSION    BIT(6)
+
+#define R_SIC_SW_ERROR_INFO           0x130U
+#define R_SIC_SW_HEARTBEAT            0x154U
+
+#define R_SIC_GPINTERRUPT             0x160U
+
+#define R_SIC_HW_VERSION0             0x200U
+#define R_SIC_SW_VERSION              0x218U
+#define R_SIC_CORE_ENABLE             0x22CU
+
+/* Register Access Helpers */
+
+static inline u32 cmh_reg_read32(void __iomem *base, u32 offset)
+{
+       return ioread32((u8 __iomem *)base + offset);
+}
+
+static inline void cmh_reg_write32(u32 value, void __iomem *base, u32 offset)
+{
+       iowrite32(value, (u8 __iomem *)base + offset);
+}
+
+/*
+ * 64-bit register access via two 32-bit reads/writes.  Only correct for
+ * register pairs where split access is defined (e.g. QUEUE_LO/HI).
+ * Do not use for registers requiring atomic 64-bit access.
+ *
+ * No explicit barrier between the two halves is needed: ioread32/iowrite32
+ * include implicit ordering guarantees on all supported architectures
+ * (MMIO accessors are strongly ordered with respect to each other).
+ */
+static inline u64 cmh_reg_read64(void __iomem *base, u32 offset)
+{
+       u32 lo = ioread32((u8 __iomem *)base + offset);
+       u32 hi = ioread32((u8 __iomem *)base + offset + 4);
+
+       return ((u64)hi << 32) | lo;
+}
+
+static inline void cmh_reg_write64(u64 value, void __iomem *base, u32 offset)
+{
+       iowrite32((u32)value, (u8 __iomem *)base + offset);
+       iowrite32((u32)(value >> 32), (u8 __iomem *)base + offset + 4);
+}
+
+/* Return the ioremap'd base for MBX instance N within the SIC region */
+static inline void __iomem *cmh_mbx_instance_base(void __iomem *sic_mapped,
+                                                 u32 instance)
+{
+       return (u8 __iomem *)sic_mapped +
+              ((unsigned long)instance << CMH_MBX_INSTANCE_SHIFT);
+}
+
+#endif /* CMH_REGISTERS_H */
diff --git a/drivers/crypto/cmh/include/cmh_rh.h b/drivers/crypto/cmh/include/cmh_rh.h
new file mode 100644
index 000000000000..b182c203a475
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_rh.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Response Handler
+ *
+ * IRQ-driven completion processing.  Uses request_threaded_irq():
+ *   - Hardirq: read+clear MBX interrupt registers, wake thread
+ *   - Threaded handler: walk per-MBX transaction queues,
+ *     fire completion callbacks, free transaction objects
+ *
+ * The Response Handler consumes transaction_obj entries enqueued
+ * by the Transaction Manager (cmh_txn.c) on each per-mailbox txq.
+ */
+
+#ifndef CMH_RH_H
+#define CMH_RH_H
+
+#include "cmh_config.h"
+
+/**
+ * cmh_rh_init() - Register IRQ handler and start response processing
+ * @cfg: Global device configuration
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int  cmh_rh_init(struct cmh_config *cfg);
+
+/**
+ * cmh_rh_cleanup() - Free IRQ and stop response processing
+ * @cfg: Global device configuration
+ */
+void cmh_rh_cleanup(struct cmh_config *cfg);
+
+/**
+ * cmh_rh_suspend() - Quiesce RH for system suspend
+ * @cfg: Global device configuration
+ *
+ * Cancels the watchdog timer and masks MBX interrupts at the hardware
+ * level.  IRQ handlers remain registered (standard PM pattern).
+ * The threaded IRQ handler stays active so that cmh_tm_quiesce()
+ * (called after this) can still drain in-flight transactions via
+ * IRQ-driven completions.
+ */
+void cmh_rh_suspend(struct cmh_config *cfg);
+
+/**
+ * cmh_rh_resume() - Restart RH after system resume
+ * @cfg: Global device configuration
+ *
+ * Re-synchronises per-MBX head tracking with hardware, clears stale
+ * interrupt bits, re-enables MBX interrupt masks, and re-arms the
+ * watchdog timer.  Must be called before cmh_tm_resume().
+ */
+void cmh_rh_resume(struct cmh_config *cfg);
+
+/* debugfs timeout accessor (debug builds only) */
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+unsigned int *cmh_rh_timeout_watchdog_ptr(void);
+#endif
+
+/**
+ * cmh_rh_force_drain_mbx() - FLUSH + drain all pending transactions on a MBX
+ * @mbx_idx: Mailbox index to drain
+ *
+ * Issues MBX_COMMAND_FLUSH, drains all pending transactions with
+ * -ECANCELED, and resets all recovery bookkeeping (including the
+ * wedged flag).  Safe to call at any time; acquires rh_process_lock.
+ * Intended for debugfs last-resort recovery.
+ */
+void cmh_rh_force_drain_mbx(u32 mbx_idx);
+
+/**
+ * cmh_rh_mbx_is_wedged() - Check if a mailbox is permanently wedged
+ * @mbx_idx: Mailbox index to check
+ *
+ * Returns true if the mailbox has failed RESTART+FLUSH recovery and
+ * is offline.  Used by the TM to avoid submitting new work to a dead
+ * mailbox.
+ *
+ * Return: true if wedged, false otherwise (including out-of-range idx).
+ */
+bool cmh_rh_mbx_is_wedged(u32 mbx_idx);
+
+/**
+ * cmh_rh_abort_mbx() - Issue MBX_COMMAND_ABORT under rh_process_lock
+ * @mbx_idx: Mailbox index to abort
+ *
+ * Serialises the ABORT write with RESTART/FLUSH commands issued by the
+ * watchdog, preventing command-register clobber races.
+ */
+void cmh_rh_abort_mbx(u32 mbx_idx);
+
+#endif /* CMH_RH_H */
diff --git a/drivers/crypto/cmh/include/cmh_rng.h b/drivers/crypto/cmh/include/cmh_rng.h
new file mode 100644
index 000000000000..1a886a0d82c1
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_rng.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Hardware RNG (DRBG) Driver
+ *
+ * Registers a struct hwrng backed by the CMH DRBG core.
+ * Each .read() builds a VCQ with DRBG_CMD_GENERATE and submits it
+ * through the Transaction Manager for synchronous completion.
+ *
+ * The DRBG must be configured (CONFIG command) by the management host
+ * before the LKM is loaded -- the LKM only issues GENERATE requests.
+ *
+ * CRNG seeding control:
+ *   - Module param "hwrng_quality" (0=disabled, 1-1024=enable)
+ *   - Default: 0 (conservative -- no automatic kernel CRNG seeding)
+ */
+
+#ifndef CMH_RNG_H
+#define CMH_RNG_H
+
+struct platform_device;
+
+int  cmh_rng_register(struct platform_device *pdev);
+void cmh_rng_unregister(void);
+
+/* debugfs timeout accessor (debug builds only) */
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+unsigned int *cmh_rng_timeout_drbg_ptr(void);
+#endif
+
+#endif /* CMH_RNG_H */
diff --git a/drivers/crypto/cmh/include/cmh_sm3_abi.h b/drivers/crypto/cmh/include/cmh_sm3_abi.h
new file mode 100644
index 000000000000..cbbe80fe18d6
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_sm3_abi.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SM3 Hash Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH SM3 ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_SM3_ABI_H
+#define CMH_SM3_ABI_H
+
+#include <linux/types.h>
+
+/* SM3 Commands */
+
+#define SM3_CMD_INIT            0x01U
+#define SM3_CMD_UPDATE          0x02U
+#define SM3_CMD_FINAL           0x03U
+#define SM3_CMD_UPDATE2D        0x04U
+#define SM3_CMD_GATHER          0x06U
+#define SM3_CMD_SAVE            0x07U
+#define SM3_CMD_RESTORE         0x08U
+
+/* SM3 Digest / Block Sizes */
+
+#define CMH_SM3_DIGEST_SIZE     32U
+#define CMH_SM3_BLOCK_SIZE      64U
+
+/* SM3 Context (for SAVE/RESTORE) */
+
+#define SM3_CONTEXT_WORDS       29U
+#define SM3_CONTEXT_SIZE        (SM3_CONTEXT_WORDS * 4 + 4)  /* ctx[29] + crc */
+
+/* SM3 Command Structures */
+
+struct sm3_cmd_update {
+       u64 input;      /* DMA physical address of input data */
+       u32 inlen;      /* input data length in bytes */
+};
+
+struct sm3_cmd_final {
+       u64 digest;     /* DMA physical address for output digest */
+       u32 outlen;     /* digest length in bytes */
+};
+
+struct sm3_cmd_update2d {
+       u64 input;      /* DMA source address for input data */
+       u64 output;     /* DMA destination address for pass-through data */
+       u32 iolen;      /* input/pass-through data length in bytes */
+};
+
+struct sm3_cmd_gather {
+       u64 lista;      /* DMA address of dma_scattergather_item chain */
+       u32 sgcmd;      /* SM3 sub-command: SM3_CMD_UPDATE or SM3_CMD_UPDATE2D */
+};
+
+struct sm3_cmd_save {
+       u64 output;     /* DMA physical address for saved context */
+       u32 outlen;     /* must be SM3_CONTEXT_SIZE */
+};
+
+struct sm3_cmd_restore {
+       u64 input;      /* DMA physical address of saved context */
+       u32 inlen;      /* must be SM3_CONTEXT_SIZE */
+};
+
+/* SM3 Command Union */
+
+union sm3_cmd {
+       struct sm3_cmd_update   cmd_update;
+       struct sm3_cmd_final    cmd_final;
+       struct sm3_cmd_update2d cmd_update2d;
+       struct sm3_cmd_gather   cmd_gather;
+       struct sm3_cmd_save     cmd_save;
+       struct sm3_cmd_restore  cmd_restore;
+};
+
+#endif /* CMH_SM3_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_sm4_abi.h b/drivers/crypto/cmh/include/cmh_sm4_abi.h
new file mode 100644
index 000000000000..a34faea613dc
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_sm4_abi.h
@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SM4 Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH SM4 ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_SM4_ABI_H
+#define CMH_SM4_ABI_H
+
+#include <linux/types.h>
+
+/* SM4 Block Size */
+
+#define CMH_SM4_BLOCK_SIZE     16U
+#define CMH_SM4_IV_SIZE                16U
+#define CMH_SM4_KEY_SIZE       16U     /* SM4 always uses 128-bit keys */
+
+/* SM4 Modes (per CMH SM4 ABI) */
+
+#define SM4_MODE_ECB           1U
+#define SM4_MODE_CBC           2U
+#define SM4_MODE_CTR           3U
+#define SM4_MODE_CFB           5U
+#define SM4_MODE_GCM           6U
+#define SM4_MODE_CMAC          7U
+#define SM4_MODE_CCM           8U
+#define SM4_MODE_XTS           9U
+#define SM4_MODE_XCBC          10U
+
+/* SM4 Operations (per CMH SM4 ABI) */
+
+#define SM4_OP_DECRYPT         1U
+#define SM4_OP_ENCRYPT         2U
+
+/* SM4 Command IDs */
+
+#define SM4_CMD_INIT           0x01U
+#define SM4_CMD_AAD_UPDATE     0x02U
+#define SM4_CMD_AAD_FINAL      0x03U
+#define SM4_CMD_UPDATE         0x04U
+#define SM4_CMD_FINAL          0x05U
+#define SM4_CMD_SCATTERGATHER  0x06U
+#define SM4_CMD_CCM_INIT       0x09U
+
+/* SM4 Command Structures */
+
+struct sm4_cmd_init {
+       u64 key;        /* datastore reference for the key */
+       u64 iv;         /* DMA address of the IV */
+       u32 keylen;     /* key length in bytes (16, or 32 for XTS) */
+       u32 ivlen;      /* IV length in bytes (0..16) */
+       u32 mode;       /* SM4 mode (SM4_MODE_*) */
+       u32 op;         /* SM4 operation (SM4_OP_*) */
+       u32 aadlen;     /* AAD length or 0 */
+       u32 iolen;      /* plaintext/ciphertext length */
+};
+
+struct sm4_cmd_update {
+       u64 input;      /* DMA address of input data */
+       u64 output;     /* DMA address of output data */
+       u32 iolen;      /* input/output data length */
+};
+
+struct sm4_cmd_final {
+       u64 input;      /* DMA address of last input data */
+       u64 output;     /* DMA address of last output data */
+       u64 tag;        /* DMA address of tag (AEAD only) */
+       u32 iolen;      /* last input/output data length */
+       u32 taglen;     /* tag length (AEAD only) */
+};
+
+struct sm4_cmd_aad_final {
+       u64 data;       /* DMA address of AAD data */
+       u32 datalen;    /* AAD data length */
+};
+
+struct sm4_cmd_ccm_init {
+       u64 key;        /* datastore reference for the key */
+       u64 nonce;      /* DMA address of the nonce */
+       u32 keylen;     /* key length in bytes (always 16) */
+       u32 noncelen;   /* nonce length (15 - L) */
+       u32 op;         /* SM4 operation (SM4_OP_*) */
+       u32 aadlen;     /* AAD length */
+       u32 iolen;      /* plaintext/ciphertext length */
+       u32 taglen;     /* tag length */
+};
+
+/* SM4 Command Union */
+
+union sm4_cmd {
+       struct sm4_cmd_init     cmd_init;
+       struct sm4_cmd_update   cmd_update;
+       struct sm4_cmd_final    cmd_final;
+       struct sm4_cmd_aad_final cmd_aad_final;
+       struct sm4_cmd_ccm_init cmd_ccm_init;
+};
+
+#endif /* CMH_SM4_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_sys_abi.h b/drivers/crypto/cmh/include/cmh_sys_abi.h
new file mode 100644
index 000000000000..64110311e552
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_sys_abi.h
@@ -0,0 +1,148 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SYS Core ABI Definitions
+ *
+ * Kernel-side definitions for the CMH SYS ABI.
+ * All constants and layouts derived from the CMH eSW ABI.
+ */
+
+#ifndef CMH_SYS_ABI_H
+#define CMH_SYS_ABI_H
+
+#include <linux/bits.h>
+#include <linux/types.h>
+
+/* SYS Commands (per CMH SYS ABI) */
+
+#define SYS_CMD_RUN            0x01U
+#define SYS_CMD_NOP            0x02U
+#define SYS_CMD_IMPORT         0x07U
+#define SYS_CMD_EXPORT         0x08U
+#define SYS_CMD_NEW            0x0AU
+#define SYS_CMD_READ           0x0BU
+#define SYS_CMD_WRITE          0x0CU
+#define SYS_CMD_GRANT          0x0DU
+#define SYS_CMD_LIST           0x0EU
+#define SYS_CMD_FIND           0x0FU
+#define SYS_CMD_DATA           0x11U
+
+/* SYS Reference Constants */
+
+#define SYS_REF_NONE           0x0000000000000000ULL
+#define SYS_REF_TEMP           0x1111111111111111ULL
+#define SYS_REF_LAST           0xFFFFFFFFFFFFFFFFULL
+
+typedef u64 sys_ref_t;
+
+/* SYS CID */
+
+#define SYS_CID_NONE           0x0000000000000000ULL
+
+/* SYS Type Encoding -- bits [7:0] = core_id, bits [23:16] = flags */
+
+#define SYS_TYPE_FLAG_PT       BIT(16)  /* can be read as plaintext */
+#define SYS_TYPE_FLAG_XC       BIT(17)  /* can be exported over XC bus */
+#define SYS_TYPE_FLAG_SCA      BIT(18)  /* SCA key in 2 shares */
+
+#define SYS_TYPE_SET(flags, core) \
+       (((flags) & 0xFF0000UL) | ((core) & 0xFFUL))
+#define SYS_TYPE_CORE(type)    ((type) & 0xFFU)
+#define SYS_TYPE_FLAGS(type)   ((type) & 0xFF0000U)
+#define SYS_TYPE_NONE          0U      /* DMA output, no DS storage */
+
+#define SYS_WRAP_HDR_SIZE      16      /* sys_read plaintext header */
+
+/* SYS Command Structures */
+
+struct sys_cmd_new {
+       u64 cid;        /* caller id (name) for the object */
+       u64 ref;        /* DMA address -- CMH eSW writes back reference here */
+       u32 len;        /* size of the new object in bytes */
+};
+
+struct sys_cmd_write {
+       u64 ref;        /* object datastore reference */
+       u64 src;        /* DMA source address of key data */
+       u64 key;        /* wrapping key reference (SYS_REF_NONE = plaintext) */
+       u32 len;        /* source buffer length */
+       u32 type;       /* SYS_TYPE_SET(flags, core_id) */
+};
+
+struct sys_cmd_read {
+       u64 ref;        /* object datastore reference */
+       u64 dst;        /* DMA destination for key data */
+       u64 key;        /* wrapping key reference (SYS_REF_NONE = plaintext) */
+       u32 len;        /* destination buffer length */
+};
+
+struct sys_cmd_data {
+       u64 ref;        /* object datastore reference */
+       u64 dst;        /* DMA destination for object data */
+       u32 len;        /* destination buffer length */
+};
+
+struct sys_cmd_find {
+       u64 cid;        /* caller id to search for */
+       u64 dst;        /* DMA destination for struct sys_list_item */
+       u32 len;        /* destination buffer length */
+};
+
+struct sys_cmd_list {
+       u64 ref;        /* starting DS reference (SYS_REF_NONE = first) */
+       u64 dst;        /* DMA destination for struct sys_list_item */
+       u32 len;        /* destination buffer length */
+};
+
+struct sys_cmd_grant {
+       u64 ref;        /* object datastore reference */
+       u64 read;       /* bitfield: allow read for mailboxes */
+       u64 write;      /* bitfield: allow write for mailboxes */
+       u64 execute;    /* bitfield: allow use for mailboxes */
+};
+
+struct sys_cmd_export {
+       u64 cid;        /* caller id for the response */
+       u64 dst;        /* DMA destination for the export blob */
+       u64 key;        /* wrapping key datastore reference */
+       u32 len;        /* destination buffer length */
+};
+
+struct sys_cmd_import {
+       u64 src;        /* DMA source address of import blob */
+       u64 key;        /* wrapping key datastore reference */
+       u32 len;        /* source buffer length */
+};
+
+/* SYS List/Find Response Item */
+
+struct sys_list_item {
+       u64 ref;        /* object datastore reference */
+       u64 cid;        /* caller id */
+       u32 len;        /* object length */
+       u32 type;       /* object type (SYS_TYPE_SET packed) */
+};
+
+/* Wrapped-read header (prepended to SYS_CMD_READ responses) */
+
+struct sys_wrap_hdr {
+       u64 cid;        /* caller id */
+       u32 wrap;       /* wrap data length following this header */
+       u32 len;        /* object data length following wrap data */
+};
+
+/* SYS Command Union */
+
+union sys_cmd {
+       struct sys_cmd_new      cmd_new;
+       struct sys_cmd_write    cmd_write;
+       struct sys_cmd_read     cmd_read;
+       struct sys_cmd_data     cmd_data;
+       struct sys_cmd_find     cmd_find;
+       struct sys_cmd_list     cmd_list;
+       struct sys_cmd_grant    cmd_grant;
+       struct sys_cmd_export   cmd_export;
+       struct sys_cmd_import   cmd_import;
+};
+
+#endif /* CMH_SYS_ABI_H */
diff --git a/drivers/crypto/cmh/include/cmh_sysfs.h b/drivers/crypto/cmh/include/cmh_sysfs.h
new file mode 100644
index 000000000000..864cf1c8fa00
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_sysfs.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- sysfs Device Attributes
+ */
+
+#ifndef CMH_SYSFS_H
+#define CMH_SYSFS_H
+
+struct attribute_group;
+
+extern const struct attribute_group *cmh_sysfs_groups[];
+
+#endif /* CMH_SYSFS_H */
diff --git a/drivers/crypto/cmh/include/cmh_txn.h b/drivers/crypto/cmh/include/cmh_txn.h
new file mode 100644
index 000000000000..6131f0b2224f
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_txn.h
@@ -0,0 +1,463 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Transaction Manager
+ *
+ * Dedicated kthread managing concurrent VCQ submissions.
+ *
+ * Callers post command_msg objects into the Command Message Queue (CMQ).
+ * The TM thread dequeues them, selects a mailbox, builds VCQ(s) in the
+ * DMA queue slot, creates a transaction_obj, and rings the doorbell.
+ *
+ * The Response Handler (cmh_rh.c) walks per-mailbox transaction queues
+ * when an IRQ fires and fires completion callbacks.
+ */
+
+#ifndef CMH_TXN_H
+#define CMH_TXN_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/completion.h>
+#include <linux/jiffies.h>
+#include <linux/refcount.h>
+#include <linux/mutex.h>
+#include <linux/timer.h>
+#include <crypto/algapi.h>
+
+#include "cmh_config.h"
+#include "cmh_vcq.h"
+
+/* Command Message (caller -> TM) */
+
+typedef void (*cmh_completion_fn)(void *data, int error);
+
+struct command_msg {
+       struct list_head    list;           /* CMQ linked list node */
+       u32                 command_id;     /* VCQ_CMD_ID(core, flags, span, cmd) */
+       void               *vcq_data;      /* heap-owned copy of VCQ entries */
+       u32                 vcq_count;      /* total vcq_cmd entries across all VCQs */
+       u32                 num_vcqs;       /* how many VCQs in vcq_data (0 or 1 = single) */
+       s32                 target_mbx;     /* MBX index from core affinity, or -1 fallback */
+       s32                 actual_mbx;     /* MBX selected by TM thread, -1 until dispatched */
+       cmh_completion_fn   complete;       /* completion callback (may be NULL) */
+       void               *completion_data;
+       refcount_t          refs;           /* submit_sync: 2 = waiter + TM */
+       bool                backlog_ok;     /* accept into backlog when CMQ is full */
+       unsigned long       timeout_jiffies;/* per-txn async timeout (0 = none) */
+};
+
+/* Transaction Object (TM -> RH) */
+
+/* Per-transaction FSM states for async timeout resolution */
+#define TXN_INFLIGHT   0
+#define TXN_COMPLETE   1
+#define TXN_TIMED_OUT  2
+
+struct transaction_obj {
+       struct list_head    list;           /* per-mailbox txn queue node */
+       u32                 first_vcq_id;
+       u32                 last_vcq_id;
+       u32                 mailbox_idx;    /* index into cfg->mailboxes[] */
+       u32                 command_id;     /* VCQ_CMD_ID from first payload cmd */
+       int                 error_code;
+       cmh_completion_fn   complete;
+       void               *completion_data;
+       atomic_t            state;          /* TXN_INFLIGHT / COMPLETE / TIMED_OUT */
+       struct timer_list   timeout_timer;  /* per-request async timeout */
+       refcount_t          refs;           /* owner + timer (if armed) */
+};
+
+/* Per-Mailbox Transaction Queue */
+
+struct cmh_mbx_txq {
+       struct list_head    head;
+       spinlock_t          lock;           /* protects head list + depth */
+       u32                 depth;          /* number of in-flight transactions */
+       struct mutex        dispatch_lock;  /* serialises VCQ dispatch + MBX flush */
+};
+
+/* Public Interface */
+
+/**
+ * cmh_tm_init() - Initialise the Transaction Manager
+ * @cfg: Global device configuration (mailbox layout, IRQ, etc.)
+ *
+ * Starts the TM kthread and initialises per-mailbox transaction queues.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int  cmh_tm_init(struct cmh_config *cfg);
+
+/**
+ * cmh_tm_cleanup() - Stop the TM kthread and drain all queues
+ */
+void cmh_tm_cleanup(void);
+
+/**
+ * cmh_tm_quiesce() - Stop TM kthread and drain in-flight transactions
+ *
+ * Stops the TM kthread, rejects new posts, then waits (with a
+ * configurable timeout) for all per-MBX transaction queues to drain.
+ * If the timeout fires, remaining transactions are cancelled with
+ * -ECANCELED.
+ */
+void cmh_tm_quiesce(void);
+
+/**
+ * cmh_tm_resume() - Restart the TM kthread after resume
+ *
+ * Return: 0 on success, negative errno if the kthread fails to start.
+ */
+int  cmh_tm_resume(void);
+
+/**
+ * cmh_tm_post_command() - Post a command to the TM for submission
+ * @msg: Command message with pre-built VCQ data and completion callback
+ *
+ * Round-robin selects the next MBX with enough free slots for
+ * msg->num_vcqs VCQs.  All VCQs in a message are written to
+ * consecutive slots on the same MBX (back-to-back).
+ * The caller retains ownership of @msg until the completion callback fires.
+ *
+ * Return: 0 on success, -EAGAIN if queue full, -ENODEV if TM stopped.
+ */
+int  cmh_tm_post_command(struct command_msg *msg);
+
+/*
+ * Synchronous submit -- post one or more VCQs and wait for completion.
+ *
+ * Combines post_command + refcounted wait + timeout + cancel into one
+ * call.  This is the standard pattern for all synchronous crypto ops.
+ *
+ * Context: must be called from a sleepable (task) context.
+ *          Performs GFP_KERNEL allocations and sleeps on
+ *          wait_for_completion_timeout().  A WARN_ON_ONCE fires
+ *          if called from atomic / IRQ / softirq context.
+ *
+ * vcq_cmds:   pre-built VCQ array (headers + commands, contiguous)
+ * vcq_count:  total number of vcq_cmd entries across all VCQs
+ * num_vcqs:   number of VCQs in the array (0 or 1 = single VCQ)
+ *
+ * For multi-VCQ submissions, the array contains multiple VCQs laid
+ * out contiguously, each starting with its own header.  All VCQs are
+ * written to consecutive MBX slots and share one transaction object.
+ *
+ * Returns 0 on success, -ETIMEDOUT, or CMH eSW error code.
+ */
+int  cmh_tm_submit_sync(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                       u32 num_vcqs);
+
+/*
+ * Synchronous submit pinned to a specific mailbox.
+ * target_mbx: -1 = round-robin, >= 0 = pin to that MBX index.
+ */
+int  cmh_tm_submit_sync_mbx(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                           u32 num_vcqs, s32 target_mbx);
+
+/*
+ * Synchronous submit with explicit timeout.
+ * timeout_hz: completion timeout in jiffies (use msecs_to_jiffies()).
+ */
+
+/*
+ * Extended timeout for slow crypto operations: RSA keygen, PQC
+ * keygen/sign/verify.  Controlled by the slow_op_timeout_ms module
+ * parameter.
+ */
+unsigned long cmh_tm_slow_op_timeout_jiffies(void);
+
+int  cmh_tm_submit_sync_tmo(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                           u32 num_vcqs, s32 target_mbx,
+                           unsigned long timeout_hz);
+
+/*
+ * Synchronous submit that never issues MBX_COMMAND_ABORT on timeout.
+ * Returns -EAGAIN if cancelled from queue, -EINPROGRESS if the VCQ is
+ * left in-flight.  On -EINPROGRESS, @orphan_cb(@orphan_data) will be
+ * called when the VCQ eventually completes (RH callback fires and the
+ * last sync_ctx ref drops).  Use this to defer DMA cleanup.
+ * Safe for background/kthread callers that must not disrupt other MBX work.
+ */
+int  cmh_tm_submit_sync_noabort(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                               u32 num_vcqs, unsigned long timeout_hz,
+                               void (*orphan_cb)(void *),
+                               void *orphan_data);
+
+/*
+ * Asynchronous submit -- post VCQs and return immediately.
+ *
+ * On successful return (0), the provided @callback may be invoked from
+ * either the RH threaded IRQ context (normal completion path) or the TM
+ * kthread (if VCQ dispatch to the HW ring fails after the message was
+ * posted to the CMQ).  The caller must not assume a specific callback
+ * context.
+ *
+ * After a successful post, the caller must NOT touch VCQ buffers --
+ * ownership transfers to the TM.  If this function returns non-zero,
+ * the message was not posted, the callback will NOT fire, and the caller
+ * must perform cleanup.
+ *
+ * Uses GFP_ATOMIC internally -- the crypto API may invoke driver ops
+ * from softirq context (e.g. IPsec), so GFP_KERNEL would deadlock.
+ *
+ * If @backlog_ok is true and the CMQ is full, the message is placed on
+ * an overflow backlog queue and -EBUSY is returned.  The caller must
+ * treat -EBUSY as "accepted" (like -EINPROGRESS): the callback WILL
+ * fire once the request is promoted from backlog and completes.  When
+ * @backlog_ok is false, CMQ-full returns -EAGAIN (caller must clean up).
+ *
+ * Returns: 0 on successful post, -EBUSY (backlogged -- callback will
+ *          fire), -ENOMEM, -EINVAL (bad vcq_count), -EAGAIN (CMQ full,
+ *          no backlog), -ENODEV.
+ */
+int  cmh_tm_submit_async(struct vcq_cmd *vcq_cmds, u32 vcq_count,
+                        u32 num_vcqs, s32 target_mbx,
+                        cmh_completion_fn callback, void *callback_data,
+                        bool backlog_ok, unsigned long timeout_jiffies);
+
+/**
+ * cmh_tm_async_timeout_jiffies() - Default per-request async timeout
+ *
+ * Returns the debugfs-configurable timeout for symmetric data-path
+ * ops (async_timeout_ms converted to jiffies).  Akcipher/kpp callers
+ * should pass 0 instead (no per-request timeout; vcq_timeout_ms is the
+ * safety net).
+ */
+unsigned long cmh_tm_async_timeout_jiffies(void);
+
+/**
+ * cmh_tm_flush_mbx() - Issue MBX_COMMAND_FLUSH and wait for completion
+ * @mbx_idx: Mailbox index
+ *
+ * Resets the eSW child mailbox state including the temp stack.
+ * Must be called when no VCQ submission is in progress on @mbx_idx.
+ *
+ * Return: 0 on success, -ETIMEDOUT if eSW does not clear the command,
+ *         -EBUSY if a command is already pending.
+ */
+int  cmh_tm_flush_mbx(s32 mbx_idx);
+
+/**
+ * cmh_tm_try_cancel_command() - Try to cancel a queued command
+ * @msg: Command message to cancel
+ *
+ * Return: true if removed from CMQ, false if already consumed by the TM thread.
+ */
+bool cmh_tm_try_cancel_command(struct command_msg *msg);
+
+/**
+ * cmh_tm_peek_transaction() - Peek at the oldest transaction on a mailbox
+ * @mbx_idx: Mailbox index
+ *
+ * For use by the Response Handler.  Caller must hold txq->lock or call
+ * from a context where no concurrent pop is possible (e.g. threaded IRQ).
+ *
+ * Return: Pointer to the oldest transaction_obj, or NULL if empty.
+ */
+struct transaction_obj *cmh_tm_peek_transaction(u32 mbx_idx);
+
+/**
+ * cmh_tm_pop_transaction() - Remove and return the oldest transaction
+ * @mbx_idx: Mailbox index
+ *
+ * Return: Pointer to the removed transaction_obj, or NULL if empty.
+ */
+struct transaction_obj *cmh_tm_pop_transaction(u32 mbx_idx);
+
+/**
+ * cmh_txn_finish() - Complete a transaction with FSM + timer handling
+ * @txn: Transaction popped from the TXQ
+ * @error: Error code (0 for success, negative errno)
+ *
+ * Resolves the timer-vs-completion race via atomic cmpxchg, cancels
+ * the per-txn timeout timer if still pending, fires the completion
+ * callback (if this path wins the race), and drops the owner reference.
+ * The transaction is freed when the last reference is dropped.
+ *
+ * Called by the Response Handler after popping a completed transaction.
+ */
+void cmh_txn_finish(struct transaction_obj *txn, int error);
+
+/**
+ * cmh_tm_max_cmds_per_vcq() - Max vcq_cmd entries per MBX slot
+ *
+ * Returns the minimum across all configured MBXes so callers can pack
+ * VCQs without knowing which MBX will be selected.
+ *
+ * Return: At least MIN_VCQ_CMDS (2).
+ */
+u32  cmh_tm_max_cmds_per_vcq(void);
+
+/**
+ * cmh_tm_mbx_count() - Return the number of configured mailboxes
+ *
+ * Return: cfg->mbx_count.
+ */
+u32  cmh_tm_mbx_count(void);
+
+/**
+ * cmh_core_default_id() - Return the default core_id for a core type
+ * @type: Logical core type enum
+ *
+ * Returns the core_id of the first (index-0) instance without advancing
+ * the round-robin counter.  Intended for callers pinned to a fixed MBX
+ * (e.g. mgmt ioctls on MGMT_MBX) that only need the VCQ core_id field.
+ *
+ * In multi-instance configurations the returned core_id is always that
+ * of instance[0], regardless of which MBX instance[0] is assigned to.
+ * Mgmt callers submit on MGMT_MBX (0) -- the eSW accepts any valid
+ * core_id on any MBX for command dispatch.
+ *
+ * Return: u32 core_id.
+ */
+u32  cmh_core_default_id(enum cmh_core_type type);
+
+/**
+ * cmh_core_select_instance() - Multi-instance core dispatch selection
+ * @type: Logical core type enum
+ *
+ * Returns the next (core_id, mbx_idx) pair for @type using round-robin
+ * across configured instances.  On first use for an instance whose MBX
+ * is not pre-assigned, atomically assigns the next available MBX.
+ *
+ * With single-instance defaults, this degenerates to the same behaviour
+ * as the old single-entry core_to_mbx[] table -- one core type, one MBX.
+ *
+ * Return: struct core_dispatch with core_id and mbx_idx.
+ */
+struct core_dispatch cmh_core_select_instance(enum cmh_core_type type);
+
+/**
+ * cmh_core_num_instances() - Return count of configured instances
+ * @type: Logical core type enum
+ *
+ * Return: Number of instances (>= 1) for @type.
+ */
+u32  cmh_core_num_instances(enum cmh_core_type type);
+
+/**
+ * cmh_core_get_instance() - Get a specific instance by index
+ * @type: Logical core type enum
+ * @idx: Instance index (0-based, must be < cmh_core_num_instances())
+ *
+ * Returns (core_id, mbx_idx) for the given instance without advancing
+ * the round-robin counter.  Triggers auto-assign if the instance has
+ * no MBX yet.
+ *
+ * Return: struct core_dispatch with core_id and mbx_idx.
+ */
+struct core_dispatch cmh_core_get_instance(enum cmh_core_type type, u32 idx);
+
+/**
+ * cmh_tm_affinity_reset() - Reset all core-to-MBX assignments
+ *
+ * Called during init and cleanup.
+ */
+void cmh_tm_affinity_reset(void);
+
+/**
+ * cmh_tm_txq_completion_notify() - Wake TM thread after TXQ completion
+ *
+ * Called by the Response Handler after completing a transaction to
+ * unblock the TM thread if it is waiting for a free MBX slot.
+ */
+void cmh_tm_txq_completion_notify(void);
+
+/*
+ * Pack @count payload commands (no headers) into one or more VCQs
+ * respecting the per-slot size limit, then submit synchronously.
+ *
+ * @payload:    flat array of vcq_cmd entries (no headers)
+ * @count:      number of entries in @payload
+ * @packed:     caller-provided scratch buffer for the packed output
+ * @max_packed: size of @packed in vcq_cmd entries
+ * @target_mbx: -1 = round-robin, >= 0 = pin to this MBX index
+ *
+ * Each VCQ gets its own header.  All VCQs are submitted as a single
+ * back-to-back transaction on the same MBX.
+ */
+int  cmh_vcq_pack_and_submit(const struct vcq_cmd *payload, u32 count,
+                            struct vcq_cmd *packed, u32 max_packed,
+                            s32 target_mbx);
+
+/**
+ * cmh_vcq_pack_and_submit_async() - Pack payload commands and submit async
+ * @payload: Flat array of VCQ command entries (no headers)
+ * @count: Number of entries in @payload
+ * @packed: Caller-provided scratch buffer for packed output
+ * @max_packed: Size of @packed in vcq_cmd entries
+ * @target_mbx: Mailbox index (-1 for round-robin)
+ * @callback: Completion callback
+ * @callback_data: Opaque data passed to @callback
+ * @backlog_ok: If true, accept into backlog when CMQ is full
+ * @timeout_jiffies: Per-request timeout (0 to disable)
+ *
+ * Async variant of cmh_vcq_pack_and_submit().  Returns 0 on successful
+ * post; after a successful post, @callback may run from RH threaded IRQ
+ * context on normal completion, from the TM kthread if VCQ dispatch
+ * fails after posting, or from TM teardown paths such as
+ * cmh_tm_cleanup() / cmh_tm_quiesce() when queued or in-flight work is
+ * cancelled.  Callers must not assume a single callback context.  On
+ * non-zero return, the callback will NOT fire.
+ *
+ * @payload:          flat array of vcq_cmd entries (no headers)
+ * @count:            number of entries in @payload
+ * @packed:           caller-provided scratch buffer for the packed output
+ * @max_packed:       size of @packed in vcq_cmd entries
+ * @target_mbx:       -1 = round-robin, >= 0 = pin to this MBX index
+ * @callback:         completion callback (may run from IRQ or TM context)
+ * @callback_data:    opaque pointer passed to @callback
+ * @backlog_ok:       if true, queue the request when all MBXs are busy
+ * @timeout_jiffies:  maximum wait time for MBX slot (0 = no wait)
+ *
+ * Return: 0 on successful post, -EBUSY (backlogged), negative errno on failure.
+ */
+int  cmh_vcq_pack_and_submit_async(const struct vcq_cmd *payload, u32 count,
+                                  struct vcq_cmd *packed, u32 max_packed,
+                                  s32 target_mbx,
+                                  cmh_completion_fn callback,
+                                  void *callback_data,
+                                  bool backlog_ok,
+                                  unsigned long timeout_jiffies);
+
+/* debugfs timeout accessors (debug builds only) */
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+unsigned int *cmh_tm_timeout_async_ptr(void);
+unsigned int *cmh_tm_timeout_vcq_ptr(void);
+unsigned int *cmh_tm_timeout_slow_op_ptr(void);
+unsigned int *cmh_tm_timeout_drain_ptr(void);
+#endif
+
+/* -- Crypto request completion helper ---------------------------------- */
+
+struct device *cmh_dev(void);
+
+/**
+ * cmh_complete() - Complete a crypto request with optional error logging
+ * @req: The async crypto request to complete
+ * @err: Error code (0 = success, -EINPROGRESS = backlog promotion signal)
+ *
+ * Logs a rate-limited diagnostic on genuine errors, then hands the
+ * request back to the crypto framework.  -EINPROGRESS is excluded from
+ * logging -- it is the crypto API's backlog promotion notification, not
+ * an error.  Centralizes error reporting so individual algorithm drivers
+ * do not need per-callback logging.
+ */
+static inline void cmh_complete(struct crypto_async_request *req, int err)
+{
+       if (err && err != -EINPROGRESS) {
+               /*
+                * For template instances (e.g. hmac(sha3-512-cmh)) the
+                * driver name will be the outer template's, not ours.
+                * Still useful for triage -- identifies the failing tfm.
+                */
+               dev_dbg_ratelimited(cmh_dev(), "op error: alg=%s err=%d\n",
+                                   crypto_tfm_alg_driver_name(req->tfm),
+                                   err);
+       }
+       crypto_request_complete(req, err);
+}
+
+#endif /* CMH_TXN_H */
diff --git a/drivers/crypto/cmh/include/cmh_vcq.h b/drivers/crypto/cmh/include/cmh_vcq.h
new file mode 100644
index 000000000000..a9d04635d819
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_vcq.h
@@ -0,0 +1,283 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- VCQ (Virtual Command Queue) Definitions
+ *
+ * Kernel-side definitions for the CMH VCQ and DMA scatter-gather ABI,
+ * so the LKM can build VCQs without depending on CMH eSW headers.
+ *
+ * All constants and layouts are derived from the CMH eSW ABI.
+ *
+ * Per-core command definitions live in their own ABI headers (cmh_hc_abi.h,
+ * cmh_aes_abi.h, etc.) and are included here to form the hwc_cmd union.
+ */
+
+#ifndef CMH_VCQ_H
+#define CMH_VCQ_H
+
+#include <linux/types.h>
+#include <linux/build_bug.h>
+#include <linux/string.h>
+#include <linux/bits.h>
+
+#include "cmh_hc_abi.h"
+#include "cmh_sm3_abi.h"
+#include "cmh_drbg_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_kic_abi.h"
+#include "cmh_aes_abi.h"
+#include "cmh_sm4_abi.h"
+#include "cmh_ccp_abi.h"
+#include "cmh_pke_abi.h"
+#include "cmh_qse_abi.h"
+#include "cmh_hcq_abi.h"
+#include "cmh_eac_abi.h"
+
+/* VCQ Magic Numbers */
+
+#define VCQ_HDR_MAGIC           0x01514356U  /* 'V' 'C' 'Q' 0x01 */
+#define VCQ_CMD_MAGIC           0x01444D43U  /* 'C' 'M' 'D' 0x01 */
+
+/* VCQ Command ID Encoding */
+
+#define VCQ_CMD_MASK            0x000000FFU
+#define VCQ_SPAN_MASK           0x0000FF00U
+#define VCQ_FLAG_MASK           0x00FF0000U
+#define VCQ_CORE_MASK           0xFF000000U
+
+#define VCQ_CMD_ID(core, flags, span, cmd) \
+       (((u32)(core) << 24) | ((flags) & VCQ_FLAG_MASK) | \
+        (((u32)(span) << 8) & VCQ_SPAN_MASK) | ((cmd) & VCQ_CMD_MASK))
+
+/* Core IDs (per CMH hardware specification) */
+
+#define CORE_ID_SYS             0x00U
+#define CORE_ID_DMA             0x01U
+#define CORE_ID_HC              0x02U
+#define CORE_ID_AES             0x03U
+#define CORE_ID_SM4             0x04U
+#define CORE_ID_SM3             0x05U
+#define CORE_ID_XC              0x07U
+#define CORE_ID_HCQ             0x08U
+#define CORE_ID_QSE             0x09U
+#define CORE_ID_PKE             0x0AU
+#define CORE_ID_TIC             0x0BU
+#define CORE_ID_KIC             0x0CU
+#define CORE_ID_MPU             0x0EU
+#define CORE_ID_DRBG            0x0FU
+#define CORE_ID_EMC             0x11U
+#define CORE_ID_CCP             0x18U
+#define CORE_ID_EAC             0x1EU
+#define CORE_ID_NUM             0x1FU  /* eSW g_drvs[] array size sentinel */
+#define CORE_ID_MAX             0xFFU  /* VCQ encoding limit (8-bit field) */
+
+/**
+ * enum cmh_core_type - Logical core type for multi-instance dispatch
+ * @CMH_CORE_HC:         Hash / HMAC / CSHAKE / KMAC (CORE_ID_HC)
+ * @CMH_CORE_AES:        AES (CORE_ID_AES)
+ * @CMH_CORE_SM4:        SM4 (CORE_ID_SM4)
+ * @CMH_CORE_SM3:        SM3 (CORE_ID_SM3)
+ * @CMH_CORE_CCP:        ChaCha20 / Poly1305 (CORE_ID_CCP)
+ * @CMH_CORE_PKE:        RSA / ECDSA / ECDH / EdDSA / SM2 (CORE_ID_PKE)
+ * @CMH_CORE_QSE:        ML-KEM / ML-DSA (CORE_ID_QSE)
+ * @CMH_CORE_HCQ:        SLH-DSA / LMS / XMSS (CORE_ID_HCQ)
+ * @CMH_NUM_CORE_TYPES:  Number of core types (array sizing sentinel)
+ *
+ * Algorithm drivers use this enum (not raw CORE_ID_* constants) for
+ * MBX selection and VCQ dispatch.  Each value indexes into a config
+ * table that maps to one or more (core_id, mbx) pairs.
+ *
+ * Raw CORE_ID_* defines remain for:
+ *   - SYS_TYPE_SET() key-type tags in datastore operations
+ *   - DT child node ``reg`` values (hardware core identity for config lookup)
+ *   - Singleton system cores (SYS, KIC, DRBG, EAC) not in this enum
+ */
+enum cmh_core_type {
+       CMH_CORE_HC = 0,
+       CMH_CORE_AES,
+       CMH_CORE_SM4,
+       CMH_CORE_SM3,
+       CMH_CORE_CCP,
+       CMH_CORE_PKE,
+       CMH_CORE_QSE,
+       CMH_CORE_HCQ,
+       CMH_NUM_CORE_TYPES
+};
+
+/**
+ * struct core_dispatch - VCQ dispatch target returned by core selection
+ * @core_id: Hardware core ID to encode in VCQ_CMD_ID()
+ * @mbx_idx: Mailbox index to submit the VCQ to
+ */
+struct core_dispatch {
+       u32 core_id;
+       s32 mbx_idx;
+};
+
+/* Common VCQ Command (per CMH VCQ ABI) */
+
+#define VCQ_CMD_FLUSH           0xFFU
+
+/**
+ * struct vcq_hdr - VCQ header occupying the first slot of every VCQ
+ * @cmds: Total number of commands including the header itself
+ * @rsvd: Reserved -- used internally by CMH eSW firmware
+ */
+struct vcq_hdr {
+       u32 cmds;
+       u32 rsvd[13];
+};
+
+/* DMA Scatter-Gather Item (per CMH DMAC hardware specification) */
+
+/**
+ * struct dma_scattergather_item - DMA scatter-gather descriptor node
+ * @lli: Next descriptor address (0 = end of list)
+ * @src: Source address for input particle
+ * @dst: Destination address for output particle
+ * @len: Particle length (low 32 bits used by hardware)
+ *
+ * Linked-list node walked by the DMAC hardware.  @lli chains to the
+ * next item or is zero for end-of-list.
+ */
+struct dma_scattergather_item {
+       u64 lli;
+       u64 src;
+       u64 dst;
+       u64 len;
+};
+
+/* Unified HWC Command Union */
+/*
+ * Each per-core ABI header defines a union <core>_cmd.
+ * Add new cores here as they are implemented.
+ */
+
+union hwc_cmd {
+       struct vcq_hdr          hdr;
+       union hc_cmd            hc;
+       union sm3_cmd           sm3;
+       union drbg_cmd          drbg;
+       union sys_cmd           sys;
+       union kic_cmd           kic;
+       union aes_cmd           aes;
+       union sm4_cmd           sm4;
+       union ccp_cmd           ccp;
+       union pke_cmd           pke;
+       union qse_cmd           qse;
+       union hcq_cmd           hcq;
+       union eac_cmd           eac;
+};
+
+/**
+ * struct vcq_cmd - Single VCQ command entry (always 64 bytes)
+ * @magic: VCQ_HDR_MAGIC for the header slot, VCQ_CMD_MAGIC for commands
+ * @id:    Encoded command ID built via VCQ_CMD_ID(core, flags, span, cmd)
+ * @hwc:   Per-core command payload union
+ */
+struct vcq_cmd {
+       u32 magic;
+       u32 id;
+       union hwc_cmd hwc;
+};
+
+static_assert(sizeof(struct vcq_cmd) == 64,
+             "struct vcq_cmd must be exactly 64 bytes (one VCQ slot)");
+
+/**
+ * vcq_set_header() - Write the standard VCQ header at slot[0]
+ * @slot:       Pointer to the first VCQ slot
+ * @total_cmds: Total number of commands including the header
+ */
+static inline void vcq_set_header(struct vcq_cmd *slot, u32 total_cmds)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_HDR_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_RUN);
+       slot->hwc.hdr.cmds = total_cmds;
+}
+
+/* VCQ Command Limits */
+
+#define MIN_VCQ_CMDS            2U   /* header + at least one command */
+#define MAX_VCQ_CMDS            15U  /* including the header */
+#define MAX_VCQ_SIZE            (MAX_VCQ_CMDS * sizeof(struct vcq_cmd))
+
+/**
+ * vcq_add_inline_data() - Pack inline data into consecutive VCQ slots
+ * @slot:     Pointer to the command slot preceding the inline data
+ * @data:     Source data to copy into subsequent slots
+ * @data_len: Length of @data in bytes
+ *
+ * Appends data starting at slot+1 and updates the span field in
+ * slot->id.  The caller must ensure enough slots are reserved.
+ *
+ * Return: Total number of slots consumed (1 + inline slots).
+ */
+static inline u32 vcq_add_inline_data(struct vcq_cmd *slot,
+                                     const void *data, u32 data_len)
+{
+       u32 inline_slots, total_span;
+
+       if (!data_len)
+               return 1;
+
+       inline_slots = (data_len + sizeof(struct vcq_cmd) - 1) /
+                      sizeof(struct vcq_cmd);
+       total_span = 1 + inline_slots;
+
+       /* Zero the inline slots, then copy data */
+       memset(slot + 1, 0, inline_slots * sizeof(struct vcq_cmd));
+       memcpy(slot + 1, data, data_len);
+
+       /* Update span in the command's id field */
+       slot->id = (slot->id & ~VCQ_SPAN_MASK) |
+                  (((u32)total_span << 8) & VCQ_SPAN_MASK);
+
+       return total_span;
+}
+
+/**
+ * vcq_add_flush() - Build a generic VCQ_CMD_FLUSH command
+ * @slot:    Pointer to the VCQ slot to populate
+ * @core_id: Hardware core ID for the flush command
+ */
+static inline void vcq_add_flush(struct vcq_cmd *slot, u32 core_id)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, VCQ_CMD_FLUSH);
+}
+
+/* Shared HC VCQ Builders -- used by hash, hmac, cshake, kmac drivers */
+
+static inline void vcq_add_hc_init(struct vcq_cmd *slot, u32 core_id,
+                                  u32 algo)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_INIT);
+       slot->hwc.hc.cmd_init.algo = algo;
+}
+
+static inline void vcq_add_hc_final(struct vcq_cmd *slot, u32 core_id,
+                                   u64 digest_phys, u32 outlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_FINAL);
+       slot->hwc.hc.cmd_final.digest = digest_phys;
+       slot->hwc.hc.cmd_final.outlen = outlen;
+}
+
+static inline void vcq_add_hc_gather(struct vcq_cmd *slot, u32 core_id,
+                                    u64 lista_phys, u32 sgcmd)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_GATHER);
+       slot->hwc.hc.cmd_gather.lista = lista_phys;
+       slot->hwc.hc.cmd_gather.sgcmd = sgcmd;
+}
+
+#endif /* CMH_VCQ_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 16/19] crypto: cmh - add SLH-DSA/LMS/XMSS (HCQ)
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register SLH-DSA, LMS, LMS-HSS, XMSS, and XMSS-MT algorithms
using the CMH HCQ core (core ID 0x08).  SLH-DSA is registered as a
sig algorithm with sign and verify support.  LMS, LMS-HSS, XMSS,
and XMSS-MT are registered as verify-only sig algorithms: their
stateful signing semantics (one-time-key private state the signer
must track) are not modeled by the kernel crypto API, so only
verification is exposed.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile         |   6 +-
 drivers/crypto/cmh/cmh_hcq.c        | 313 +++++++++++++++++++++++
 drivers/crypto/cmh/cmh_main.c       |  24 ++
 drivers/crypto/cmh/cmh_pqc_lms.c    | 230 +++++++++++++++++
 drivers/crypto/cmh/cmh_pqc_slhdsa.c | 377 ++++++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_pqc_xmss.c   | 230 +++++++++++++++++
 6 files changed, 1179 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_hcq.c
 create mode 100644 drivers/crypto/cmh/cmh_pqc_lms.c
 create mode 100644 drivers/crypto/cmh/cmh_pqc_slhdsa.c
 create mode 100644 drivers/crypto/cmh/cmh_pqc_xmss.c

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 3425eb65d653..c3332804a9d7 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -36,7 +36,11 @@ cmh-y := \
        cmh_pke_ecdh.o \
        cmh_qse.o \
        cmh_pqc_mldsa.o \
-       cmh_pqc_sizes.o
+       cmh_pqc_sizes.o \
+       cmh_hcq.o \
+       cmh_pqc_slhdsa.o \
+       cmh_pqc_lms.o \
+       cmh_pqc_xmss.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_hcq.c b/drivers/crypto/cmh/cmh_hcq.c
new file mode 100644
index 000000000000..8fc3a5cb0f9f
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_hcq.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- HCQ Core VCQ Builders
+ *
+ * VCQ builder functions for SLH-DSA, LMS, and XMSS commands.
+ * Each function populates a single vcq_cmd slot.  Callers assemble
+ * complete VCQs with header + command(s) + flush, then submit via
+ * cmh_tm_submit_sync().
+ */
+
+#include <linux/string.h>
+
+#include "cmh_sys.h"
+
+/* -- HCQ flush -- */
+
+/**
+ * vcq_add_hcq_flush() - Build an HCQ flush VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ */
+void vcq_add_hcq_flush(struct vcq_cmd *slot, u32 core_id)
+{
+       vcq_add_flush(slot, core_id);
+}
+
+/* -- SLH-DSA -- */
+
+/**
+ * vcq_add_hcq_slhdsa_keygen() - Build an SLH-DSA key generation VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @param_set: SLH-DSA parameter set identifier
+ * @seed_len: Length of seed buffer in bytes
+ * @pk_len: Length of public key buffer in bytes
+ * @sk_len: Length of secret key buffer in bytes
+ * @seed: DMA address of seed input buffer
+ * @pk: DMA address of public key output buffer
+ * @sk: DMA address of secret key output buffer
+ */
+void vcq_add_hcq_slhdsa_keygen(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                              u32 seed_len, u32 pk_len, u32 sk_len,
+                              u64 seed, u64 pk, u64 sk)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HCQ_CMD_SLHDSA_KEYGEN);
+       slot->hwc.hcq.cmd_slhdsa_keygen.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_keygen.seed_len = seed_len;
+       slot->hwc.hcq.cmd_slhdsa_keygen.pk_len = pk_len;
+       slot->hwc.hcq.cmd_slhdsa_keygen.sk_len = sk_len;
+       slot->hwc.hcq.cmd_slhdsa_keygen.seed = seed;
+       slot->hwc.hcq.cmd_slhdsa_keygen.pk = pk;
+       slot->hwc.hcq.cmd_slhdsa_keygen.sk = sk;
+}
+
+/**
+ * vcq_add_hcq_slhdsa_sign() - Build an SLH-DSA signing VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @param_set: SLH-DSA parameter set identifier
+ * @msg_len: Length of message buffer in bytes
+ * @ctx_len: Length of context string in bytes
+ * @add_random: DMA address of additional randomness buffer
+ * @msg: DMA address of message buffer
+ * @ctx: DMA address of context string buffer
+ * @sk: DMA address of secret key buffer
+ * @sig: DMA address of signature output buffer
+ */
+void vcq_add_hcq_slhdsa_sign(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                            u32 msg_len, u32 ctx_len,
+                            u64 add_random, u64 msg, u64 ctx,
+                            u64 sk, u64 sig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HCQ_CMD_SLHDSA_SIGN);
+       slot->hwc.hcq.cmd_slhdsa_sign.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_sign.message_len = msg_len;
+       slot->hwc.hcq.cmd_slhdsa_sign.add_random = add_random;
+       slot->hwc.hcq.cmd_slhdsa_sign.message = msg;
+       slot->hwc.hcq.cmd_slhdsa_sign.context = ctx;
+       slot->hwc.hcq.cmd_slhdsa_sign.sk = sk;
+       slot->hwc.hcq.cmd_slhdsa_sign.sig = sig;
+       slot->hwc.hcq.cmd_slhdsa_sign.context_len = ctx_len;
+}
+
+/**
+ * vcq_add_hcq_slhdsa_sign_internal() - Build an SLH-DSA internal signing VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @param_set: SLH-DSA parameter set identifier
+ * @msg_len: Length of message buffer in bytes
+ * @add_random: DMA address of additional randomness buffer
+ * @msg: DMA address of message buffer
+ * @sk: DMA address of secret key buffer
+ * @sig: DMA address of signature output buffer
+ */
+void vcq_add_hcq_slhdsa_sign_internal(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                                     u32 msg_len, u64 add_random,
+                                     u64 msg, u64 sk, u64 sig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HCQ_CMD_SLHDSA_SIGN_INTERNAL);
+       slot->hwc.hcq.cmd_slhdsa_sign_internal.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_sign_internal.message_len = msg_len;
+       slot->hwc.hcq.cmd_slhdsa_sign_internal.add_random = add_random;
+       slot->hwc.hcq.cmd_slhdsa_sign_internal.message = msg;
+       slot->hwc.hcq.cmd_slhdsa_sign_internal.sk = sk;
+       slot->hwc.hcq.cmd_slhdsa_sign_internal.sig = sig;
+}
+
+/**
+ * vcq_add_hcq_slhdsa_verify() - Build an SLH-DSA verification VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @param_set: SLH-DSA parameter set identifier
+ * @msg_len: Length of message buffer in bytes
+ * @ctx_len: Length of context string in bytes
+ * @msg: DMA address of message buffer
+ * @ctx: DMA address of context string buffer
+ * @pk: DMA address of public key buffer
+ * @sig: DMA address of signature buffer to verify
+ */
+void vcq_add_hcq_slhdsa_verify(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                              u32 msg_len, u32 ctx_len,
+                              u64 msg, u64 ctx, u64 pk, u64 sig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HCQ_CMD_SLHDSA_VERIFY);
+       slot->hwc.hcq.cmd_slhdsa_verify.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_verify.message_len = msg_len;
+       slot->hwc.hcq.cmd_slhdsa_verify.message = msg;
+       slot->hwc.hcq.cmd_slhdsa_verify.context = ctx;
+       slot->hwc.hcq.cmd_slhdsa_verify.pk = pk;
+       slot->hwc.hcq.cmd_slhdsa_verify.sig = sig;
+       slot->hwc.hcq.cmd_slhdsa_verify.context_len = ctx_len;
+}
+
+/**
+ * vcq_add_hcq_slhdsa_sign_prehash() - Build an SLH-DSA prehash signing VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @cmd: VCQ command ID (sign-prehash variant)
+ * @param_set: SLH-DSA parameter set identifier
+ * @prehash_algo: Prehash algorithm identifier
+ * @msg_len: Length of message buffer in bytes
+ * @ctx_len: Length of context string in bytes
+ * @add_random: DMA address of additional randomness buffer
+ * @msg: DMA address of message buffer
+ * @ctx: DMA address of context string buffer
+ * @sk: DMA address of secret key buffer
+ * @sig: DMA address of signature output buffer
+ */
+void vcq_add_hcq_slhdsa_sign_prehash(struct vcq_cmd *slot, u32 core_id,
+                                    u32 cmd, u32 param_set, u32 prehash_algo,
+                                    u32 msg_len, u32 ctx_len,
+                                    u64 add_random, u64 msg, u64 ctx,
+                                    u64 sk, u64 sig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, cmd);
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.prehash_algo = prehash_algo;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.message_len = msg_len;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.context_len = ctx_len;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.add_random = add_random;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.message = msg;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.context = ctx;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.sk = sk;
+       slot->hwc.hcq.cmd_slhdsa_sign_prehash.sig = sig;
+}
+
+/**
+ * vcq_add_hcq_slhdsa_verify_prehash() - Build an SLH-DSA prehash verify VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @cmd: VCQ command ID (verify-prehash variant)
+ * @param_set: SLH-DSA parameter set identifier
+ * @prehash_algo: Prehash algorithm identifier
+ * @msg_len: Length of message buffer in bytes
+ * @ctx_len: Length of context string in bytes
+ * @msg: DMA address of message buffer
+ * @ctx: DMA address of context string buffer
+ * @pk: DMA address of public key buffer
+ * @sig: DMA address of signature buffer to verify
+ */
+void vcq_add_hcq_slhdsa_verify_prehash(struct vcq_cmd *slot, u32 core_id,
+                                      u32 cmd, u32 param_set, u32 prehash_algo,
+                                      u32 msg_len, u32 ctx_len,
+                                      u64 msg, u64 ctx, u64 pk, u64 sig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, cmd);
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.prehash_algo = prehash_algo;
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.message_len = msg_len;
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.context_len = ctx_len;
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.message = msg;
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.context = ctx;
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.pk = pk;
+       slot->hwc.hcq.cmd_slhdsa_verify_prehash.sig = sig;
+}
+
+/**
+ * vcq_add_hcq_slhdsa_verify_internal() - Build an SLH-DSA internal verify VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @param_set: SLH-DSA parameter set identifier
+ * @msg_len: Length of message buffer in bytes
+ * @msg: DMA address of message buffer
+ * @pk: DMA address of public key buffer
+ * @sig: DMA address of signature buffer to verify
+ */
+void vcq_add_hcq_slhdsa_verify_internal(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                                       u32 msg_len, u64 msg, u64 pk, u64 sig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1,
+                             HCQ_CMD_SLHDSA_VERIFY_INTERNAL);
+       slot->hwc.hcq.cmd_slhdsa_verify_internal.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_verify_internal.message_len = msg_len;
+       slot->hwc.hcq.cmd_slhdsa_verify_internal.message = msg;
+       slot->hwc.hcq.cmd_slhdsa_verify_internal.pk = pk;
+       slot->hwc.hcq.cmd_slhdsa_verify_internal.sig = sig;
+}
+
+/**
+ * vcq_add_hcq_slhdsa_pubgen() - Build an SLH-DSA public key generation VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @param_set: SLH-DSA parameter set identifier
+ * @sk_len: Length of secret key buffer in bytes
+ * @sk: DMA address of secret key input buffer
+ * @pk: DMA address of public key output buffer
+ */
+void vcq_add_hcq_slhdsa_pubgen(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                              u32 sk_len, u64 sk, u64 pk)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HCQ_CMD_SLHDSA_PUBGEN);
+       slot->hwc.hcq.cmd_slhdsa_pubgen.parameter_set = param_set;
+       slot->hwc.hcq.cmd_slhdsa_pubgen.sk_len = sk_len;
+       slot->hwc.hcq.cmd_slhdsa_pubgen.sk = sk;
+       slot->hwc.hcq.cmd_slhdsa_pubgen.pk = pk;
+}
+
+/* -- LMS -- */
+
+/**
+ * vcq_add_hcq_lms_verify() - Build an LMS/HSS signature verify VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @lms_hss: LMS/HSS mode flag (0 = LMS, 1 = HSS)
+ * @pk_len: Length of public key buffer in bytes
+ * @sig_len: Length of signature buffer in bytes
+ * @dig_len: Length of digest buffer in bytes
+ * @pk: DMA address of public key buffer
+ * @sig: DMA address of signature buffer
+ * @dig: DMA address of digest buffer
+ */
+void vcq_add_hcq_lms_verify(struct vcq_cmd *slot, u32 core_id, u32 lms_hss,
+                           u32 pk_len, u32 sig_len, u32 dig_len,
+                           u64 pk, u64 sig, u64 dig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HCQ_CMD_LMS_VERIFY);
+       slot->hwc.hcq.cmd_lms_verify.lms_hss = lms_hss;
+       slot->hwc.hcq.cmd_lms_verify.pk_len = pk_len;
+       slot->hwc.hcq.cmd_lms_verify.sig_len = sig_len;
+       slot->hwc.hcq.cmd_lms_verify.dig_len = dig_len;
+       slot->hwc.hcq.cmd_lms_verify.pk = pk;
+       slot->hwc.hcq.cmd_lms_verify.sig = sig;
+       slot->hwc.hcq.cmd_lms_verify.dig = dig;
+}
+
+/* -- XMSS -- */
+
+/**
+ * vcq_add_hcq_xmss_verify() - Build an XMSS/XMSS^MT signature verify VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @xmss_mt: XMSS/XMSS^MT mode flag (0 = XMSS, 1 = XMSS^MT)
+ * @pk_len: Length of public key buffer in bytes
+ * @sig_len: Length of signature buffer in bytes
+ * @dig_len: Length of digest buffer in bytes
+ * @pk: DMA address of public key buffer
+ * @sig: DMA address of signature buffer
+ * @dig: DMA address of digest buffer
+ */
+void vcq_add_hcq_xmss_verify(struct vcq_cmd *slot, u32 core_id, u32 xmss_mt,
+                            u32 pk_len, u32 sig_len, u32 dig_len,
+                            u64 pk, u64 sig, u64 dig)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HCQ_CMD_XMSS_VERIFY);
+       slot->hwc.hcq.cmd_xmss_verify.xmss_mt = xmss_mt;
+       slot->hwc.hcq.cmd_xmss_verify.pk_len = pk_len;
+       slot->hwc.hcq.cmd_xmss_verify.sig_len = sig_len;
+       slot->hwc.hcq.cmd_xmss_verify.dig_len = dig_len;
+       slot->hwc.hcq.cmd_xmss_verify.pk = pk;
+       slot->hwc.hcq.cmd_xmss_verify.sig = sig;
+       slot->hwc.hcq.cmd_xmss_verify.dig = dig;
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index df38f43dc179..03c127083507 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -297,6 +297,21 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_pqc_mldsa_register;

+       /* Register PQC SLH-DSA */
+       ret = cmh_pqc_slhdsa_register();
+       if (ret)
+               goto err_pqc_slhdsa_register;
+
+       /* Register PQC LMS */
+       ret = cmh_pqc_lms_register();
+       if (ret)
+               goto err_pqc_lms_register;
+
+       /* Register PQC XMSS */
+       ret = cmh_pqc_xmss_register();
+       if (ret)
+               goto err_pqc_xmss_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -309,6 +324,12 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_pqc_xmss_unregister();
+err_pqc_xmss_register:
+       cmh_pqc_lms_unregister();
+err_pqc_lms_register:
+       cmh_pqc_slhdsa_unregister();
+err_pqc_slhdsa_register:
        cmh_pqc_mldsa_unregister();
 err_pqc_mldsa_register:
        cmh_pke_ecdh_unregister();
@@ -373,6 +394,9 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_pqc_xmss_unregister();
+       cmh_pqc_lms_unregister();
+       cmh_pqc_slhdsa_unregister();
        cmh_pqc_mldsa_unregister();
        cmh_pke_ecdh_unregister();
        cmh_pke_ecdsa_unregister();
diff --git a/drivers/crypto/cmh/cmh_pqc_lms.c b/drivers/crypto/cmh/cmh_pqc_lms.c
new file mode 100644
index 000000000000..13b2e26aa7bd
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pqc_lms.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- LMS/HSS Signature Driver (verify-only, sig_alg, synchronous)
+ *
+ * Registers "lms" and "lms-hss" sig algorithms with verify-only
+ * callbacks.  Sign is not supported (stateful signature -- key
+ * management must happen externally).
+ *
+ * Verify: src = raw signature, digest = message bytes
+ * Public key: raw pk bytes (variable length, set via set_pub_key)
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <crypto/sig.h>
+#include <crypto/internal/sig.h>
+
+#include "cmh_sys.h"
+#include "cmh_hcq_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_pqc.h"
+
+#define LMS_VCQ_CMDS   3       /* header + cmd + flush */
+
+struct cmh_lms_tfm_ctx {
+       u8 *pub_key;
+       u32 pub_key_len;
+       u32 lms_hss;            /* 0 = LMS, 1 = LMS-HSS */
+};
+
+static inline struct cmh_lms_tfm_ctx *cmh_lms_ctx(struct crypto_sig *tfm)
+{
+       return crypto_sig_ctx(tfm);
+}
+
+/*
+ * LMS/HSS verify (synchronous sig_alg)
+ *
+ * @src:    raw signature
+ * @slen:   signature length
+ * @digest: message bytes
+ * @dlen:   message length
+ *
+ * Returns 0 on successful verification, negative errno on failure.
+ */
+static int cmh_lms_verify(struct crypto_sig *tfm,
+                         const void *src, unsigned int slen,
+                         const void *digest, unsigned int dlen)
+{
+       struct cmh_lms_tfm_ctx *ctx = cmh_lms_ctx(tfm);
+       struct core_dispatch d = cmh_core_select_instance(CMH_CORE_HCQ);
+       struct vcq_cmd vcq[LMS_VCQ_CMDS];
+       u8 *sig_buf = NULL, *m_buf = NULL, *pk_buf = NULL;
+       dma_addr_t sig_dma = DMA_MAPPING_ERROR;
+       dma_addr_t m_dma = DMA_MAPPING_ERROR;
+       dma_addr_t pk_dma = DMA_MAPPING_ERROR;
+       int ret;
+
+       if (!ctx->pub_key)
+               return -EINVAL;
+       if (!slen || slen > LMS_MAX_SIG_LEN)
+               return -EINVAL;
+       if (!dlen || dlen > LMS_MAX_MSG_LEN)
+               return -EINVAL;
+
+       sig_buf = kmemdup(src, slen, GFP_KERNEL);
+       m_buf = kmemdup(digest, dlen, GFP_KERNEL);
+       pk_buf = kmemdup(ctx->pub_key, ctx->pub_key_len, GFP_KERNEL);
+       if (!sig_buf || !m_buf || !pk_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       sig_dma = cmh_dma_map_single(sig_buf, slen, DMA_TO_DEVICE);
+       m_dma = cmh_dma_map_single(m_buf, dlen, DMA_TO_DEVICE);
+       pk_dma = cmh_dma_map_single(pk_buf, ctx->pub_key_len, DMA_TO_DEVICE);
+
+       if (cmh_dma_map_error(sig_dma) || cmh_dma_map_error(m_dma) ||
+           cmh_dma_map_error(pk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], LMS_VCQ_CMDS);
+       vcq_add_hcq_lms_verify(&vcq[1], d.core_id, ctx->lms_hss,
+                              ctx->pub_key_len, slen, dlen,
+                              pk_dma, sig_dma, m_dma);
+       vcq_add_hcq_flush(&vcq[2], d.core_id);
+
+       /* LMS verify traverses Merkle hash chains -- inherently slow */
+       ret = cmh_tm_submit_sync_tmo(vcq, LMS_VCQ_CMDS, 1, d.mbx_idx,
+                                    cmh_tm_slow_op_timeout_jiffies());
+
+out_unmap:
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, ctx->pub_key_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, dlen, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, slen, DMA_TO_DEVICE);
+
+out_free:
+       kfree(pk_buf);
+       kfree(m_buf);
+       kfree(sig_buf);
+       return ret;
+}
+
+static int cmh_lms_set_pub_key(struct crypto_sig *tfm,
+                              const void *key, unsigned int keylen)
+{
+       struct cmh_lms_tfm_ctx *ctx = cmh_lms_ctx(tfm);
+
+       if (!keylen || keylen > LMS_MAX_PK_LEN)
+               return -EINVAL;
+
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+       ctx->pub_key_len = 0;
+
+       ctx->pub_key = kmemdup(key, keylen, GFP_KERNEL);
+       if (!ctx->pub_key)
+               return -ENOMEM;
+
+       ctx->pub_key_len = keylen;
+       return 0;
+}
+
+static unsigned int cmh_lms_key_size(struct crypto_sig *tfm)
+{
+       struct cmh_lms_tfm_ctx *ctx = cmh_lms_ctx(tfm);
+
+       return ctx->pub_key_len * 8;
+}
+
+static int cmh_lms_init(struct crypto_sig *tfm)
+{
+       struct cmh_lms_tfm_ctx *ctx = cmh_lms_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       return 0;
+}
+
+static int cmh_lms_hss_init(struct crypto_sig *tfm)
+{
+       struct cmh_lms_tfm_ctx *ctx = cmh_lms_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->lms_hss = 1;
+       return 0;
+}
+
+static void cmh_lms_exit(struct crypto_sig *tfm)
+{
+       struct cmh_lms_tfm_ctx *ctx = cmh_lms_ctx(tfm);
+
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+}
+
+static struct sig_alg cmh_lms_algs[] = {
+       {
+               .verify         = cmh_lms_verify,
+               .set_pub_key    = cmh_lms_set_pub_key,
+               .key_size       = cmh_lms_key_size,
+               .init           = cmh_lms_init,
+               .exit           = cmh_lms_exit,
+               .base = {
+                       .cra_name         = "lms",
+                       .cra_driver_name  = "cri-cmh-lms",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_lms_tfm_ctx),
+               },
+       },
+       {
+               .verify         = cmh_lms_verify,
+               .set_pub_key    = cmh_lms_set_pub_key,
+               .key_size       = cmh_lms_key_size,
+               .init           = cmh_lms_hss_init,
+               .exit           = cmh_lms_exit,
+               .base = {
+                       .cra_name         = "lms-hss",
+                       .cra_driver_name  = "cri-cmh-lms-hss",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_lms_tfm_ctx),
+               },
+       },
+};
+
+/**
+ * cmh_pqc_lms_register() - Register LMS/LMS-HSS sig algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_pqc_lms_register(void)
+{
+       int ret, i;
+
+       for (i = 0; i < ARRAY_SIZE(cmh_lms_algs); i++) {
+               ret = crypto_register_sig(&cmh_lms_algs[i]);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh: failed to register %s (%d)\n",
+                               cmh_lms_algs[i].base.cra_name, ret);
+                       goto err_unregister;
+               }
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_sig(&cmh_lms_algs[i]);
+       return ret;
+}
+
+/**
+ * cmh_pqc_lms_unregister() - Unregister LMS/LMS-HSS sig algorithms from the crypto framework
+ */
+void cmh_pqc_lms_unregister(void)
+{
+       int i = ARRAY_SIZE(cmh_lms_algs);
+
+       while (i--)
+               crypto_unregister_sig(&cmh_lms_algs[i]);
+}
diff --git a/drivers/crypto/cmh/cmh_pqc_slhdsa.c b/drivers/crypto/cmh/cmh_pqc_slhdsa.c
new file mode 100644
index 000000000000..9cc8cdb442b2
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pqc_slhdsa.c
@@ -0,0 +1,377 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SLH-DSA Signature Driver (sig_alg, synchronous)
+ *
+ * Registers SLH-DSA sig algorithms for all 12 parameter sets
+ * (SHAKE/SHA2 x 128/192/256 x s/f) with sign and verify callbacks.
+ *
+ * Key format:
+ *   Public key  = raw pk bytes (2*n bytes)
+ *   Private key = raw sk bytes (4*n)
+ *
+ * Sign: src = message, dst = raw signature
+ * Verify: src = raw signature, digest = message bytes
+ *
+ * Private keys are raw (written to SYS_REF_TEMP per-operation).
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/unaligned.h>
+#include <crypto/sig.h>
+#include <crypto/internal/sig.h>
+
+#include "cmh_sys.h"
+#include "cmh_qse_abi.h"
+#include "cmh_hcq_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+#include "cmh_pqc.h"
+
+struct cmh_slhdsa_tfm_ctx {
+       struct cmh_key_ctx key;         /* private key (raw sk bytes) */
+       u8 *pub_key;
+       u32 pub_key_len;
+       u32 param_set;                  /* HCQ_SLHDSA_SHAKE_128S .. SHA2_256F */
+};
+
+static inline struct cmh_slhdsa_tfm_ctx *cmh_slhdsa_ctx(struct crypto_sig *tfm)
+{
+       return crypto_sig_ctx(tfm);
+}
+
+/*
+ * SLH-DSA sign (synchronous sig_alg)
+ *
+ * @src:  message bytes
+ * @slen: message length
+ * @dst:  signature output buffer
+ * @dlen: output buffer length
+ *
+ * Returns signature length on success, negative errno on failure.
+ * Uses raw private keys written to SYS_REF_TEMP per-operation.
+ */
+static int cmh_slhdsa_sign(struct crypto_sig *tfm,
+                          const void *src, unsigned int slen,
+                          void *dst, unsigned int dlen)
+{
+       struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);
+       u32 sig_sz = slhdsa_get_sig_size(ctx->param_set);
+       u32 sk_sz = slhdsa_sk_size(ctx->param_set);
+       struct vcq_cmd vcq[HCQ_VCQ_CMDS_MAX]; /* raw: hdr+write+sign+flush */
+       u32 vcq_count;
+       u8 *m_buf = NULL, *sig_buf = NULL, *sk_buf = NULL;
+       dma_addr_t m_dma = DMA_MAPPING_ERROR;
+       dma_addr_t sig_dma = DMA_MAPPING_ERROR;
+       dma_addr_t sk_dma = DMA_MAPPING_ERROR;
+       int ret, idx;
+
+       if (ctx->key.mode == CMH_KEY_NONE)
+               return -EINVAL;
+       if (dlen < sig_sz)
+               return -EINVAL;
+       if (!slen || slen > SLHDSA_MAX_MSG_LEN)
+               return -EINVAL;
+
+       m_buf = kmemdup(src, slen, GFP_KERNEL);
+       sig_buf = kzalloc(sig_sz, GFP_KERNEL);
+       if (!m_buf || !sig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       m_dma = cmh_dma_map_single(m_buf, slen, DMA_TO_DEVICE);
+       sig_dma = cmh_dma_map_single(sig_buf, sig_sz, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(m_dma) || cmh_dma_map_error(sig_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       sk_dma = DMA_MAPPING_ERROR;
+       idx = 0;
+
+       struct core_dispatch d;
+
+       d = cmh_core_select_instance(CMH_CORE_HCQ);
+
+       if (ctx->key.raw.len != sk_sz) {
+               ret = -EINVAL;
+               goto out_unmap;
+       }
+       sk_buf = kmemdup(ctx->key.raw.data, ctx->key.raw.len,
+                        GFP_KERNEL);
+       if (!sk_buf) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+       sk_dma = cmh_dma_map_single(sk_buf, sk_sz, DMA_TO_DEVICE);
+       if (cmh_dma_map_error(sk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_count = HCQ_VCQ_CMDS_MIN + 1;
+       vcq_set_header(&vcq[idx++], vcq_count);
+       vcq_add_sys_write(&vcq[idx++], SYS_REF_TEMP, sk_dma,
+                         SYS_REF_NONE, sk_sz,
+                         ctx->key.raw.sys_type);
+       vcq_add_hcq_slhdsa_sign_internal(&vcq[idx++], d.core_id,
+                                        ctx->param_set,
+                                        slen, 0,
+                                        m_dma, SYS_REF_TEMP,
+                                        sig_dma);
+       vcq_add_hcq_flush(&vcq[idx++], d.core_id);
+
+       ret = cmh_tm_submit_sync_tmo(vcq, vcq_count, 1, d.mbx_idx,
+                                    cmh_tm_slow_op_timeout_jiffies());
+
+       if (!ret) {
+               /* Sync bounce buffer so CPU sees the DMA-written signature */
+               cmh_dma_sync_for_cpu(sig_dma, sig_sz, DMA_FROM_DEVICE);
+               memcpy(dst, sig_buf, sig_sz);
+               ret = sig_sz;
+       }
+
+out_unmap:
+       if (sk_buf) {
+               if (!cmh_dma_map_error(sk_dma))
+                       cmh_dma_unmap_single(sk_dma, sk_sz, DMA_TO_DEVICE);
+               kfree_sensitive(sk_buf);
+       }
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_sz, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, slen, DMA_TO_DEVICE);
+
+out_free:
+       kfree(sig_buf);
+       kfree(m_buf);
+       return ret;
+}
+
+/*
+ * SLH-DSA verify (synchronous sig_alg)
+ *
+ * @src:    raw signature
+ * @slen:   signature length
+ * @digest: message bytes
+ * @dlen:   message length
+ *
+ * Returns 0 on successful verification, negative errno on failure.
+ */
+static int cmh_slhdsa_verify(struct crypto_sig *tfm,
+                            const void *src, unsigned int slen,
+                            const void *digest, unsigned int dlen)
+{
+       struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);
+       u32 sig_sz = slhdsa_get_sig_size(ctx->param_set);
+       u32 pk_sz = slhdsa_pk_size(ctx->param_set);
+       struct core_dispatch d = cmh_core_select_instance(CMH_CORE_HCQ);
+       struct vcq_cmd vcq[HCQ_VCQ_CMDS_MIN];
+       u8 *sig_buf = NULL, *m_buf = NULL, *pk_buf = NULL;
+       dma_addr_t sig_dma = DMA_MAPPING_ERROR;
+       dma_addr_t m_dma = DMA_MAPPING_ERROR;
+       dma_addr_t pk_dma = DMA_MAPPING_ERROR;
+       int ret;
+
+       if (!ctx->pub_key)
+               return -EINVAL;
+       if (slen != sig_sz)
+               return -EINVAL;
+       if (!dlen || dlen > SLHDSA_MAX_MSG_LEN)
+               return -EINVAL;
+
+       sig_buf = kmemdup(src, slen, GFP_KERNEL);
+       m_buf = kmemdup(digest, dlen, GFP_KERNEL);
+       pk_buf = kmemdup(ctx->pub_key, pk_sz, GFP_KERNEL);
+       if (!sig_buf || !m_buf || !pk_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       sig_dma = cmh_dma_map_single(sig_buf, sig_sz, DMA_TO_DEVICE);
+       m_dma = cmh_dma_map_single(m_buf, dlen, DMA_TO_DEVICE);
+       pk_dma = cmh_dma_map_single(pk_buf, pk_sz, DMA_TO_DEVICE);
+
+       if (cmh_dma_map_error(sig_dma) || cmh_dma_map_error(m_dma) ||
+           cmh_dma_map_error(pk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], HCQ_VCQ_CMDS_MIN);
+       vcq_add_hcq_slhdsa_verify_internal(&vcq[1], d.core_id, ctx->param_set,
+                                          dlen, m_dma, pk_dma, sig_dma);
+       vcq_add_hcq_flush(&vcq[2], d.core_id);
+
+       /* SLH-DSA verify recomputes hyper-tree hashes -- inherently slow */
+       ret = cmh_tm_submit_sync_tmo(vcq, HCQ_VCQ_CMDS_MIN, 1, d.mbx_idx,
+                                    cmh_tm_slow_op_timeout_jiffies());
+
+out_unmap:
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, pk_sz, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, dlen, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_sz, DMA_TO_DEVICE);
+
+out_free:
+       kfree(pk_buf);
+       kfree(m_buf);
+       kfree(sig_buf);
+       return ret;
+}
+
+static int cmh_slhdsa_set_pub_key(struct crypto_sig *tfm,
+                                 const void *key, unsigned int keylen)
+{
+       struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);
+       u32 expected = slhdsa_pk_size(ctx->param_set);
+
+       if (keylen != expected)
+               return -EINVAL;
+
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+       ctx->pub_key_len = 0;
+
+       ctx->pub_key = kmemdup(key, keylen, GFP_KERNEL);
+       if (!ctx->pub_key)
+               return -ENOMEM;
+
+       ctx->pub_key_len = keylen;
+       return 0;
+}
+
+static int cmh_slhdsa_set_priv_key(struct crypto_sig *tfm,
+                                  const void *key, unsigned int keylen)
+{
+       struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);
+
+       /* Raw sk (4*n bytes) */
+       if (keylen != slhdsa_sk_size(ctx->param_set))
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&ctx->key, key, keylen, CORE_ID_HCQ);
+}
+
+static unsigned int cmh_slhdsa_key_size(struct crypto_sig *tfm)
+{
+       struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);
+
+       /* crypto_sig_keysize() returns bits, not bytes */
+       return slhdsa_pk_size(ctx->param_set) * 8;
+}
+
+static unsigned int cmh_slhdsa_max_size(struct crypto_sig *tfm)
+{
+       struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);
+
+       return slhdsa_get_sig_size(ctx->param_set);
+}
+
+static void cmh_slhdsa_exit(struct crypto_sig *tfm)
+{
+       struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);
+
+       cmh_key_destroy(&ctx->key);
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+}
+
+/* Generate init functions for all 12 parameter sets */
+#define SLHDSA_INIT(ps_val)                                            \
+       static int cmh_slhdsa_init_##ps_val(struct crypto_sig *tfm)     \
+       {                                                               \
+               struct cmh_slhdsa_tfm_ctx *ctx = cmh_slhdsa_ctx(tfm);   \
+               memset(ctx, 0, sizeof(*ctx));                           \
+               ctx->param_set = ps_val;                                \
+               return 0;                                               \
+       }
+
+SLHDSA_INIT(HCQ_SLHDSA_SHAKE_128S)
+SLHDSA_INIT(HCQ_SLHDSA_SHAKE_128F)
+SLHDSA_INIT(HCQ_SLHDSA_SHAKE_192S)
+SLHDSA_INIT(HCQ_SLHDSA_SHAKE_192F)
+SLHDSA_INIT(HCQ_SLHDSA_SHAKE_256S)
+SLHDSA_INIT(HCQ_SLHDSA_SHAKE_256F)
+SLHDSA_INIT(HCQ_SLHDSA_SHA2_128S)
+SLHDSA_INIT(HCQ_SLHDSA_SHA2_128F)
+SLHDSA_INIT(HCQ_SLHDSA_SHA2_192S)
+SLHDSA_INIT(HCQ_SLHDSA_SHA2_192F)
+SLHDSA_INIT(HCQ_SLHDSA_SHA2_256S)
+SLHDSA_INIT(HCQ_SLHDSA_SHA2_256F)
+
+#define SLHDSA_ALG(name, drv, ps_val) {                                        \
+               .sign           = cmh_slhdsa_sign,                      \
+               .verify         = cmh_slhdsa_verify,                    \
+               .set_pub_key    = cmh_slhdsa_set_pub_key,               \
+               .set_priv_key   = cmh_slhdsa_set_priv_key,              \
+               .key_size       = cmh_slhdsa_key_size,                  \
+               .max_size       = cmh_slhdsa_max_size,                  \
+               .init           = cmh_slhdsa_init_##ps_val,             \
+               .exit           = cmh_slhdsa_exit,                      \
+               .base = {                                               \
+                       .cra_name         = name,                       \
+                       .cra_driver_name  = drv,                        \
+                       .cra_priority     = 300,                        \
+                       .cra_module       = THIS_MODULE,                \
+                       .cra_ctxsize      = sizeof(struct cmh_slhdsa_tfm_ctx), \
+               },                                                      \
+       }
+
+static struct sig_alg cmh_slhdsa_algs[] = {
+       SLHDSA_ALG("slh-dsa-shake-128s", "cri-cmh-slh-dsa-shake-128s", HCQ_SLHDSA_SHAKE_128S),
+       SLHDSA_ALG("slh-dsa-shake-128f", "cri-cmh-slh-dsa-shake-128f", HCQ_SLHDSA_SHAKE_128F),
+       SLHDSA_ALG("slh-dsa-shake-192s", "cri-cmh-slh-dsa-shake-192s", HCQ_SLHDSA_SHAKE_192S),
+       SLHDSA_ALG("slh-dsa-shake-192f", "cri-cmh-slh-dsa-shake-192f", HCQ_SLHDSA_SHAKE_192F),
+       SLHDSA_ALG("slh-dsa-shake-256s", "cri-cmh-slh-dsa-shake-256s", HCQ_SLHDSA_SHAKE_256S),
+       SLHDSA_ALG("slh-dsa-shake-256f", "cri-cmh-slh-dsa-shake-256f", HCQ_SLHDSA_SHAKE_256F),
+       SLHDSA_ALG("slh-dsa-sha2-128s",  "cri-cmh-slh-dsa-sha2-128s",  HCQ_SLHDSA_SHA2_128S),
+       SLHDSA_ALG("slh-dsa-sha2-128f",  "cri-cmh-slh-dsa-sha2-128f",  HCQ_SLHDSA_SHA2_128F),
+       SLHDSA_ALG("slh-dsa-sha2-192s",  "cri-cmh-slh-dsa-sha2-192s",  HCQ_SLHDSA_SHA2_192S),
+       SLHDSA_ALG("slh-dsa-sha2-192f",  "cri-cmh-slh-dsa-sha2-192f",  HCQ_SLHDSA_SHA2_192F),
+       SLHDSA_ALG("slh-dsa-sha2-256s",  "cri-cmh-slh-dsa-sha2-256s",  HCQ_SLHDSA_SHA2_256S),
+       SLHDSA_ALG("slh-dsa-sha2-256f",  "cri-cmh-slh-dsa-sha2-256f",  HCQ_SLHDSA_SHA2_256F),
+};
+
+/**
+ * cmh_pqc_slhdsa_register() - Register SLH-DSA akcipher algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_pqc_slhdsa_register(void)
+{
+       int ret, i;
+
+       for (i = 0; i < ARRAY_SIZE(cmh_slhdsa_algs); i++) {
+               ret = crypto_register_sig(&cmh_slhdsa_algs[i]);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh: failed to register %s (%d)\n",
+                               cmh_slhdsa_algs[i].base.cra_name, ret);
+                       goto err_unregister;
+               }
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_sig(&cmh_slhdsa_algs[i]);
+       return ret;
+}
+
+/**
+ * cmh_pqc_slhdsa_unregister() - Unregister SLH-DSA akcipher algorithms from the crypto framework
+ */
+void cmh_pqc_slhdsa_unregister(void)
+{
+       int i = ARRAY_SIZE(cmh_slhdsa_algs);
+
+       while (i--)
+               crypto_unregister_sig(&cmh_slhdsa_algs[i]);
+}
diff --git a/drivers/crypto/cmh/cmh_pqc_xmss.c b/drivers/crypto/cmh/cmh_pqc_xmss.c
new file mode 100644
index 000000000000..433ffcd6463d
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pqc_xmss.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- XMSS/XMSS-MT Signature Driver (verify-only, sig_alg, synchronous)
+ *
+ * Registers "xmss" and "xmss-mt" sig algorithms with verify-only
+ * callbacks.  Sign is not supported (stateful signature -- key
+ * management must happen externally).
+ *
+ * Verify: src = raw signature, digest = message bytes
+ * Public key: raw pk bytes (variable length, set via set_pub_key)
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <crypto/sig.h>
+#include <crypto/internal/sig.h>
+
+#include "cmh_sys.h"
+#include "cmh_hcq_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_pqc.h"
+
+#define XMSS_VCQ_CMDS  3       /* header + cmd + flush */
+
+struct cmh_xmss_tfm_ctx {
+       u8 *pub_key;
+       u32 pub_key_len;
+       u32 xmss_mt;            /* 0 = XMSS, 1 = XMSS-MT */
+};
+
+static inline struct cmh_xmss_tfm_ctx *cmh_xmss_ctx(struct crypto_sig *tfm)
+{
+       return crypto_sig_ctx(tfm);
+}
+
+/*
+ * XMSS/XMSS-MT verify (synchronous sig_alg)
+ *
+ * @src:    raw signature
+ * @slen:   signature length
+ * @digest: message bytes
+ * @dlen:   message length
+ *
+ * Returns 0 on successful verification, negative errno on failure.
+ */
+static int cmh_xmss_verify(struct crypto_sig *tfm,
+                          const void *src, unsigned int slen,
+                          const void *digest, unsigned int dlen)
+{
+       struct cmh_xmss_tfm_ctx *ctx = cmh_xmss_ctx(tfm);
+       struct core_dispatch d = cmh_core_select_instance(CMH_CORE_HCQ);
+       struct vcq_cmd vcq[XMSS_VCQ_CMDS];
+       u8 *sig_buf = NULL, *m_buf = NULL, *pk_buf = NULL;
+       dma_addr_t sig_dma = DMA_MAPPING_ERROR;
+       dma_addr_t m_dma = DMA_MAPPING_ERROR;
+       dma_addr_t pk_dma = DMA_MAPPING_ERROR;
+       int ret;
+
+       if (!ctx->pub_key)
+               return -EINVAL;
+       if (!slen || slen > XMSS_MAX_SIG_LEN)
+               return -EINVAL;
+       if (!dlen || dlen > XMSS_MAX_MSG_LEN)
+               return -EINVAL;
+
+       sig_buf = kmemdup(src, slen, GFP_KERNEL);
+       m_buf = kmemdup(digest, dlen, GFP_KERNEL);
+       pk_buf = kmemdup(ctx->pub_key, ctx->pub_key_len, GFP_KERNEL);
+       if (!sig_buf || !m_buf || !pk_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       sig_dma = cmh_dma_map_single(sig_buf, slen, DMA_TO_DEVICE);
+       m_dma = cmh_dma_map_single(m_buf, dlen, DMA_TO_DEVICE);
+       pk_dma = cmh_dma_map_single(pk_buf, ctx->pub_key_len, DMA_TO_DEVICE);
+
+       if (cmh_dma_map_error(sig_dma) || cmh_dma_map_error(m_dma) ||
+           cmh_dma_map_error(pk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], XMSS_VCQ_CMDS);
+       vcq_add_hcq_xmss_verify(&vcq[1], d.core_id, ctx->xmss_mt,
+                               ctx->pub_key_len, slen, dlen,
+                               pk_dma, sig_dma, m_dma);
+       vcq_add_hcq_flush(&vcq[2], d.core_id);
+
+       /* XMSS verify traverses Merkle hash chains -- inherently slow */
+       ret = cmh_tm_submit_sync_tmo(vcq, XMSS_VCQ_CMDS, 1, d.mbx_idx,
+                                    cmh_tm_slow_op_timeout_jiffies());
+
+out_unmap:
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, ctx->pub_key_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, dlen, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, slen, DMA_TO_DEVICE);
+
+out_free:
+       kfree(pk_buf);
+       kfree(m_buf);
+       kfree(sig_buf);
+       return ret;
+}
+
+static int cmh_xmss_set_pub_key(struct crypto_sig *tfm,
+                               const void *key, unsigned int keylen)
+{
+       struct cmh_xmss_tfm_ctx *ctx = cmh_xmss_ctx(tfm);
+
+       if (!keylen || keylen > XMSS_MAX_PK_LEN)
+               return -EINVAL;
+
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+       ctx->pub_key_len = 0;
+
+       ctx->pub_key = kmemdup(key, keylen, GFP_KERNEL);
+       if (!ctx->pub_key)
+               return -ENOMEM;
+
+       ctx->pub_key_len = keylen;
+       return 0;
+}
+
+static unsigned int cmh_xmss_key_size(struct crypto_sig *tfm)
+{
+       struct cmh_xmss_tfm_ctx *ctx = cmh_xmss_ctx(tfm);
+
+       return ctx->pub_key_len * 8;
+}
+
+static int cmh_xmss_init(struct crypto_sig *tfm)
+{
+       struct cmh_xmss_tfm_ctx *ctx = cmh_xmss_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       return 0;
+}
+
+static int cmh_xmss_mt_init(struct crypto_sig *tfm)
+{
+       struct cmh_xmss_tfm_ctx *ctx = cmh_xmss_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->xmss_mt = 1;
+       return 0;
+}
+
+static void cmh_xmss_exit(struct crypto_sig *tfm)
+{
+       struct cmh_xmss_tfm_ctx *ctx = cmh_xmss_ctx(tfm);
+
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+}
+
+static struct sig_alg cmh_xmss_algs[] = {
+       {
+               .verify         = cmh_xmss_verify,
+               .set_pub_key    = cmh_xmss_set_pub_key,
+               .key_size       = cmh_xmss_key_size,
+               .init           = cmh_xmss_init,
+               .exit           = cmh_xmss_exit,
+               .base = {
+                       .cra_name         = "xmss",
+                       .cra_driver_name  = "cri-cmh-xmss",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_xmss_tfm_ctx),
+               },
+       },
+       {
+               .verify         = cmh_xmss_verify,
+               .set_pub_key    = cmh_xmss_set_pub_key,
+               .key_size       = cmh_xmss_key_size,
+               .init           = cmh_xmss_mt_init,
+               .exit           = cmh_xmss_exit,
+               .base = {
+                       .cra_name         = "xmss-mt",
+                       .cra_driver_name  = "cri-cmh-xmss-mt",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_xmss_tfm_ctx),
+               },
+       },
+};
+
+/**
+ * cmh_pqc_xmss_register() - Register XMSS/XMSS-MT sig algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_pqc_xmss_register(void)
+{
+       int ret, i;
+
+       for (i = 0; i < ARRAY_SIZE(cmh_xmss_algs); i++) {
+               ret = crypto_register_sig(&cmh_xmss_algs[i]);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh: failed to register %s (%d)\n",
+                               cmh_xmss_algs[i].base.cra_name, ret);
+                       goto err_unregister;
+               }
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_sig(&cmh_xmss_algs[i]);
+       return ret;
+}
+
+/**
+ * cmh_pqc_xmss_unregister() - Unregister XMSS/XMSS-MT sig algorithms from the crypto framework
+ */
+void cmh_pqc_xmss_unregister(void)
+{
+       int i = ARRAY_SIZE(cmh_xmss_algs);
+
+       while (i--)
+               crypto_unregister_sig(&cmh_xmss_algs[i]);
+}
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 12/19] crypto: cmh - add RSA akcipher
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register the RSA akcipher algorithm using the CMH PKE core (core ID
0x0a).  Supports encrypt, decrypt, sign, and verify operations with
2048, 3072, and 4096-bit keys.  512- and 1024-bit keys are also
accepted for legacy/test interoperability.  Includes common PKE
helpers shared by subsequent ECDSA and ECDH patches.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile         |   4 +-
 drivers/crypto/cmh/cmh_main.c       |   9 +
 drivers/crypto/cmh/cmh_pke_common.c | 578 +++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_pke_rsa.c    | 642 ++++++++++++++++++++++++++++
 4 files changed, 1232 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_pke_common.c
 create mode 100644 drivers/crypto/cmh/cmh_pke_rsa.c

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 1c4cb817424c..7afd9852c337 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -29,7 +29,9 @@ cmh-y := \
        cmh_ccp.o \
        cmh_ccp_aead.o \
        cmh_ccp_poly.o \
-       cmh_rng.o
+       cmh_rng.o \
+       cmh_pke_common.o \
+       cmh_pke_rsa.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index f31c50168e4a..8535453342d7 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -38,6 +38,7 @@
 #include "cmh_aes.h"
 #include "cmh_sm4.h"
 #include "cmh_ccp.h"
+#include "cmh_pke.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -275,6 +276,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_ccp_poly_register;

+       /* Register PKE RSA akcipher */
+       ret = cmh_pke_rsa_register();
+       if (ret)
+               goto err_pke_rsa_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -287,6 +293,8 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_pke_rsa_unregister();
+err_pke_rsa_register:
        cmh_ccp_poly_unregister();
 err_ccp_poly_register:
        cmh_ccp_aead_unregister();
@@ -343,6 +351,7 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_pke_rsa_unregister();
        cmh_ccp_poly_unregister();
        cmh_ccp_aead_unregister();
        cmh_ccp_unregister();
diff --git a/drivers/crypto/cmh/cmh_pke_common.c b/drivers/crypto/cmh/cmh_pke_common.c
new file mode 100644
index 000000000000..ab3e2eb7d3f8
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pke_common.c
@@ -0,0 +1,578 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- PKE Common VCQ Builders
+ *
+ * VCQ builder functions for all PKE core commands.  Each builder
+ * populates a single vcq_cmd slot with the appropriate magic,
+ * command ID, byte-swap flags, and command-specific payload.
+ *
+ * RSA commands always use PKE_SWAP_FLAGS (VCQ_FLAG_SWAP_BYTES |
+ * VCQ_FLAG_SWAP_WORDS).  EC Weierstrass curves (NIST P-*, Brainpool,
+ * secp256k1, SM2) use PKE_SWAP_FLAGS; Edwards curves (25519, 448)
+ * use no swap flags.  SM2 commands use per-command flags documented
+ * in the eSW ABI.
+ *
+ * Callers combine these with vcq_set_header() + vcq_add_flush()
+ * and submit via cmh_tm_submit_sync().
+ */
+
+#include <linux/string.h>
+
+#include "cmh_pke.h"
+
+/**
+ * vcq_add_pke_flush() - Add a PKE flush command to a VCQ slot
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ *
+ * Populates @slot with a flush command for the specified PKE core.
+ */
+void vcq_add_pke_flush(struct vcq_cmd *slot, u32 core_id)
+{
+       vcq_add_flush(slot, core_id);
+}
+
+/* RSA */
+
+/**
+ * vcq_add_pke_rsa_enc() - Build a VCQ command for RSA public-key encryption
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @bits: RSA key size in bits
+ * @e_len: Length of the public exponent in bytes
+ * @e_dma: DMA address of public exponent buffer
+ * @n_dma: DMA address of modulus buffer
+ * @m_dma: DMA address of plaintext message buffer
+ * @c_dma: DMA address of ciphertext output buffer
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_rsa_enc(struct vcq_cmd *slot, u32 core_id, u32 bits, u32 e_len,
+                        u64 e_dma, u64 n_dma, u64 m_dma, u64 c_dma,
+                        u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_RSA_ENC);
+       slot->hwc.pke.cmd_rsa_enc.bits = bits;
+       slot->hwc.pke.cmd_rsa_enc.e_len = e_len;
+       slot->hwc.pke.cmd_rsa_enc.e = e_dma;
+       slot->hwc.pke.cmd_rsa_enc.n = n_dma;
+       slot->hwc.pke.cmd_rsa_enc.m = m_dma;
+       slot->hwc.pke.cmd_rsa_enc.c = c_dma;
+}
+
+/**
+ * vcq_add_pke_rsa_dec() - Build a VCQ command for RSA private-key decryption
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @bits: RSA key size in bits
+ * @e_len: Length of the public exponent in bytes
+ * @e_dma: DMA address of public exponent buffer
+ * @n_dma: DMA address of modulus buffer
+ * @c_dma: DMA address of ciphertext input buffer
+ * @m_dma: DMA address of plaintext output buffer
+ * @d_ref: Datastore reference for the private exponent
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_rsa_dec(struct vcq_cmd *slot, u32 core_id, u32 bits, u32 e_len,
+                        u64 e_dma, u64 n_dma, u64 c_dma, u64 m_dma,
+                        u64 d_ref, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_RSA_DEC);
+       slot->hwc.pke.cmd_rsa_dec.bits = bits;
+       slot->hwc.pke.cmd_rsa_dec.e_len = e_len;
+       slot->hwc.pke.cmd_rsa_dec.e = e_dma;
+       slot->hwc.pke.cmd_rsa_dec.n = n_dma;
+       slot->hwc.pke.cmd_rsa_dec.c = c_dma;
+       slot->hwc.pke.cmd_rsa_dec.m = m_dma;
+       slot->hwc.pke.cmd_rsa_dec.d = d_ref;
+}
+
+/**
+ * vcq_add_pke_rsa_crt_dec() - Build a VCQ command for RSA-CRT decryption
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @bits: RSA key size in bits
+ * @e_len: Length of the public exponent in bytes
+ * @e_dma: DMA address of public exponent buffer
+ * @n_dma: DMA address of modulus buffer
+ * @c_dma: DMA address of ciphertext input buffer
+ * @m_dma: DMA address of plaintext output buffer
+ * @crt_ref: Datastore reference for CRT private key components
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_rsa_crt_dec(struct vcq_cmd *slot, u32 core_id, u32 bits, u32 e_len,
+                            u64 e_dma, u64 n_dma, u64 c_dma, u64 m_dma,
+                            u64 crt_ref, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_RSA_CRT_DEC);
+       slot->hwc.pke.cmd_rsa_crt_dec.bits = bits;
+       slot->hwc.pke.cmd_rsa_crt_dec.e_len = e_len;
+       slot->hwc.pke.cmd_rsa_crt_dec.e = e_dma;
+       slot->hwc.pke.cmd_rsa_crt_dec.n = n_dma;
+       slot->hwc.pke.cmd_rsa_crt_dec.c = c_dma;
+       slot->hwc.pke.cmd_rsa_crt_dec.m = m_dma;
+       slot->hwc.pke.cmd_rsa_crt_dec.crt = crt_ref;
+}
+
+/* ECDSA */
+
+/**
+ * vcq_add_pke_ecdsa_verify() - Build a VCQ command for ECDSA signature verification
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (e.g. NIST P-256, P-384, P-521)
+ * @dlen: Digest length in bytes
+ * @pk_dma: DMA address of public key buffer
+ * @dig_dma: DMA address of digest buffer
+ * @sig_dma: DMA address of signature buffer
+ * @rp_dma: DMA address of r-prime verification output buffer
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_ecdsa_verify(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 dlen,
+                             u64 pk_dma, u64 dig_dma, u64 sig_dma,
+                             u64 rp_dma, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_ECDSA_VERIFY);
+       slot->hwc.pke.cmd_ecdsa_verify.curve = curve;
+       slot->hwc.pke.cmd_ecdsa_verify.digest_len = dlen;
+       slot->hwc.pke.cmd_ecdsa_verify.public_key = pk_dma;
+       slot->hwc.pke.cmd_ecdsa_verify.digest = dig_dma;
+       slot->hwc.pke.cmd_ecdsa_verify.signature = sig_dma;
+       slot->hwc.pke.cmd_ecdsa_verify.rprime = rp_dma;
+}
+
+/**
+ * vcq_add_pke_ecdsa_sign() - Build a VCQ command for ECDSA signing
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (e.g. NIST P-256, P-384, P-521)
+ * @sklen: Secret key length in bytes
+ * @dig_dma: DMA address of digest buffer
+ * @sig_dma: DMA address of signature output buffer
+ * @sk_ref: Datastore reference for the secret key
+ * @dlen: Digest length in bytes
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_ecdsa_sign(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                           u64 dig_dma, u64 sig_dma, u64 sk_ref,
+                           u32 dlen, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_ECDSA_SIGN);
+       slot->hwc.pke.cmd_ecdsa_sign.curve = curve;
+       slot->hwc.pke.cmd_ecdsa_sign.secret_key_len = sklen;
+       slot->hwc.pke.cmd_ecdsa_sign.digest = dig_dma;
+       slot->hwc.pke.cmd_ecdsa_sign.signature = sig_dma;
+       slot->hwc.pke.cmd_ecdsa_sign.secret_key = sk_ref;
+       slot->hwc.pke.cmd_ecdsa_sign.digest_len = dlen;
+}
+
+/**
+ * vcq_add_pke_ecdsa_pubgen() - Build a VCQ command for ECDSA public key generation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (e.g. NIST P-256, P-384, P-521)
+ * @sklen: Secret key length in bytes
+ * @pk_dma: DMA address of public key output buffer
+ * @sk_ref: Datastore reference for the secret key
+ * @flags: VCQ command flags
+ *
+ * Generates the public key from an existing private key stored in the
+ * datastore.
+ */
+void vcq_add_pke_ecdsa_pubgen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                             u64 pk_dma, u64 sk_ref, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_ECDSA_PUBGEN);
+       slot->hwc.pke.cmd_ecdsa_pubgen.curve = curve;
+       slot->hwc.pke.cmd_ecdsa_pubgen.secret_key_len = sklen;
+       slot->hwc.pke.cmd_ecdsa_pubgen.public_key = pk_dma;
+       slot->hwc.pke.cmd_ecdsa_pubgen.secret_key = sk_ref;
+}
+
+/**
+ * vcq_add_pke_ecdsa_keygen() - Build a VCQ command for ECDSA key pair generation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (e.g. NIST P-256, P-384, P-521)
+ * @sklen: Secret key length in bytes
+ * @sk_ref: Datastore reference for the generated secret key
+ * @sk_type: Datastore type for the secret key object
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_ecdsa_keygen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                             u64 sk_ref, u32 sk_type, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_ECDSA_KEYGEN);
+       slot->hwc.pke.cmd_ecdsa_keygen.curve = curve;
+       slot->hwc.pke.cmd_ecdsa_keygen.secret_key_len = sklen;
+       slot->hwc.pke.cmd_ecdsa_keygen.secret_key = sk_ref;
+       slot->hwc.pke.cmd_ecdsa_keygen.secret_key_type = sk_type;
+}
+
+/* ECDH */
+
+/**
+ * vcq_add_pke_ecdh_keygen() - Build a VCQ command for ECDH key pair generation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (e.g. NIST P-256, P-384, P-521, X25519, X448)
+ * @sklen: Secret key length in bytes
+ * @pkx_dma: DMA address of public key X-coordinate output buffer
+ * @sk_ref: Datastore reference for the generated secret key
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_ecdh_keygen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                            u64 pkx_dma, u64 sk_ref, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_ECDH_KEYGEN);
+       slot->hwc.pke.cmd_ecdh_keygen.curve = curve;
+       slot->hwc.pke.cmd_ecdh_keygen.secret_key_len = sklen;
+       slot->hwc.pke.cmd_ecdh_keygen.public_key_x = pkx_dma;
+       slot->hwc.pke.cmd_ecdh_keygen.secret_key = sk_ref;
+}
+
+/**
+ * vcq_add_pke_ecdh() - Build a VCQ command for ECDH shared secret computation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (e.g. NIST P-256, P-384, P-521, X25519, X448)
+ * @sklen: Secret key length in bytes
+ * @sslen: Shared secret length in bytes
+ * @ss_type: Datastore type for the shared secret object
+ * @peer_dma: DMA address of peer public key buffer
+ * @sk_ref: Datastore reference for the local secret key
+ * @ss_ref: Datastore reference for the computed shared secret
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_ecdh(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                     u32 sslen, u32 ss_type, u64 peer_dma, u64 sk_ref,
+                     u64 ss_ref, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_ECDH);
+       slot->hwc.pke.cmd_ecdh.curve = curve;
+       slot->hwc.pke.cmd_ecdh.secret_key_len = sklen;
+       slot->hwc.pke.cmd_ecdh.shared_secret_len = sslen;
+       slot->hwc.pke.cmd_ecdh.shared_secret_type = ss_type;
+       slot->hwc.pke.cmd_ecdh.peer_key_x = peer_dma;
+       slot->hwc.pke.cmd_ecdh.secret_key = sk_ref;
+       slot->hwc.pke.cmd_ecdh.shared_secret = ss_ref;
+}
+
+/* EdDSA */
+
+/**
+ * vcq_add_pke_eddsa_verify() - Build a VCQ command for EdDSA signature verification
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (Ed25519 or Ed448)
+ * @dlen: Digest (message) length in bytes
+ * @pky_dma: DMA address of public key Y-coordinate buffer
+ * @dig_dma: DMA address of digest buffer
+ * @sig_dma: DMA address of signature buffer
+ * @rp_dma: DMA address of r-prime verification output buffer
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_eddsa_verify(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 dlen,
+                             u64 pky_dma, u64 dig_dma, u64 sig_dma,
+                             u64 rp_dma, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_EDDSA_VERIFY);
+       slot->hwc.pke.cmd_eddsa_verify.curve = curve;
+       slot->hwc.pke.cmd_eddsa_verify.digest_len = dlen;
+       slot->hwc.pke.cmd_eddsa_verify.public_key_y = pky_dma;
+       slot->hwc.pke.cmd_eddsa_verify.digest = dig_dma;
+       slot->hwc.pke.cmd_eddsa_verify.signature = sig_dma;
+       slot->hwc.pke.cmd_eddsa_verify.rprime = rp_dma;
+}
+
+/**
+ * vcq_add_pke_eddsa_sign() - Build a VCQ command for EdDSA signing
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (Ed25519 or Ed448)
+ * @sklen: Secret key length in bytes
+ * @dig_dma: DMA address of digest (message) buffer
+ * @sig_dma: DMA address of signature output buffer
+ * @sk_ref: Datastore reference for the secret key
+ * @dlen: Digest (message) length in bytes
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_eddsa_sign(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                           u64 dig_dma, u64 sig_dma, u64 sk_ref,
+                           u32 dlen, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_EDDSA_SIGN);
+       slot->hwc.pke.cmd_eddsa_sign.curve = curve;
+       slot->hwc.pke.cmd_eddsa_sign.secret_key_len = sklen;
+       slot->hwc.pke.cmd_eddsa_sign.digest = dig_dma;
+       slot->hwc.pke.cmd_eddsa_sign.signature = sig_dma;
+       slot->hwc.pke.cmd_eddsa_sign.secret_key = sk_ref;
+       slot->hwc.pke.cmd_eddsa_sign.digest_len = dlen;
+}
+
+/**
+ * vcq_add_pke_eddsa_pubgen() - Build a VCQ command for EdDSA public key generation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (Ed25519 or Ed448)
+ * @sklen: Secret key length in bytes
+ * @pky_dma: DMA address of public key Y-coordinate output buffer
+ * @sk_ref: Datastore reference for the secret key
+ * @flags: VCQ command flags
+ *
+ * Generates the public key from an existing private key stored in the
+ * datastore.
+ */
+void vcq_add_pke_eddsa_pubgen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                             u64 pky_dma, u64 sk_ref, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_EDDSA_PUBGEN);
+       slot->hwc.pke.cmd_eddsa_pubgen.curve = curve;
+       slot->hwc.pke.cmd_eddsa_pubgen.secret_key_len = sklen;
+       slot->hwc.pke.cmd_eddsa_pubgen.public_key_y = pky_dma;
+       slot->hwc.pke.cmd_eddsa_pubgen.secret_key = sk_ref;
+}
+
+/**
+ * vcq_add_pke_eddsa_keygen_sca() - Build a VCQ command for EdDSA SCA key generation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @curve: Curve identifier (Ed448)
+ * @sk_ref: Datastore reference for the input secret key
+ * @sca_sk_ref: Datastore reference for the SCA-masked output key
+ *
+ * Blinds an Ed448 private key into a side-channel-protected masked
+ * form.  No byte-swap flags are used (CRI reference uses flags=0).
+ */
+void vcq_add_pke_eddsa_keygen_sca(struct vcq_cmd *slot, u32 core_id, u32 curve,
+                                 u64 sk_ref, u64 sca_sk_ref)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1,
+                             PKE_CMD_EDDSA_PRIV_KEYGEN_SCA);
+       slot->hwc.pke.cmd_eddsa_keygen_sca.curve = curve;
+       slot->hwc.pke.cmd_eddsa_keygen_sca.secret_key = sk_ref;
+       slot->hwc.pke.cmd_eddsa_keygen_sca.sca_secret_key = sca_sk_ref;
+}
+
+/* SM2 */
+
+/**
+ * vcq_add_pke_sm2_ecdh_keygen() - Build a VCQ command for SM2 ECDH ephemeral key generation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @nonce_dma: DMA address of nonce input buffer
+ * @session_key_dma: DMA address of session key output buffer
+ * @nonce_len: Nonce length in bytes
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_ecdh_keygen(struct vcq_cmd *slot, u32 core_id, u64 nonce_dma,
+                                u64 session_key_dma, u32 nonce_len, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1,
+                             PKE_CMD_SM2_ECDH_KEYGEN);
+       slot->hwc.pke.cmd_sm2_ecdh_keygen.nonce = nonce_dma;
+       slot->hwc.pke.cmd_sm2_ecdh_keygen.session_key = session_key_dma;
+       slot->hwc.pke.cmd_sm2_ecdh_keygen.nonce_len = nonce_len;
+}
+
+/**
+ * vcq_add_pke_sm2_ecdh() - Build a VCQ command for SM2 ECDH shared secret computation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @nonce_len: Nonce length in bytes
+ * @private_key_len: Private key length in bytes
+ * @nonce_dma: DMA address of nonce buffer
+ * @peer_pk_dma: DMA address of peer public key buffer
+ * @peer_sk_dma: DMA address of peer session key buffer
+ * @priv_ref: Datastore reference for the local private key
+ * @sp_ref: Datastore reference for the shared point output
+ * @sp_type: Datastore type for the shared point object
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_ecdh(struct vcq_cmd *slot, u32 core_id, u32 nonce_len,
+                         u32 private_key_len, u64 nonce_dma,
+                         u64 peer_pk_dma, u64 peer_sk_dma,
+                         u64 priv_ref, u64 sp_ref, u32 sp_type, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_SM2_ECDH);
+       slot->hwc.pke.cmd_sm2_ecdh.nonce_len = nonce_len;
+       slot->hwc.pke.cmd_sm2_ecdh.private_key_len = private_key_len;
+       slot->hwc.pke.cmd_sm2_ecdh.nonce = nonce_dma;
+       slot->hwc.pke.cmd_sm2_ecdh.peer_public_key = peer_pk_dma;
+       slot->hwc.pke.cmd_sm2_ecdh.peer_session_key = peer_sk_dma;
+       slot->hwc.pke.cmd_sm2_ecdh.private_key = priv_ref;
+       slot->hwc.pke.cmd_sm2_ecdh.shared_point = sp_ref;
+       slot->hwc.pke.cmd_sm2_ecdh.shared_point_type = sp_type;
+}
+
+/**
+ * vcq_add_pke_sm2_dec_point() - Build a VCQ command for SM2 decryption point multiplication
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @ct_len: Ciphertext length in bytes
+ * @pk_len: Private key length in bytes
+ * @ct_dma: DMA address of ciphertext input buffer
+ * @dp_dma: DMA address of decryption point output buffer
+ * @priv_ref: Datastore reference for the private key
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_dec_point(struct vcq_cmd *slot, u32 core_id, u32 ct_len,
+                              u32 pk_len, u64 ct_dma, u64 dp_dma,
+                              u64 priv_ref, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_SM2_DEC_POINT);
+       slot->hwc.pke.cmd_sm2_dec_point.ciphertext_len = ct_len;
+       slot->hwc.pke.cmd_sm2_dec_point.private_key_len = pk_len;
+       slot->hwc.pke.cmd_sm2_dec_point.ciphertext = ct_dma;
+       slot->hwc.pke.cmd_sm2_dec_point.dec_point = dp_dma;
+       slot->hwc.pke.cmd_sm2_dec_point.private_key = priv_ref;
+}
+
+/**
+ * vcq_add_pke_sm2_enc_point() - Build a VCQ command for SM2 encryption point multiplication
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @nonce_dma: DMA address of nonce buffer
+ * @pk_dma: DMA address of public key buffer
+ * @ct_dma: DMA address of ciphertext header output buffer
+ * @ep_dma: DMA address of encryption point output buffer
+ * @nonce_len: Nonce length in bytes
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_enc_point(struct vcq_cmd *slot, u32 core_id, u64 nonce_dma,
+                              u64 pk_dma, u64 ct_dma, u64 ep_dma,
+                              u32 nonce_len, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_SM2_ENC_POINT);
+       slot->hwc.pke.cmd_sm2_enc_point.nonce = nonce_dma;
+       slot->hwc.pke.cmd_sm2_enc_point.public_key = pk_dma;
+       slot->hwc.pke.cmd_sm2_enc_point.ciphertext = ct_dma;
+       slot->hwc.pke.cmd_sm2_enc_point.enc_point = ep_dma;
+       slot->hwc.pke.cmd_sm2_enc_point.nonce_len = nonce_len;
+}
+
+/**
+ * vcq_add_pke_sm2_id_digest() - Build a VCQ command for SM2 identity digest computation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @id_dma: DMA address of identity string buffer
+ * @pk_dma: DMA address of public key buffer
+ * @dig_dma: DMA address of digest output buffer
+ * @id_len: Identity string length in bytes
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_id_digest(struct vcq_cmd *slot, u32 core_id, u64 id_dma,
+                              u64 pk_dma, u64 dig_dma, u32 id_len,
+                              u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_SM2_ID_DIGEST);
+       slot->hwc.pke.cmd_sm2_id_digest.id = id_dma;
+       slot->hwc.pke.cmd_sm2_id_digest.public_key = pk_dma;
+       slot->hwc.pke.cmd_sm2_id_digest.digest = dig_dma;
+       slot->hwc.pke.cmd_sm2_id_digest.id_len = id_len;
+}
+
+/**
+ * vcq_add_pke_sm2_ecdh_hash() - Build a VCQ command for SM2 ECDH key derivation hash
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @peer_dig_dma: DMA address of peer identity digest buffer
+ * @dig_dma: DMA address of local identity digest buffer
+ * @sp_ref: Datastore reference for the shared point
+ * @sk_ref: Datastore reference for the derived shared key output
+ * @sk_type: Datastore type for the shared key object
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_ecdh_hash(struct vcq_cmd *slot, u32 core_id, u64 peer_dig_dma,
+                              u64 dig_dma, u64 sp_ref, u64 sk_ref,
+                              u32 sk_type, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_SM2_ECDH_HASH);
+       slot->hwc.pke.cmd_sm2_ecdh_hash.peer_id_digest = peer_dig_dma;
+       slot->hwc.pke.cmd_sm2_ecdh_hash.id_digest = dig_dma;
+       slot->hwc.pke.cmd_sm2_ecdh_hash.shared_point = sp_ref;
+       slot->hwc.pke.cmd_sm2_ecdh_hash.shared_key = sk_ref;
+       slot->hwc.pke.cmd_sm2_ecdh_hash.shared_key_type = sk_type;
+}
+
+/**
+ * vcq_add_pke_sm2_dec_hash() - Build a VCQ command for SM2 decryption hash verification
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @ct_dma: DMA address of ciphertext input buffer
+ * @dp_dma: DMA address of decryption point buffer
+ * @pt_dma: DMA address of plaintext output buffer
+ * @ct_len: Ciphertext length in bytes
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_dec_hash(struct vcq_cmd *slot, u32 core_id, u64 ct_dma,
+                             u64 dp_dma, u64 pt_dma, u32 ct_len, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_SM2_DEC_HASH);
+       slot->hwc.pke.cmd_sm2_dec_hash.ciphertext = ct_dma;
+       slot->hwc.pke.cmd_sm2_dec_hash.dec_point = dp_dma;
+       slot->hwc.pke.cmd_sm2_dec_hash.plaintext = pt_dma;
+       slot->hwc.pke.cmd_sm2_dec_hash.ciphertext_len = ct_len;
+}
+
+/**
+ * vcq_add_pke_sm2_enc_hash() - Build a VCQ command for SM2 encryption hash computation
+ * @slot: VCQ command slot to populate
+ * @core_id: PKE hardware core ID
+ * @msg_dma: DMA address of plaintext message buffer
+ * @ep_dma: DMA address of encryption point buffer
+ * @ct_dma: DMA address of ciphertext output buffer
+ * @msg_len: Message length in bytes
+ * @flags: VCQ command flags
+ */
+void vcq_add_pke_sm2_enc_hash(struct vcq_cmd *slot, u32 core_id, u64 msg_dma,
+                             u64 ep_dma, u64 ct_dma, u32 msg_len, u32 flags)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, flags, 1, PKE_CMD_SM2_ENC_HASH);
+       slot->hwc.pke.cmd_sm2_enc_hash.message = msg_dma;
+       slot->hwc.pke.cmd_sm2_enc_hash.enc_point = ep_dma;
+       slot->hwc.pke.cmd_sm2_enc_hash.ciphertext = ct_dma;
+       slot->hwc.pke.cmd_sm2_enc_hash.message_len = msg_len;
+}
diff --git a/drivers/crypto/cmh/cmh_pke_rsa.c b/drivers/crypto/cmh/cmh_pke_rsa.c
new file mode 100644
index 000000000000..010f8bd98f0d
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pke_rsa.c
@@ -0,0 +1,642 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- RSA akcipher Driver
+ *
+ * Registers "rsa" akcipher algorithm with the Linux crypto subsystem
+ * (priority 300, overrides software rsa-generic at 100).
+ *
+ * Raw RSA operations only (m^e mod n / c^d mod n).  The kernel's
+ * pkcs1pad() template wraps this for PKCS#1 v1.5 / PSS / OAEP.
+ *
+ * Key format: DER-encoded ASN.1, parsed by kernel rsa_parse_pub_key()
+ * / rsa_parse_priv_key() helpers.
+ *
+ * Private key via cmh_key_ctx: raw keys written via SYS_REF_TEMP.
+ * Datastore-referenced keys are only reachable through the ioctl
+ * path (cmh_mgmt.c).
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <crypto/akcipher.h>
+#include <crypto/internal/akcipher.h>
+#include <crypto/internal/rsa.h>
+
+#include "cmh_pke.h"
+#include "cmh_sys.h"
+#include "cmh_sys_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+struct cmh_rsa_tfm_ctx {
+       struct cmh_key_ctx key;         /* private key (raw d only) */
+       u8 *n;                          /* modulus (big-endian) */
+       u8 *e;                          /* public exponent (big-endian) */
+       size_t n_sz;
+       size_t e_sz;
+       u32 bits;                       /* key size in bits */
+};
+
+static inline struct cmh_rsa_tfm_ctx *cmh_rsa_ctx(struct crypto_akcipher *tfm)
+{
+       return akcipher_tfm_ctx(tfm);
+}
+
+struct cmh_rsa_reqctx {
+       u8 *e_buf;
+       u8 *n_buf;
+       u8 *m_buf;
+       u8 *c_buf;
+       u8 *d_buf;              /* dec only: private key copy */
+       dma_addr_t e_dma;
+       dma_addr_t n_dma;
+       dma_addr_t m_dma;
+       dma_addr_t c_dma;
+       dma_addr_t d_dma;
+       u32 key_bytes;
+       u32 e_padded;
+       u32 n_sz;
+       u32 d_len;              /* dec only */
+};
+
+static u32 cmh_rsa_key_bits(size_t n_sz)
+{
+       /*
+        * Only accept exact modulus sizes supported by the hardware.
+        * The programmed RSA width must match the actual modulus buffer
+        * length; rounding a shorter modulus up to the next size would
+        * let the device read past the end of the DMA buffer.
+        */
+       switch (n_sz) {
+       case 64:
+               return 512;
+       case 128:
+               return 1024;
+       case 256:
+               return 2048;
+       case 384:
+               return 3072;
+       case 512:
+               return 4096;
+       default:
+               return 0;
+       }
+}
+
+static void cmh_rsa_enc_complete(void *data, int error)
+{
+       struct akcipher_request *req = data;
+       struct cmh_rsa_reqctx *rctx = akcipher_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (!cmh_dma_map_error(rctx->c_dma))
+               cmh_dma_unmap_single(rctx->c_dma, rctx->key_bytes,
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(rctx->m_dma))
+               cmh_dma_unmap_single(rctx->m_dma, rctx->key_bytes,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->n_dma))
+               cmh_dma_unmap_single(rctx->n_dma, rctx->n_sz,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->e_dma))
+               cmh_dma_unmap_single(rctx->e_dma, rctx->e_padded,
+                                    DMA_TO_DEVICE);
+
+       if (!error) {
+               int nents;
+
+               nents = sg_nents_for_len(req->dst, rctx->key_bytes);
+               if (nents < 0 ||
+                   sg_copy_from_buffer(req->dst, nents,
+                                       rctx->c_buf,
+                                       rctx->key_bytes) != rctx->key_bytes)
+                       error = -EINVAL;
+               else
+                       req->dst_len = rctx->key_bytes;
+       }
+
+       kfree(rctx->c_buf);
+       rctx->c_buf = NULL;
+       kfree_sensitive(rctx->m_buf);
+       rctx->m_buf = NULL;
+       kfree(rctx->n_buf);
+       rctx->n_buf = NULL;
+       kfree(rctx->e_buf);
+       rctx->e_buf = NULL;
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * RSA encrypt: c = m^e mod n (public key operation)
+ * Also used for signature verification (verify = encrypt for raw RSA).
+ */
+static int cmh_rsa_enc(struct akcipher_request *req)
+{
+       struct crypto_akcipher *tfm = crypto_akcipher_reqtfm(req);
+       struct cmh_rsa_tfm_ctx *ctx = cmh_rsa_ctx(tfm);
+       struct cmh_rsa_reqctx *rctx = akcipher_request_ctx(req);
+       u32 key_bytes = ctx->bits / 8;
+       u32 e_padded = ALIGN(ctx->e_sz, 4);
+       struct core_dispatch d = cmh_core_select_instance(CMH_CORE_PKE);
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       int ret, nents;
+       gfp_t gfp;
+
+       if (!ctx->n || !ctx->e)
+               return -EINVAL;
+       if (req->src_len > key_bytes || req->dst_len < key_bytes)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->key_bytes = key_bytes;
+       rctx->e_padded = e_padded;
+       rctx->n_sz = ctx->n_sz;
+       rctx->e_dma = DMA_MAPPING_ERROR;
+       rctx->n_dma = DMA_MAPPING_ERROR;
+       rctx->m_dma = DMA_MAPPING_ERROR;
+       rctx->c_dma = DMA_MAPPING_ERROR;
+
+       rctx->e_buf = kzalloc(e_padded, gfp);
+       rctx->n_buf = kmemdup(ctx->n, ctx->n_sz, gfp);
+       rctx->m_buf = kzalloc(key_bytes, gfp);
+       rctx->c_buf = kzalloc(key_bytes, gfp);
+       if (!rctx->e_buf || !rctx->n_buf || !rctx->m_buf || !rctx->c_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       memcpy(rctx->e_buf + e_padded - ctx->e_sz, ctx->e, ctx->e_sz);
+
+       nents = sg_nents_for_len(req->src, req->src_len);
+       if (nents < 0 ||
+           sg_pcopy_to_buffer(req->src, nents,
+                              rctx->m_buf + key_bytes - req->src_len,
+                              req->src_len, 0) != req->src_len) {
+               ret = -EINVAL;
+               goto out_free;
+       }
+
+       rctx->e_dma = cmh_dma_map_single(rctx->e_buf, e_padded,
+                                        DMA_TO_DEVICE);
+       rctx->n_dma = cmh_dma_map_single(rctx->n_buf, ctx->n_sz,
+                                        DMA_TO_DEVICE);
+       rctx->m_dma = cmh_dma_map_single(rctx->m_buf, key_bytes,
+                                        DMA_TO_DEVICE);
+       rctx->c_dma = cmh_dma_map_single(rctx->c_buf, key_bytes,
+                                        DMA_FROM_DEVICE);
+
+       if (cmh_dma_map_error(rctx->e_dma) ||
+           cmh_dma_map_error(rctx->n_dma) ||
+           cmh_dma_map_error(rctx->m_dma) ||
+           cmh_dma_map_error(rctx->c_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_rsa_enc(&vcq[1], d.core_id, ctx->bits, e_padded,
+                           rctx->e_dma, rctx->n_dma, rctx->m_dma,
+                           rctx->c_dma, PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[2], d.core_id);
+
+       ret = cmh_tm_submit_async(vcq, PKE_VCQ_CMDS_MIN, 1, d.mbx_idx,
+                                 cmh_rsa_enc_complete, req,
+                                 !!(req->base.flags &
+                                    CRYPTO_TFM_REQ_MAY_BACKLOG), 0);
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (!ret)
+               return -EINPROGRESS;
+
+out_unmap:
+       if (!cmh_dma_map_error(rctx->c_dma))
+               cmh_dma_unmap_single(rctx->c_dma, key_bytes,
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(rctx->m_dma))
+               cmh_dma_unmap_single(rctx->m_dma, key_bytes,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->n_dma))
+               cmh_dma_unmap_single(rctx->n_dma, ctx->n_sz,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->e_dma))
+               cmh_dma_unmap_single(rctx->e_dma, e_padded,
+                                    DMA_TO_DEVICE);
+
+out_free:
+       kfree(rctx->c_buf);
+       kfree_sensitive(rctx->m_buf);
+       kfree(rctx->n_buf);
+       kfree(rctx->e_buf);
+       return ret;
+}
+
+static void cmh_rsa_dec_complete(void *data, int error)
+{
+       struct akcipher_request *req = data;
+       struct cmh_rsa_reqctx *rctx = akcipher_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (!cmh_dma_map_error(rctx->d_dma))
+               cmh_dma_unmap_single(rctx->d_dma, rctx->d_len,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->m_dma))
+               cmh_dma_unmap_single(rctx->m_dma, rctx->key_bytes,
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(rctx->c_dma))
+               cmh_dma_unmap_single(rctx->c_dma, rctx->key_bytes,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->n_dma))
+               cmh_dma_unmap_single(rctx->n_dma, rctx->n_sz,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->e_dma))
+               cmh_dma_unmap_single(rctx->e_dma, rctx->e_padded,
+                                    DMA_TO_DEVICE);
+
+       if (!error) {
+               int nents;
+
+               nents = sg_nents_for_len(req->dst, rctx->key_bytes);
+               if (nents < 0 ||
+                   sg_copy_from_buffer(req->dst, nents,
+                                       rctx->m_buf,
+                                       rctx->key_bytes) != rctx->key_bytes)
+                       error = -EINVAL;
+               else
+                       req->dst_len = rctx->key_bytes;
+       }
+
+       kfree_sensitive(rctx->d_buf);
+       rctx->d_buf = NULL;
+       kfree_sensitive(rctx->m_buf);
+       rctx->m_buf = NULL;
+       kfree(rctx->c_buf);
+       rctx->c_buf = NULL;
+       kfree(rctx->n_buf);
+       rctx->n_buf = NULL;
+       kfree(rctx->e_buf);
+       rctx->e_buf = NULL;
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * RSA decrypt: m = c^d mod n (private key operation)
+ * Also used for signing (sign = decrypt for raw RSA).
+ *
+ * Private key 'd' is written via SYS_REF_TEMP inline.
+ */
+static int cmh_rsa_dec(struct akcipher_request *req)
+{
+       struct crypto_akcipher *tfm = crypto_akcipher_reqtfm(req);
+       struct cmh_rsa_tfm_ctx *ctx = cmh_rsa_ctx(tfm);
+       struct cmh_rsa_reqctx *rctx = akcipher_request_ctx(req);
+       u32 key_bytes = ctx->bits / 8;
+       u32 e_padded = ALIGN(ctx->e_sz, 4);
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MAX];
+       struct core_dispatch dd;
+       int ret, idx, nents;
+       gfp_t gfp;
+
+       if (ctx->key.mode != CMH_KEY_RAW)
+               return -EINVAL;
+       if (!ctx->n || !ctx->e)
+               return -EINVAL;
+       if (req->src_len > key_bytes || req->dst_len < key_bytes)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->key_bytes = key_bytes;
+       rctx->e_padded = e_padded;
+       rctx->n_sz = ctx->n_sz;
+       rctx->e_dma = DMA_MAPPING_ERROR;
+       rctx->n_dma = DMA_MAPPING_ERROR;
+       rctx->m_dma = DMA_MAPPING_ERROR;
+       rctx->c_dma = DMA_MAPPING_ERROR;
+       rctx->d_dma = DMA_MAPPING_ERROR;
+
+       rctx->e_buf = kzalloc(e_padded, gfp);
+       rctx->n_buf = kmemdup(ctx->n, ctx->n_sz, gfp);
+       rctx->c_buf = kzalloc(key_bytes, gfp);
+       rctx->m_buf = kzalloc(key_bytes, gfp);
+       if (!rctx->e_buf || !rctx->n_buf || !rctx->c_buf || !rctx->m_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       memcpy(rctx->e_buf + e_padded - ctx->e_sz, ctx->e, ctx->e_sz);
+
+       nents = sg_nents_for_len(req->src, req->src_len);
+       if (nents < 0 ||
+           sg_pcopy_to_buffer(req->src, nents,
+                              rctx->c_buf + key_bytes - req->src_len,
+                              req->src_len, 0) != req->src_len) {
+               ret = -EINVAL;
+               goto out_free;
+       }
+
+       rctx->e_dma = cmh_dma_map_single(rctx->e_buf, e_padded,
+                                        DMA_TO_DEVICE);
+       rctx->n_dma = cmh_dma_map_single(rctx->n_buf, ctx->n_sz,
+                                        DMA_TO_DEVICE);
+       rctx->c_dma = cmh_dma_map_single(rctx->c_buf, key_bytes,
+                                        DMA_TO_DEVICE);
+       rctx->m_dma = cmh_dma_map_single(rctx->m_buf, key_bytes,
+                                        DMA_FROM_DEVICE);
+
+       if (cmh_dma_map_error(rctx->e_dma) ||
+           cmh_dma_map_error(rctx->n_dma) ||
+           cmh_dma_map_error(rctx->c_dma) ||
+           cmh_dma_map_error(rctx->m_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       dd = cmh_core_select_instance(CMH_CORE_PKE);
+
+       rctx->d_buf = kmemdup(ctx->key.raw.data, ctx->key.raw.len, gfp);
+       if (!rctx->d_buf) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+       rctx->d_len = ctx->key.raw.len;
+
+       rctx->d_dma = cmh_dma_map_single(rctx->d_buf, ctx->key.raw.len,
+                                        DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->d_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       idx = 1;
+       vcq_add_sys_write(&vcq[idx], SYS_REF_TEMP, rctx->d_dma,
+                         SYS_REF_NONE, ctx->key.raw.len,
+                         ctx->key.raw.sys_type);
+       vcq[idx].id |= PKE_SWAP_FLAGS;
+       idx++;
+       vcq_add_pke_rsa_dec(&vcq[idx++], dd.core_id, ctx->bits, e_padded,
+                           rctx->e_dma, rctx->n_dma, rctx->c_dma,
+                           rctx->m_dma, SYS_REF_TEMP, PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[idx++], dd.core_id);
+       vcq_set_header(&vcq[0], idx);
+
+       ret = cmh_tm_submit_async(vcq, idx, 1, dd.mbx_idx,
+                                 cmh_rsa_dec_complete, req,
+                                 !!(req->base.flags &
+                                    CRYPTO_TFM_REQ_MAY_BACKLOG), 0);
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (!ret)
+               return -EINPROGRESS;
+
+out_unmap:
+       if (!cmh_dma_map_error(rctx->d_dma))
+               cmh_dma_unmap_single(rctx->d_dma, rctx->d_len,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->m_dma))
+               cmh_dma_unmap_single(rctx->m_dma, key_bytes,
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(rctx->c_dma))
+               cmh_dma_unmap_single(rctx->c_dma, key_bytes,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->n_dma))
+               cmh_dma_unmap_single(rctx->n_dma, ctx->n_sz,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->e_dma))
+               cmh_dma_unmap_single(rctx->e_dma, e_padded,
+                                    DMA_TO_DEVICE);
+
+out_free:
+       kfree_sensitive(rctx->d_buf);
+       kfree_sensitive(rctx->m_buf);
+       kfree(rctx->c_buf);
+       kfree(rctx->n_buf);
+       kfree(rctx->e_buf);
+       return ret;
+}
+
+static int cmh_rsa_set_pub_key(struct crypto_akcipher *tfm,
+                              const void *key, unsigned int keylen)
+{
+       struct cmh_rsa_tfm_ctx *ctx = cmh_rsa_ctx(tfm);
+       struct rsa_key rsa = {};
+       int ret;
+
+       ret = rsa_parse_pub_key(&rsa, key, keylen);
+       if (ret)
+               return ret;
+
+       /* Strip ASN.1 leading zero padding from modulus */
+       while (rsa.n_sz > 0 && rsa.n[0] == 0) {
+               rsa.n++;
+               rsa.n_sz--;
+       }
+
+       ctx->bits = cmh_rsa_key_bits(rsa.n_sz);
+       if (!ctx->bits)
+               return -EINVAL;
+
+       kfree(ctx->n);
+       kfree(ctx->e);
+       ctx->n = NULL;
+       ctx->e = NULL;
+       ctx->n_sz = 0;
+       ctx->e_sz = 0;
+
+       ctx->n = kmemdup(rsa.n, rsa.n_sz, GFP_KERNEL);
+       ctx->e = kmemdup(rsa.e, rsa.e_sz, GFP_KERNEL);
+       if (!ctx->n || !ctx->e) {
+               kfree(ctx->n);
+               kfree(ctx->e);
+               ctx->n = NULL;
+               ctx->e = NULL;
+               return -ENOMEM;
+       }
+
+       ctx->n_sz = rsa.n_sz;
+       ctx->e_sz = rsa.e_sz;
+
+       return 0;
+}
+
+static int cmh_rsa_set_priv_key(struct crypto_akcipher *tfm,
+                               const void *key, unsigned int keylen)
+{
+       struct cmh_rsa_tfm_ctx *ctx = cmh_rsa_ctx(tfm);
+       struct rsa_key rsa = {};
+       u32 key_bytes;
+       u8 *d_padded;
+       int ret;
+
+       ret = rsa_parse_priv_key(&rsa, key, keylen);
+       if (ret)
+               return ret;
+
+       /* Strip ASN.1 leading zero padding from modulus */
+       while (rsa.n_sz > 0 && rsa.n[0] == 0) {
+               rsa.n++;
+               rsa.n_sz--;
+       }
+
+       ctx->bits = cmh_rsa_key_bits(rsa.n_sz);
+       if (!ctx->bits || !rsa.d_sz)
+               return -EINVAL;
+
+       key_bytes = ctx->bits / 8;
+
+       /* Strip ASN.1 leading zero padding from private exponent */
+       while (rsa.d_sz > 0 && rsa.d[0] == 0) {
+               rsa.d++;
+               rsa.d_sz--;
+       }
+
+       if (!rsa.d_sz || rsa.d_sz > key_bytes)
+               return -EINVAL;
+
+       kfree(ctx->n);
+       kfree(ctx->e);
+       ctx->n = NULL;
+       ctx->e = NULL;
+       ctx->n_sz = 0;
+       ctx->e_sz = 0;
+
+       ctx->n = kmemdup(rsa.n, rsa.n_sz, GFP_KERNEL);
+       ctx->e = kmemdup(rsa.e, rsa.e_sz, GFP_KERNEL);
+       if (!ctx->n || !ctx->e) {
+               ret = -ENOMEM;
+               goto err;
+       }
+
+       ctx->n_sz = rsa.n_sz;
+       ctx->e_sz = rsa.e_sz;
+
+       /*
+        * Left-pad d to key_bytes (big-endian alignment).
+        * The CMH eSW resolves SYS_REF_TEMP by checking
+        * hdr->len >= key_bytes, so the written buffer must
+        * be at least key_bytes wide.
+        */
+       d_padded = kzalloc(key_bytes, GFP_KERNEL);
+       if (!d_padded) {
+               ret = -ENOMEM;
+               goto err;
+       }
+       memcpy(d_padded + key_bytes - rsa.d_sz, rsa.d, rsa.d_sz);
+
+       ret = cmh_key_setkey_raw(&ctx->key, d_padded, key_bytes,
+                                CORE_ID_PKE);
+       kfree_sensitive(d_padded);
+       if (ret)
+               goto err;
+
+       return 0;
+err:
+       kfree(ctx->n);
+       kfree(ctx->e);
+       ctx->n = NULL;
+       ctx->e = NULL;
+       ctx->n_sz = 0;
+       ctx->e_sz = 0;
+       ctx->bits = 0;
+       return ret;
+}
+
+static unsigned int cmh_rsa_max_size(struct crypto_akcipher *tfm)
+{
+       struct cmh_rsa_tfm_ctx *ctx = cmh_rsa_ctx(tfm);
+
+       return ctx->n_sz;
+}
+
+static int cmh_rsa_init_tfm(struct crypto_akcipher *tfm)
+{
+       struct cmh_rsa_tfm_ctx *ctx = cmh_rsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       tfm->reqsize = sizeof(struct cmh_rsa_reqctx);
+       return 0;
+}
+
+static void cmh_rsa_exit_tfm(struct crypto_akcipher *tfm)
+{
+       struct cmh_rsa_tfm_ctx *ctx = cmh_rsa_ctx(tfm);
+
+       cmh_key_destroy(&ctx->key);
+       kfree(ctx->n);
+       kfree(ctx->e);
+       ctx->n = NULL;
+       ctx->e = NULL;
+}
+
+/*
+ * Raw RSA stays as akcipher (encrypt/decrypt only).  The kernel's
+ * rsassa-pkcs1 sig template wraps our akcipher for sign/verify,
+ * matching the upstream split (rsa.c = akcipher,
+ * rsassa-pkcs1.c = sig template).
+ */
+static struct akcipher_alg cmh_rsa_alg = {
+       .encrypt        = cmh_rsa_enc,
+       .decrypt        = cmh_rsa_dec,
+       .set_pub_key    = cmh_rsa_set_pub_key,
+       .set_priv_key   = cmh_rsa_set_priv_key,
+       .max_size       = cmh_rsa_max_size,
+       .init           = cmh_rsa_init_tfm,
+       .exit           = cmh_rsa_exit_tfm,
+       .base = {
+               .cra_name         = "rsa",
+               .cra_driver_name  = "cri-cmh-rsa",
+               .cra_priority     = 300,
+               .cra_flags        = CRYPTO_ALG_ASYNC,
+               .cra_module       = THIS_MODULE,
+               .cra_ctxsize      = sizeof(struct cmh_rsa_tfm_ctx),
+       },
+};
+
+static bool cmh_rsa_registered;
+
+/**
+ * cmh_pke_rsa_register() - Register RSA akcipher algorithm with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_pke_rsa_register(void)
+{
+       int ret;
+
+       ret = crypto_register_akcipher(&cmh_rsa_alg);
+       if (ret) {
+               dev_err(cmh_dev(),
+                       "cmh: failed to register rsa akcipher (%d)\n",
+                       ret);
+               return ret;
+       }
+
+       cmh_rsa_registered = true;
+       return 0;
+}
+
+/**
+ * cmh_pke_rsa_unregister() - Unregister RSA akcipher algorithm from the crypto framework
+ */
+void cmh_pke_rsa_unregister(void)
+{
+       if (cmh_rsa_registered)
+               crypto_unregister_akcipher(&cmh_rsa_alg);
+       cmh_rsa_registered = false;
+}
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 10/19] crypto: cmh - add ChaCha20-Poly1305
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register ChaCha20-Poly1305 AEAD and ChaCha20 skcipher algorithms
using the CMH CCP core (core ID 0x18).  Also registers the Poly1305
ahash for standalone use.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile          |   5 +-
 drivers/crypto/cmh/cmh_ccp.c         | 364 +++++++++++++++++
 drivers/crypto/cmh/cmh_ccp_aead.c    | 583 +++++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_ccp_poly.c    | 528 ++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_main.c        |  25 ++
 drivers/crypto/cmh/include/cmh_ccp.h |  24 ++
 6 files changed, 1528 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_ccp.c
 create mode 100644 drivers/crypto/cmh/cmh_ccp_aead.c
 create mode 100644 drivers/crypto/cmh/cmh_ccp_poly.c
 create mode 100644 drivers/crypto/cmh/include/cmh_ccp.h

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 1f36cd9c0b98..4ebd0e1d10bc 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -25,7 +25,10 @@ cmh-y := \
        cmh_aes_cmac.o \
        cmh_sm4_skcipher.o \
        cmh_sm4_aead.o \
-       cmh_sm4_cmac.o
+       cmh_sm4_cmac.o \
+       cmh_ccp.o \
+       cmh_ccp_aead.o \
+       cmh_ccp_poly.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_ccp.c b/drivers/crypto/cmh/cmh_ccp.c
new file mode 100644
index 000000000000..deb1db9200f8
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_ccp.c
@@ -0,0 +1,364 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API ChaCha20 (skcipher) Driver
+ *
+ * Registers the "chacha20" skcipher algorithm with the Linux crypto
+ * subsystem, backed by the CMH CCP core.
+ *
+ * VCQ sequence:
+ *   [SYS_CMD_WRITE] + CCP_CMD_CHACHA20_INIT + CCP_CMD_FINAL + CCP_CMD_FLUSH
+ *
+ * The CCP core expects a 16-byte counter+nonce (ctrnonce):
+ *   bytes [0..3]  = 32-bit LE counter
+ *   bytes [4..15] = 12-byte nonce
+ *
+ * The Linux chacha20 skcipher interface passes a 16-byte IV in the
+ * same format, so we forward it directly.
+ *
+ * ChaCha20 is a stream cipher -- arbitrary plaintext lengths are
+ * supported (no block-alignment requirement).
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/unaligned.h>
+
+#include "cmh_ccp.h"
+#include "cmh_vcq.h"
+#include "cmh_ccp_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/* Per-transform context */
+
+struct cmh_ccp_tfm_ctx {
+       struct cmh_key_ctx key;
+};
+
+/* Per-request context (lives in skcipher_request::__ctx) */
+
+/*
+ * Maximum payload commands:
+ *   [SYS_CMD_WRITE] + CCP_CMD_CHACHA20_INIT + CCP_CMD_FINAL + FLUSH = 4
+ */
+#define CMH_CCP_MAX_PAYLOAD    4
+#define CMH_CCP_MAX_PACKED     (CMH_CCP_MAX_PAYLOAD * 2)
+
+struct cmh_ccp_reqctx {
+       dma_addr_t in_dma;
+       dma_addr_t out_dma;
+       dma_addr_t iv_dma;
+       dma_addr_t key_dma;
+       u8 *in_buf;
+       u8 *out_buf;
+       u8 *iv_buf;
+       u32 cryptlen;
+       u32 keylen;
+       struct vcq_cmd packed[CMH_CCP_MAX_PACKED];
+};
+
+/* VCQ Builders -- ChaCha20-specific */
+
+static void vcq_add_ccp_chacha_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref,
+                                   u64 ctrnonce_dma, u32 keylen, u32 op)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, CCP_CMD_CHACHA20_INIT);
+       slot->hwc.ccp.cmd_chacha.key = key_ref;
+       slot->hwc.ccp.cmd_chacha.ctrnonce = ctrnonce_dma;
+       slot->hwc.ccp.cmd_chacha.keylen = keylen;
+       slot->hwc.ccp.cmd_chacha.ctrnoncelen = CCP_CTRNONCE_SIZE;
+       slot->hwc.ccp.cmd_chacha.ctrlen = CCP_CHACHA_CTR_LEN;
+       slot->hwc.ccp.cmd_chacha.op = op;
+}
+
+static void vcq_add_ccp_final(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                             u64 output_dma, u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, CCP_CMD_FINAL);
+       slot->hwc.ccp.cmd_final.input = input_dma;
+       slot->hwc.ccp.cmd_final.output = output_dma;
+       slot->hwc.ccp.cmd_final.tag = 0;
+       slot->hwc.ccp.cmd_final.iolen = iolen;
+       slot->hwc.ccp.cmd_final.taglen = 0;
+}
+
+/* skcipher Operations */
+static int cmh_ccp_setkey(struct crypto_skcipher *tfm, const u8 *key,
+                         unsigned int keylen)
+{
+       struct cmh_ccp_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       /* ChaCha20 requires 32-byte key per RFC 8439 */
+       if (keylen != 32)
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_CCP);
+}
+
+static int cmh_ccp_init_tfm(struct crypto_skcipher *tfm)
+{
+       struct cmh_ccp_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       crypto_skcipher_set_reqsize(tfm, sizeof(struct cmh_ccp_reqctx));
+       return 0;
+}
+
+static void cmh_ccp_exit_tfm(struct crypto_skcipher *tfm)
+{
+       struct cmh_ccp_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+
+       cmh_key_destroy(&tctx->key);
+}
+
+/* DMA unmap helper */
+static void cmh_ccp_unmap_dma(struct cmh_ccp_reqctx *rctx)
+{
+       cmh_dma_unmap_single(rctx->iv_dma, CCP_CTRNONCE_SIZE, DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->out_dma, rctx->cryptlen, DMA_FROM_DEVICE);
+       cmh_dma_unmap_single(rctx->in_dma, rctx->cryptlen, DMA_TO_DEVICE);
+}
+
+static void cmh_ccp_free_bufs(struct cmh_ccp_reqctx *rctx)
+{
+       kfree(rctx->iv_buf);
+       rctx->iv_buf = NULL;
+       kfree_sensitive(rctx->out_buf);
+       rctx->out_buf = NULL;
+       kfree_sensitive(rctx->in_buf);
+       rctx->in_buf = NULL;
+}
+
+static void cmh_ccp_complete(void *data, int error)
+{
+       struct skcipher_request *req = data;
+       struct cmh_ccp_reqctx *rctx = skcipher_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       cmh_ccp_unmap_dma(rctx);
+
+       if (!error) {
+               u32 counter, nblocks;
+
+               scatterwalk_map_and_copy(rctx->out_buf, req->dst,
+                                        0, rctx->cryptlen, 1);
+
+               /*
+                * Update the 32-bit LE block counter at IV[0..3].
+                * ChaCha20 processes 64-byte blocks; the nonce at
+                * IV[4..15] is unchanged.
+                */
+               counter = get_unaligned_le32(req->iv);
+               nblocks = DIV_ROUND_UP(rctx->cryptlen, 64);
+               put_unaligned_le32(counter + nblocks, req->iv);
+       }
+
+       cmh_ccp_free_bufs(rctx);
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * Core encrypt/decrypt -- builds a VCQ transaction and submits async.
+ *
+ * ChaCha20 is a stream cipher: encrypt and decrypt use the same
+ * underlying XOR operation.
+ */
+static int cmh_ccp_crypt(struct skcipher_request *req, u32 ccp_op)
+{
+       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+       struct cmh_ccp_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       struct cmh_ccp_reqctx *rctx = skcipher_request_ctx(req);
+       struct vcq_cmd cmds[CMH_CCP_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE)
+               return -ENOKEY;
+
+       if (!req->cryptlen)
+               return 0;
+
+       /* Limit linearisation buffers to avoid large allocations. */
+       if (req->cryptlen > SZ_1M)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = req->cryptlen;
+
+       /* Linearise input from scatterlist */
+       rctx->in_buf = kmalloc(req->cryptlen, gfp);
+       if (!rctx->in_buf)
+               return -ENOMEM;
+
+       scatterwalk_map_and_copy(rctx->in_buf, req->src, 0, req->cryptlen, 0);
+
+       rctx->in_dma = cmh_dma_map_single(rctx->in_buf, req->cryptlen,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->in_dma)) {
+               ret = -ENOMEM;
+               goto out_free_in;
+       }
+
+       rctx->out_buf = kmalloc(req->cryptlen, gfp);
+       if (!rctx->out_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_in;
+       }
+
+       rctx->out_dma = cmh_dma_map_single(rctx->out_buf, req->cryptlen,
+                                          DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->out_dma)) {
+               ret = -ENOMEM;
+               goto out_free_out;
+       }
+
+       rctx->iv_buf = kmemdup(req->iv, CCP_CTRNONCE_SIZE, gfp);
+       if (!rctx->iv_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_out;
+       }
+
+       rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf, CCP_CTRNONCE_SIZE,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->iv_dma)) {
+               ret = -ENOMEM;
+               goto out_free_iv;
+       }
+
+       /* Resolve key reference */
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_CCP);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       vcq_add_ccp_chacha_init(&cmds[idx++], core_id, key_ref,
+                               (u64)rctx->iv_dma, keylen, ccp_op);
+
+       vcq_add_ccp_final(&cmds[idx++], core_id, (u64)rctx->in_dma,
+                         (u64)rctx->out_dma, req->cryptlen);
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_CCP_MAX_PACKED, target_mbx,
+                                           cmh_ccp_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       /* -EBUSY = backlogged; ownership transferred to callback. */
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       cmh_dma_unmap_single(rctx->iv_dma, CCP_CTRNONCE_SIZE, DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+out_unmap_out:
+       cmh_dma_unmap_single(rctx->out_dma, req->cryptlen, DMA_FROM_DEVICE);
+out_free_out:
+       kfree_sensitive(rctx->out_buf);
+out_unmap_in:
+       cmh_dma_unmap_single(rctx->in_dma, req->cryptlen, DMA_TO_DEVICE);
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+       return ret;
+}
+
+static int cmh_ccp_encrypt(struct skcipher_request *req)
+{
+       return cmh_ccp_crypt(req, CCP_OP_ENCRYPT);
+}
+
+static int cmh_ccp_decrypt(struct skcipher_request *req)
+{
+       return cmh_ccp_crypt(req, CCP_OP_DECRYPT);
+}
+
+/* Registration */
+
+static struct skcipher_alg cmh_chacha20_alg = {
+       .setkey      = cmh_ccp_setkey,
+       .encrypt     = cmh_ccp_encrypt,
+       .decrypt     = cmh_ccp_decrypt,
+       .init        = cmh_ccp_init_tfm,
+       .exit        = cmh_ccp_exit_tfm,
+       .min_keysize = 32,
+       .max_keysize = 32,
+       .ivsize      = CCP_CTRNONCE_SIZE,
+       .base        = {
+               .cra_name        = "chacha20",
+               .cra_driver_name = "cri-cmh-chacha20",
+               .cra_priority    = 300,
+               .cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                  CRYPTO_ALG_ASYNC,
+               .cra_blocksize   = 1,   /* stream cipher */
+               .cra_ctxsize     = sizeof(struct cmh_ccp_tfm_ctx),
+               .cra_module      = THIS_MODULE,
+       },
+};
+
+/**
+ * cmh_ccp_register() - Register ChaCha20 skcipher algorithm with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_ccp_register(void)
+{
+       int ret;
+
+       ret = crypto_register_skcipher(&cmh_chacha20_alg);
+       if (ret)
+               dev_err(cmh_dev(), "cmh_ccp: failed to register chacha20 (rc=%d)\n", ret);
+       else
+               dev_dbg(cmh_dev(), "cmh_ccp: registered chacha20\n");
+
+       return ret;
+}
+
+/**
+ * cmh_ccp_unregister() - Unregister ChaCha20 skcipher algorithm from the crypto framework
+ */
+void cmh_ccp_unregister(void)
+{
+       crypto_unregister_skcipher(&cmh_chacha20_alg);
+       dev_dbg(cmh_dev(), "cmh_ccp: unregistered chacha20\n");
+}
diff --git a/drivers/crypto/cmh/cmh_ccp_aead.c b/drivers/crypto/cmh/cmh_ccp_aead.c
new file mode 100644
index 000000000000..20b6f9d1746a
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_ccp_aead.c
@@ -0,0 +1,583 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API ChaCha20-Poly1305 AEAD Driver (RFC 7539)
+ *
+ * Registers "rfc7539(chacha20,poly1305)" as an AEAD algorithm with the
+ * Linux crypto subsystem, backed by the CMH CCP core.
+ *
+ * VCQ sequence:
+ *   [SYS_CMD_WRITE] + CCP_CMD_AEAD_INIT + [CCP_CMD_AAD_FINAL]
+ *   + CCP_CMD_FINAL + CCP_CMD_FLUSH
+ *
+ * The RFC 7539 AEAD interface passes a 12-byte nonce via req->iv.
+ * The CCP core expects a 16-byte ctrnonce (4-byte LE counter + 12-byte
+ * nonce).  We prepend a zero counter (per RFC 7539 S2.8: counter 0
+ * generates the Poly1305 key, counter 1 starts encryption -- the
+ * CMH eSW handles this internally from the initial counter value of 0).
+ *
+ * Tag is always 16 bytes (Poly1305 authenticator).
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/chacha.h>
+#include <crypto/internal/aead.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_ccp.h"
+#include "cmh_vcq.h"
+#include "cmh_ccp_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+#define CCP_AEAD_IV_SIZE       12U     /* RFC 7539 nonce */
+#define CCP_ESP_IV_SIZE                8U      /* RFC 7539 ESP nonce (4-byte salt at setkey) */
+#define CCP_ESP_SALT_SIZE      4U
+#define CCP_AEAD_TAG_SIZE      16U     /* Poly1305 tag */
+
+struct cmh_ccp_aead_tfm_ctx {
+       struct cmh_key_ctx key;
+       u32 authsize;
+       u8 salt[CCP_ESP_SALT_SIZE];     /* ESP salt (unused for rfc7539) */
+};
+
+/* Per-request context (lives in aead_request::__ctx) */
+
+/*
+ * Maximum payload commands:
+ *   [SYS_CMD_WRITE] + CCP_CMD_AEAD_INIT + CCP_CMD_AAD_FINAL
+ *   + CCP_CMD_FINAL + FLUSH = 5
+ */
+#define CMH_CCP_AEAD_MAX_PAYLOAD       5
+#define CMH_CCP_AEAD_MAX_PACKED                (CMH_CCP_AEAD_MAX_PAYLOAD * 2)
+
+struct cmh_ccp_aead_reqctx {
+       dma_addr_t in_dma;
+       dma_addr_t out_dma;
+       dma_addr_t iv_dma;
+       dma_addr_t key_dma;
+       dma_addr_t aad_dma;
+       dma_addr_t tag_dma;
+       u8 *in_buf;
+       u8 *out_buf;
+       u8 *iv_buf;
+       u8 *aad_buf;
+       u8 *tag_buf;
+       u32 cryptlen;
+       u32 assoclen;
+       u32 authsize;
+       u32 keylen;
+       bool encrypting;
+       struct vcq_cmd packed[CMH_CCP_AEAD_MAX_PACKED];
+};
+
+/* VCQ Builders -- CCP AEAD-specific */
+
+static void vcq_add_ccp_aead_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref,
+                                 u64 ctrnonce_dma, u32 keylen, u32 op)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, CCP_CMD_AEAD_INIT);
+       slot->hwc.ccp.cmd_aead.key = key_ref;
+       slot->hwc.ccp.cmd_aead.ctrnonce = ctrnonce_dma;
+       slot->hwc.ccp.cmd_aead.keylen = keylen;
+       slot->hwc.ccp.cmd_aead.ctrnoncelen = CCP_CTRNONCE_SIZE;
+       slot->hwc.ccp.cmd_aead.op = op;
+}
+
+static void vcq_add_ccp_aad_final(struct vcq_cmd *slot, u32 core_id, u64 aad_dma,
+                                 u32 aadlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, CCP_CMD_AAD_FINAL);
+       slot->hwc.ccp.cmd_aad_final.aad = aad_dma;
+       slot->hwc.ccp.cmd_aad_final.aadlen = aadlen;
+}
+
+static void vcq_add_ccp_aead_final(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                                  u64 output_dma, u64 tag_dma,
+                                  u32 iolen, u32 taglen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, CCP_CMD_FINAL);
+       slot->hwc.ccp.cmd_final.input = input_dma;
+       slot->hwc.ccp.cmd_final.output = output_dma;
+       slot->hwc.ccp.cmd_final.tag = tag_dma;
+       slot->hwc.ccp.cmd_final.iolen = iolen;
+       slot->hwc.ccp.cmd_final.taglen = taglen;
+}
+
+/* setkey */
+static int cmh_ccp_aead_setkey(struct crypto_aead *tfm, const u8 *key,
+                              unsigned int keylen)
+{
+       struct cmh_ccp_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       /* RFC 7539 AEAD requires 32-byte key */
+       if (keylen != CHACHA_KEY_SIZE)
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_CCP);
+}
+
+static int cmh_ccp_aead_setauthsize(struct crypto_aead *tfm,
+                                   unsigned int authsize)
+{
+       struct cmh_ccp_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+
+       /* Poly1305 tag is always 16 bytes */
+       if (authsize != CCP_AEAD_TAG_SIZE)
+               return -EINVAL;
+
+       tctx->authsize = authsize;
+       return 0;
+}
+
+static int cmh_ccp_aead_init_tfm(struct crypto_aead *tfm)
+{
+       struct cmh_ccp_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       tctx->authsize = CCP_AEAD_TAG_SIZE;
+       crypto_aead_set_reqsize(tfm, sizeof(struct cmh_ccp_aead_reqctx));
+       return 0;
+}
+
+static void cmh_ccp_aead_exit_tfm(struct crypto_aead *tfm)
+{
+       struct cmh_ccp_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+
+       cmh_key_destroy(&tctx->key);
+}
+
+/* DMA unmap helper */
+static void cmh_ccp_aead_unmap_dma(struct cmh_ccp_aead_reqctx *rctx)
+{
+       cmh_dma_unmap_single(rctx->iv_dma, CCP_CTRNONCE_SIZE, DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->tag_dma, rctx->authsize,
+                            rctx->encrypting ? DMA_FROM_DEVICE :
+                                              DMA_TO_DEVICE);
+       if (rctx->cryptlen > 0) {
+               cmh_dma_unmap_single(rctx->out_dma, rctx->cryptlen,
+                                    DMA_FROM_DEVICE);
+               cmh_dma_unmap_single(rctx->in_dma, rctx->cryptlen,
+                                    DMA_TO_DEVICE);
+       }
+       if (rctx->assoclen > 0)
+               cmh_dma_unmap_single(rctx->aad_dma, rctx->assoclen,
+                                    DMA_TO_DEVICE);
+}
+
+static void cmh_ccp_aead_free_bufs(struct cmh_ccp_aead_reqctx *rctx)
+{
+       kfree(rctx->iv_buf);
+       rctx->iv_buf = NULL;
+       kfree(rctx->tag_buf);
+       rctx->tag_buf = NULL;
+       kfree_sensitive(rctx->out_buf);
+       rctx->out_buf = NULL;
+       kfree_sensitive(rctx->in_buf);
+       rctx->in_buf = NULL;
+       kfree(rctx->aad_buf);
+       rctx->aad_buf = NULL;
+}
+
+static void cmh_ccp_aead_complete(void *data, int error)
+{
+       struct aead_request *req = data;
+       struct cmh_ccp_aead_reqctx *rctx = aead_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       cmh_ccp_aead_unmap_dma(rctx);
+
+       /*
+        * Map HW error on decrypt to -EBADMSG.  The eSW CCP core uses a
+        * single error code (-EIO) for both authentication failures and
+        * other core errors (e.g. DMA timeout), so we cannot distinguish
+        * them from the MBX_STATUS alone.  In practice the only error
+        * during a well-formed AEAD decrypt is auth-tag mismatch; a DMA
+        * timeout would indicate a fatal HW problem where -EBADMSG vs
+        * -EIO is moot.  The kernel crypto API requires -EBADMSG for
+        * AEAD authentication failures.
+        */
+       if (error == -EIO && !rctx->encrypting)
+               error = -EBADMSG;
+
+       if (!error) {
+               if (rctx->cryptlen > 0)
+                       scatterwalk_map_and_copy(rctx->out_buf, req->dst,
+                                                req->assoclen,
+                                               rctx->cryptlen, 1);
+               if (rctx->encrypting)
+                       scatterwalk_map_and_copy(rctx->tag_buf, req->dst,
+                                                req->assoclen +
+                                               rctx->cryptlen,
+                                               rctx->authsize, 1);
+       }
+
+       cmh_ccp_aead_free_bufs(rctx);
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * Core AEAD encrypt/decrypt -- async path.
+ *
+ * Encrypt: plaintext -> ciphertext + 16-byte tag
+ * Decrypt: ciphertext + tag -> plaintext (tag verified by CMH eSW)
+ *
+ * VCQ: [SYS_CMD_WRITE] + AEAD_INIT + [AAD_FINAL] + FINAL + FLUSH
+ */
+static int cmh_ccp_aead_crypt(struct aead_request *req, u32 ccp_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_ccp_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       struct cmh_ccp_aead_reqctx *rctx = aead_request_ctx(req);
+       struct vcq_cmd cmds[CMH_CCP_AEAD_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen, authsize, cryptlen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE)
+               return -ENOKEY;
+
+       authsize = tctx->authsize;
+
+       if (ccp_op == CCP_OP_ENCRYPT) {
+               cryptlen = req->cryptlen;
+       } else {
+               if (req->cryptlen < authsize)
+                       return -EINVAL;
+               cryptlen = req->cryptlen - authsize;
+       }
+
+       /*
+        * HW uses a proprietary LLI scatter-gather format that is
+        * incompatible with struct scatterlist, so the payload is
+        * linearised into contiguous buffers for DMA.  Cap total
+        * size to prevent excessive memory consumption.
+        */
+       if ((u64)cryptlen + req->assoclen > SZ_1M)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = cryptlen;
+       rctx->assoclen = req->assoclen;
+       rctx->authsize = authsize;
+       rctx->encrypting = (ccp_op == CCP_OP_ENCRYPT);
+
+       /*
+        * rfc7539esp: the last ivsize (8) bytes of the AAD region are the
+        * IV/nonce, not actual associated data.  Subtract them so HW only
+        * authenticates the real AAD.
+        */
+       if (crypto_aead_ivsize(tfm) == CCP_ESP_IV_SIZE) {
+               if (rctx->assoclen < CCP_ESP_IV_SIZE)
+                       return -EINVAL;
+               rctx->assoclen -= CCP_ESP_IV_SIZE;
+       }
+
+       /* Linearise AAD */
+       if (rctx->assoclen > 0) {
+               rctx->aad_buf = kmalloc(rctx->assoclen, gfp);
+               if (!rctx->aad_buf)
+                       return -ENOMEM;
+               scatterwalk_map_and_copy(rctx->aad_buf, req->src,
+                                        0, rctx->assoclen, 0);
+               rctx->aad_dma = cmh_dma_map_single(rctx->aad_buf,
+                                                  rctx->assoclen,
+                                                   DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->aad_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_aad;
+               }
+       }
+
+       /* Linearise input */
+       if (cryptlen > 0) {
+               rctx->in_buf = kmalloc(cryptlen, gfp);
+               if (!rctx->in_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_aad;
+               }
+               scatterwalk_map_and_copy(rctx->in_buf, req->src,
+                                        req->assoclen, cryptlen, 0);
+               rctx->in_dma = cmh_dma_map_single(rctx->in_buf, cryptlen,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->in_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_in;
+               }
+       }
+
+       /* Allocate output buffer */
+       if (cryptlen > 0) {
+               rctx->out_buf = kmalloc(cryptlen, gfp);
+               if (!rctx->out_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_in;
+               }
+               rctx->out_dma = cmh_dma_map_single(rctx->out_buf, cryptlen,
+                                                  DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(rctx->out_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_out;
+               }
+       }
+
+       /* Tag buffer */
+       rctx->tag_buf = kmalloc(authsize, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_out;
+       }
+
+       if (!rctx->encrypting) {
+               scatterwalk_map_and_copy(rctx->tag_buf, req->src,
+                                        req->assoclen + cryptlen,
+                                       authsize, 0);
+       } else {
+               memset(rctx->tag_buf, 0, authsize);
+       }
+
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf, authsize,
+                                          rctx->encrypting ?
+                                           DMA_FROM_DEVICE : DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       /* Build 16-byte ctrnonce: 4-byte zero counter + 12-byte nonce.
+        * rfc7539:    counter(4) | req->iv(12)
+        * rfc7539esp: counter(4) | salt(4) | req->iv(8)
+        */
+       rctx->iv_buf = kzalloc(CCP_CTRNONCE_SIZE, gfp);
+       if (!rctx->iv_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_tag;
+       }
+       if (crypto_aead_ivsize(tfm) == CCP_ESP_IV_SIZE) {
+               memcpy(rctx->iv_buf + CCP_CHACHA_CTR_LEN,
+                      tctx->salt, CCP_ESP_SALT_SIZE);
+               memcpy(rctx->iv_buf + CCP_CHACHA_CTR_LEN + CCP_ESP_SALT_SIZE,
+                      req->iv, CCP_ESP_IV_SIZE);
+       } else {
+               memcpy(rctx->iv_buf + CCP_CHACHA_CTR_LEN,
+                      req->iv, CCP_AEAD_IV_SIZE);
+       }
+
+       rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf, CCP_CTRNONCE_SIZE,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->iv_dma)) {
+               ret = -ENOMEM;
+               goto out_free_iv;
+       }
+
+       /* Resolve key reference */
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_CCP);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /* AEAD_INIT */
+       vcq_add_ccp_aead_init(&cmds[idx++], core_id, key_ref,
+                             (u64)rctx->iv_dma, keylen, ccp_op);
+
+       /* AAD_FINAL if we have associated data */
+       if (rctx->assoclen > 0)
+               vcq_add_ccp_aad_final(&cmds[idx++], core_id,
+                                     (u64)rctx->aad_dma, rctx->assoclen);
+
+       /* FINAL with tag */
+       vcq_add_ccp_aead_final(&cmds[idx++], core_id,
+                              cryptlen > 0 ? (u64)rctx->in_dma : 0,
+                              cryptlen > 0 ? (u64)rctx->out_dma : 0,
+                              (u64)rctx->tag_dma, cryptlen, authsize);
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_CCP_AEAD_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_ccp_aead_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       cmh_dma_unmap_single(rctx->iv_dma, CCP_CTRNONCE_SIZE, DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, authsize,
+                            rctx->encrypting ? DMA_FROM_DEVICE :
+                                              DMA_TO_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_unmap_out:
+       if (cryptlen > 0)
+               cmh_dma_unmap_single(rctx->out_dma, cryptlen, DMA_FROM_DEVICE);
+out_free_out:
+       kfree_sensitive(rctx->out_buf);
+out_unmap_in:
+       if (cryptlen > 0)
+               cmh_dma_unmap_single(rctx->in_dma, cryptlen, DMA_TO_DEVICE);
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+out_unmap_aad:
+       if (rctx->assoclen > 0)
+               cmh_dma_unmap_single(rctx->aad_dma, rctx->assoclen,
+                                    DMA_TO_DEVICE);
+out_free_aad:
+       kfree(rctx->aad_buf);
+       return ret;
+}
+
+static int cmh_ccp_aead_encrypt(struct aead_request *req)
+{
+       return cmh_ccp_aead_crypt(req, CCP_OP_ENCRYPT);
+}
+
+static int cmh_ccp_aead_decrypt(struct aead_request *req)
+{
+       return cmh_ccp_aead_crypt(req, CCP_OP_DECRYPT);
+}
+
+/* -- rfc7539esp: ESP variant with 4-byte salt + 8-byte IV --------------- */
+
+/*
+ * ESP setkey: 36 bytes = 32-byte ChaCha20 key + 4-byte salt.
+ * The salt is prepended to the 8-byte per-packet IV from the ESP header
+ * to form the 12-byte RFC 7539 nonce.
+ */
+static int cmh_ccp_esp_setkey(struct crypto_aead *tfm, const u8 *key,
+                             unsigned int keylen)
+{
+       struct cmh_ccp_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+
+       if (keylen != CHACHA_KEY_SIZE + CCP_ESP_SALT_SIZE)
+               return -EINVAL;
+
+       memcpy(tctx->salt, key + CHACHA_KEY_SIZE, CCP_ESP_SALT_SIZE);
+       return cmh_key_setkey_raw(&tctx->key, key, CHACHA_KEY_SIZE, CORE_ID_CCP);
+}
+
+/* Registration */
+
+static struct aead_alg cmh_rfc7539_alg = {
+       .setkey      = cmh_ccp_aead_setkey,
+       .setauthsize = cmh_ccp_aead_setauthsize,
+       .encrypt     = cmh_ccp_aead_encrypt,
+       .decrypt     = cmh_ccp_aead_decrypt,
+       .init        = cmh_ccp_aead_init_tfm,
+       .exit        = cmh_ccp_aead_exit_tfm,
+       .ivsize      = CCP_AEAD_IV_SIZE,
+       .maxauthsize = CCP_AEAD_TAG_SIZE,
+       .base        = {
+               .cra_name        = "rfc7539(chacha20,poly1305)",
+               .cra_driver_name = "cri-cmh-rfc7539-chacha20-poly1305",
+               .cra_priority    = 300,
+               .cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                  CRYPTO_ALG_ASYNC,
+               .cra_blocksize   = 1,
+               .cra_ctxsize     = sizeof(struct cmh_ccp_aead_tfm_ctx),
+               .cra_module      = THIS_MODULE,
+       },
+};
+
+static struct aead_alg cmh_rfc7539esp_alg = {
+       .setkey      = cmh_ccp_esp_setkey,
+       .setauthsize = cmh_ccp_aead_setauthsize,
+       .encrypt     = cmh_ccp_aead_encrypt,
+       .decrypt     = cmh_ccp_aead_decrypt,
+       .init        = cmh_ccp_aead_init_tfm,
+       .exit        = cmh_ccp_aead_exit_tfm,
+       .ivsize      = CCP_ESP_IV_SIZE,
+       .maxauthsize = CCP_AEAD_TAG_SIZE,
+       .base        = {
+               .cra_name        = "rfc7539esp(chacha20,poly1305)",
+               .cra_driver_name = "cri-cmh-rfc7539esp-chacha20-poly1305",
+               .cra_priority    = 300,
+               .cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                  CRYPTO_ALG_ASYNC,
+               .cra_blocksize   = 1,
+               .cra_ctxsize     = sizeof(struct cmh_ccp_aead_tfm_ctx),
+               .cra_module      = THIS_MODULE,
+       },
+};
+
+/**
+ * cmh_ccp_aead_register() - Register ChaCha20-Poly1305 AEAD algorithm with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_ccp_aead_register(void)
+{
+       int ret;
+
+       ret = crypto_register_aead(&cmh_rfc7539_alg);
+       if (ret) {
+               dev_err(cmh_dev(), "cmh_ccp_aead: failed to register rfc7539 (rc=%d)\n",
+                       ret);
+               return ret;
+       }
+       dev_dbg(cmh_dev(), "cmh_ccp_aead: registered rfc7539(chacha20,poly1305)\n");
+
+       ret = crypto_register_aead(&cmh_rfc7539esp_alg);
+       if (ret) {
+               dev_err(cmh_dev(), "cmh_ccp_aead: failed to register rfc7539esp (rc=%d)\n",
+                       ret);
+               crypto_unregister_aead(&cmh_rfc7539_alg);
+               return ret;
+       }
+       dev_dbg(cmh_dev(), "cmh_ccp_aead: registered rfc7539esp(chacha20,poly1305)\n");
+
+       return 0;
+}
+
+/**
+ * cmh_ccp_aead_unregister() - Unregister ChaCha20-Poly1305 AEAD algorithms
+ */
+void cmh_ccp_aead_unregister(void)
+{
+       crypto_unregister_aead(&cmh_rfc7539esp_alg);
+       crypto_unregister_aead(&cmh_rfc7539_alg);
+       dev_dbg(cmh_dev(), "cmh_ccp_aead: unregistered rfc7539/rfc7539esp\n");
+}
diff --git a/drivers/crypto/cmh/cmh_ccp_poly.c b/drivers/crypto/cmh/cmh_ccp_poly.c
new file mode 100644
index 000000000000..020a98fbe607
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_ccp_poly.c
@@ -0,0 +1,528 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API Poly1305 (ahash) Driver
+ *
+ * Registers "poly1305" as an ahash algorithm with the Linux crypto
+ * subsystem, backed by the CMH CCP core.
+ *
+ * Poly1305 is a one-time authenticator that produces a 16-byte MAC.
+ * It requires two 16-byte keys: r (clamped multiplier) and s (nonce).
+ *
+ * Key format: 32 bytes = r_key[0..15] || s_key[16..31]
+ * This matches the Poly1305 key layout in RFC 7539 S2.5.
+ *
+ * VCQ sequence:
+ *   SYS_CMD_WRITE(s_key) + SYS_CMD_WRITE(r_key)
+ *   + CCP_CMD_POLY1305_INIT + CCP_CMD_FINAL + CCP_CMD_FLUSH
+ *
+ * Both keys are written to SYS_REF_TEMP; the CMH eSW stacks them
+ * so that POLY1305_INIT finds r_key (most recent) as rkey and
+ * s_key (previous) as skey.
+ *
+ * The ahash interface accumulates data via .update() and submits the
+ * full VCQ asynchronously in .final().
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/hash.h>
+#include <crypto/scatterwalk.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_ccp.h"
+#include "cmh_vcq.h"
+#include "cmh_ccp_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+#define POLY1305_DIGEST_SIZE   16U
+#define POLY1305_BLOCK_SIZE    16U
+#define POLY1305_KEY_SIZE      32U     /* r(16) + s(16) */
+
+/*
+ * Maximum accumulated data for Poly1305 -- driver-imposed, not HW.
+ *
+ * The CCP core does not expose external save/restore VCQ commands,
+ * so the driver must accumulate all data in kernel memory via
+ * .update() and submit it atomically in .final().  This cap limits
+ * the per-request kernel allocation.
+ */
+#define POLY_MAX_DATA          (64 * 1024)
+
+/*
+ * Per-transform context -- stores the raw 32-byte key (r || s).
+ *
+ * Only the raw-key path is supported for standalone Poly1305.
+ */
+struct cmh_poly_tfm_ctx {
+       u8  key[POLY1305_KEY_SIZE];
+       dma_addr_t rkey_dma;
+       dma_addr_t skey_dma;
+       u32 keylen;
+       bool has_key;
+       spinlock_t         chunk_lock;  /* protects all_chunks */
+       struct list_head   all_chunks;  /* orphan-safe chunk tracking */
+};
+
+/* Chunk node for O(1) update() appends */
+struct cmh_poly_chunk {
+       struct list_head list;
+       struct list_head tfm_node; /* per-tfm orphan tracking */
+       u32 len;
+       u8  data[];
+};
+
+/* Per-request context (lives in ahash_request::__ctx) */
+
+/*
+ * Maximum payload commands:
+ *   SYS_CMD_WRITE(s) + SYS_CMD_WRITE(r) + POLY1305_INIT
+ *   + CCP_CMD_FINAL + FLUSH = 5
+ */
+#define CMH_POLY_MAX_PAYLOAD   5
+#define CMH_POLY_MAX_PACKED    (CMH_POLY_MAX_PAYLOAD * 2)
+
+struct cmh_poly_reqctx {
+       struct list_head chunks;
+       u32  total_len;
+       u8  *buf;               /* linearised in final() */
+       /* DMA state for async final */
+       dma_addr_t in_dma;
+       dma_addr_t tag_dma;
+       u8 *tag_buf;
+       struct vcq_cmd packed[CMH_POLY_MAX_PACKED];
+};
+
+/*
+ * Export/import: not supported.
+ *
+ * The CCP core lacks external save/restore VCQ commands, so there is
+ * no way to checkpoint intermediate Poly1305 state to host memory.
+ * Pending eSW ABI extension to add save/restore for the CCP core.
+ */
+
+static void vcq_add_ccp_poly_init(struct vcq_cmd *slot, u32 core_id,
+                                 u64 rkey_ref, u32 rkeylen,
+                                 u64 skey_ref, u32 skeylen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, CCP_CMD_POLY1305_INIT);
+       slot->hwc.ccp.cmd_poly.rkey = rkey_ref;
+       slot->hwc.ccp.cmd_poly.rkeylen = rkeylen;
+       slot->hwc.ccp.cmd_poly.skey = skey_ref;
+       slot->hwc.ccp.cmd_poly.skeylen = skeylen;
+}
+
+static void vcq_add_ccp_poly_final(struct vcq_cmd *slot, u32 core_id,
+                                  u64 input_dma, u64 tag_dma,
+                                  u32 iolen, u32 taglen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, CCP_CMD_FINAL);
+       slot->hwc.ccp.cmd_final.input = input_dma;
+       slot->hwc.ccp.cmd_final.output = 0;
+       slot->hwc.ccp.cmd_final.tag = tag_dma;
+       slot->hwc.ccp.cmd_final.iolen = iolen;
+       slot->hwc.ccp.cmd_final.taglen = taglen;
+}
+
+static int cmh_poly_setkey(struct crypto_ahash *tfm, const u8 *key,
+                          unsigned int keylen)
+{
+       struct cmh_poly_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+
+       /* Poly1305: exactly 32 bytes (r[16] + s[16]) */
+       if (keylen != POLY1305_KEY_SIZE)
+               return -EINVAL;
+
+       /* Unmap old key DMA if re-keying */
+       if (tctx->has_key) {
+               cmh_dma_unmap_single(tctx->rkey_dma, CCP_POLY_KEY_SIZE,
+                                    DMA_TO_DEVICE);
+               cmh_dma_unmap_single(tctx->skey_dma, CCP_POLY_KEY_SIZE,
+                                    DMA_TO_DEVICE);
+       }
+
+       memcpy(tctx->key, key, POLY1305_KEY_SIZE);
+       tctx->keylen = POLY1305_KEY_SIZE;
+
+       /*
+        * Pre-map both key halves for DMA.  The key buffer lives in
+        * the tfm context and is stable until exit_tfm() or re-setkey.
+        */
+       tctx->skey_dma = cmh_dma_map_single(tctx->key + CCP_POLY_KEY_SIZE,
+                                           CCP_POLY_KEY_SIZE,
+                                            DMA_TO_DEVICE);
+       if (cmh_dma_map_error(tctx->skey_dma)) {
+               tctx->has_key = false;
+               return -ENOMEM;
+       }
+
+       tctx->rkey_dma = cmh_dma_map_single(tctx->key, CCP_POLY_KEY_SIZE,
+                                           DMA_TO_DEVICE);
+       if (cmh_dma_map_error(tctx->rkey_dma)) {
+               cmh_dma_unmap_single(tctx->skey_dma, CCP_POLY_KEY_SIZE,
+                                    DMA_TO_DEVICE);
+               tctx->has_key = false;
+               return -ENOMEM;
+       }
+
+       tctx->has_key = true;
+       return 0;
+}
+
+static void cmh_poly_free_chunks(struct cmh_poly_reqctx *rctx,
+                                struct cmh_poly_tfm_ctx *tctx)
+{
+       struct cmh_poly_chunk *c, *tmp;
+
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(c, tmp, &rctx->chunks, list) {
+               list_del(&c->list);
+               list_del(&c->tfm_node);
+               kfree_sensitive(c);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+}
+
+static int cmh_poly_init(struct ahash_request *req)
+{
+       struct cmh_poly_reqctx *rctx = ahash_request_ctx(req);
+
+       memset(rctx, 0, sizeof(*rctx));
+       INIT_LIST_HEAD(&rctx->chunks);
+       return 0;
+}
+
+static int cmh_poly_update(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_poly_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_poly_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_poly_chunk *chunk;
+       gfp_t gfp;
+       int ret;
+
+       if (!req->nbytes)
+               return 0;
+
+       if (req->nbytes > POLY_MAX_DATA - rctx->total_len) {
+               ret = -EINVAL;
+               goto err_free_chunks;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+       chunk = kmalloc(sizeof(*chunk) + req->nbytes, gfp);
+       if (!chunk) {
+               ret = -ENOMEM;
+               goto err_free_chunks;
+       }
+
+       chunk->len = req->nbytes;
+       if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+               memcpy(chunk->data, req->svirt, req->nbytes);
+       else
+               scatterwalk_map_and_copy(chunk->data, req->src,
+                                        0, req->nbytes, 0);
+       list_add_tail(&chunk->list, &rctx->chunks);
+       spin_lock_bh(&tctx->chunk_lock);
+       list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->total_len += req->nbytes;
+       return 0;
+
+err_free_chunks:
+       /*
+        * Terminal error -- free all previously accumulated chunks.
+        * Callers may not call .final() on error, so they would leak.
+        */
+       cmh_poly_free_chunks(rctx, tctx);
+       return ret;
+}
+
+static void cmh_poly_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_poly_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_poly_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (rctx->total_len > 0)
+               cmh_dma_unmap_single(rctx->in_dma, rctx->total_len,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->tag_dma, POLY1305_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->tag_buf, POLY1305_DIGEST_SIZE);
+
+       kfree(rctx->tag_buf);
+       rctx->tag_buf = NULL;
+       cmh_poly_free_chunks(rctx, tctx);
+       kfree_sensitive(rctx->buf);
+       rctx->buf = NULL;
+       rctx->total_len = 0;
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_poly_final(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_poly_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_poly_reqctx *rctx = ahash_request_ctx(req);
+       struct vcq_cmd cmds[CMH_POLY_MAX_PAYLOAD];
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (!tctx->has_key) {
+               ret = -ENOKEY;
+               goto out_free_chunks;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       /* Linearise chunks into a single contiguous buffer for DMA */
+       if (rctx->total_len > 0) {
+               struct cmh_poly_chunk *c;
+               u32 off = 0;
+
+               rctx->buf = kmalloc(rctx->total_len, gfp);
+               if (!rctx->buf) {
+                       ret = -ENOMEM;
+                       goto out_free_chunks;
+               }
+               list_for_each_entry(c, &rctx->chunks, list) {
+                       memcpy(rctx->buf + off, c->data, c->len);
+                       off += c->len;
+               }
+       }
+
+       /* Tag output buffer */
+       rctx->tag_buf = kzalloc(POLY1305_DIGEST_SIZE, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_free_buf;
+       }
+
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf,
+                                          POLY1305_DIGEST_SIZE,
+                                           DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       /* Map input data */
+       if (rctx->total_len > 0) {
+               rctx->in_dma = cmh_dma_map_single(rctx->buf, rctx->total_len,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->in_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+       }
+
+       /*
+        * Key DMA handles are pre-mapped in setkey() and live in
+        * the tfm context.  Use them directly for the VCQ writes.
+        */
+
+       d = cmh_core_select_instance(CMH_CORE_CCP);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+       idx = 0;
+
+       /* Write s_key to SYS_REF_TEMP first (bottom of stack) */
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)tctx->skey_dma, SYS_REF_NONE,
+                         CCP_POLY_KEY_SIZE,
+                         SYS_TYPE_SET(SYS_TYPE_FLAG_PT, CORE_ID_CCP));
+
+       /* Write r_key to SYS_REF_TEMP second (top of stack) */
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)tctx->rkey_dma, SYS_REF_NONE,
+                         CCP_POLY_KEY_SIZE,
+                         SYS_TYPE_SET(SYS_TYPE_FLAG_PT, CORE_ID_CCP));
+
+       /* POLY1305_INIT: rkey=TEMP (top), skey=TEMP (next) */
+       vcq_add_ccp_poly_init(&cmds[idx++], core_id, SYS_REF_TEMP,
+                             CCP_POLY_KEY_SIZE, SYS_REF_TEMP,
+                             CCP_POLY_KEY_SIZE);
+
+       /* FINAL: data -> tag */
+       vcq_add_ccp_poly_final(&cmds[idx++], core_id,
+                              rctx->total_len > 0 ? (u64)rctx->in_dma : 0,
+                              (u64)rctx->tag_dma, rctx->total_len,
+                              POLY1305_DIGEST_SIZE);
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_POLY_MAX_PACKED, target_mbx,
+                                           cmh_poly_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_unmap_in;
+
+       return -EINPROGRESS;
+
+out_unmap_in:
+       if (rctx->total_len > 0 && rctx->in_dma)
+               cmh_dma_unmap_single(rctx->in_dma, rctx->total_len,
+                                    DMA_TO_DEVICE);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, POLY1305_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_free_buf:
+       kfree_sensitive(rctx->buf);
+       rctx->buf = NULL;
+out_free_chunks:
+       cmh_poly_free_chunks(rctx, tctx);
+       rctx->total_len = 0;
+       return ret;
+}
+
+static int cmh_poly_export(struct ahash_request *req, void *out)
+{
+       return -EOPNOTSUPP;
+}
+
+static int cmh_poly_import(struct ahash_request *req, const void *in)
+{
+       return -EOPNOTSUPP;
+}
+
+static int cmh_poly_finup(struct ahash_request *req)
+{
+       int err;
+
+       err = cmh_poly_update(req);
+       if (err)
+               return err;
+       return cmh_poly_final(req);
+}
+
+static int cmh_poly_digest(struct ahash_request *req)
+{
+       int err;
+
+       err = cmh_poly_init(req);
+       if (err)
+               return err;
+       return cmh_poly_finup(req);
+}
+
+static int cmh_poly_init_tfm(struct crypto_ahash *tfm)
+{
+       struct cmh_poly_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       spin_lock_init(&tctx->chunk_lock);
+       INIT_LIST_HEAD(&tctx->all_chunks);
+       crypto_ahash_set_reqsize(tfm, sizeof(struct cmh_poly_reqctx));
+       return 0;
+}
+
+static void cmh_poly_exit_tfm(struct crypto_ahash *tfm)
+{
+       struct cmh_poly_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_poly_chunk *c, *tmp;
+
+       /* Free any orphaned chunks (e.g. testmgr export/reimport poison) */
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(c, tmp, &tctx->all_chunks, tfm_node) {
+               list_del(&c->tfm_node);
+               kfree_sensitive(c);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+
+       if (tctx->has_key) {
+               cmh_dma_unmap_single(tctx->rkey_dma, CCP_POLY_KEY_SIZE,
+                                    DMA_TO_DEVICE);
+               cmh_dma_unmap_single(tctx->skey_dma, CCP_POLY_KEY_SIZE,
+                                    DMA_TO_DEVICE);
+       }
+       memzero_explicit(tctx->key, POLY1305_KEY_SIZE);
+}
+
+static struct ahash_alg cmh_poly1305_alg = {
+       .init           = cmh_poly_init,
+       .update         = cmh_poly_update,
+       .final          = cmh_poly_final,
+       .finup          = cmh_poly_finup,
+       .digest         = cmh_poly_digest,
+       .export         = cmh_poly_export,
+       .import         = cmh_poly_import,
+       .setkey         = cmh_poly_setkey,
+       .init_tfm       = cmh_poly_init_tfm,
+       .exit_tfm       = cmh_poly_exit_tfm,
+       .halg           = {
+               .digestsize     = POLY1305_DIGEST_SIZE,
+               .statesize      = sizeof(struct cmh_poly_reqctx),
+               .base           = {
+                       .cra_name        = "poly1305",
+                       .cra_driver_name = "cri-cmh-poly1305",
+                       .cra_priority    = 300,
+                       .cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                          CRYPTO_ALG_NO_FALLBACK |
+                                          CRYPTO_ALG_ASYNC |
+                                          CRYPTO_ALG_REQ_VIRT,
+                       .cra_blocksize   = POLY1305_BLOCK_SIZE,
+                       .cra_ctxsize     = sizeof(struct cmh_poly_tfm_ctx),
+                       .cra_module      = THIS_MODULE,
+               },
+       },
+};
+
+/**
+ * cmh_ccp_poly_register() - Register Poly1305 hash algorithm with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_ccp_poly_register(void)
+{
+       int ret;
+
+       ret = crypto_register_ahash(&cmh_poly1305_alg);
+       if (ret)
+               dev_err(cmh_dev(), "cmh_ccp_poly: failed to register poly1305 (rc=%d)\n",
+                       ret);
+       else
+               dev_dbg(cmh_dev(), "cmh_ccp_poly: registered poly1305\n");
+
+       return ret;
+}
+
+/**
+ * cmh_ccp_poly_unregister() - Unregister Poly1305 hash algorithm from the crypto framework
+ */
+void cmh_ccp_poly_unregister(void)
+{
+       crypto_unregister_ahash(&cmh_poly1305_alg);
+       dev_dbg(cmh_dev(), "cmh_ccp_poly: unregistered poly1305\n");
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 5d67a4a12333..79df27d43e7e 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -36,6 +36,7 @@
 #include "cmh_sm3.h"
 #include "cmh_aes.h"
 #include "cmh_sm4.h"
+#include "cmh_ccp.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -253,6 +254,21 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_sm4_cmac_register;

+       /* Register CCP ChaCha20 skcipher algorithm */
+       ret = cmh_ccp_register();
+       if (ret)
+               goto err_ccp_register;
+
+       /* Register CCP ChaCha20-Poly1305 AEAD (RFC 7539) */
+       ret = cmh_ccp_aead_register();
+       if (ret)
+               goto err_ccp_aead_register;
+
+       /* Register CCP Poly1305 shash algorithm */
+       ret = cmh_ccp_poly_register();
+       if (ret)
+               goto err_ccp_poly_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -265,6 +281,12 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_ccp_poly_unregister();
+err_ccp_poly_register:
+       cmh_ccp_aead_unregister();
+err_ccp_aead_register:
+       cmh_ccp_unregister();
+err_ccp_register:
        cmh_sm4_cmac_unregister();
 err_sm4_cmac_register:
        cmh_sm4_aead_unregister();
@@ -313,6 +335,9 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_ccp_poly_unregister();
+       cmh_ccp_aead_unregister();
+       cmh_ccp_unregister();
        cmh_sm4_cmac_unregister();
        cmh_sm4_aead_unregister();
        cmh_sm4_unregister();
diff --git a/drivers/crypto/cmh/include/cmh_ccp.h b/drivers/crypto/cmh/include/cmh_ccp.h
new file mode 100644
index 000000000000..363d208cbceb
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_ccp.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- CCP Crypto API Drivers
+ *
+ * Registers CCP algorithms with the Linux crypto subsystem:
+ *   skcipher: chacha20
+ *   shash:    poly1305
+ *   aead:     rfc7539(chacha20poly1305)
+ */
+
+#ifndef CMH_CCP_H
+#define CMH_CCP_H
+
+int  cmh_ccp_register(void);
+void cmh_ccp_unregister(void);
+
+int  cmh_ccp_aead_register(void);
+void cmh_ccp_aead_unregister(void);
+
+int  cmh_ccp_poly_register(void);
+void cmh_ccp_poly_unregister(void);
+
+#endif /* CMH_CCP_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 15/19] crypto: cmh - add ML-KEM/ML-DSA (QSE)
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register ML-KEM (Kyber) and ML-DSA (Dilithium) algorithms using
the CMH QSE core (core ID 0x09).  ML-KEM is ioctl-only (keygen,
encaps, decaps).  ML-DSA is registered as a sig algorithm with
priority 5001 to override the kernel's verify-only mldsa
implementation at priority 5000.  This follows the established
pattern where hardware drivers override software-only fallbacks
(e.g. ccp at 300 over generic AES at 100, qat similarly).  The
CMH driver provides full HW-accelerated sign + verify vs the
kernel's verify-only software implementation.

Includes cmh_pqc_sizes.c with compile-time tables of PQC key and
signature sizes for all supported parameter sets.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile        |   5 +-
 drivers/crypto/cmh/cmh_main.c      |   9 +
 drivers/crypto/cmh/cmh_pqc_mldsa.c | 394 +++++++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_pqc_sizes.c |  39 +++
 drivers/crypto/cmh/cmh_qse.c       | 211 +++++++++++++++
 5 files changed, 657 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_pqc_mldsa.c
 create mode 100644 drivers/crypto/cmh/cmh_pqc_sizes.c
 create mode 100644 drivers/crypto/cmh/cmh_qse.c

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index a4cea0a56fc1..3425eb65d653 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -33,7 +33,10 @@ cmh-y := \
        cmh_pke_common.o \
        cmh_pke_rsa.o \
        cmh_pke_ecdsa.o \
-       cmh_pke_ecdh.o
+       cmh_pke_ecdh.o \
+       cmh_qse.o \
+       cmh_pqc_mldsa.o \
+       cmh_pqc_sizes.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index ea0f32b941f5..df38f43dc179 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -39,6 +39,7 @@
 #include "cmh_sm4.h"
 #include "cmh_ccp.h"
 #include "cmh_pke.h"
+#include "cmh_pqc.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -291,6 +292,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_pke_ecdh_register;

+       /* Register PQC ML-KEM/ML-DSA */
+       ret = cmh_pqc_mldsa_register();
+       if (ret)
+               goto err_pqc_mldsa_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -303,6 +309,8 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_pqc_mldsa_unregister();
+err_pqc_mldsa_register:
        cmh_pke_ecdh_unregister();
 err_pke_ecdh_register:
        cmh_pke_ecdsa_unregister();
@@ -365,6 +373,7 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_pqc_mldsa_unregister();
        cmh_pke_ecdh_unregister();
        cmh_pke_ecdsa_unregister();
        cmh_pke_rsa_unregister();
diff --git a/drivers/crypto/cmh/cmh_pqc_mldsa.c b/drivers/crypto/cmh/cmh_pqc_mldsa.c
new file mode 100644
index 000000000000..cbe63c34a1c8
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pqc_mldsa.c
@@ -0,0 +1,394 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- ML-DSA Signature Driver (sig_alg, synchronous)
+ *
+ * Registers "mldsa44", "mldsa65", "mldsa87" sig algorithms
+ * with sign, verify, set_pub_key, and set_priv_key callbacks.
+ *
+ * Key format:
+ *   Public key  = raw pk bytes (1312 / 1952 / 2592 bytes)
+ *   Private key = raw sk bytes (2560 / 4032 / 4896 bytes)
+ *
+ * Sign: src = message bytes (up to 10240 bytes), dst = raw signature
+ * Verify: src = raw signature, digest = message bytes
+ *
+ * Non-masked mode only for sig_alg API.
+ * Masked mode available through /dev/cmh_mgmt ioctl.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <crypto/sig.h>
+#include <crypto/internal/sig.h>
+
+#include "cmh_sys.h"
+#include "cmh_qse_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+#include "cmh_pqc.h"
+
+struct cmh_mldsa_tfm_ctx {
+       struct cmh_key_ctx key;         /* private key (raw only) */
+       u8 *pub_key;
+       u32 pub_key_len;
+       u32 mode;                       /* ML_DSA_MODE_44/65/87 */
+       int mode_idx;                   /* index into size tables */
+};
+
+static inline struct cmh_mldsa_tfm_ctx *cmh_mldsa_ctx(struct crypto_sig *tfm)
+{
+       return crypto_sig_ctx(tfm);
+}
+
+/*
+ * ML-DSA sign (synchronous sig_alg)
+ *
+ * @src:  message bytes
+ * @slen: message length
+ * @dst:  signature output buffer
+ * @dlen: output buffer length
+ *
+ * Returns signature length on success, negative errno on failure.
+ */
+static int cmh_mldsa_sign(struct crypto_sig *tfm,
+                         const void *src, unsigned int slen,
+                         void *dst, unsigned int dlen)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+       int mi = ctx->mode_idx;
+       u32 sig_size = ml_dsa_sig_size[mi];
+       u32 sk_size = ml_dsa_sk_size[mi];
+       struct vcq_cmd vcq[QSE_VCQ_CMDS_MIN];
+       struct core_dispatch dd;
+       u8 *m_buf = NULL, *sig_buf = NULL, *sk_buf = NULL;
+       dma_addr_t m_dma = DMA_MAPPING_ERROR;
+       dma_addr_t sig_dma = DMA_MAPPING_ERROR;
+       dma_addr_t sk_dma = DMA_MAPPING_ERROR;
+       int ret, idx;
+
+       if (ctx->key.mode != CMH_KEY_RAW)
+               return -EINVAL;
+       if (dlen < sig_size)
+               return -EINVAL;
+       if (!slen || slen > ML_DSA_MAX_MLEN)
+               return -EINVAL;
+
+       m_buf = kmemdup(src, slen, GFP_KERNEL);
+       sig_buf = kzalloc(sig_size, GFP_KERNEL);
+       if (!m_buf || !sig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (ctx->key.raw.len != sk_size) {
+               ret = -EINVAL;
+               goto out_free;
+       }
+
+       sk_buf = kmemdup(ctx->key.raw.data, ctx->key.raw.len, GFP_KERNEL);
+       if (!sk_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       m_dma = cmh_dma_map_single(m_buf, slen, DMA_TO_DEVICE);
+       sig_dma = cmh_dma_map_single(sig_buf, sig_size, DMA_FROM_DEVICE);
+       sk_dma = cmh_dma_map_single(sk_buf, sk_size, DMA_TO_DEVICE);
+
+       if (cmh_dma_map_error(m_dma) || cmh_dma_map_error(sig_dma) ||
+           cmh_dma_map_error(sk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       dd = cmh_core_select_instance(CMH_CORE_QSE);
+
+       vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+       idx = 1;
+       vcq_add_qse_ml_dsa_sign(&vcq[idx++], dd.core_id, ctx->mode,
+                               QSE_FLAG_USE_RNG,
+                               0, m_dma, sk_dma, sig_dma, slen, false);
+       vcq_add_qse_flush(&vcq[idx++], dd.core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN, 1,
+                                    dd.mbx_idx);
+       if (!ret) {
+               /* Sync bounce buffer so CPU sees the DMA-written signature */
+               cmh_dma_sync_for_cpu(sig_dma, sig_size, DMA_FROM_DEVICE);
+               memcpy(dst, sig_buf, sig_size);
+               ret = sig_size;
+       }
+
+out_unmap:
+       if (!cmh_dma_map_error(sk_dma))
+               cmh_dma_unmap_single(sk_dma, sk_size, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_size, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, slen, DMA_TO_DEVICE);
+
+out_free:
+       kfree_sensitive(sk_buf);
+       kfree(sig_buf);
+       kfree(m_buf);
+       return ret;
+}
+
+/*
+ * ML-DSA verify (synchronous sig_alg)
+ *
+ * @src:    raw signature
+ * @slen:   signature length
+ * @digest: message bytes
+ * @dlen:   message length
+ *
+ * Returns 0 on successful verification, negative errno on failure.
+ */
+static int cmh_mldsa_verify(struct crypto_sig *tfm,
+                           const void *src, unsigned int slen,
+                           const void *digest, unsigned int dlen)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+       int mi = ctx->mode_idx;
+       u32 sig_size = ml_dsa_sig_size[mi];
+       u32 pk_size = ml_dsa_pk_size[mi];
+       struct core_dispatch d = cmh_core_select_instance(CMH_CORE_QSE);
+       struct vcq_cmd vcq[QSE_VCQ_CMDS_MIN];
+       u8 *sig_buf = NULL, *m_buf = NULL, *pk_buf = NULL;
+       dma_addr_t sig_dma = DMA_MAPPING_ERROR;
+       dma_addr_t m_dma = DMA_MAPPING_ERROR;
+       dma_addr_t pk_dma = DMA_MAPPING_ERROR;
+       int ret;
+
+       if (!ctx->pub_key)
+               return -EINVAL;
+       if (slen != sig_size)
+               return -EINVAL;
+       if (!dlen || dlen > ML_DSA_MAX_MLEN)
+               return -EINVAL;
+
+       sig_buf = kmemdup(src, slen, GFP_KERNEL);
+       m_buf = kmemdup(digest, dlen, GFP_KERNEL);
+       pk_buf = kmemdup(ctx->pub_key, pk_size, GFP_KERNEL);
+       if (!sig_buf || !m_buf || !pk_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       sig_dma = cmh_dma_map_single(sig_buf, sig_size, DMA_TO_DEVICE);
+       m_dma = cmh_dma_map_single(m_buf, dlen, DMA_TO_DEVICE);
+       pk_dma = cmh_dma_map_single(pk_buf, pk_size, DMA_TO_DEVICE);
+
+       if (cmh_dma_map_error(sig_dma) || cmh_dma_map_error(m_dma) ||
+           cmh_dma_map_error(pk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+       vcq_add_qse_ml_dsa_verify(&vcq[1], d.core_id, ctx->mode, 0,
+                                 m_dma, pk_dma, sig_dma, dlen);
+       vcq_add_qse_flush(&vcq[2], d.core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN, 1, d.mbx_idx);
+
+out_unmap:
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, pk_size, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, dlen, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_size, DMA_TO_DEVICE);
+
+out_free:
+       kfree(pk_buf);
+       kfree(m_buf);
+       kfree(sig_buf);
+       return ret;
+}
+
+static int cmh_mldsa_set_pub_key(struct crypto_sig *tfm,
+                                const void *key, unsigned int keylen)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+       u32 expected = ml_dsa_pk_size[ctx->mode_idx];
+
+       if (keylen != expected)
+               return -EINVAL;
+
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+       ctx->pub_key_len = 0;
+
+       ctx->pub_key = kmemdup(key, keylen, GFP_KERNEL);
+       if (!ctx->pub_key)
+               return -ENOMEM;
+
+       ctx->pub_key_len = keylen;
+       return 0;
+}
+
+static int cmh_mldsa_set_priv_key(struct crypto_sig *tfm,
+                                 const void *key, unsigned int keylen)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+       u32 expected = ml_dsa_sk_size[ctx->mode_idx];
+
+       if (keylen != expected)
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&ctx->key, key, keylen, CORE_ID_QSE);
+}
+
+static unsigned int cmh_mldsa_key_size(struct crypto_sig *tfm)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+
+       /* crypto_sig_keysize() returns bits, not bytes */
+       return ml_dsa_pk_size[ctx->mode_idx] * 8;
+}
+
+static unsigned int cmh_mldsa_max_size(struct crypto_sig *tfm)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+
+       return ml_dsa_sig_size[ctx->mode_idx];
+}
+
+static int cmh_mldsa_44_init(struct crypto_sig *tfm)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->mode = ML_DSA_MODE_44;
+       ctx->mode_idx = 0;
+       return 0;
+}
+
+static int cmh_mldsa_65_init(struct crypto_sig *tfm)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->mode = ML_DSA_MODE_65;
+       ctx->mode_idx = 1;
+       return 0;
+}
+
+static int cmh_mldsa_87_init(struct crypto_sig *tfm)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->mode = ML_DSA_MODE_87;
+       ctx->mode_idx = 2;
+       return 0;
+}
+
+static void cmh_mldsa_exit(struct crypto_sig *tfm)
+{
+       struct cmh_mldsa_tfm_ctx *ctx = cmh_mldsa_ctx(tfm);
+
+       cmh_key_destroy(&ctx->key);
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+}
+
+/*
+ * Priority 5001: the kernel's software ML-DSA (crypto/mldsa.c) registers
+ * at priority 5000 but only implements verify -- sign returns -EOPNOTSUPP.
+ * We provide full HW-accelerated sign + verify, so we must override.
+ */
+static struct sig_alg cmh_mldsa_algs[] = {
+       {
+               .sign           = cmh_mldsa_sign,
+               .verify         = cmh_mldsa_verify,
+               .set_pub_key    = cmh_mldsa_set_pub_key,
+               .set_priv_key   = cmh_mldsa_set_priv_key,
+               .key_size       = cmh_mldsa_key_size,
+               .max_size       = cmh_mldsa_max_size,
+               .init           = cmh_mldsa_44_init,
+               .exit           = cmh_mldsa_exit,
+               .base = {
+                       .cra_name         = "mldsa44",
+                       .cra_driver_name  = "cri-cmh-mldsa44",
+                       .cra_priority     = 5001,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_mldsa_tfm_ctx),
+               },
+       },
+       {
+               .sign           = cmh_mldsa_sign,
+               .verify         = cmh_mldsa_verify,
+               .set_pub_key    = cmh_mldsa_set_pub_key,
+               .set_priv_key   = cmh_mldsa_set_priv_key,
+               .key_size       = cmh_mldsa_key_size,
+               .max_size       = cmh_mldsa_max_size,
+               .init           = cmh_mldsa_65_init,
+               .exit           = cmh_mldsa_exit,
+               .base = {
+                       .cra_name         = "mldsa65",
+                       .cra_driver_name  = "cri-cmh-mldsa65",
+                       .cra_priority     = 5001,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_mldsa_tfm_ctx),
+               },
+       },
+       {
+               .sign           = cmh_mldsa_sign,
+               .verify         = cmh_mldsa_verify,
+               .set_pub_key    = cmh_mldsa_set_pub_key,
+               .set_priv_key   = cmh_mldsa_set_priv_key,
+               .key_size       = cmh_mldsa_key_size,
+               .max_size       = cmh_mldsa_max_size,
+               .init           = cmh_mldsa_87_init,
+               .exit           = cmh_mldsa_exit,
+               .base = {
+                       .cra_name         = "mldsa87",
+                       .cra_driver_name  = "cri-cmh-mldsa87",
+                       .cra_priority     = 5001,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_mldsa_tfm_ctx),
+               },
+       },
+};
+
+/**
+ * cmh_pqc_mldsa_register() - Register ML-DSA akcipher algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_pqc_mldsa_register(void)
+{
+       int ret, i;
+
+       for (i = 0; i < ARRAY_SIZE(cmh_mldsa_algs); i++) {
+               ret = crypto_register_sig(&cmh_mldsa_algs[i]);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh: failed to register %s (%d)\n",
+                               cmh_mldsa_algs[i].base.cra_name, ret);
+                       goto err_unregister;
+               }
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_sig(&cmh_mldsa_algs[i]);
+       return ret;
+}
+
+/**
+ * cmh_pqc_mldsa_unregister() - Unregister ML-DSA akcipher algorithms from the crypto framework
+ */
+void cmh_pqc_mldsa_unregister(void)
+{
+       int i = ARRAY_SIZE(cmh_mldsa_algs);
+
+       while (i--)
+               crypto_unregister_sig(&cmh_mldsa_algs[i]);
+}
diff --git a/drivers/crypto/cmh/cmh_pqc_sizes.c b/drivers/crypto/cmh/cmh_pqc_sizes.c
new file mode 100644
index 000000000000..39e3d56f4312
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pqc_sizes.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- PQC Algorithm Size Tables
+ *
+ * Centralised ML-DSA and SLH-DSA parameter-size arrays.  Declared
+ * extern in cmh_qse_abi.h / cmh_hcq_abi.h, defined here once to
+ * avoid per-TU duplication.
+ */
+
+#include <linux/build_bug.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "cmh_qse_abi.h"
+#include "cmh_hcq_abi.h"
+
+/* ML-DSA size tables (indexed by ml_dsa_mode_idx()) */
+const u32 ml_dsa_pk_size[3]  = { 1312U, 1952U, 2592U };
+const u32 ml_dsa_sk_size[3]  = { 2560U, 4032U, 4896U };
+const u32 ml_dsa_sk_size_masked[3] = { 3360U, 5472U, 6368U };
+const u32 ml_dsa_sig_size[3] = { 2420U, 3309U, 4627U };
+
+static_assert(ARRAY_SIZE(ml_dsa_pk_size) == ARRAY_SIZE(ml_dsa_sk_size));
+static_assert(ARRAY_SIZE(ml_dsa_pk_size) == ARRAY_SIZE(ml_dsa_sk_size_masked));
+static_assert(ARRAY_SIZE(ml_dsa_pk_size) == ARRAY_SIZE(ml_dsa_sig_size));
+
+/* SLH-DSA n-values and signature sizes (indexed by param_set - 1) */
+const u32 slhdsa_n[12] = {
+       16, 16, 24, 24, 32, 32,         /* SHAKE 128s/f, 192s/f, 256s/f */
+       16, 16, 24, 24, 32, 32,         /* SHA2 128s/f, 192s/f, 256s/f */
+};
+
+const u32 slhdsa_sig_size[12] = {
+       7856,  17088, 16224, 35664, 29792, 49856,       /* SHAKE */
+       7856,  17088, 16224, 35664, 29792, 49856,       /* SHA2 */
+};
+
+static_assert(ARRAY_SIZE(slhdsa_n) == ARRAY_SIZE(slhdsa_sig_size));
diff --git a/drivers/crypto/cmh/cmh_qse.c b/drivers/crypto/cmh/cmh_qse.c
new file mode 100644
index 000000000000..257dc3ee29a8
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_qse.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- QSE Core VCQ Builders
+ *
+ * VCQ builder functions for ML-KEM and ML-DSA commands (plain and masked).
+ * Each function populates a single vcq_cmd slot.  Callers assemble
+ * complete VCQs with header + command(s) + flush, then submit via
+ * cmh_tm_submit_sync().
+ */
+
+#include <linux/string.h>
+
+#include "cmh_sys.h"
+
+/* -- QSE flush -- */
+
+/**
+ * vcq_add_qse_flush() - Build a QSE flush VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ */
+void vcq_add_qse_flush(struct vcq_cmd *slot, u32 core_id)
+{
+       vcq_add_flush(slot, core_id);
+}
+
+/* -- ML-KEM -- */
+
+/**
+ * vcq_add_qse_ml_kem_keygen() - Build an ML-KEM key generation VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @k: ML-KEM security parameter (k = 2, 3, or 4)
+ * @flags: Command flags
+ * @seed: DMA address of seed input buffer
+ * @z: DMA address of implicit rejection value buffer
+ * @ek: DMA address of encapsulation key output buffer
+ * @dk: DMA address of decapsulation key output buffer
+ * @dk_type: Decapsulation key datastore type
+ * @masked: Use masked (side-channel protected) variant
+ */
+void vcq_add_qse_ml_kem_keygen(struct vcq_cmd *slot, u32 core_id, u32 k, u32 flags,
+                              u64 seed, u64 z, u64 ek, u64 dk, u32 dk_type,
+                              bool masked)
+{
+       u32 cmd_id = masked ? QSE_CMD_ML_KEM_KEYGEN_MASKED
+                           : QSE_CMD_ML_KEM_KEYGEN;
+
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, cmd_id);
+       slot->hwc.qse.cmd_ml_kem_keygen.k = k;
+       slot->hwc.qse.cmd_ml_kem_keygen.flags = flags;
+       slot->hwc.qse.cmd_ml_kem_keygen.seed = seed;
+       slot->hwc.qse.cmd_ml_kem_keygen.z = z;
+       slot->hwc.qse.cmd_ml_kem_keygen.ek = ek;
+       slot->hwc.qse.cmd_ml_kem_keygen.dk = dk;
+       slot->hwc.qse.cmd_ml_kem_keygen.dk_type = dk_type;
+}
+
+/**
+ * vcq_add_qse_ml_kem_enc() - Build an ML-KEM encapsulation VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @k: ML-KEM security parameter (k = 2, 3, or 4)
+ * @flags: Command flags
+ * @coin: DMA address of encapsulation coin/randomness buffer
+ * @ek: DMA address of encapsulation key input buffer
+ * @ct: DMA address of ciphertext output buffer
+ * @ss: DMA address of shared secret output buffer
+ * @ss_type: Shared secret datastore type
+ * @masked: Use masked (side-channel protected) variant
+ */
+void vcq_add_qse_ml_kem_enc(struct vcq_cmd *slot, u32 core_id, u32 k, u32 flags,
+                           u64 coin, u64 ek, u64 ct, u64 ss, u32 ss_type,
+                           bool masked)
+{
+       u32 cmd_id = masked ? QSE_CMD_ML_KEM_ENC_MASKED
+                           : QSE_CMD_ML_KEM_ENC;
+
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, cmd_id);
+       slot->hwc.qse.cmd_ml_kem_enc.k = k;
+       slot->hwc.qse.cmd_ml_kem_enc.flags = flags;
+       slot->hwc.qse.cmd_ml_kem_enc.coin = coin;
+       slot->hwc.qse.cmd_ml_kem_enc.ek = ek;
+       slot->hwc.qse.cmd_ml_kem_enc.ct = ct;
+       slot->hwc.qse.cmd_ml_kem_enc.ss = ss;
+       slot->hwc.qse.cmd_ml_kem_enc.ss_type = ss_type;
+}
+
+/**
+ * vcq_add_qse_ml_kem_dec() - Build an ML-KEM decapsulation VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @k: ML-KEM security parameter (k = 2, 3, or 4)
+ * @flags: Command flags
+ * @ct: DMA address of ciphertext input buffer
+ * @dk: DMA address of decapsulation key input buffer
+ * @ss: DMA address of shared secret output buffer
+ * @ss_type: Shared secret datastore type
+ * @masked: Use masked (side-channel protected) variant
+ */
+void vcq_add_qse_ml_kem_dec(struct vcq_cmd *slot, u32 core_id, u32 k, u32 flags,
+                           u64 ct, u64 dk, u64 ss, u32 ss_type,
+                           bool masked)
+{
+       u32 cmd_id = masked ? QSE_CMD_ML_KEM_DEC_MASKED
+                           : QSE_CMD_ML_KEM_DEC;
+
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, cmd_id);
+       slot->hwc.qse.cmd_ml_kem_dec.k = k;
+       slot->hwc.qse.cmd_ml_kem_dec.flags = flags;
+       slot->hwc.qse.cmd_ml_kem_dec.ct = ct;
+       slot->hwc.qse.cmd_ml_kem_dec.dk = dk;
+       slot->hwc.qse.cmd_ml_kem_dec.ss = ss;
+       slot->hwc.qse.cmd_ml_kem_dec.ss_type = ss_type;
+}
+
+/* -- ML-DSA -- */
+
+/**
+ * vcq_add_qse_ml_dsa_keygen() - Build an ML-DSA key generation VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @mode: ML-DSA mode (44, 65, or 87)
+ * @flags: Command flags
+ * @seed: DMA address of seed input buffer
+ * @pk: DMA address of public key output buffer
+ * @sk: DMA address of secret key output buffer
+ * @sk_type: Secret key datastore type
+ * @masked: Use masked (side-channel protected) variant
+ */
+void vcq_add_qse_ml_dsa_keygen(struct vcq_cmd *slot, u32 core_id, u32 mode, u32 flags,
+                              u64 seed, u64 pk, u64 sk, u32 sk_type,
+                              bool masked)
+{
+       u32 cmd_id = masked ? QSE_CMD_ML_DSA_KEYGEN_MASKED
+                           : QSE_CMD_ML_DSA_KEYGEN;
+
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, cmd_id);
+       slot->hwc.qse.cmd_ml_dsa_keygen.mode = mode;
+       slot->hwc.qse.cmd_ml_dsa_keygen.flags = flags;
+       slot->hwc.qse.cmd_ml_dsa_keygen.seed = seed;
+       slot->hwc.qse.cmd_ml_dsa_keygen.pk = pk;
+       slot->hwc.qse.cmd_ml_dsa_keygen.sk = sk;
+       slot->hwc.qse.cmd_ml_dsa_keygen.sk_type = sk_type;
+}
+
+/**
+ * vcq_add_qse_ml_dsa_sign() - Build an ML-DSA signing VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @mode: ML-DSA mode (44, 65, or 87)
+ * @flags: Command flags
+ * @rnd: DMA address of signing randomness buffer
+ * @m: DMA address of message buffer
+ * @sk: DMA address of secret key buffer
+ * @sig: DMA address of signature output buffer
+ * @mlen: Length of message in bytes
+ * @masked: Use masked (side-channel protected) variant
+ */
+void vcq_add_qse_ml_dsa_sign(struct vcq_cmd *slot, u32 core_id, u32 mode, u32 flags,
+                            u64 rnd, u64 m, u64 sk, u64 sig, u32 mlen,
+                            bool masked)
+{
+       u32 cmd_id = masked ? QSE_CMD_ML_DSA_SIGN_MASKED
+                           : QSE_CMD_ML_DSA_SIGN;
+
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, cmd_id);
+       slot->hwc.qse.cmd_ml_dsa_sign.mode = mode;
+       slot->hwc.qse.cmd_ml_dsa_sign.flags = flags;
+       slot->hwc.qse.cmd_ml_dsa_sign.rnd = rnd;
+       slot->hwc.qse.cmd_ml_dsa_sign.m = m;
+       slot->hwc.qse.cmd_ml_dsa_sign.sk = sk;
+       slot->hwc.qse.cmd_ml_dsa_sign.sig = sig;
+       slot->hwc.qse.cmd_ml_dsa_sign.mlen = mlen;
+}
+
+/**
+ * vcq_add_qse_ml_dsa_verify() - Build an ML-DSA signature verify VCQ command
+ * @slot: VCQ command slot to populate
+ * @core_id: Hardware core ID for dispatch
+ * @mode: ML-DSA mode (44, 65, or 87)
+ * @flags: Command flags
+ * @m: DMA address of message buffer
+ * @pk: DMA address of public key buffer
+ * @sig: DMA address of signature buffer to verify
+ * @mlen: Length of message in bytes
+ */
+void vcq_add_qse_ml_dsa_verify(struct vcq_cmd *slot, u32 core_id, u32 mode, u32 flags,
+                              u64 m, u64 pk, u64 sig, u32 mlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, QSE_CMD_ML_DSA_VERIFY);
+       slot->hwc.qse.cmd_ml_dsa_verify.mode = mode;
+       slot->hwc.qse.cmd_ml_dsa_verify.flags = flags;
+       slot->hwc.qse.cmd_ml_dsa_verify.m = m;
+       slot->hwc.qse.cmd_ml_dsa_verify.pk = pk;
+       slot->hwc.qse.cmd_ml_dsa_verify.sig = sig;
+       slot->hwc.qse.cmd_ml_dsa_verify.mlen = mlen;
+}
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 17/19] Documentation: ioctl: add CMH ioctl documentation and register 'J'
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Add Documentation/userspace-api/ioctl/cmh_mgmt.rst documenting the
ioctl commands on the /dev/cmh_mgmt misc device for the CRI
CryptoManager Hub (CMH) hardware crypto accelerator driver.  Covers
key management, KIC key derivation, PKE (RSA, ECDSA, ECDH, EdDSA),
PQC (ML-KEM, ML-DSA, SLH-DSA), SM2, EAC, and DRBG.

Register ioctl magic number 'J' (0x4A) in ioctl-number.rst.  The
driver uses ioctls 0x01-0x40.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 .../userspace-api/ioctl/cmh_mgmt.rst          | 941 ++++++++++++++++++
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 2 files changed, 942 insertions(+)
 create mode 100644 Documentation/userspace-api/ioctl/cmh_mgmt.rst

diff --git a/Documentation/userspace-api/ioctl/cmh_mgmt.rst b/Documentation/userspace-api/ioctl/cmh_mgmt.rst
new file mode 100644
index 000000000000..b0968ba6b153
--- /dev/null
+++ b/Documentation/userspace-api/ioctl/cmh_mgmt.rst
@@ -0,0 +1,941 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================================
+CMH Key Management ioctl Interface (cmh_mgmt)
+=============================================
+
+:Author: Cryptography Research, Inc. (CRI)
+:Maintainer: linux-crypto@vger.kernel.org
+
+Introduction
+============
+
+The ``/dev/cmh_mgmt`` character device provides user-space access to key
+management, key derivation, public-key, and post-quantum cryptographic
+operations on the CryptoManager Hub (CMH) hardware accelerator.
+
+The device is created by the ``cmh`` kernel module as a ``misc_device``.
+All operations are synchronous -- the ioctl blocks until the hardware
+completes.  Opening the device requires ``CAP_SYS_ADMIN``.
+
+All ioctl argument structures are versioned: user space sets the
+``version`` field to ``CMH_MGMT_V1`` (currently 1).  This allows the
+driver to extend structures in the future without breaking the ABI.
+
+Data types and ioctl numbers are defined in
+``<uapi/linux/cmh_mgmt_ioctl.h>``.  The ioctl type letter is ``'J'``
+(0x4A).
+
+Error Handling
+==============
+
+Unless otherwise noted, all ioctls return 0 on success and a negative
+errno on failure.  Common error codes:
+
+========== =============================================================
+``EINVAL`` Invalid ``version`` field, unsupported parameter, or
+           out-of-range length.
+``EFAULT`` Failed to copy data to/from user space.
+``ENOMEM`` Kernel memory allocation failed.
+``EIO``    Hardware returned an error (eSW command failure).
+``ENOENT`` Key not found (``KEY_FIND``, ``KEY_LIST``).
+========== =============================================================
+
+Datastore Concepts
+==================
+
+The CMH hardware maintains an embedded datastore managed by the eSW
+firmware.  Objects in the datastore are identified by a 64-bit reference
+(``ref``) and optionally by a 64-bit Content ID (``cid``).
+
+Two storage classes exist:
+
+**Temporary (SYS_REF_TEMP)**
+  Lifetime is scoped to a single mailbox slot.  The eSW firmware
+  reclaims the object when the slot is reused.  Used for raw-key
+  provisioning via ``KEY_NEW`` + ``KEY_WRITE``.
+
+**Persistent (SYS_REF_PERSIST)**
+  Survives across mailbox slots.  Requires explicit deletion via
+  ``KEY_DELETE``.  Identified by CID; resolved to a per-mailbox ref
+  via ``KEY_FIND``.
+
+Mailbox Dispatch
+================
+
+All ``/dev/cmh_mgmt`` ioctls are submitted on a single management
+mailbox.  This is a structural requirement of the eSW datastore model,
+not a tunable:
+
+* Datastore access control is **per-mailbox**.  ``KEY_NEW`` grants the
+  creating mailbox read/write/execute access; other mailboxes have none
+  until granted.  The returned 64-bit ``ref`` encodes a randomised
+  offset and does **not** carry the owning mailbox, so an operation that
+  receives only a ``ref`` (``KEY_GRANT``, ``KEY_READ``, ``KEY_DELETE``,
+  ``DS_EXPORT``) cannot itself determine which mailbox owns the object.
+  Using one fixed management mailbox guarantees that a key's create,
+  modify, grant, read and hardware-held-key compute steps all share the
+  mailbox that holds its access rights, without exposing mailbox
+  identity in the UABI.  User space may still widen a key's access to
+  additional mailboxes via ``KEY_GRANT``.
+
+* The eSW ``SYS_REF_TEMP`` scratch store is per-mailbox and persists
+  across ioctl calls, so a multi-step flow that derives into
+  ``SYS_REF_TEMP`` (for example a ``KIC_*`` derivation) and later
+  consumes it (``DS_EXPORT`` with ``wrap_key = SYS_REF_TEMP``) requires
+  both calls to use the same mailbox.
+
+Per-core ``cri,mbx`` device-tree affinity applies to the *stateless*
+in-kernel crypto API path, which carries no datastore state between
+requests and is balanced across mailboxes by the driver.
+
+Key Types
+=========
+
+The ``ds_type`` field in ``KEY_NEW`` and ``KEY_WRITE`` selects the
+datastore object type.  Values are defined as ``CMH_DS_*`` constants:
+
+=================================  =====  ==============================
+Constant                           Value  Description
+=================================  =====  ==============================
+``CMH_DS_RAW_VALUE``               1      Raw byte array
+``CMH_DS_AES_KEY``                 2      AES key (128/192/256-bit)
+``CMH_DS_AES_XTS_KEY``             3      AES-XTS key (256/512-bit)
+``CMH_DS_HMAC_KEY``                4      HMAC key
+``CMH_DS_KMAC_KEY``                5      KMAC key
+``CMH_DS_SM4_KEY``                 6      SM4 key (128-bit)
+``CMH_DS_CHACHA20_KEY``            7      ChaCha20 key (256-bit)
+``CMH_DS_RSA_PRIV_KEY``            10     RSA private key
+``CMH_DS_RSA_PUB_KEY``             11     RSA public key
+``CMH_DS_RSA_CRT_KEY``             12     RSA CRT private key
+``CMH_DS_ECDSA_PRIV_KEY``          13     ECDSA private key
+``CMH_DS_ECDSA_PUB_KEY``           14     ECDSA public key
+``CMH_DS_ECDH_PRIV_KEY``           15     ECDH private key
+``CMH_DS_EDDSA_PRIV_KEY``          16     EdDSA private key
+``CMH_DS_SHARED_SECRET``           17     Shared secret
+``CMH_DS_SM2_PRIV_KEY``            18     SM2 private key
+``CMH_DS_ML_KEM_DK``               20     ML-KEM decapsulation key
+``CMH_DS_ML_DSA_SK``               21     ML-DSA secret key
+``CMH_DS_SLHDSA_SK``               25     SLH-DSA secret key
+=================================  =====  ==============================
+
+Key Flags
+=========
+
+The ``flags`` field in ``KEY_NEW`` and ``KEY_WRITE`` is a bitmask:
+
+==================  ===========  ========================================
+Flag                Bit          Description
+==================  ===========  ========================================
+``CMH_FLAG_PT``     16           Key can be read as plaintext
+``CMH_FLAG_XC``     17           Key can be exported over XC bus
+``CMH_FLAG_SCA``    18           SCA key stored in 2 shares
+==================  ===========  ========================================
+
+Elliptic Curve IDs
+==================
+
+Curve identifiers for PKE operations (``curve`` field):
+
+==========================  =====
+Constant                    Value
+==========================  =====
+``CMH_CURVE_P192``          0x01
+``CMH_CURVE_P224``          0x02
+``CMH_CURVE_P256``          0x03
+``CMH_CURVE_P384``          0x04
+``CMH_CURVE_P521``          0x05
+``CMH_CURVE_SECP256K1``     0x07
+``CMH_CURVE_BP192R1``       0x11
+``CMH_CURVE_BP224R1``       0x12
+``CMH_CURVE_BP256R1``       0x13
+``CMH_CURVE_BP320R1``       0x14
+``CMH_CURVE_BP384R1``       0x15
+``CMH_CURVE_BP512R1``       0x16
+``CMH_CURVE_SM2``           0x18
+``CMH_CURVE_25519``         0x21
+``CMH_CURVE_448``           0x22
+==========================  =====
+
+Key Management ioctls
+=====================
+
+CMH_IOCTL_KEY_NEW
+-----------------
+
+Create a new empty datastore object.
+
+:Direction: ``_IOWR``
+:Number: 0x01
+:Argument: ``struct cmh_ioctl_key_new``
+
+::
+
+  struct cmh_ioctl_key_new {
+      __u32 version;     /* must be CMH_MGMT_V1 */
+      __u32 ds_type;     /* CMH_DS_* key type */
+      __u32 len;         /* key length in bytes */
+      __u32 flags;       /* CMH_FLAG_* */
+      __u64 cid;         /* caller ID (name) for the key */
+      __u64 ref;         /* [out] key reference */
+  };
+
+The returned ``ref`` is used in subsequent ``KEY_WRITE``, ``KEY_READ``,
+and crypto operation ioctls.
+
+CMH_IOCTL_KEY_NEW_RANDOM
+------------------------
+
+Create a new datastore object filled with hardware-generated random data.
+
+:Direction: ``_IOWR``
+:Number: 0x0B
+:Argument: ``struct cmh_ioctl_key_new``
+
+Same structure as ``KEY_NEW``.  The hardware DRBG fills the object with
+``len`` random bytes.
+
+CMH_IOCTL_KEY_WRITE
+-------------------
+
+Write key material into a previously created datastore object.
+
+:Direction: ``_IOW``
+:Number: 0x02
+:Argument: ``struct cmh_ioctl_key_write``
+
+::
+
+  struct cmh_ioctl_key_write {
+      __u32 version;
+      __u32 len;         /* key data length */
+      __u32 ds_type;     /* CMH_DS_* key type */
+      __u32 flags;       /* CMH_FLAG_* */
+      __u64 ref;         /* key reference from KEY_NEW */
+      __u64 wrap_key;    /* wrapping key ref (CMH_REF_NONE = plaintext) */
+      __u64 data;        /* user-space pointer to key material */
+  };
+
+If ``wrap_key`` is ``CMH_REF_NONE`` (0), key material is written in
+plaintext.  Otherwise, the data is unwrapped using the specified
+wrapping key.
+
+CMH_IOCTL_KEY_READ
+------------------
+
+Read key material from a datastore object.
+
+:Direction: ``_IOWR``
+:Number: 0x03
+:Argument: ``struct cmh_ioctl_key_read``
+
+::
+
+  struct cmh_ioctl_key_read {
+      __u32 version;
+      __u32 len;         /* buffer length */
+      __u64 ref;         /* key reference */
+      __u64 wrap_key;    /* wrapping key ref (CMH_REF_NONE = plaintext) */
+      __u64 data;        /* user-space pointer to output buffer */
+      __u32 out_len;     /* [out] actual bytes written */
+      __u32 __reserved;
+  };
+
+Plaintext reads require the ``CMH_FLAG_PT`` attribute on the key.
+The eSW prepends a 16-byte header (``CMH_SYS_WRAP_HDR_SIZE``) even
+for plaintext reads; the output buffer must accommodate this.  The
+output overhead is ``CMH_DS_EXPORT_OVERHEAD_PLAIN`` (16 bytes) for
+plaintext reads and ``CMH_DS_EXPORT_OVERHEAD_WRAPPED`` (48 bytes:
+16-byte header + 16-byte nonce + 16-byte tag) for wrapped reads.
+
+CMH_IOCTL_KEY_FIND
+------------------
+
+Resolve a Content ID to a datastore reference.
+
+:Direction: ``_IOWR``
+:Number: 0x04
+:Argument: ``struct cmh_ioctl_key_find``
+
+::
+
+  struct cmh_ioctl_key_find {
+      __u32 version;
+      __u32 __reserved;
+      __u64 cid;         /* caller ID to search for */
+      __u64 ref;         /* [out] resolved key reference */
+      __u32 len;         /* [out] key length */
+      __u32 type;        /* [out] key type */
+  };
+
+Returns ``-ENOENT`` if no object with the given CID exists.
+
+CMH_IOCTL_KEY_LIST
+------------------
+
+Iterate datastore objects.
+
+:Direction: ``_IOWR``
+:Number: 0x0E
+:Argument: ``struct cmh_ioctl_key_list``
+
+::
+
+  struct cmh_ioctl_key_list {
+      __u32 version;
+      __u32 __reserved;
+      __u64 start_ref;   /* starting DS reference (0 = first) */
+      __u64 ref;         /* [out] object reference */
+      __u64 cid;         /* [out] caller ID */
+      __u32 len;         /* [out] object length */
+      __u32 type;        /* [out] object type */
+  };
+
+Pass ``start_ref=0`` to begin from the first object.  On return, pass
+the returned ``ref`` as ``start_ref`` in the next call.  Iteration ends
+when ``ref == 0``.
+
+CMH_IOCTL_KEY_GRANT
+-------------------
+
+Set per-mailbox access permissions on a datastore object.
+
+:Direction: ``_IOW``
+:Number: 0x05
+:Argument: ``struct cmh_ioctl_key_grant``
+
+::
+
+  struct cmh_ioctl_key_grant {
+      __u32 version;
+      __u32 __reserved;
+      __u64 ref;         /* key reference */
+      __u64 read;        /* per-MBX read permission bitfield */
+      __u64 write;       /* per-MBX write permission bitfield */
+      __u64 execute;     /* per-MBX execute permission bitfield */
+  };
+
+CMH_IOCTL_KEY_DELETE
+--------------------
+
+Delete a datastore object (persistent keys only).
+
+:Direction: ``_IOW``
+:Number: 0x06
+:Argument: ``struct cmh_ioctl_key_grant``
+
+Uses the same structure as ``KEY_GRANT``; only the ``ref`` field is
+used.
+
+Datastore Export/Import ioctls
+==============================
+
+CMH_IOCTL_DS_EXPORT
+-------------------
+
+Export the entire datastore as an encrypted blob.
+
+:Direction: ``_IOWR``
+:Number: 0x07
+:Argument: ``struct cmh_ioctl_ds_export``
+
+::
+
+  struct cmh_ioctl_ds_export {
+      __u32 version;
+      __u32 len;         /* buffer length */
+      __u64 cid;         /* caller ID for response tagging */
+      __u64 wrap_key;    /* wrapping key ref (CMH_REF_NONE = plaintext) */
+      __u64 data;        /* user-space pointer to output buffer */
+      __u32 out_len;     /* [out] actual bytes written */
+      __u32 __reserved;
+  };
+
+CMH_IOCTL_DS_IMPORT
+-------------------
+
+Import a previously exported datastore blob.
+
+:Direction: ``_IOW``
+:Number: 0x08
+:Argument: ``struct cmh_ioctl_ds_import``
+
+::
+
+  struct cmh_ioctl_ds_import {
+      __u32 version;
+      __u32 len;         /* blob length */
+      __u64 wrap_key;    /* wrapping key ref (CMH_REF_NONE = plaintext) */
+      __u64 data;        /* user-space pointer to import blob */
+  };
+
+Key Derivation ioctls (KIC)
+===========================
+
+The Key Initialization Core (KIC) provides hardware key derivation from
+OTP-provisioned base keys.  Up to 8 base keys are available
+(``CMH_KIC_KEY1`` through ``CMH_KIC_KEY8``).
+
+CMH_IOCTL_KIC_HKDF1
+--------------------
+
+HKDF-based key derivation (single-step, label only).
+
+:Direction: ``_IOWR``
+:Number: 0x09
+:Argument: ``struct cmh_ioctl_kic_hkdf1``
+
+::
+
+  struct cmh_ioctl_kic_hkdf1 {
+      __u32 version;
+      __u32 key_len;     /* output key length */
+      __u64 base_key;    /* KIC base key reference */
+      __u64 cid;         /* CID for the new DS entry */
+      __u64 label;       /* user-space pointer to label data */
+      __u32 label_len;   /* label length in bytes */
+      __u32 flags;       /* CMH_KIC_FLAG_* */
+      __u64 ref;         /* [out] derived key reference */
+  };
+
+If ``CMH_KIC_FLAG_TEMP`` is set, the result is stored in the temporary
+datastore (not persistent).
+
+CMH_IOCTL_KIC_HKDF2
+--------------------
+
+HKDF-based key derivation (two-step, with salt key).
+
+:Direction: ``_IOWR``
+:Number: 0x0A
+:Argument: ``struct cmh_ioctl_kic_hkdf2``
+
+::
+
+  struct cmh_ioctl_kic_hkdf2 {
+      __u32 version;
+      __u32 key_len;
+      __u64 base_key;
+      __u64 salt_key;    /* salt key reference (CMH_REF_NONE = no salt) */
+      __u64 cid;
+      __u64 label;
+      __u32 label_len;
+      __u32 flags;
+      __u64 ref;         /* [out] derived key reference */
+  };
+
+CMH_IOCTL_KIC_AES_CMAC_KDF
+---------------------------
+
+AES-CMAC-based key derivation (NIST SP 800-108).
+
+:Direction: ``_IOWR``
+:Number: 0x0C
+:Argument: ``struct cmh_ioctl_kic_aes_cmac_kdf``
+
+::
+
+  struct cmh_ioctl_kic_aes_cmac_kdf {
+      __u32 version;
+      __u32 key_len;     /* base & output key length (must be 32) */
+      __u64 base_key;
+      __u64 cid;
+      __u64 label;
+      __u32 label_len;
+      __u32 flags;
+      __u64 ref;         /* [out] derived key reference */
+  };
+
+CMH_IOCTL_KIC_DKEK_DERIVE
+--------------------------
+
+Derive a Device Key Encryption Key (DKEK) for secure key export.
+
+:Direction: ``_IOWR``
+:Number: 0x0D
+:Argument: ``struct cmh_ioctl_kic_dkek_derive``
+
+::
+
+  struct cmh_ioctl_kic_dkek_derive {
+      __u32 version;
+      __u32 host_id;     /* target host ID (0 = caller's own) */
+      __u64 base_key;
+      __u64 cid;
+      __u64 metadata;    /* user-space pointer to metadata */
+      __u32 metadata_len;
+      __u32 flags;
+      __u64 ref;         /* [out] derived KEK reference */
+  };
+
+PKE (Public Key Engine) ioctls
+==============================
+
+RSA Operations
+--------------
+
+CMH_IOCTL_PKE_RSA_ENC
+~~~~~~~~~~~~~~~~~~~~~~
+
+RSA public-key encryption.
+
+:Direction: ``_IOWR``
+:Number: 0x10
+:Argument: ``struct cmh_ioctl_pke_rsa_enc``
+
+The public key (e, n) is passed as raw user-space buffers.
+
+CMH_IOCTL_PKE_RSA_DEC
+~~~~~~~~~~~~~~~~~~~~~~
+
+RSA private-key decryption using a datastore key reference.
+
+:Direction: ``_IOWR``
+:Number: 0x11
+:Argument: ``struct cmh_ioctl_pke_rsa_dec``
+
+CMH_IOCTL_PKE_RSA_CRT_DEC
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+RSA CRT private-key decryption (faster, uses CRT key format).
+
+:Direction: ``_IOWR``
+:Number: 0x12
+:Argument: ``struct cmh_ioctl_pke_rsa_crt_dec``
+
+CMH_IOCTL_PKE_RSA_KEYGEN
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generate an RSA key pair in hardware.
+
+:Direction: ``_IOWR``
+:Number: 0x13
+:Argument: ``struct cmh_ioctl_pke_rsa_keygen``
+
+Returns private key and optional CRT key as datastore references.
+The modulus is written back to user space.
+
+ECDSA Operations
+----------------
+
+CMH_IOCTL_PKE_ECDSA_SIGN
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ECDSA signature generation using a datastore private key.
+
+:Direction: ``_IOWR``
+:Number: 0x14
+:Argument: ``struct cmh_ioctl_pke_ecdsa_sign``
+
+CMH_IOCTL_PKE_ECDH
+~~~~~~~~~~~~~~~~~~~
+
+Compute ECDH shared secret from a peer public key and a datastore
+private key.
+
+:Direction: ``_IOWR``
+:Number: 0x16
+:Argument: ``struct cmh_ioctl_pke_ecdh``
+
+If ``CMH_PKE_FLAG_DS_RESULT`` is set, the shared secret is stored in
+the datastore and a reference is returned instead of raw bytes.
+
+CMH_IOCTL_PKE_ECDH_KEYGEN
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Derive a public key from a datastore private key.
+
+:Direction: ``_IOWR``
+:Number: 0x17
+:Argument: ``struct cmh_ioctl_pke_ecdh_keygen``
+
+EdDSA Operations
+----------------
+
+CMH_IOCTL_PKE_EDDSA_SIGN
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+EdDSA (Ed25519/Ed448) signature generation.
+
+:Direction: ``_IOWR``
+:Number: 0x18
+:Argument: ``struct cmh_ioctl_pke_eddsa_sign``
+
+Note: the ``digest`` field is the full message (pure EdDSA), not a
+pre-computed hash.
+
+CMH_IOCTL_PKE_EDDSA_VERIFY
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+EdDSA signature verification.
+
+:Direction: ``_IOW``
+:Number: 0x19
+:Argument: ``struct cmh_ioctl_pke_eddsa_verify``
+
+EC Key Management
+-----------------
+
+CMH_IOCTL_PKE_EC_KEYGEN
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generate an EC private key in the hardware datastore.
+
+:Direction: ``_IOWR``
+:Number: 0x1A
+:Argument: ``struct cmh_ioctl_pke_ec_keygen``
+
+CMH_IOCTL_PKE_EC_PUBGEN
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Derive the public key from a datastore private key.
+
+:Direction: ``_IOWR``
+:Number: 0x1B
+:Argument: ``struct cmh_ioctl_pke_ec_pubgen``
+
+CMH_IOCTL_PKE_EDDSA_KEYGEN_SCA
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generate a 2-share SCA-protected Ed448 private key.
+
+:Direction: ``_IOWR``
+:Number: 0x1C
+:Argument: ``struct cmh_ioctl_pke_eddsa_keygen_sca``
+
+Post-Quantum Cryptography (PQC) ioctls
+=======================================
+
+PQC operations support the following flags in the ``flags`` field:
+
+============================  ====  ====================================
+Flag                          Bit   Description
+============================  ====  ====================================
+``CMH_QSE_FLAG_MASKED``       0     Use masked (SCA-resistant) HW path
+``CMH_QSE_FLAG_DS_REF``       1     Store key output in DS, return ref
+``CMH_QSE_FLAG_HW_RNG``       2     Use HW RNG for seed/randomness
+============================  ====  ====================================
+
+ML-KEM (FIPS 203)
+-----------------
+
+CMH_IOCTL_ML_KEM_KEYGEN
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generate an ML-KEM key pair.
+
+:Direction: ``_IOWR``
+:Number: 0x20
+:Argument: ``struct cmh_ioctl_ml_kem_keygen``
+
+Security parameter ``k`` selects the strength: 2 (ML-KEM-512),
+3 (ML-KEM-768), or 4 (ML-KEM-1024).
+
+CMH_IOCTL_ML_KEM_ENC
+~~~~~~~~~~~~~~~~~~~~~
+
+ML-KEM encapsulation.  Produces ciphertext and shared secret.
+
+:Direction: ``_IOWR``
+:Number: 0x21
+:Argument: ``struct cmh_ioctl_ml_kem_enc``
+
+CMH_IOCTL_ML_KEM_DEC
+~~~~~~~~~~~~~~~~~~~~~
+
+ML-KEM decapsulation.  Recovers shared secret from ciphertext.
+
+:Direction: ``_IOWR``
+:Number: 0x22
+:Argument: ``struct cmh_ioctl_ml_kem_dec``
+
+ML-DSA (FIPS 204)
+-----------------
+
+CMH_IOCTL_ML_DSA_KEYGEN
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generate an ML-DSA key pair.
+
+:Direction: ``_IOWR``
+:Number: 0x23
+:Argument: ``struct cmh_ioctl_ml_dsa_keygen``
+
+Security parameter ``mode`` selects the strength: 2 (ML-DSA-44),
+3 (ML-DSA-65), or 5 (ML-DSA-87).
+
+.. note::
+
+   When ``CMH_QSE_FLAG_DS_REF`` keeps the secret key in the datastore,
+   the public key returned in ``pk`` is the only copy: there is no
+   operation to derive the public key from the secret-key reference
+   for ML-DSA.  User space must persist ``pk`` at keygen time.
+
+CMH_IOCTL_ML_DSA_SIGN
+~~~~~~~~~~~~~~~~~~~~~~
+
+ML-DSA signature generation.
+
+:Direction: ``_IOWR``
+:Number: 0x24
+:Argument: ``struct cmh_ioctl_ml_dsa_sign``
+
+If ``mlen`` is set to ``CMH_ML_DSA_MLEN_EXTERNAL_MU`` (0xFFFFFFFF),
+the ``m`` pointer is interpreted as a 64-byte pre-hashed mu value
+(ExternalMu mode).
+
+CMH_IOCTL_SLHDSA_KEYGEN
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Generate an SLH-DSA key pair.
+
+:Direction: ``_IOWR``
+:Number: 0x28
+:Argument: ``struct cmh_ioctl_slhdsa_keygen``
+
+.. note::
+
+   When ``CMH_QSE_FLAG_DS_REF`` keeps the secret key in the datastore,
+   the public key returned in ``pk`` is the only copy: there is no
+   operation to derive the public key from the secret-key reference
+   for SLH-DSA.  User space must persist ``pk`` at keygen time.
+
+CMH_IOCTL_SLHDSA_SIGN
+~~~~~~~~~~~~~~~~~~~~~~
+
+SLH-DSA signature generation (pure mode).
+
+:Direction: ``_IOWR``
+:Number: 0x29
+:Argument: ``struct cmh_ioctl_slhdsa_sign``
+
+CMH_IOCTL_SLHDSA_SIGN_PREHASH
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+SLH-DSA pre-hash signature generation.
+
+:Direction: ``_IOWR``
+:Number: 0x2D
+:Argument: ``struct cmh_ioctl_slhdsa_sign_prehash``
+
+The ``prehash_algo`` field selects the hash algorithm
+(``CMH_SLHDSA_PREHASH_SHA256``, etc.).
+
+CMH_IOCTL_SM2_ENC_POINT
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Direction: ``_IOWR``
+:Number: 0x33
+:Argument: ``struct cmh_ioctl_sm2_enc_point``
+
+CMH_IOCTL_SM2_ENC_HASH
+~~~~~~~~~~~~~~~~~~~~~~~
+
+:Direction: ``_IOWR``
+:Number: 0x37
+:Argument: ``struct cmh_ioctl_sm2_enc_hash``
+
+CMH_IOCTL_SM2_DEC_POINT
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Direction: ``_IOWR``
+:Number: 0x32
+:Argument: ``struct cmh_ioctl_sm2_dec_point``
+
+CMH_IOCTL_SM2_DEC_HASH
+~~~~~~~~~~~~~~~~~~~~~~~
+
+:Direction: ``_IOWR``
+:Number: 0x36
+:Argument: ``struct cmh_ioctl_sm2_dec_hash``
+
+SM2 Key Exchange (GM/T 0003.3)
+------------------------------
+
+The key exchange protocol is a multi-step flow:
+
+1. ``EC_KEYGEN(CMH_CURVE_SM2)`` -- generate a long-lived private key.
+2. ``EC_PUBGEN`` -- derive the public key.
+3. ``SM2_ID_DIGEST`` -- compute the SM3 identity digest (ZA).
+4. ``SM2_ECDH_KEYGEN`` -- generate an ephemeral session key.
+5. Exchange session keys with the peer.
+6. ``SM2_ECDH`` -- compute the shared point.
+7. ``SM2_ECDH_HASH`` -- derive the shared key from the shared point
+   and both parties' ZA digests.
+
+CMH_IOCTL_SM2_ECDH_KEYGEN
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Direction: ``_IOWR``
+:Number: 0x30
+:Argument: ``struct cmh_ioctl_sm2_ecdh_keygen``
+
+``nonce_len`` must be 0 or 32.  If ``nonce_len=0``, the hardware
+generates the ephemeral scalar and writes it back to the ``nonce``
+buffer.
+
+CMH_IOCTL_SM2_ECDH
+~~~~~~~~~~~~~~~~~~~
+
+:Direction: ``_IOWR``
+:Number: 0x31
+:Argument: ``struct cmh_ioctl_sm2_ecdh``
+
+If ``shared_point_ref`` points to a non-zero value, the shared point
+is kept in the datastore for use by ``SM2_ECDH_HASH``.
+
+CMH_IOCTL_SM2_ID_DIGEST
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Compute the SM3 identity digest (ZA) for a public key and identity
+string.
+
+:Direction: ``_IOWR``
+:Number: 0x34
+:Argument: ``struct cmh_ioctl_sm2_id_digest``
+
+CMH_IOCTL_SM2_ECDH_HASH
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Derive the shared key from the shared point and ZA digests.
+
+:Direction: ``_IOWR``
+:Number: 0x35
+:Argument: ``struct cmh_ioctl_sm2_ecdh_hash``
+
+.. important::
+
+   The digest fields use **absolute** ordering per GM/T 0003.3, not
+   relative own/peer ordering.  Both parties must pass:
+
+   - ``peer_id_digest`` = Z_A (initiator's digest) -- hashed first
+   - ``id_digest`` = Z_B (responder's digest) -- hashed second
+
+Hardware Management ioctls
+==========================
+
+CMH_IOCTL_EAC_READ
+-------------------
+
+Read and clear the hardware Error and Alarm Controller registers.
+
+:Direction: ``_IOWR``
+:Number: 0x0F
+:Argument: ``struct cmh_ioctl_eac_read``
+
+::
+
+  struct cmh_ioctl_eac_read {
+      __u32 version;
+      __u32 __reserved;
+      __u64 mailbox_notification;
+      __u32 hw_error;
+      __u32 hw_nmi;
+      __u32 hw_panic;
+      __u32 safety_fatal;
+      __u32 safety_notification;
+      __u32 sw_info0;
+      __u32 sw_info1;
+      __u32 sram_bank_errors[4];
+      __u32 __pad;
+  };
+
+The eSW atomically reads and clears the registers on each call.
+Successive reads show only new events since the last read.
+
+CMH_IOCTL_DRBG_CONFIG
+----------------------
+
+Configure the hardware DRBG before first use.
+
+:Direction: ``_IOW``
+:Number: 0x40
+:Argument: ``struct cmh_ioctl_drbg_config``
+
+::
+
+  struct cmh_ioctl_drbg_config {
+      __u32 version;
+      __u32 entropy_ratio;       /* CMH_DRBG_RATIO_* */
+      __u32 security_strength;   /* CMH_DRBG_STRENGTH_* */
+      __u32 __reserved;
+  };
+
+This is a management operation normally performed once at system
+startup.  Must be called before any ``hwrng`` reads or DRBG generate
+operations.
+
+ioctl Number Summary
+====================
+
+======================================  ====  ====  =========================================
+ioctl                                   Dir   Seq   Argument
+======================================  ====  ====  =========================================
+``CMH_IOCTL_KEY_NEW``                   IOWR  0x01  ``cmh_ioctl_key_new``
+``CMH_IOCTL_KEY_WRITE``                 IOW   0x02  ``cmh_ioctl_key_write``
+``CMH_IOCTL_KEY_READ``                  IOWR  0x03  ``cmh_ioctl_key_read``
+``CMH_IOCTL_KEY_FIND``                  IOWR  0x04  ``cmh_ioctl_key_find``
+``CMH_IOCTL_KEY_GRANT``                 IOW   0x05  ``cmh_ioctl_key_grant``
+``CMH_IOCTL_KEY_DELETE``                IOW   0x06  ``cmh_ioctl_key_grant``
+``CMH_IOCTL_DS_EXPORT``                 IOWR  0x07  ``cmh_ioctl_ds_export``
+``CMH_IOCTL_DS_IMPORT``                 IOW   0x08  ``cmh_ioctl_ds_import``
+``CMH_IOCTL_KIC_HKDF1``                 IOWR  0x09  ``cmh_ioctl_kic_hkdf1``
+``CMH_IOCTL_KIC_HKDF2``                 IOWR  0x0A  ``cmh_ioctl_kic_hkdf2``
+``CMH_IOCTL_KEY_NEW_RANDOM``            IOWR  0x0B  ``cmh_ioctl_key_new``
+``CMH_IOCTL_KIC_AES_CMAC_KDF``          IOWR  0x0C  ``cmh_ioctl_kic_aes_cmac_kdf``
+``CMH_IOCTL_KIC_DKEK_DERIVE``           IOWR  0x0D  ``cmh_ioctl_kic_dkek_derive``
+``CMH_IOCTL_KEY_LIST``                  IOWR  0x0E  ``cmh_ioctl_key_list``
+``CMH_IOCTL_EAC_READ``                  IOWR  0x0F  ``cmh_ioctl_eac_read``
+``CMH_IOCTL_PKE_RSA_ENC``               IOWR  0x10  ``cmh_ioctl_pke_rsa_enc``
+``CMH_IOCTL_PKE_RSA_DEC``               IOWR  0x11  ``cmh_ioctl_pke_rsa_dec``
+``CMH_IOCTL_PKE_RSA_CRT_DEC``           IOWR  0x12  ``cmh_ioctl_pke_rsa_crt_dec``
+``CMH_IOCTL_PKE_RSA_KEYGEN``            IOWR  0x13  ``cmh_ioctl_pke_rsa_keygen``
+``CMH_IOCTL_PKE_ECDSA_SIGN``            IOWR  0x14  ``cmh_ioctl_pke_ecdsa_sign``
+``CMH_IOCTL_PKE_ECDH``                  IOWR  0x16  ``cmh_ioctl_pke_ecdh``
+``CMH_IOCTL_PKE_ECDH_KEYGEN``           IOWR  0x17  ``cmh_ioctl_pke_ecdh_keygen``
+``CMH_IOCTL_PKE_EDDSA_SIGN``            IOWR  0x18  ``cmh_ioctl_pke_eddsa_sign``
+``CMH_IOCTL_PKE_EDDSA_VERIFY``          IOW   0x19  ``cmh_ioctl_pke_eddsa_verify``
+``CMH_IOCTL_PKE_EC_KEYGEN``             IOWR  0x1A  ``cmh_ioctl_pke_ec_keygen``
+``CMH_IOCTL_PKE_EC_PUBGEN``             IOWR  0x1B  ``cmh_ioctl_pke_ec_pubgen``
+``CMH_IOCTL_PKE_EDDSA_KEYGEN_SCA``      IOWR  0x1C  ``cmh_ioctl_pke_eddsa_keygen_sca``
+``CMH_IOCTL_ML_KEM_KEYGEN``             IOWR  0x20  ``cmh_ioctl_ml_kem_keygen``
+``CMH_IOCTL_ML_KEM_ENC``                IOWR  0x21  ``cmh_ioctl_ml_kem_enc``
+``CMH_IOCTL_ML_KEM_DEC``                IOWR  0x22  ``cmh_ioctl_ml_kem_dec``
+``CMH_IOCTL_ML_DSA_KEYGEN``             IOWR  0x23  ``cmh_ioctl_ml_dsa_keygen``
+``CMH_IOCTL_ML_DSA_SIGN``               IOWR  0x24  ``cmh_ioctl_ml_dsa_sign``
+``CMH_IOCTL_SLHDSA_KEYGEN``             IOWR  0x28  ``cmh_ioctl_slhdsa_keygen``
+``CMH_IOCTL_SLHDSA_SIGN``               IOWR  0x29  ``cmh_ioctl_slhdsa_sign``
+``CMH_IOCTL_SLHDSA_SIGN_PREHASH``       IOWR  0x2D  ``cmh_ioctl_slhdsa_sign_prehash``
+``CMH_IOCTL_SM2_ECDH_KEYGEN``           IOWR  0x30  ``cmh_ioctl_sm2_ecdh_keygen``
+``CMH_IOCTL_SM2_ECDH``                  IOWR  0x31  ``cmh_ioctl_sm2_ecdh``
+``CMH_IOCTL_SM2_DEC_POINT``             IOWR  0x32  ``cmh_ioctl_sm2_dec_point``
+``CMH_IOCTL_SM2_ENC_POINT``             IOWR  0x33  ``cmh_ioctl_sm2_enc_point``
+``CMH_IOCTL_SM2_ID_DIGEST``             IOWR  0x34  ``cmh_ioctl_sm2_id_digest``
+``CMH_IOCTL_SM2_ECDH_HASH``             IOWR  0x35  ``cmh_ioctl_sm2_ecdh_hash``
+``CMH_IOCTL_SM2_DEC_HASH``              IOWR  0x36  ``cmh_ioctl_sm2_dec_hash``
+``CMH_IOCTL_SM2_ENC_HASH``              IOWR  0x37  ``cmh_ioctl_sm2_enc_hash``
+``CMH_IOCTL_DRBG_CONFIG``               IOW   0x40  ``cmh_ioctl_drbg_config``
+======================================  ====  ====  =========================================
+
+Migration Plan
+==============
+
+Several ioctl commands provide operations that may gain dedicated kernel
+crypto API bindings in the future.  When those APIs land, the driver will
+register through them and the corresponding ioctls will be deprecated
+(retained for backward compatibility but no longer the primary interface):
+
+- **EdDSA** (``CMH_IOCTL_PKE_EDDSA_*``): will migrate to the kernel ``sig``
+  API once ed25519/ed448 algorithm types are accepted upstream.
+
+- **ML-KEM** (``CMH_IOCTL_ML_KEM_*``): will migrate to the kernel KEM API
+  once the in-flight KEM subsystem series lands.
+
+- **Key lifecycle** (``CMH_IOCTL_KEY_*``): will evaluate integration with
+  the kernel KEYS subsystem (trusted-keys / encrypted-keys) as a follow-up
+  series.
+
+Operations that are inherently vendor-specific (EAC Chip Authentication,
+KIC key derivation, SM2 key exchange, DRBG configuration, datastore
+export/import) will remain as ioctls permanently -- they have no
+corresponding kernel abstraction and are not expected to gain one.
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 29a08bc059dd..4a9ba12ee138 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -170,6 +170,7 @@ Code  Seq#    Include File                                             Comments
 'I'   all    linux/isdn.h                                              conflict!
 'I'   00-0F  drivers/isdn/divert/isdn_divert.h                         conflict!
 'I'   40-4F  linux/mISDNif.h                                           conflict!
+'J'   01-40  uapi/linux/cmh_mgmt_ioctl.h                               CRI CryptoManager Hub (CMH)
 'K'   all    linux/kd.h
 'L'   00-1F  linux/loop.h                                              conflict!
 'L'   10-1F  drivers/scsi/mpt3sas/mpt3sas_ctl.h                        conflict!
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 03/19] crypto: cmh - add key provisioning and management
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Add the CMH key management subsystem:

- Key provisioning: create, import, derive, and destroy hardware keys
  stored in the CMH datastore
- System object management: allocate and free CMH system objects
- Management ioctl interface (/dev/cmh_mgmt): ioctl commands
  covering key lifecycle, KIC key derivation, PKE operations (RSA,
  ECDSA, ECDH, EdDSA), PQC operations (ML-KEM, ML-DSA, SLH-DSA),
  SM2, EAC, and DRBG reseeding
- SM2 ioctl handlers: SM2 encrypt, decrypt, sign, and key exchange
  -- operations that require multi-step protocol flows not
  expressible through the standard crypto API sig interface
- UAPI header: cmh_mgmt_ioctl.h (ioctl definitions and structures)

The misc device requires CAP_SYS_ADMIN for open().

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 Documentation/ABI/testing/cmh-mgmt       |  136 ++
 drivers/crypto/cmh/Kconfig               |   19 +
 drivers/crypto/cmh/Makefile              |   11 +-
 drivers/crypto/cmh/cmh_key.c             |  164 +++
 drivers/crypto/cmh/cmh_main.c            |    9 +
 drivers/crypto/cmh/cmh_mgmt.c            | 1607 ++++++++++++++++++++++
 drivers/crypto/cmh/cmh_mgmt_pke.c        | 1100 +++++++++++++++
 drivers/crypto/cmh/cmh_mgmt_pqc.c        | 1279 +++++++++++++++++
 drivers/crypto/cmh/cmh_pke_sm2.c         |  827 +++++++++++
 drivers/crypto/cmh/cmh_sys.c             |  376 +++++
 drivers/crypto/cmh/include/cmh_key.h     |   82 ++
 drivers/crypto/cmh/include/cmh_mgmt.h    |   62 +
 drivers/crypto/cmh/include/cmh_pke.h     |  245 ++++
 drivers/crypto/cmh/include/cmh_pke_sm2.h |   30 +
 drivers/crypto/cmh/include/cmh_pqc.h     |   25 +
 drivers/crypto/cmh/include/cmh_sys.h     |  111 ++
 include/uapi/linux/cmh_mgmt_ioctl.h      |  895 ++++++++++++
 17 files changed, 6977 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/cmh-mgmt
 create mode 100644 drivers/crypto/cmh/cmh_key.c
 create mode 100644 drivers/crypto/cmh/cmh_mgmt.c
 create mode 100644 drivers/crypto/cmh/cmh_mgmt_pke.c
 create mode 100644 drivers/crypto/cmh/cmh_mgmt_pqc.c
 create mode 100644 drivers/crypto/cmh/cmh_pke_sm2.c
 create mode 100644 drivers/crypto/cmh/cmh_sys.c
 create mode 100644 drivers/crypto/cmh/include/cmh_key.h
 create mode 100644 drivers/crypto/cmh/include/cmh_mgmt.h
 create mode 100644 drivers/crypto/cmh/include/cmh_pke.h
 create mode 100644 drivers/crypto/cmh/include/cmh_pke_sm2.h
 create mode 100644 drivers/crypto/cmh/include/cmh_pqc.h
 create mode 100644 drivers/crypto/cmh/include/cmh_sys.h
 create mode 100644 include/uapi/linux/cmh_mgmt_ioctl.h

diff --git a/Documentation/ABI/testing/cmh-mgmt b/Documentation/ABI/testing/cmh-mgmt
new file mode 100644
index 000000000000..2c6fce7ae009
--- /dev/null
+++ b/Documentation/ABI/testing/cmh-mgmt
@@ -0,0 +1,136 @@
+What:          /dev/cmh_mgmt
+Date:          June 2026
+KernelVersion: 7.1
+Contact:       linux-crypto@vger.kernel.org
+Description:
+               Character device (misc) providing a management and
+               key-operations ioctl interface to the CRI CryptoManager Hub
+               hardware crypto accelerator.  Used for operations that
+               cannot be represented through the standard in-kernel
+               crypto API: datastore key CRUD, key derivation,
+               asymmetric crypto (EdDSA, SM2), and post-quantum crypto
+               (ML-KEM, ML-DSA, SLH-DSA).
+
+               The ioctl magic is 'J'.  All struct arguments are
+               versioned via a leading __u32 version field set to
+               CMH_MGMT_V1 (1).
+
+               Ioctl commands are grouped by function:
+
+               **Key Management (0x01-0x0E):**
+
+               - CMH_IOCTL_KEY_NEW (0x01): Allocate a new datastore slot.
+                 Accepts ds_type (CMH_DS_* constant), key length, flags,
+                 and caller ID.  Returns a 64-bit key reference.
+
+               - CMH_IOCTL_KEY_WRITE (0x02): Write key material into
+                 a previously allocated datastore slot.  Supports
+                 plaintext or wrapped key import via a wrapping key ref.
+
+               - CMH_IOCTL_KEY_READ (0x03): Read key material from
+                 a datastore slot, optionally wrapped.  Returns data
+                 plus a 16-byte SYS header for wrapped reads.
+
+               - CMH_IOCTL_KEY_FIND (0x04): Look up a key reference
+                 by caller ID (CID).
+
+               - CMH_IOCTL_KEY_GRANT (0x05): Grant access to a key.
+
+               - CMH_IOCTL_KEY_DELETE (0x06): Delete a datastore slot.
+
+               - CMH_IOCTL_DS_EXPORT (0x07): Export the entire datastore
+                 as an encrypted blob.
+
+               - CMH_IOCTL_DS_IMPORT (0x08): Import a previously exported
+                 datastore blob.
+
+               - CMH_IOCTL_KEY_NEW_RANDOM (0x0B): Allocate a datastore
+                 slot and fill it with hardware-generated random data.
+
+               - CMH_IOCTL_KEY_LIST (0x0E): List active datastore entries,
+                 returning CIDs, types, lengths, and flags.
+
+               **Key Derivation -- KIC (0x09-0x0D):**
+
+               - CMH_IOCTL_KIC_HKDF1 (0x09): HKDF-Extract step.
+               - CMH_IOCTL_KIC_HKDF2 (0x0A): HKDF-Expand step.
+               - CMH_IOCTL_KIC_AES_CMAC_KDF (0x0C): AES-CMAC KDF.
+               - CMH_IOCTL_KIC_DKEK_DERIVE (0x0D): DKEK derivation.
+
+               **EAC -- Error and Alarm (0x0F):**
+
+               - CMH_IOCTL_EAC_READ (0x0F): Read and clear hardware
+                 error, alarm, and safety notification registers.
+
+               **PKE -- Public Key Engine (0x10-0x1C):**
+
+               - CMH_IOCTL_PKE_RSA_ENC (0x10): RSA public-key encrypt.
+               - CMH_IOCTL_PKE_RSA_DEC (0x11): RSA private-key decrypt.
+               - CMH_IOCTL_PKE_RSA_CRT_DEC (0x12): RSA-CRT decrypt.
+               - CMH_IOCTL_PKE_RSA_KEYGEN (0x13): RSA key pair generation.
+               - CMH_IOCTL_PKE_ECDSA_SIGN (0x14): ECDSA sign.
+               - CMH_IOCTL_PKE_ECDH (0x16): ECDH shared secret.
+               - CMH_IOCTL_PKE_ECDH_KEYGEN (0x17): ECDH key pair generation.
+               - CMH_IOCTL_PKE_EDDSA_SIGN (0x18): EdDSA sign (Ed25519/Ed448).
+               - CMH_IOCTL_PKE_EDDSA_VERIFY (0x19): EdDSA verify.
+               - CMH_IOCTL_PKE_EC_KEYGEN (0x1A): EC key pair generation.
+               - CMH_IOCTL_PKE_EC_PUBGEN (0x1B): EC public key derivation.
+               - CMH_IOCTL_PKE_EDDSA_KEYGEN_SCA (0x1C): EdDSA SCA-protected
+                 key generation.
+
+               **PQC -- Post-Quantum Crypto (0x20-0x2D):**
+
+               - CMH_IOCTL_ML_KEM_KEYGEN (0x20): ML-KEM key pair generation
+                 (modes 512/768/1024).
+               - CMH_IOCTL_ML_KEM_ENC (0x21): ML-KEM encapsulation.
+               - CMH_IOCTL_ML_KEM_DEC (0x22): ML-KEM decapsulation.
+               - CMH_IOCTL_ML_DSA_KEYGEN (0x23): ML-DSA key pair generation
+                 (modes 44/65/87).
+               - CMH_IOCTL_ML_DSA_SIGN (0x24): ML-DSA sign.
+               - CMH_IOCTL_SLHDSA_KEYGEN (0x28): SLH-DSA key pair generation
+                 (12 parameter sets).
+               - CMH_IOCTL_SLHDSA_SIGN (0x29): SLH-DSA sign.
+               - CMH_IOCTL_SLHDSA_SIGN_PREHASH (0x2D): SLH-DSA prehash sign.
+
+               **SM2 Operations (0x30-0x37):**
+
+               - CMH_IOCTL_SM2_ECDH_KEYGEN (0x30): SM2 ephemeral key gen.
+               - CMH_IOCTL_SM2_ECDH (0x31): SM2 key exchange.
+               - CMH_IOCTL_SM2_DEC_POINT (0x32): SM2 decrypt (point step).
+               - CMH_IOCTL_SM2_ENC_POINT (0x33): SM2 encrypt (point step).
+               - CMH_IOCTL_SM2_ID_DIGEST (0x34): SM2 ID digest (ZA).
+               - CMH_IOCTL_SM2_ECDH_HASH (0x35): SM2 key exchange hash step.
+               - CMH_IOCTL_SM2_DEC_HASH (0x36): SM2 decrypt (hash step).
+               - CMH_IOCTL_SM2_ENC_HASH (0x37): SM2 encrypt (hash step).
+
+               The SM2 encrypt/decrypt hash-step ioctls accept payloads
+               of at most 32 bytes.  The underlying hardware KDF emits a
+               single 32-byte SM3 block, so longer messages cannot be
+               processed in a single command and are rejected with
+               -EINVAL.
+
+               **DRBG Management (0x40):**
+
+               - CMH_IOCTL_DRBG_CONFIG (0x40): Configure the hardware
+                 DRBG entropy ratio and security strength.  Normally
+                 called once at system start-up before hwrng reads.
+
+               All structs contain ``__reserved`` fields that must be
+               zero; the driver returns ``-EINVAL`` if any reserved field
+               is non-zero.  This ensures forward compatibility when
+               reserved fields gain meaning in future versions.
+
+               All ioctls return 0 on success or a negative errno on
+               failure.  Common errors:
+
+               - EINVAL:  Invalid version, parameter, key type, or
+                 non-zero reserved field.
+               - ENOENT:  Key reference not found in datastore.
+               - ENOMEM:  DMA allocation failure.
+               - EBUSY:   Hardware mailbox full.
+               - ETIMEDOUT: VCQ operation timed out.
+               - EFAULT:  Bad user-space pointer.
+
+               The ioctl UAPI header is <linux/cmh_mgmt_ioctl.h>.
+               All structures, constants, and type definitions are
+               documented in that header file.
diff --git a/drivers/crypto/cmh/Kconfig b/drivers/crypto/cmh/Kconfig
index fa5adeca2512..c607014f8fbc 100644
--- a/drivers/crypto/cmh/Kconfig
+++ b/drivers/crypto/cmh/Kconfig
@@ -44,3 +44,22 @@ config CRYPTO_DEV_CMH_DEBUG
          Useful for bringup, validation, and performance analysis.
          Not recommended for production.

+
+config CRYPTO_DEV_CMH_MGMT
+       bool "CMH management ioctl device (/dev/cmh_mgmt)"
+       depends on CRYPTO_DEV_CMH
+       default n
+       help
+         Expose /dev/cmh_mgmt, a misc device providing ioctl commands
+         for operations that have no kernel crypto API binding: hardware
+         key lifecycle (create, import, derive, destroy), KIC key
+         derivation, PQC keygen/encaps/decaps (ML-KEM, ML-DSA, SLH-DSA),
+         EdDSA sign/verify, SM2 key exchange, and DRBG
+         configuration.
+
+         The device requires CAP_SYS_ADMIN.  Disabling this option
+         removes the ioctl interface but all kernel crypto API
+         algorithms (consumed by in-kernel users and validated by the
+         crypto test manager) remain fully functional.
+
+         If unsure, say N.
diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 0a4591c9fd86..1492e575598c 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -12,7 +12,16 @@ cmh-y := \
        cmh_txn.o \
        cmh_rh.o \
        cmh_dma.o \
-       cmh_sysfs.o
+       cmh_sysfs.o \
+       cmh_key.o \
+       cmh_sys.o
+
+# Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
+cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
+       cmh_mgmt.o \
+       cmh_mgmt_pke.o \
+       cmh_mgmt_pqc.o \
+       cmh_pke_sm2.o

 ccflags-y += -I$(src)/include

diff --git a/drivers/crypto/cmh/cmh_key.c b/drivers/crypto/cmh/cmh_key.c
new file mode 100644
index 000000000000..fde8be50b25c
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_key.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Dual Key Path Implementation
+ *
+ * Two key provisioning paths are supported:
+ *
+ * Raw key:  key bytes -> stored in tfm context ->
+ *   SYS_CMD_WRITE(SYS_REF_TEMP) packed into every crypto VCQ.
+ *   The raw key buffer is DMA-mapped once at setkey time and remains
+ *   mapped for the lifetime of the transform (unmapped in destroy).
+ *
+ * Raw key DMA lifetime rationale
+ * ------------------------------
+ * Raw keys are DMA-mapped at setkey time and the mapping persists
+ * until the transform is destroyed (cmh_key_destroy).  This is a
+ * deliberate design choice, consistent with upstream HW crypto
+ * drivers (CAAM, ccree, CCP) that also map keys at setkey for
+ * transform-lifetime reuse:
+ *
+ *   - The Linux crypto framework expects setkey to prepare the
+ *     transform for repeated encrypt/decrypt calls.  Remapping the
+ *     same key on every request would add DMA API overhead per crypto
+ *     operation with no security benefit.
+ *   - On destroy, kfree_sensitive() scrubs the key buffer and the
+ *     DMA mapping is released.  For key-by-ID (persistent), the
+ *     per-MBX ref cache is zeroed with memzero_explicit().
+ *   - No key material is ever logged; dev_dbg() messages only show
+ *     CIDs (content identifiers), not key bytes.
+ *
+ * Hardware-required behaviors (not driver policy)
+ * ------------------------------------------------
+ * - SYS_REF_TEMP lifetime:  the eSW firmware reclaims temporary
+ *   datastore objects when the mailbox slot is reused.  This is a
+ *   hardware constraint; the driver packs SYS_CMD_WRITE into every
+ *   VCQ to re-provision the raw key for each operation.
+ * - Mailbox flush (SYS_CMD_FLUSH):  reclaims temp-stack space on the
+ *   target MBX.  Required by HW to prevent temp-stack exhaustion
+ *   across multi-VCQ operations.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_key.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_sys_abi.h"
+#include <uapi/linux/cmh_mgmt_ioctl.h>
+
+/**
+ * cmh_ds_type_to_core_id() - Map a datastore type to a logical core ID
+ * @ds_type: Datastore type constant (e.g. CMH_DS_AES_KEY, CMH_DS_SM4_KEY)
+ *
+ * Returns the algorithm-family identity (e.g. CORE_ID_AES = 0x03), NOT the
+ * VCQ dispatch core_id.  With multi-instance, a second AES engine dispatches
+ * at CORE_ID_AES2 (0x06) but keys are still tagged with CORE_ID_AES (0x03)
+ * -- the eSW validates against the logical identity, not the dispatch ID.
+ *
+ * Return: Logical core ID on success, CORE_ID_NUM for unknown @ds_type.
+ */
+u32 cmh_ds_type_to_core_id(u32 ds_type)
+{
+       switch (ds_type) {
+       case CMH_DS_AES_KEY:
+       case CMH_DS_AES_XTS_KEY:
+               return CORE_ID_AES;
+       case CMH_DS_SM4_KEY:
+               return CORE_ID_SM4;
+       case CMH_DS_HMAC_KEY:
+       case CMH_DS_KMAC_KEY:
+               return CORE_ID_HC;
+       case CMH_DS_CHACHA20_KEY:
+               return CORE_ID_CCP;
+       case CMH_DS_RSA_PRIV_KEY:
+       case CMH_DS_RSA_PUB_KEY:
+       case CMH_DS_RSA_CRT_KEY:
+       case CMH_DS_ECDSA_PRIV_KEY:
+       case CMH_DS_ECDSA_PUB_KEY:
+       case CMH_DS_ECDH_PRIV_KEY:
+       case CMH_DS_EDDSA_PRIV_KEY:
+       case CMH_DS_SHARED_SECRET:
+       case CMH_DS_SM2_PRIV_KEY:
+               return CORE_ID_PKE;
+       case CMH_DS_ML_KEM_DK:
+       case CMH_DS_ML_DSA_SK:
+               return CORE_ID_QSE;
+       case CMH_DS_SLHDSA_SK:
+               return CORE_ID_HCQ;
+       default:
+               return CORE_ID_NUM;
+       }
+}
+
+/**
+ * cmh_key_setkey_raw() - Store a raw key in the key context
+ * @ctx: Key context to populate
+ * @key: Pointer to the raw key bytes
+ * @keylen: Length of @key in bytes
+ * @core_id: Logical core ID for SYS_TYPE tagging
+ *
+ * Duplicates the raw key, DMA-maps the copy for the lifetime of the
+ * transform, and stores the mapping in @ctx.  Any previously held key
+ * is destroyed first.
+ *
+ * The DMA mapping persists until cmh_key_destroy() is called (typically
+ * from the algorithm .exit_tfm callback).  This avoids per-request DMA
+ * mapping overhead and matches the setkey-to-destroy lifetime model used
+ * by other upstream HW crypto drivers (CAAM, ccree, CCP).  The key
+ * buffer is freed via kfree_sensitive() on destroy.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_key_setkey_raw(struct cmh_key_ctx *ctx, const u8 *key,
+                      u32 keylen, u32 core_id)
+{
+       dma_addr_t dma;
+       u8 *copy;
+
+       if (!keylen || !key)
+               return -EINVAL;
+
+       copy = kmemdup(key, keylen, GFP_KERNEL);
+       if (!copy)
+               return -ENOMEM;
+
+       /* Pre-map for the lifetime of the transform */
+       dma = cmh_dma_map_single(copy, keylen, DMA_TO_DEVICE);
+       if (cmh_dma_map_error(dma)) {
+               kfree_sensitive(copy);
+               return -ENOMEM;
+       }
+
+       /* Clean up any previous key */
+       cmh_key_destroy(ctx);
+
+       ctx->mode = CMH_KEY_RAW;
+       ctx->raw.data = copy;
+       ctx->raw.len = keylen;
+       ctx->raw.dma = dma;
+       ctx->raw.sys_type = SYS_TYPE_SET(SYS_TYPE_FLAG_PT, core_id);
+
+       return 0;
+}
+
+/**
+ * cmh_key_destroy() - Destroy and zero-fill a key context
+ * @ctx: Key context to destroy
+ *
+ * For raw keys, unmaps the DMA buffer and securely frees the key material.
+ * Resets the key mode to CMH_KEY_NONE.
+ */
+void cmh_key_destroy(struct cmh_key_ctx *ctx)
+{
+       if (ctx->mode == CMH_KEY_RAW && ctx->raw.data) {
+               cmh_dma_unmap_single(ctx->raw.dma, ctx->raw.len,
+                                    DMA_TO_DEVICE);
+               kfree_sensitive(ctx->raw.data);
+               memzero_explicit(&ctx->raw, sizeof(ctx->raw));
+       }
+       ctx->mode = CMH_KEY_NONE;
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 452b8272908f..307bd7dd304b 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -29,6 +29,7 @@
 #include "cmh_mqi.h"
 #include "cmh_txn.h"
 #include "cmh_rh.h"
+#include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
 #include "cmh_sysfs.h"
@@ -190,12 +191,19 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_rh_init;

+       /* Register key management device (/dev/cmh_mgmt) */
+       ret = cmh_mgmt_register();
+       if (ret)
+               goto err_mgmt_register;
+
        g_cmh_dev = dev;
        platform_set_drvdata(pdev, dev);

        dev_info(cmh_dev(), "initialized successfully\n");
        return 0;

+err_mgmt_register:
+       cmh_rh_cleanup(cfg);
 err_rh_init:
        cmh_tm_cleanup();
 err_tm_init:
@@ -220,6 +228,7 @@ static void cmh_remove(struct platform_device *pdev)

        cfg = &dev->config;

+       cmh_mgmt_unregister();
        cmh_rh_cleanup(cfg);
        cmh_tm_cleanup();
        cmh_mqi_cleanup(cfg);
diff --git a/drivers/crypto/cmh/cmh_mgmt.c b/drivers/crypto/cmh/cmh_mgmt.c
new file mode 100644
index 000000000000..d228213f7850
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_mgmt.c
@@ -0,0 +1,1607 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Key Management misc_device (/dev/cmh_mgmt)
+ *
+ * Provides ioctl interface for key provisioning (NEW, NEW_RANDOM, WRITE, READ,
+ * FIND, GRANT, DELETE) and datastore lifecycle (EXPORT, IMPORT).
+ *
+ * Each ioctl handler: copy_from_user -> validate -> DMA alloc ->
+ * build VCQ -> cmh_tm_submit_sync -> copy_to_user -> DMA free.
+ *
+ * Access requires CAP_SYS_ADMIN (checked in open()).  The device node
+ * is mode 0660; DAC further limits access to owner/group.
+ * CMH eSW enforces per-MBX access control on top of this.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/miscdevice.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/capability.h>
+#include <linux/overflow.h>
+
+#include "cmh_mgmt.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_key.h"
+#include "cmh_dma.h"
+#include "cmh_config.h"
+#include "cmh_sys_abi.h"
+#include "cmh_pke.h"
+#include "cmh_pke_sm2.h"
+#include "cmh_qse_abi.h"
+#include "cmh_hcq_abi.h"
+#include <uapi/linux/cmh_mgmt_ioctl.h>
+
+#include <crypto/utils.h>
+
+/*
+ * Pin all mgmt ioctls to a single management mailbox (MBX 0).
+ *
+ * This is a deliberate, structural choice -- not a performance default.
+ * The /dev/cmh_mgmt path is *stateful* with respect to the eSW datastore,
+ * and that state is per-mailbox, so every step of a key's lifecycle must
+ * land on the same mailbox:
+ *
+ * 1. Datastore access control is per-mailbox AND opaque to the driver.
+ *    SYS_CMD_NEW grants the creating mailbox a (1 << mbx_id) access mask
+ *    (read/write/execute).  Crucially, the returned 64-bit ref encodes a
+ *    randomised offset -- NOT the owning mailbox -- so given only a ref
+ *    (as KEY_GRANT/READ/DELETE/DS_EXPORT receive), the driver cannot
+ *    recover which mailbox owns the object.  A fixed management mailbox
+ *    is therefore the only way to guarantee that NEW, WRITE, GRANT, READ
+ *    and the subsequent hardware-held-key compute ops all share the
+ *    mailbox that holds the access rights, without exposing mailbox
+ *    identity in the UABI.  (User space may still widen access to other
+ *    mailboxes explicitly via KEY_GRANT.)
+ *
+ * 2. The eSW SYS_REF_TEMP scratch store is per-mailbox and persists
+ *    across ioctl calls.  A derivation that writes SYS_REF_TEMP (e.g. a
+ *    KIC_* derive) must be consumed by a later ioctl on the *same*
+ *    mailbox (e.g. DS_EXPORT with wrap_key=SYS_REF_TEMP).
+ *
+ * Device-tree per-core ``cri,mbx`` affinity applies to the *stateless*
+ * registered crypto API path (cmh_core_select_instance()), which carries
+ * no datastore state across calls and is free to balance across mailboxes.
+ *
+ * Note: MBX 0 is NOT reserved exclusively for mgmt -- registered crypto
+ * operations may also land here via TM round-robin (target_mbx = -1).
+ * This is safe because those ops do not allocate from the temp store.
+ */
+
+/* VCQ layout: header + command + flush = 3 entries */
+#define MGMT_VCQ_CMDS          3
+
+/*
+ * Tracks whether any operation has left residual state in the device's
+ * per-mailbox temporary key store since the last flush.  The device
+ * reclaims temp storage only on a full mailbox flush (MBX_COMMAND_FLUSH),
+ * which also terminates any executing command queue with -EPIPE.
+ *
+ * To avoid killing concurrent in-flight operations, the flush in
+ * cmh_mgmt_ioctl() is conditional: it fires only when this flag is set.
+ * Operations that allocate temp storage (currently: KIC derivations
+ * targeting SYS_REF_TEMP) set this flag on success.
+ */
+static atomic_t mgmt_temp_dirty = ATOMIC_INIT(0);
+
+/* -- KEY_NEW -------------------------- */
+
+static int cmh_mgmt_key_new(void __user *argp)
+{
+       struct cmh_ioctl_key_new req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       u64 *ref_buf;
+       dma_addr_t ref_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.len)
+               return -EINVAL;
+
+       /* DMA buffer for CMH eSW to write back the ref */
+       ref_buf = kmalloc_obj(*ref_buf, GFP_KERNEL);
+       if (!ref_buf)
+               return -ENOMEM;
+
+       *ref_buf = 0;
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ref_dma)) {
+               kfree(ref_buf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_new(&vcq[1], req.cid, ref_dma, req.len);
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+
+       /*
+        * Unmap before CPU read: single-phase operation (no re-use of
+        * the DMA mapping), so unmap transfers ownership back to the
+        * CPU.  On SWIOTLB systems the unmap copies the bounce buffer
+        * to the original allocation.  This is the correct pattern for
+        * single-shot sync submits where the buffer is not re-mapped.
+        */
+       cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf), DMA_FROM_DEVICE);
+
+       if (ret) {
+               kfree(ref_buf);
+               return ret;
+       }
+
+       req.ref = *ref_buf;
+       kfree(ref_buf);
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(), "mgmt: KEY_NEW cid=0x%llx len=%u -> ref=0x%llx\n",
+               req.cid, req.len, req.ref);
+       return 0;
+}
+
+/* -- KEY_WRITE ------------------------- */
+
+static int cmh_mgmt_key_write(void __user *argp)
+{
+       struct cmh_ioctl_key_write req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       void *dmabuf;
+       dma_addr_t dma_addr;
+       u32 core_id, sys_type;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.len || req.len > CMH_MGMT_MAX_DATA_LEN)
+               return -EINVAL;
+
+       core_id = cmh_ds_type_to_core_id(req.ds_type);
+       if (core_id == CORE_ID_NUM)
+               return -EINVAL;
+       sys_type = SYS_TYPE_SET(req.flags, core_id);
+
+       dmabuf = kmalloc(req.len, GFP_KERNEL);
+       if (!dmabuf)
+               return -ENOMEM;
+
+       if (copy_from_user(dmabuf, u64_to_user_ptr(req.data),
+                          req.len)) {
+               kfree_sensitive(dmabuf);
+               return -EFAULT;
+       }
+
+       dma_addr = cmh_dma_map_single(dmabuf, req.len, DMA_TO_DEVICE);
+       if (cmh_dma_map_error(dma_addr)) {
+               kfree_sensitive(dmabuf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_write(&vcq[1], req.ref, dma_addr, req.wrap_key,
+                         req.len, sys_type);
+       /*
+        * PKE keys on Weierstrass curves and RSA keys must be byte-swapped
+        * when stored in the DS so they match the internal big-endian
+        * representation used by the PKE sidecar.  Edwards curve keys
+        * (EdDSA) use native byte order and must NOT be swapped.
+        */
+       switch (req.ds_type) {
+       case CMH_DS_RSA_PRIV_KEY:
+       case CMH_DS_RSA_PUB_KEY:
+       case CMH_DS_RSA_CRT_KEY:
+       case CMH_DS_ECDSA_PRIV_KEY:
+       case CMH_DS_ECDSA_PUB_KEY:
+       case CMH_DS_ECDH_PRIV_KEY:
+       case CMH_DS_SHARED_SECRET:
+       case CMH_DS_SM2_PRIV_KEY:
+               vcq[1].id |= PKE_SWAP_FLAGS;
+               break;
+       default:
+               /* EdDSA, symmetric keys -- no swap */
+               break;
+       }
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(dma_addr, req.len, DMA_TO_DEVICE);
+       kfree_sensitive(dmabuf);
+
+       if (ret)
+               return ret;
+
+       dev_dbg(cmh_dev(), "mgmt: KEY_WRITE ref=0x%llx len=%u type=0x%x\n",
+               req.ref, req.len, sys_type);
+       return 0;
+}
+
+/* -- KEY_READ -------------------------- */
+
+static int cmh_mgmt_key_read(void __user *argp)
+{
+       struct cmh_ioctl_key_read req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       void *dmabuf;
+       dma_addr_t dma_addr;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (!req.len || req.len > CMH_MGMT_MAX_DATA_LEN)
+               return -EINVAL;
+
+       dmabuf = kzalloc(req.len, GFP_KERNEL);
+       if (!dmabuf)
+               return -ENOMEM;
+
+       dma_addr = cmh_dma_map_single(dmabuf, req.len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(dma_addr)) {
+               kfree(dmabuf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_read(&vcq[1], req.ref, dma_addr, req.wrap_key, req.len);
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(dma_addr, req.len, DMA_FROM_DEVICE);
+
+       if (ret) {
+               kfree_sensitive(dmabuf);
+               return ret;
+       }
+
+       if (copy_to_user(u64_to_user_ptr(req.data),
+                        dmabuf, req.len)) {
+               kfree_sensitive(dmabuf);
+               return -EFAULT;
+       }
+
+       req.out_len = req.len;
+       kfree_sensitive(dmabuf);
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(), "mgmt: KEY_READ ref=0x%llx len=%u\n",
+               req.ref, req.out_len);
+       return 0;
+}
+
+/* -- KEY_FIND -------------------------- */
+
+static int cmh_mgmt_key_find(void __user *argp)
+{
+       struct cmh_ioctl_key_find req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       struct sys_list_item *item;
+       dma_addr_t item_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+
+       item = kzalloc_obj(*item, GFP_KERNEL);
+       if (!item)
+               return -ENOMEM;
+
+       item_dma = cmh_dma_map_single(item, sizeof(*item), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(item_dma)) {
+               kfree(item);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_find(&vcq[1], req.cid, item_dma, sizeof(*item));
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(item_dma, sizeof(*item), DMA_FROM_DEVICE);
+
+       if (ret) {
+               kfree(item);
+               return ret;
+       }
+
+       req.ref = item->ref;
+       req.len = item->len;
+       req.type = item->type;
+       kfree(item);
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(), "mgmt: KEY_FIND cid=0x%llx -> ref=0x%llx\n",
+               req.cid, req.ref);
+       return 0;
+}
+
+/* -- KEY_LIST ------------------------- */
+
+static int cmh_mgmt_key_list(void __user *argp)
+{
+       struct cmh_ioctl_key_list req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       struct sys_list_item *item;
+       dma_addr_t item_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+
+       if (req.__reserved)
+               return -EINVAL;
+
+       item = kzalloc_obj(*item, GFP_KERNEL);
+       if (!item)
+               return -ENOMEM;
+
+       item_dma = cmh_dma_map_single(item, sizeof(*item), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(item_dma)) {
+               kfree(item);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_list(&vcq[1], req.start_ref, item_dma, sizeof(*item));
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(item_dma, sizeof(*item), DMA_FROM_DEVICE);
+
+       if (ret) {
+               kfree(item);
+               return ret;
+       }
+
+       req.ref = item->ref;
+       req.cid = item->cid;
+       req.len = item->len;
+       req.type = item->type;
+       kfree(item);
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       return 0;
+}
+
+/* -- KEY_GRANT / KEY_DELETE --------------------- */
+
+static int cmh_mgmt_key_grant(void __user *argp, bool is_delete)
+{
+       struct cmh_ioctl_key_grant req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+
+       /* DELETE = GRANT with all permissions zeroed */
+       if (is_delete) {
+               req.read = 0;
+               req.write = 0;
+               req.execute = 0;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_grant(&vcq[1], req.ref, req.read, req.write, req.execute);
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+       if (ret)
+               return ret;
+
+       dev_dbg(cmh_dev(), "mgmt: KEY_%s ref=0x%llx r=0x%llx w=0x%llx x=0x%llx\n",
+               is_delete ? "DELETE" : "GRANT",
+               req.ref, req.read, req.write, req.execute);
+       return 0;
+}
+
+/* -- DS_EXPORT ------------------------- */
+
+static int cmh_mgmt_ds_export(void __user *argp)
+{
+       struct cmh_ioctl_ds_export req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       void *dmabuf;
+       dma_addr_t dma_addr;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (!req.len || req.len > CMH_MGMT_MAX_DATA_LEN)
+               return -EINVAL;
+
+       /*
+        * req.len is the exact DMA buffer size given to the eSW.
+        * Userspace must size it to at least the export blob:
+        *
+        *   wrapped:   sizeof(sys_wrap_hdr) + 2*AES_BLOCK_SIZE + obj_len
+        *              = 16 + 32 + obj_len  = 48 + obj_len
+        *   plaintext: sizeof(sys_wrap_hdr) + obj_len
+        *              = 16 + obj_len
+        *
+        * obj_len is known from KEY_NEW or KEY_FIND.  If req.len is
+        * too small, the eSW rejects the command and we return -EIO.
+        */
+       dmabuf = kzalloc(req.len, GFP_KERNEL);
+       if (!dmabuf)
+               return -ENOMEM;
+
+       dma_addr = cmh_dma_map_single(dmabuf, req.len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(dma_addr)) {
+               kfree(dmabuf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_export(&vcq[1], req.cid, dma_addr, req.wrap_key, req.len);
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(dma_addr, req.len, DMA_FROM_DEVICE);
+
+       if (ret) {
+               kfree_sensitive(dmabuf);
+               return ret;
+       }
+
+       /* Parse actual blob size from the eSW-written header */
+       {
+               struct sys_wrap_hdr *hdr = (struct sys_wrap_hdr *)dmabuf;
+               u64 actual;
+
+               if (check_add_overflow((u64)sizeof(*hdr), (u64)hdr->wrap,
+                                      &actual) ||
+                   check_add_overflow(actual, (u64)hdr->len, &actual) ||
+                   actual > req.len) {
+                       kfree_sensitive(dmabuf);
+                       return -EIO;
+               }
+               req.out_len = (u32)actual;
+       }
+
+       if (copy_to_user(u64_to_user_ptr(req.data),
+                        dmabuf, req.out_len)) {
+               kfree_sensitive(dmabuf);
+               return -EFAULT;
+       }
+
+       kfree_sensitive(dmabuf);
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(), "mgmt: DS_EXPORT wrap_key=0x%llx len=%u\n",
+               req.wrap_key, req.out_len);
+       return 0;
+}
+
+/* -- DS_IMPORT ------------------------- */
+
+static int cmh_mgmt_ds_import(void __user *argp)
+{
+       struct cmh_ioctl_ds_import req;
+       struct vcq_cmd vcq[MGMT_VCQ_CMDS];
+       void *dmabuf;
+       dma_addr_t dma_addr;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.len || req.len > CMH_MGMT_MAX_DATA_LEN)
+               return -EINVAL;
+
+       dmabuf = kmalloc(req.len, GFP_KERNEL);
+       if (!dmabuf)
+               return -ENOMEM;
+
+       if (copy_from_user(dmabuf, u64_to_user_ptr(req.data),
+                          req.len)) {
+               kfree_sensitive(dmabuf);
+               return -EFAULT;
+       }
+
+       dma_addr = cmh_dma_map_single(dmabuf, req.len, DMA_TO_DEVICE);
+       if (cmh_dma_map_error(dma_addr)) {
+               kfree_sensitive(dmabuf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], MGMT_VCQ_CMDS);
+       vcq_add_sys_import(&vcq[1], dma_addr, req.wrap_key, req.len);
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, MGMT_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(dma_addr, req.len, DMA_TO_DEVICE);
+       kfree_sensitive(dmabuf);
+
+       if (ret)
+               return ret;
+
+       dev_dbg(cmh_dev(), "mgmt: DS_IMPORT wrap_key=0x%llx len=%u\n",
+               req.wrap_key, req.len);
+       return 0;
+}
+
+/* -- KIC key derivation ioctls --------
+ *
+ * All four KIC derivation handlers (HKDF1, HKDF2, AES-CMAC-KDF,
+ * DKEK-derive) share the same two-mode structure and temp-flush pattern.
+ *
+ * Temp-storage flush rationale:
+ *
+ *   The device maintains a small per-mailbox temporary key store
+ *   (~960 bytes, LIFO).  A derivation targeting SYS_REF_TEMP allocates
+ *   from this store; the allocation persists across command-queue
+ *   boundaries until either (a) a subsequent command consumes it or
+ *   (b) a mailbox flush resets the store.
+ *
+ *   Our single-derivation ioctls produce a temp key with no consumer
+ *   in the same queue -- the key is consumed by a *later* ioctl
+ *   (e.g. DS_EXPORT with wrap_key=SYS_REF_TEMP).  If no consumer
+ *   follows, the allocation persists.  Sequential temp derivations
+ *   accumulate allocations until the store is exhausted (3--8 calls
+ *   depending on key size), after which the device returns ENOMEM.
+ *
+ *   A mailbox flush (cmh_tm_flush_mbx / MBX_COMMAND_FLUSH) resets the
+ *   temp store.  It does NOT destroy persistent keys, datastore
+ *   objects, or DRBG state -- only the command queue and temp store.
+ *
+ *   Safe for cross-ioctl temp flows (e.g. export-to-file:
+ *   HKDF1->TEMP in ioctl 1, then DS_EXPORT with wrap_key=TEMP in
+ *   ioctl 2): the flush only happens in derivation handlers and in
+ *   the pre-PKE dispatch path, not in DS_EXPORT/DS_IMPORT, so the
+ *   temp key survives until consumed.
+ *
+ * The ioctl dispatch also flushes before PKE/SM2/PQC ioctls to
+ * protect them from temp residue left by earlier derivations on the
+ * same mailbox.  The per-handler flushes here remain necessary
+ * because sequential temp derivations (without an intervening
+ * PKE/SM2/PQC ioctl) would still exhaust the store.
+ */
+
+/* -- KIC_HKDF1 ------------------------- */
+
+/*
+ * Derive a key from a KIC base key via one-step HKDF.
+ *
+ * Two modes controlled by CMH_KIC_FLAG_TEMP:
+ *
+ *   TEMP (flag set) -- 3-command VCQ:
+ *     [0] SYS header
+ *     [1] KIC_CMD_HKDF1 (dst=SYS_REF_TEMP)
+ *     [2] flush
+ *   Returns SYS_REF_TEMP as ref.  No DS entry created.
+ *
+ *   Persistent (flag clear) -- 4-command VCQ:
+ *     [0] SYS header
+ *     [1] SYS_CMD_NEW   (allocate DS slot, CMH eSW writes ref)
+ *     [2] KIC_CMD_HKDF1 (dst=SYS_REF_LAST = just-allocated slot)
+ *     [3] flush
+ *   Returns the new DS reference.
+ */
+#define KDF_VCQ_MAX            4
+#define KDF_MAX_KEY_LEN                64
+#define KDF_MAX_LABEL_LEN      56
+
+static int cmh_mgmt_kic_hkdf1(void __user *argp)
+{
+       struct cmh_ioctl_kic_hkdf1 req;
+       struct vcq_cmd vcq[KDF_VCQ_MAX];
+       bool temp;
+       u64 *ref_buf = NULL;
+       void *label_buf = NULL;
+       dma_addr_t ref_dma = DMA_MAPPING_ERROR, label_dma = DMA_MAPPING_ERROR;
+       unsigned int n_cmds;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.key_len || req.key_len > KDF_MAX_KEY_LEN)
+               return -EINVAL;
+       if (req.label_len > KDF_MAX_LABEL_LEN)
+               return -EINVAL;
+
+       temp = !!(req.flags & CMH_KIC_FLAG_TEMP);
+
+       /*
+        * Persistent path: need DMA buffer for CMH eSW to write the
+        * newly-allocated DS reference.
+        */
+       if (!temp) {
+               ref_buf = kmalloc_obj(*ref_buf, GFP_KERNEL);
+               if (!ref_buf)
+                       return -ENOMEM;
+               *ref_buf = 0;
+               ref_dma = cmh_dma_map_single(ref_buf, sizeof(*ref_buf),
+                                            DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(ref_dma)) {
+                       kfree(ref_buf);
+                       return -ENOMEM;
+               }
+       }
+
+       /* DMA buffer for label data (CMH eSW DMA-reads it) */
+       if (req.label_len > 0) {
+               label_buf = kzalloc(req.label_len, GFP_KERNEL);
+               if (!label_buf) {
+                       ret = -ENOMEM;
+                       goto out_ref;
+               }
+               if (copy_from_user(label_buf,
+                                  u64_to_user_ptr(req.label),
+                                  req.label_len)) {
+                       ret = -EFAULT;
+                       goto out_label;
+               }
+               label_dma = cmh_dma_map_single(label_buf, req.label_len,
+                                              DMA_TO_DEVICE);
+               if (cmh_dma_map_error(label_dma)) {
+                       ret = -ENOMEM;
+                       goto out_label;
+               }
+       }
+
+       /* Build VCQ */
+       memset(vcq, 0, sizeof(vcq));
+
+       if (temp) {
+               /* Flush MBX to reset temp stack -- see KIC section comment */
+               ret = cmh_tm_flush_mbx(MGMT_MBX);
+               if (ret)
+                       goto out_unmap_label;
+
+               n_cmds = 3;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_kic_hkdf1(&vcq[1], SYS_REF_TEMP, req.base_key,
+                                 label_dma, req.key_len, req.label_len,
+                                  SYS_TYPE_SET(0, CORE_ID_AES));
+               vcq_add_sys_flush(&vcq[2]);
+       } else {
+               n_cmds = 4;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_sys_new(&vcq[1], req.cid, ref_dma, req.key_len);
+               vcq_add_kic_hkdf1(&vcq[2], SYS_REF_LAST, req.base_key,
+                                 label_dma, req.key_len, req.label_len,
+                                  SYS_TYPE_SET(0, CORE_ID_AES));
+               vcq_add_sys_flush(&vcq[3]);
+       }
+
+       ret = cmh_tm_submit_sync_mbx(vcq, n_cmds, 1, MGMT_MBX);
+
+       /* Cleanup label DMA */
+       if (label_buf) {
+               cmh_dma_unmap_single(label_dma, req.label_len, DMA_TO_DEVICE);
+               kfree(label_buf);
+               label_buf = NULL;
+       }
+
+       if (ret)
+               goto out_ref;
+
+       if (temp) {
+               req.ref = SYS_REF_TEMP;
+               atomic_set(&mgmt_temp_dirty, 1);
+       } else {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               req.ref = *ref_buf;
+               kfree(ref_buf);
+               ref_buf = NULL;
+       }
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(),
+               "mgmt: KIC_HKDF1 base=0x%llx len=%u flags=0x%x -> ref=0x%llx\n",
+               req.base_key, req.key_len, req.flags, req.ref);
+       return 0;
+
+out_unmap_label:
+       if (label_buf && !cmh_dma_map_error(label_dma) && label_dma)
+               cmh_dma_unmap_single(label_dma, req.label_len, DMA_TO_DEVICE);
+out_label:
+       kfree(label_buf);
+out_ref:
+       if (ref_buf) {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               kfree(ref_buf);
+       }
+       return ret;
+}
+
+/* -- KIC_HKDF2 ------------------------- */
+
+/*
+ * Two-step HKDF key derivation.  Same as HKDF1 but adds a salt key
+ * reference: Step 1: HMAC(salt, base) -> PRK; Step 2: HMAC(PRK, label) -> key.
+ */
+
+static int cmh_mgmt_kic_hkdf2(void __user *argp)
+{
+       struct cmh_ioctl_kic_hkdf2 req;
+       struct vcq_cmd vcq[KDF_VCQ_MAX];
+       bool temp;
+       u64 *ref_buf = NULL;
+       void *label_buf = NULL;
+       dma_addr_t ref_dma = DMA_MAPPING_ERROR, label_dma = DMA_MAPPING_ERROR;
+       unsigned int n_cmds;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.key_len || req.key_len > KDF_MAX_KEY_LEN)
+               return -EINVAL;
+       if (req.label_len > KDF_MAX_LABEL_LEN)
+               return -EINVAL;
+
+       temp = !!(req.flags & CMH_KIC_FLAG_TEMP);
+
+       if (!temp) {
+               ref_buf = kmalloc_obj(*ref_buf, GFP_KERNEL);
+               if (!ref_buf)
+                       return -ENOMEM;
+               *ref_buf = 0;
+               ref_dma = cmh_dma_map_single(ref_buf, sizeof(*ref_buf),
+                                            DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(ref_dma)) {
+                       kfree(ref_buf);
+                       return -ENOMEM;
+               }
+       }
+
+       if (req.label_len > 0) {
+               label_buf = kzalloc(req.label_len, GFP_KERNEL);
+               if (!label_buf) {
+                       ret = -ENOMEM;
+                       goto out_ref2;
+               }
+               if (copy_from_user(label_buf,
+                                  u64_to_user_ptr(req.label),
+                                  req.label_len)) {
+                       ret = -EFAULT;
+                       goto out_label2;
+               }
+               label_dma = cmh_dma_map_single(label_buf, req.label_len,
+                                              DMA_TO_DEVICE);
+               if (cmh_dma_map_error(label_dma)) {
+                       ret = -ENOMEM;
+                       goto out_label2;
+               }
+       }
+
+       memset(vcq, 0, sizeof(vcq));
+
+       if (temp) {
+               /* Flush MBX to reset temp stack -- see KIC section comment */
+               ret = cmh_tm_flush_mbx(MGMT_MBX);
+               if (ret)
+                       goto out_unmap_label2;
+
+               n_cmds = 3;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_kic_hkdf2(&vcq[1], SYS_REF_TEMP, req.base_key,
+                                 req.salt_key, label_dma,
+                                  req.key_len, req.label_len,
+                                  SYS_TYPE_SET(0, CORE_ID_AES));
+               vcq_add_sys_flush(&vcq[2]);
+       } else {
+               n_cmds = 4;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_sys_new(&vcq[1], req.cid, ref_dma, req.key_len);
+               vcq_add_kic_hkdf2(&vcq[2], SYS_REF_LAST, req.base_key,
+                                 req.salt_key, label_dma,
+                                  req.key_len, req.label_len,
+                                  SYS_TYPE_SET(0, CORE_ID_AES));
+               vcq_add_sys_flush(&vcq[3]);
+       }
+
+       ret = cmh_tm_submit_sync_mbx(vcq, n_cmds, 1, MGMT_MBX);
+
+       if (label_buf) {
+               cmh_dma_unmap_single(label_dma, req.label_len, DMA_TO_DEVICE);
+               kfree(label_buf);
+               label_buf = NULL;
+       }
+
+       if (ret)
+               goto out_ref2;
+
+       if (temp) {
+               req.ref = SYS_REF_TEMP;
+               atomic_set(&mgmt_temp_dirty, 1);
+       } else {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               req.ref = *ref_buf;
+               kfree(ref_buf);
+               ref_buf = NULL;
+       }
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(),
+               "mgmt: KIC_HKDF2 base=0x%llx salt=0x%llx len=%u flags=0x%x -> ref=0x%llx\n",
+               req.base_key, req.salt_key, req.key_len, req.flags, req.ref);
+       return 0;
+
+out_unmap_label2:
+       if (label_buf && !cmh_dma_map_error(label_dma) && label_dma)
+               cmh_dma_unmap_single(label_dma, req.label_len, DMA_TO_DEVICE);
+out_label2:
+       kfree(label_buf);
+out_ref2:
+       if (ref_buf) {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               kfree(ref_buf);
+       }
+       return ret;
+}
+
+/* -- KIC_AES_CMAC_KDF ------------------ */
+
+/*
+ * Derive a key using AES-CMAC-based KDF (NIST SP800-108 style).
+ * Base key must be 32 bytes.  Output is always non-PT (the hub driver
+ * rejects SYS_TYPE_FLAG_PT).
+ *
+ * VCQ layout matches HKDF: TEMP mode uses 3 commands, persistent uses 4.
+ */
+#define CMAC_KDF_KEY_LEN       32
+
+static int cmh_mgmt_kic_aes_cmac_kdf(void __user *argp)
+{
+       struct cmh_ioctl_kic_aes_cmac_kdf req;
+       struct vcq_cmd vcq[KDF_VCQ_MAX];
+       bool temp;
+       u64 *ref_buf = NULL;
+       void *label_buf = NULL;
+       dma_addr_t ref_dma = DMA_MAPPING_ERROR, label_dma = DMA_MAPPING_ERROR;
+       unsigned int n_cmds;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.key_len != CMAC_KDF_KEY_LEN)
+               return -EINVAL;
+       if (req.label_len > KDF_MAX_LABEL_LEN)
+               return -EINVAL;
+
+       temp = !!(req.flags & CMH_KIC_FLAG_TEMP);
+
+       if (!temp) {
+               ref_buf = kmalloc_obj(*ref_buf, GFP_KERNEL);
+               if (!ref_buf)
+                       return -ENOMEM;
+               *ref_buf = 0;
+               ref_dma = cmh_dma_map_single(ref_buf, sizeof(*ref_buf),
+                                            DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(ref_dma)) {
+                       kfree(ref_buf);
+                       return -ENOMEM;
+               }
+       }
+
+       if (req.label_len > 0) {
+               label_buf = kzalloc(req.label_len, GFP_KERNEL);
+               if (!label_buf) {
+                       ret = -ENOMEM;
+                       goto out_ref_cmac;
+               }
+               if (copy_from_user(label_buf,
+                                  u64_to_user_ptr(req.label),
+                                  req.label_len)) {
+                       ret = -EFAULT;
+                       goto out_label_cmac;
+               }
+               label_dma = cmh_dma_map_single(label_buf, req.label_len,
+                                              DMA_TO_DEVICE);
+               if (cmh_dma_map_error(label_dma)) {
+                       ret = -ENOMEM;
+                       goto out_label_cmac;
+               }
+       }
+
+       memset(vcq, 0, sizeof(vcq));
+
+       if (temp) {
+               /* Flush MBX to reset temp stack -- see KIC section comment */
+               ret = cmh_tm_flush_mbx(MGMT_MBX);
+               if (ret)
+                       goto out_unmap_label_cmac;
+
+               n_cmds = 3;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_kic_aes_cmac_kdf(&vcq[1], SYS_REF_TEMP,
+                                        req.base_key, label_dma,
+                                        req.key_len, req.label_len,
+                                        SYS_TYPE_SET(0, CORE_ID_AES));
+               vcq_add_sys_flush(&vcq[2]);
+       } else {
+               n_cmds = 4;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_sys_new(&vcq[1], req.cid, ref_dma, req.key_len);
+               vcq_add_kic_aes_cmac_kdf(&vcq[2], SYS_REF_LAST,
+                                        req.base_key, label_dma,
+                                        req.key_len, req.label_len,
+                                        SYS_TYPE_SET(0, CORE_ID_AES));
+               vcq_add_sys_flush(&vcq[3]);
+       }
+
+       ret = cmh_tm_submit_sync_mbx(vcq, n_cmds, 1, MGMT_MBX);
+
+       if (label_buf) {
+               cmh_dma_unmap_single(label_dma, req.label_len, DMA_TO_DEVICE);
+               kfree(label_buf);
+               label_buf = NULL;
+       }
+
+       if (ret)
+               goto out_ref_cmac;
+
+       if (temp) {
+               req.ref = SYS_REF_TEMP;
+               atomic_set(&mgmt_temp_dirty, 1);
+       } else {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               req.ref = *ref_buf;
+               kfree(ref_buf);
+               ref_buf = NULL;
+       }
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(),
+               "mgmt: KIC_AES_CMAC_KDF base=0x%llx len=%u flags=0x%x -> ref=0x%llx\n",
+               req.base_key, req.key_len, req.flags, req.ref);
+       return 0;
+
+out_unmap_label_cmac:
+       if (label_buf && !cmh_dma_map_error(label_dma) && label_dma)
+               cmh_dma_unmap_single(label_dma, req.label_len, DMA_TO_DEVICE);
+out_label_cmac:
+       kfree(label_buf);
+out_ref_cmac:
+       if (ref_buf) {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               kfree(ref_buf);
+       }
+       return ret;
+}
+
+/* -- KIC_DKEK_DERIVE ------------------- */
+
+/*
+ * Derive a Key Encryption Key (KEK) from a KIC base key.
+ * Output is tagged CORE_ID_KIC (usable for further derivation only).
+ * host_id=0 means the caller's own host; non-zero requires management
+ * host privilege (eSW enforces this).
+ */
+#define DKEK_VCQ_MAX           4
+
+static int cmh_mgmt_kic_dkek_derive(void __user *argp)
+{
+       struct cmh_ioctl_kic_dkek_derive req;
+       struct vcq_cmd vcq[DKEK_VCQ_MAX];
+       bool temp;
+       u64 *ref_buf = NULL;
+       void *meta_buf = NULL;
+       dma_addr_t ref_dma = DMA_MAPPING_ERROR, meta_dma = DMA_MAPPING_ERROR;
+       unsigned int n_cmds;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.metadata_len > KIC_DKEK_MAX_METADATA)
+               return -EINVAL;
+
+       temp = !!(req.flags & CMH_KIC_FLAG_TEMP);
+
+       if (!temp) {
+               ref_buf = kmalloc_obj(*ref_buf, GFP_KERNEL);
+               if (!ref_buf)
+                       return -ENOMEM;
+               *ref_buf = 0;
+               ref_dma = cmh_dma_map_single(ref_buf, sizeof(*ref_buf),
+                                            DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(ref_dma)) {
+                       kfree(ref_buf);
+                       return -ENOMEM;
+               }
+       }
+
+       if (req.metadata_len > 0) {
+               meta_buf = kzalloc(req.metadata_len, GFP_KERNEL);
+               if (!meta_buf) {
+                       ret = -ENOMEM;
+                       goto out_ref_dkek;
+               }
+               if (copy_from_user(meta_buf,
+                                  u64_to_user_ptr(req.metadata),
+                                  req.metadata_len)) {
+                       ret = -EFAULT;
+                       goto out_meta;
+               }
+               meta_dma = cmh_dma_map_single(meta_buf, req.metadata_len,
+                                             DMA_TO_DEVICE);
+               if (cmh_dma_map_error(meta_dma)) {
+                       ret = -ENOMEM;
+                       goto out_meta;
+               }
+       }
+
+       memset(vcq, 0, sizeof(vcq));
+
+       if (temp) {
+               /* Flush MBX to reset temp stack -- see KIC section comment */
+               ret = cmh_tm_flush_mbx(MGMT_MBX);
+               if (ret)
+                       goto out_unmap_meta;
+
+               n_cmds = 3;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_kic_dkek_derive(&vcq[1], SYS_REF_TEMP,
+                                       req.base_key, req.host_id,
+                                       meta_dma, req.metadata_len);
+               vcq_add_sys_flush(&vcq[2]);
+       } else {
+               n_cmds = 4;
+               vcq_set_header(&vcq[0], n_cmds);
+               vcq_add_sys_new(&vcq[1], req.cid, ref_dma, KIC_KEY_SIZE);
+               vcq_add_kic_dkek_derive(&vcq[2], SYS_REF_LAST,
+                                       req.base_key, req.host_id,
+                                       meta_dma, req.metadata_len);
+               vcq_add_sys_flush(&vcq[3]);
+       }
+
+       ret = cmh_tm_submit_sync_mbx(vcq, n_cmds, 1, MGMT_MBX);
+
+       if (meta_buf) {
+               cmh_dma_unmap_single(meta_dma, req.metadata_len,
+                                    DMA_TO_DEVICE);
+               kfree(meta_buf);
+               meta_buf = NULL;
+       }
+
+       if (ret)
+               goto out_ref_dkek;
+
+       if (temp) {
+               req.ref = SYS_REF_TEMP;
+               atomic_set(&mgmt_temp_dirty, 1);
+       } else {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               req.ref = *ref_buf;
+               kfree(ref_buf);
+               ref_buf = NULL;
+       }
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(),
+               "mgmt: KIC_DKEK_DERIVE base=0x%llx host=%u meta_len=%u flags=0x%x -> ref=0x%llx\n",
+               req.base_key, req.host_id, req.metadata_len, req.flags,
+               req.ref);
+       return 0;
+
+out_unmap_meta:
+       if (meta_buf && !cmh_dma_map_error(meta_dma) && meta_dma)
+               cmh_dma_unmap_single(meta_dma, req.metadata_len, DMA_TO_DEVICE);
+out_meta:
+       kfree(meta_buf);
+out_ref_dkek:
+       if (ref_buf) {
+               cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+               kfree(ref_buf);
+       }
+       return ret;
+}
+
+/* -- KEY_NEW_RANDOM -- DRBG-backed key generation --- */
+
+/*
+ * Allocate a new datastore slot and fill it with DRBG-generated
+ * random key material in a single atomic VCQ submission:
+ *
+ *   [0] SYS header(5)
+ *   [1] SYS_CMD_NEW   -- allocate DS slot (CMH eSW writes ref)
+ *   [2] DRBG_CMD_DATASTORE(SYS_REF_LAST) -- fill with random data
+ *   [3] DRBG flush -- release DRBG core ownership
+ *   [4] SYS flush
+ *
+ * The DRBG must be configured before this ioctl is used.
+ * Reuses struct cmh_ioctl_key_new (ds_type, flags, cid, len, ref).
+ */
+#define DRBG_KEYGEN_VCQ_CMDS   5
+
+static int cmh_mgmt_key_new_random(void __user *argp)
+{
+       struct cmh_ioctl_key_new req;
+       struct vcq_cmd vcq[DRBG_KEYGEN_VCQ_CMDS];
+       u64 *ref_buf;
+       dma_addr_t ref_dma;
+       u32 core_id, sys_type;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.len)
+               return -EINVAL;
+
+       core_id = cmh_ds_type_to_core_id(req.ds_type);
+       if (core_id == CORE_ID_NUM)
+               return -EINVAL;
+       sys_type = SYS_TYPE_SET(req.flags, core_id);
+
+       ref_buf = kmalloc_obj(*ref_buf, GFP_KERNEL);
+       if (!ref_buf)
+               return -ENOMEM;
+
+       *ref_buf = 0;
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(*ref_buf),
+                                    DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ref_dma)) {
+               kfree(ref_buf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], DRBG_KEYGEN_VCQ_CMDS);
+       vcq_add_sys_new(&vcq[1], req.cid, ref_dma, req.len);
+       vcq_add_drbg_datastore(&vcq[2], SYS_REF_LAST, req.len, sys_type);
+       vcq_add_flush(&vcq[3], CORE_ID_DRBG);
+       vcq_add_sys_flush(&vcq[4]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, DRBG_KEYGEN_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(ref_dma, sizeof(*ref_buf), DMA_FROM_DEVICE);
+
+       if (ret) {
+               kfree(ref_buf);
+               return ret;
+       }
+
+       req.ref = *ref_buf;
+       kfree(ref_buf);
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       dev_dbg(cmh_dev(),
+               "mgmt: KEY_NEW_RANDOM cid=0x%llx len=%u type=0x%x -> ref=0x%llx\n",
+               req.cid, req.len, sys_type, req.ref);
+       return 0;
+}
+
+#define EAC_VCQ_CMDS           3       /* header + EAC_READ + flush */
+
+static long cmh_mgmt_eac_read(void __user *argp)
+{
+       struct cmh_ioctl_eac_read req;
+       struct eac_read_rsp *rsp;
+       struct vcq_cmd vcq[EAC_VCQ_CMDS];
+       dma_addr_t rsp_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved != 0)
+               return -EINVAL;
+       if (req.__pad != 0)
+               return -EINVAL;
+
+       rsp = kmalloc_obj(*rsp, GFP_KERNEL);
+       if (!rsp)
+               return -ENOMEM;
+
+       rsp_dma = cmh_dma_map_single(rsp, sizeof(*rsp), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rsp_dma)) {
+               kfree(rsp);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], EAC_VCQ_CMDS);
+       vcq_add_eac_read(&vcq[1], rsp_dma, sizeof(*rsp));
+       vcq_add_flush(&vcq[2], CORE_ID_EAC);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, EAC_VCQ_CMDS, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(rsp_dma, sizeof(*rsp), DMA_FROM_DEVICE);
+
+       if (ret) {
+               kfree(rsp);
+               return ret;
+       }
+
+       /* Copy response fields into ioctl struct */
+       req.mailbox_notification = rsp->mailbox_notification;
+       req.hw_error = rsp->hw_error;
+       req.hw_nmi = rsp->hw_nmi;
+       req.hw_panic = rsp->hw_panic;
+       req.safety_fatal = rsp->safety_fatal;
+       req.safety_notification = rsp->safety_notification;
+       req.sw_info0 = rsp->sw_info0;
+       req.sw_info1 = rsp->sw_info1;
+       memcpy(req.sram_bank_errors, rsp->sram_bank_errors,
+              sizeof(req.sram_bank_errors));
+       req.__pad = 0;
+
+       kfree(rsp);
+
+       if (copy_to_user(argp, &req, sizeof(req)))
+               return -EFAULT;
+
+       return 0;
+}
+
+/* -- DRBG CONFIG (management) ------------ */
+
+#define DRBG_CONFIG_VCQ_CMDS   4       /* header + RESET + CONFIG + flush */
+
+static long cmh_mgmt_drbg_config(void __user *argp)
+{
+       struct cmh_ioctl_drbg_config req;
+       struct vcq_cmd vcq[DRBG_CONFIG_VCQ_CMDS];
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved != 0)
+               return -EINVAL;
+       if (req.entropy_ratio > 3)
+               return -EINVAL;
+       if (req.security_strength != CMH_DRBG_STRENGTH_128 &&
+           req.security_strength != CMH_DRBG_STRENGTH_256)
+               return -EINVAL;
+
+       vcq_set_header(&vcq[0], DRBG_CONFIG_VCQ_CMDS);
+       vcq_add_drbg_reset(&vcq[1]);
+       vcq_add_drbg_config(&vcq[2], req.entropy_ratio,
+                           req.security_strength);
+       vcq_add_flush(&vcq[3], CORE_ID_DRBG);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, DRBG_CONFIG_VCQ_CMDS, 1, MGMT_MBX);
+       if (ret)
+               dev_warn(cmh_dev(), "mgmt: DRBG CONFIG failed (rc=%d)\n", ret);
+       else
+               dev_info(cmh_dev(), "mgmt: DRBG configured (ratio=%u strength=0x%x)\n",
+                        req.entropy_ratio, req.security_strength);
+
+       return ret;
+}
+
+/* -- ioctl dispatch ------------------------ */
+
+/*
+ * PKE, SM2, and PQC ioctls use device-internal temporary storage for
+ * intermediate results.  Residual allocations in the per-mailbox temp
+ * store (left by prior operations that targeted SYS_REF_TEMP) reduce
+ * the space available and can cause the device to return ENOMEM.
+ *
+ * Flush the mailbox before these operations to reset the temp store,
+ * but ONLY when the store is actually dirty (mgmt_temp_dirty flag).
+ * Unconditional flushing would kill in-flight command queues from
+ * concurrent callers on the same mailbox -- MBX_COMMAND_FLUSH
+ * terminates any executing queue with -EPIPE and discards all queued
+ * submissions.
+ *
+ * The conditional flush is safe: PKE/SM2/PQC ioctls do not consume
+ * SYS_REF_TEMP from a prior ioctl (unlike DS_EXPORT/DS_IMPORT which
+ * may reference a temp key produced by a preceding derivation), so
+ * clearing the temp store before them loses no needed state.
+ */
+static inline bool cmh_mgmt_needs_temp_flush(unsigned int cmd)
+{
+       unsigned int nr = _IOC_NR(cmd);
+
+       /*
+        * Range invariant: all PKE/SM2/PQC ioctls must have consecutive
+        * NR values between PKE_RSA_ENC (0x10) and SM2_ENC_HASH (0x37).
+        * If a new ioctl is added outside this range, update the bounds
+        * and adjust these assertions.
+        */
+       BUILD_BUG_ON(_IOC_NR(CMH_IOCTL_PKE_RSA_ENC) != 0x10);
+       BUILD_BUG_ON(_IOC_NR(CMH_IOCTL_SM2_ENC_HASH) != 0x37);
+
+       return nr >= _IOC_NR(CMH_IOCTL_PKE_RSA_ENC) &&
+              nr <= _IOC_NR(CMH_IOCTL_SM2_ENC_HASH);
+}
+
+static long cmh_mgmt_ioctl(struct file *file, unsigned int cmd,
+                          unsigned long arg)
+{
+       void __user *argp = (void __user *)arg;
+       int ret;
+
+       if (cmh_mgmt_needs_temp_flush(cmd) &&
+           atomic_xchg(&mgmt_temp_dirty, 0)) {
+               ret = cmh_tm_flush_mbx(MGMT_MBX);
+               if (ret)
+                       return ret;
+       }
+
+       switch (cmd) {
+       case CMH_IOCTL_KEY_NEW:
+               return cmh_mgmt_key_new(argp);
+       case CMH_IOCTL_KEY_WRITE:
+               return cmh_mgmt_key_write(argp);
+       case CMH_IOCTL_KEY_READ:
+               return cmh_mgmt_key_read(argp);
+       case CMH_IOCTL_KEY_FIND:
+               return cmh_mgmt_key_find(argp);
+       case CMH_IOCTL_KEY_GRANT:
+               return cmh_mgmt_key_grant(argp, false);
+       case CMH_IOCTL_KEY_DELETE:
+               return cmh_mgmt_key_grant(argp, true);
+       case CMH_IOCTL_DS_EXPORT:
+               return cmh_mgmt_ds_export(argp);
+       case CMH_IOCTL_DS_IMPORT:
+               return cmh_mgmt_ds_import(argp);
+       case CMH_IOCTL_KIC_HKDF1:
+               return cmh_mgmt_kic_hkdf1(argp);
+       case CMH_IOCTL_KIC_HKDF2:
+               return cmh_mgmt_kic_hkdf2(argp);
+       case CMH_IOCTL_KEY_NEW_RANDOM:
+               return cmh_mgmt_key_new_random(argp);
+       case CMH_IOCTL_KIC_AES_CMAC_KDF:
+               return cmh_mgmt_kic_aes_cmac_kdf(argp);
+       case CMH_IOCTL_KIC_DKEK_DERIVE:
+               return cmh_mgmt_kic_dkek_derive(argp);
+       case CMH_IOCTL_KEY_LIST:
+               return cmh_mgmt_key_list(argp);
+       case CMH_IOCTL_EAC_READ:
+               return cmh_mgmt_eac_read(argp);
+       /* PKE operations */
+       case CMH_IOCTL_PKE_RSA_ENC:
+               return cmh_mgmt_pke_rsa_enc(argp);
+       case CMH_IOCTL_PKE_RSA_DEC:
+               return cmh_mgmt_pke_rsa_dec(argp);
+       case CMH_IOCTL_PKE_RSA_CRT_DEC:
+               return cmh_mgmt_pke_rsa_crt_dec(argp);
+       case CMH_IOCTL_PKE_RSA_KEYGEN:
+               return cmh_mgmt_pke_rsa_keygen(argp);
+       case CMH_IOCTL_PKE_ECDSA_SIGN:
+               return cmh_mgmt_pke_ecdsa_sign(argp);
+       case CMH_IOCTL_PKE_ECDH:
+               return cmh_mgmt_pke_ecdh(argp);
+       case CMH_IOCTL_PKE_ECDH_KEYGEN:
+               return cmh_mgmt_pke_ecdh_keygen(argp);
+       case CMH_IOCTL_PKE_EDDSA_SIGN:
+               return cmh_mgmt_pke_eddsa_sign(argp);
+       case CMH_IOCTL_PKE_EDDSA_VERIFY:
+               return cmh_mgmt_pke_eddsa_verify(argp);
+       case CMH_IOCTL_PKE_EC_KEYGEN:
+               return cmh_mgmt_pke_ec_keygen(argp);
+       case CMH_IOCTL_PKE_EC_PUBGEN:
+               return cmh_mgmt_pke_ec_pubgen(argp);
+       case CMH_IOCTL_PKE_EDDSA_KEYGEN_SCA:
+               return cmh_mgmt_pke_eddsa_keygen_sca(argp);
+       /* SM2 operations */
+       case CMH_IOCTL_SM2_ECDH_KEYGEN:
+               return cmh_mgmt_sm2_ecdh_keygen(argp);
+       case CMH_IOCTL_SM2_ECDH:
+               return cmh_mgmt_sm2_ecdh(argp);
+       case CMH_IOCTL_SM2_DEC_POINT:
+               return cmh_mgmt_sm2_dec_point(argp);
+       case CMH_IOCTL_SM2_ENC_POINT:
+               return cmh_mgmt_sm2_enc_point(argp);
+       case CMH_IOCTL_SM2_ID_DIGEST:
+               return cmh_mgmt_sm2_id_digest(argp);
+       case CMH_IOCTL_SM2_ECDH_HASH:
+               return cmh_mgmt_sm2_ecdh_hash(argp);
+       case CMH_IOCTL_SM2_DEC_HASH:
+               return cmh_mgmt_sm2_dec_hash(argp);
+       case CMH_IOCTL_SM2_ENC_HASH:
+               return cmh_mgmt_sm2_enc_hash(argp);
+       /* PQC operations */
+       case CMH_IOCTL_ML_KEM_KEYGEN:
+               return cmh_mgmt_ml_kem_keygen(argp);
+       case CMH_IOCTL_ML_KEM_ENC:
+               return cmh_mgmt_ml_kem_enc(argp);
+       case CMH_IOCTL_ML_KEM_DEC:
+               return cmh_mgmt_ml_kem_dec(argp);
+       case CMH_IOCTL_ML_DSA_KEYGEN:
+               return cmh_mgmt_ml_dsa_keygen(argp);
+       case CMH_IOCTL_ML_DSA_SIGN:
+               return cmh_mgmt_ml_dsa_sign(argp);
+       case CMH_IOCTL_SLHDSA_KEYGEN:
+               return cmh_mgmt_slhdsa_keygen(argp);
+       case CMH_IOCTL_SLHDSA_SIGN:
+               return cmh_mgmt_slhdsa_sign(argp);
+       case CMH_IOCTL_SLHDSA_SIGN_PREHASH:
+               return cmh_mgmt_slhdsa_sign_prehash(argp);
+       /* DRBG management */
+       case CMH_IOCTL_DRBG_CONFIG:
+               return cmh_mgmt_drbg_config(argp);
+       default:
+               return -ENOTTY;
+       }
+}
+
+/* -- File operations ----------------------- */
+
+/*
+ * Capability is checked once at open time.  A privileged process may
+ * pass the resulting fd to an unprivileged helper -- this delegation
+ * model is intentional and mirrors /dev/kvm, /dev/loop-control, etc.
+ */
+static int cmh_mgmt_open(struct inode *inode, struct file *file)
+{
+       if (!capable(CAP_SYS_ADMIN))
+               return -EPERM;
+
+       return 0;
+}
+
+static const struct file_operations cmh_mgmt_fops = {
+       .owner          = THIS_MODULE,
+       .open           = cmh_mgmt_open,
+       .unlocked_ioctl = cmh_mgmt_ioctl,
+       .compat_ioctl   = compat_ptr_ioctl,
+};
+
+static struct miscdevice cmh_mgmt_dev = {
+       .minor = MISC_DYNAMIC_MINOR,
+       .name  = "cmh_mgmt",
+       .fops  = &cmh_mgmt_fops,
+       .mode  = 0660,
+};
+
+static bool cmh_mgmt_registered;
+
+/**
+ * cmh_mgmt_register() - Register the /dev/cmh_mgmt misc device
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_register(void)
+{
+       int ret;
+
+       /*
+        * ABI size guards -- catch silent layout changes at compile time.
+        * All ioctl structs use only __u32 and __u64 with explicit padding,
+        * guaranteeing identical layout on 32-bit and 64-bit (compat_ptr_ioctl).
+        */
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_key_new) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_key_write) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_key_read) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_key_find) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_key_list) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_key_grant) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_ds_export) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_ds_import) != 24);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_kic_hkdf1) != 48);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_kic_hkdf2) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_kic_aes_cmac_kdf) != 48);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_kic_dkek_derive) != 48);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_rsa_enc) != 48);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_rsa_dec) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_rsa_crt_dec) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_rsa_keygen) != 64);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_ecdsa_sign) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_ecdh) != 48);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_ecdh_keygen) != 24);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_eddsa_sign) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_eddsa_verify) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_ec_keygen) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_ec_pubgen) != 24);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_pke_eddsa_keygen_sca) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_ml_kem_keygen) != 64);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_ml_kem_enc) != 64);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_ml_kem_dec) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_ml_dsa_keygen) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_ml_dsa_sign) != 48);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_slhdsa_keygen) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_slhdsa_sign) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_slhdsa_sign_prehash) != 64);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_ecdh_keygen) != 24);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_ecdh) != 56);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_dec_point) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_enc_point) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_id_digest) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_ecdh_hash) != 40);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_dec_hash) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_sm2_enc_hash) != 32);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_eac_read) != 64);
+       BUILD_BUG_ON(sizeof(struct cmh_ioctl_drbg_config) != 16);
+
+       ret = misc_register(&cmh_mgmt_dev);
+       if (ret) {
+               dev_err(cmh_dev(), "mgmt: misc_register failed (rc=%d)\n", ret);
+               return ret;
+       }
+
+       cmh_mgmt_registered = true;
+       dev_info(cmh_dev(), "mgmt: registered /dev/cmh_mgmt\n");
+       return 0;
+}
+
+/**
+ * cmh_mgmt_unregister() - Unregister the /dev/cmh_mgmt misc device
+ */
+void cmh_mgmt_unregister(void)
+{
+       if (!cmh_mgmt_registered)
+               return;
+
+       misc_deregister(&cmh_mgmt_dev);
+       cmh_mgmt_registered = false;
+       dev_info(cmh_dev(), "mgmt: unregistered /dev/cmh_mgmt\n");
+}
diff --git a/drivers/crypto/cmh/cmh_mgmt_pke.c b/drivers/crypto/cmh/cmh_mgmt_pke.c
new file mode 100644
index 000000000000..6954832fa8ac
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_mgmt_pke.c
@@ -0,0 +1,1100 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH -- PKE ioctl handlers for /dev/cmh_mgmt
+ *
+ * RSA encrypt/decrypt/CRT/keygen, ECDSA sign, ECDH/keygen,
+ * EdDSA sign/verify, EC keygen/pubgen.
+ *
+ * Split from cmh_mgmt.c for maintainability.
+ */
+
+#include <linux/kernel.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/overflow.h>
+
+#include "cmh_mgmt.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_key.h"
+#include "cmh_dma.h"
+#include "cmh_config.h"
+#include "cmh_pke.h"
+#include "cmh_pke_abi.h"
+#include "cmh_sys_abi.h"
+#include <uapi/linux/cmh_mgmt_ioctl.h>
+
+#include <crypto/utils.h>
+
+/* -- PKE ioctl helpers ------------------- */
+
+/*
+ * Maximum PKE operand size: 512 bytes (RSA 4096-bit),
+ * or 2 * 68 = 136 bytes (P-521 coordinate pair).
+ */
+#define PKE_MAX_OPERAND        512
+
+/* Validate curve ID and return coordinate length; 0 = invalid */
+static u32 cmh_pke_validate_curve(u32 curve)
+{
+       return pke_curve_clen(curve);
+}
+
+/**
+ * cmh_mgmt_pke_rsa_enc() - Handle CMH_MGMT_IOC_PKE_RSA_ENC ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_rsa_enc(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_rsa_enc req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 n_len, e_padded;
+       u8 *e_buf, *n_buf, *m_buf, *c_buf;
+       dma_addr_t e_dma, n_dma, m_dma, c_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (req.bits < PKE_RSA_MIN_BITS || req.bits > PKE_RSA_MAX_BITS)
+               return -EINVAL;
+       if (!req.e_len || req.e_len > PKE_MAX_OPERAND)
+               return -EINVAL;
+
+       n_len = req.bits / 8;
+       e_padded = ALIGN(req.e_len, 4);
+
+       e_buf = kzalloc(e_padded, GFP_KERNEL);
+       n_buf = kmalloc(n_len, GFP_KERNEL);
+       m_buf = kmalloc(n_len, GFP_KERNEL);
+       c_buf = kzalloc(n_len, GFP_KERNEL);
+       if (!e_buf || !n_buf || !m_buf || !c_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       /* Right-align exponent in zero-padded buffer for DMA alignment */
+       if (copy_from_user(e_buf + e_padded - req.e_len,
+                          u64_to_user_ptr(req.e), req.e_len) ||
+           copy_from_user(n_buf, u64_to_user_ptr(req.n), n_len) ||
+           copy_from_user(m_buf, u64_to_user_ptr(req.input), n_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       e_dma = cmh_dma_map_single(e_buf, e_padded, DMA_TO_DEVICE);
+       n_dma = cmh_dma_map_single(n_buf, n_len, DMA_TO_DEVICE);
+       m_dma = cmh_dma_map_single(m_buf, n_len, DMA_TO_DEVICE);
+       c_dma = cmh_dma_map_single(c_buf, n_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(e_dma) || cmh_dma_map_error(n_dma) ||
+           cmh_dma_map_error(m_dma) || cmh_dma_map_error(c_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_rsa_enc(&vcq[1], pke_cid, req.bits, e_padded,
+                           e_dma, n_dma, m_dma, c_dma, PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(c_dma))
+               cmh_dma_unmap_single(c_dma, n_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, n_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(n_dma))
+               cmh_dma_unmap_single(n_dma, n_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(e_dma))
+               cmh_dma_unmap_single(e_dma, e_padded, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.output), c_buf, n_len))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(c_buf);
+       kfree_sensitive(m_buf);
+       kfree(n_buf);
+       kfree(e_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_rsa_dec() - Handle CMH_MGMT_IOC_PKE_RSA_DEC ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_rsa_dec(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_rsa_dec req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 n_len, e_padded;
+       u8 *e_buf, *n_buf, *c_buf, *m_buf;
+       dma_addr_t e_dma, n_dma, c_dma, m_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (req.bits < PKE_RSA_MIN_BITS || req.bits > PKE_RSA_MAX_BITS)
+               return -EINVAL;
+       if (!req.e_len || req.e_len > PKE_MAX_OPERAND)
+               return -EINVAL;
+
+       n_len = req.bits / 8;
+       e_padded = ALIGN(req.e_len, 4);
+
+       e_buf = kzalloc(e_padded, GFP_KERNEL);
+       n_buf = kmalloc(n_len, GFP_KERNEL);
+       c_buf = kmalloc(n_len, GFP_KERNEL);
+       m_buf = kzalloc(n_len, GFP_KERNEL);
+       if (!e_buf || !n_buf || !c_buf || !m_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       /* Right-align exponent in zero-padded buffer for DMA alignment */
+       if (copy_from_user(e_buf + e_padded - req.e_len,
+                          u64_to_user_ptr(req.e), req.e_len) ||
+           copy_from_user(n_buf, u64_to_user_ptr(req.n), n_len) ||
+           copy_from_user(c_buf, u64_to_user_ptr(req.input), n_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       e_dma = cmh_dma_map_single(e_buf, e_padded, DMA_TO_DEVICE);
+       n_dma = cmh_dma_map_single(n_buf, n_len, DMA_TO_DEVICE);
+       c_dma = cmh_dma_map_single(c_buf, n_len, DMA_TO_DEVICE);
+       m_dma = cmh_dma_map_single(m_buf, n_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(e_dma) || cmh_dma_map_error(n_dma) ||
+           cmh_dma_map_error(c_dma) || cmh_dma_map_error(m_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_rsa_dec(&vcq[1], pke_cid, req.bits, e_padded,
+                           e_dma, n_dma, c_dma, m_dma, req.key_ref,
+                           PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, n_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(c_dma))
+               cmh_dma_unmap_single(c_dma, n_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(n_dma))
+               cmh_dma_unmap_single(n_dma, n_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(e_dma))
+               cmh_dma_unmap_single(e_dma, e_padded, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.output), m_buf, n_len))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(m_buf);
+       kfree(c_buf);
+       kfree(n_buf);
+       kfree(e_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_rsa_crt_dec() - Handle CMH_MGMT_IOC_PKE_RSA_CRT_DEC ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_rsa_crt_dec(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_rsa_crt_dec req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 n_len, e_padded;
+       u8 *e_buf, *n_buf, *c_buf, *m_buf;
+       dma_addr_t e_dma, n_dma, c_dma, m_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (req.bits < PKE_RSA_MIN_BITS || req.bits > PKE_RSA_MAX_BITS)
+               return -EINVAL;
+       if (!req.e_len || req.e_len > PKE_MAX_OPERAND)
+               return -EINVAL;
+
+       n_len = req.bits / 8;
+       e_padded = ALIGN(req.e_len, 4);
+
+       e_buf = kzalloc(e_padded, GFP_KERNEL);
+       n_buf = kmalloc(n_len, GFP_KERNEL);
+       c_buf = kmalloc(n_len, GFP_KERNEL);
+       m_buf = kzalloc(n_len, GFP_KERNEL);
+       if (!e_buf || !n_buf || !c_buf || !m_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       /* Right-align exponent in zero-padded buffer for DMA alignment */
+       if (copy_from_user(e_buf + e_padded - req.e_len,
+                          u64_to_user_ptr(req.e), req.e_len) ||
+           copy_from_user(n_buf, u64_to_user_ptr(req.n), n_len) ||
+           copy_from_user(c_buf, u64_to_user_ptr(req.input), n_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       e_dma = cmh_dma_map_single(e_buf, e_padded, DMA_TO_DEVICE);
+       n_dma = cmh_dma_map_single(n_buf, n_len, DMA_TO_DEVICE);
+       c_dma = cmh_dma_map_single(c_buf, n_len, DMA_TO_DEVICE);
+       m_dma = cmh_dma_map_single(m_buf, n_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(e_dma) || cmh_dma_map_error(n_dma) ||
+           cmh_dma_map_error(c_dma) || cmh_dma_map_error(m_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_rsa_crt_dec(&vcq[1], pke_cid, req.bits, e_padded,
+                               e_dma, n_dma, c_dma, m_dma, req.crt_ref,
+                               PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, n_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(c_dma))
+               cmh_dma_unmap_single(c_dma, n_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(n_dma))
+               cmh_dma_unmap_single(n_dma, n_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(e_dma))
+               cmh_dma_unmap_single(e_dma, e_padded, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.output), m_buf, n_len))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(m_buf);
+       kfree(c_buf);
+       kfree(n_buf);
+       kfree(e_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_rsa_keygen() - Handle CMH_MGMT_IOC_PKE_RSA_KEYGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_rsa_keygen(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_rsa_keygen req;
+       /*
+        * When has_crt, we use a two-VCQ approach (CRI pattern):
+        *   VCQ #1: header + SYS_NEW(d) + SYS_NEW(crt) + SYS_FLUSH  (4 slots)
+        *   VCQ #2: header + RSA_KEYGEN + PKE_FLUSH + SYS_FLUSH       (4 slots)
+        * Without CRT, single VCQ:
+        *   header + SYS_NEW(d) + RSA_KEYGEN + PKE_FLUSH + SYS_FLUSH  (5 slots)
+        */
+       struct vcq_cmd vcq[5];
+       u32 n_len, e_padded, key_flags, d_ds_len, crt_ds_len;
+       u8 *e_buf, *n_buf;
+       u64 *d_ref_buf, *crt_ref_buf;
+       dma_addr_t e_dma, n_dma, d_ref_dma, crt_ref_dma;
+       int idx, ret;
+       bool has_crt, is_sca;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.bits < PKE_RSA_MIN_BITS || req.bits > PKE_RSA_MAX_BITS)
+               return -EINVAL;
+       if (!req.e_len || req.e_len > PKE_MAX_OPERAND)
+               return -EINVAL;
+       if (req.flags & ~CMH_FLAG_MASK)
+               return -EINVAL;
+
+       n_len = req.bits / 8;
+       has_crt = (req.crt_cid != 0);
+       e_padded = ALIGN(req.e_len, 4);
+       key_flags = req.flags & CMH_FLAG_MASK;
+       is_sca = !!(req.flags & CMH_FLAG_SCA);
+
+       /*
+        * SCA keys are stored in 2 shares -- DS allocation must be enlarged.
+        * CRI reference formulas: cmh_pke_rsa_private_key_size().
+        */
+       if (is_sca) {
+               d_ds_len = n_len * 2;
+               crt_ds_len = (7 + n_len / 2) * 4;
+       } else {
+               d_ds_len = n_len;
+               crt_ds_len = 5 * (n_len / 2);
+       }
+
+       e_buf = kzalloc(e_padded, GFP_KERNEL);
+       n_buf = kzalloc(n_len, GFP_KERNEL);
+       d_ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+       crt_ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+       if (!e_buf || !n_buf || !d_ref_buf || !crt_ref_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(e_buf + e_padded - req.e_len,
+                          u64_to_user_ptr(req.e), req.e_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       e_dma = cmh_dma_map_single(e_buf, e_padded, DMA_TO_DEVICE);
+       n_dma = cmh_dma_map_single(n_buf, n_len, DMA_FROM_DEVICE);
+       d_ref_dma = cmh_dma_map_single(d_ref_buf, sizeof(u64), DMA_FROM_DEVICE);
+       crt_ref_dma = cmh_dma_map_single(crt_ref_buf, sizeof(u64),
+                                        DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(e_dma) || cmh_dma_map_error(n_dma) ||
+           cmh_dma_map_error(d_ref_dma) || cmh_dma_map_error(crt_ref_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       if (has_crt) {
+               /*
+                * Two-VCQ approach (CRI pattern): SYS_REF_LAST can only
+                * refer to the most recently created DS object.  When we
+                * need both d and crt refs, we must first allocate DS
+                * objects, read back the opaque refs, then pass them by
+                * value in the keygen VCQ.
+                *
+                * VCQ #1: allocate both DS objects.
+                */
+               idx = 0;
+               vcq_set_header(&vcq[idx++], 4);
+               vcq_add_sys_new(&vcq[idx++], req.d_cid, d_ref_dma, d_ds_len);
+               vcq_add_sys_new(&vcq[idx++], req.crt_cid, crt_ref_dma,
+                               crt_ds_len);
+               vcq_add_sys_flush(&vcq[idx++]);
+
+               ret = cmh_tm_submit_sync_mbx(vcq, 4, 1, MGMT_MBX);
+               if (ret)
+                       goto out_unmap;
+
+               /* Sync DMA so we can read back the opaque refs */
+               cmh_dma_unmap_single(d_ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+               cmh_dma_unmap_single(crt_ref_dma, sizeof(u64),
+                                    DMA_FROM_DEVICE);
+               d_ref_dma = 0;
+               crt_ref_dma = 0;
+
+               /*
+                * VCQ #2: keygen with resolved refs.
+                */
+               idx = 0;
+               memset(vcq, 0, sizeof(vcq));
+               vcq_set_header(&vcq[idx++], 4);
+
+               vcq[idx].magic = VCQ_CMD_MAGIC;
+               vcq[idx].id = VCQ_CMD_ID(pke_cid, PKE_SWAP_FLAGS, 1,
+                                        PKE_CMD_RSA_KEYGEN);
+               vcq[idx].hwc.pke.cmd_rsa_keygen.bits = req.bits;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.e = e_dma;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.n = n_dma;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.d = *d_ref_buf;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.d_type =
+                       SYS_TYPE_SET(key_flags, CORE_ID_PKE);
+               vcq[idx].hwc.pke.cmd_rsa_keygen.crt = *crt_ref_buf;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.crt_type =
+                       SYS_TYPE_SET(key_flags, CORE_ID_PKE);
+               idx++;
+
+               vcq_add_pke_flush(&vcq[idx++], pke_cid);
+               vcq_add_sys_flush(&vcq[idx++]);
+
+               ret = cmh_tm_submit_sync_tmo(vcq, 4, 1, MGMT_MBX,
+                                            cmh_tm_slow_op_timeout_jiffies());
+       } else {
+               /*
+                * Single-VCQ: only d, so SYS_REF_LAST is unambiguous.
+                */
+               idx = 0;
+               vcq_set_header(&vcq[idx++], 5);
+               vcq_add_sys_new(&vcq[idx++], req.d_cid, d_ref_dma, d_ds_len);
+
+               vcq[idx].magic = VCQ_CMD_MAGIC;
+               vcq[idx].id = VCQ_CMD_ID(pke_cid, PKE_SWAP_FLAGS, 1,
+                                        PKE_CMD_RSA_KEYGEN);
+               vcq[idx].hwc.pke.cmd_rsa_keygen.bits = req.bits;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.e = e_dma;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.n = n_dma;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.d = SYS_REF_LAST;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.d_type =
+                       SYS_TYPE_SET(key_flags, CORE_ID_PKE);
+               vcq[idx].hwc.pke.cmd_rsa_keygen.crt = SYS_REF_NONE;
+               vcq[idx].hwc.pke.cmd_rsa_keygen.crt_type = 0;
+               idx++;
+
+               vcq_add_pke_flush(&vcq[idx++], pke_cid);
+               vcq_add_sys_flush(&vcq[idx++]);
+
+               ret = cmh_tm_submit_sync_tmo(vcq, 5, 1, MGMT_MBX,
+                                            cmh_tm_slow_op_timeout_jiffies());
+       }
+
+out_unmap:
+       if (crt_ref_dma && !cmh_dma_map_error(crt_ref_dma))
+               cmh_dma_unmap_single(crt_ref_dma, sizeof(u64),
+                                    DMA_FROM_DEVICE);
+       if (d_ref_dma && !cmh_dma_map_error(d_ref_dma))
+               cmh_dma_unmap_single(d_ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(n_dma))
+               cmh_dma_unmap_single(n_dma, n_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(e_dma))
+               cmh_dma_unmap_single(e_dma, e_padded, DMA_TO_DEVICE);
+
+       if (!ret) {
+               /* Copy generated modulus and refs back */
+               if (copy_to_user(u64_to_user_ptr(req.n), n_buf, n_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               req.d_ref = *d_ref_buf;
+               req.crt_ref = has_crt ? *crt_ref_buf : 0;
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(crt_ref_buf);
+       kfree(d_ref_buf);
+       kfree(n_buf);
+       kfree(e_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_ecdsa_sign() - Handle CMH_MGMT_IOC_PKE_ECDSA_SIGN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_ecdsa_sign(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_ecdsa_sign req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 clen, sig_len, dig_map_len;
+       u8 *dig_buf, *sig_buf;
+       dma_addr_t dig_dma, sig_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       clen = cmh_pke_validate_curve(req.curve);
+       if (!clen || !req.digest_len ||
+           req.digest_len > CMH_MGMT_MAX_DATA_LEN)
+               return -EINVAL;
+
+       sig_len = 2 * clen;
+
+       /*
+        * eSW requires digest_len >= clen.  Zero-pad shorter hashes.
+        */
+       dig_map_len = max_t(u32, req.digest_len, clen);
+
+       dig_buf = kzalloc(dig_map_len, GFP_KERNEL);
+       sig_buf = kzalloc(sig_len, GFP_KERNEL);
+       if (!dig_buf || !sig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(dig_buf, u64_to_user_ptr(req.digest),
+                          req.digest_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       dig_dma = cmh_dma_map_single(dig_buf, dig_map_len, DMA_TO_DEVICE);
+       sig_dma = cmh_dma_map_single(sig_buf, sig_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(dig_dma) || cmh_dma_map_error(sig_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_ecdsa_sign(&vcq[1], pke_cid, req.curve, clen,
+                              dig_dma, sig_dma, req.key_ref,
+                              dig_map_len, pke_swap_flags(req.curve));
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(dig_dma))
+               cmh_dma_unmap_single(dig_dma, dig_map_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.signature),
+                                sig_buf, sig_len))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(sig_buf);
+       kfree(dig_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_ecdh() - Handle CMH_MGMT_IOC_PKE_ECDH ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_ecdh(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_ecdh req;
+       /* Phase 1: hdr + sys_new + pke_ecdh + pke_flush; reused for Phase 2 */
+       struct vcq_cmd vcq[4];
+       u32 clen, swap, ss_type;
+       u8 *peer_buf, *ss_buf;
+       u64 *ref_buf;
+       dma_addr_t peer_dma, ss_dma, ref_dma;
+       int ret, idx;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       clen = cmh_pke_validate_curve(req.curve);
+       if (!clen)
+               return -EINVAL;
+
+       swap = PKE_SWAP_FLAGS;
+       ss_type = SYS_TYPE_SET(SYS_TYPE_FLAG_PT, CORE_ID_PKE);
+
+       peer_buf = kmalloc(clen, GFP_KERNEL);
+       ss_buf = kzalloc(clen, GFP_KERNEL);
+       ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+       if (!peer_buf || !ss_buf || !ref_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(peer_buf, u64_to_user_ptr(req.peer_key_x), clen)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       peer_dma = cmh_dma_map_single(peer_buf, clen, DMA_TO_DEVICE);
+       ss_dma = cmh_dma_map_single(ss_buf, clen, DMA_FROM_DEVICE);
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(peer_dma) || cmh_dma_map_error(ss_dma) ||
+           cmh_dma_map_error(ref_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       idx = 0;
+       vcq_set_header(&vcq[idx++], 4);
+       vcq_add_sys_new(&vcq[idx++], 0, ref_dma, clen);
+       vcq_add_pke_ecdh(&vcq[idx++], pke_cid, req.curve, clen, clen,
+                        ss_type, peer_dma, req.key_ref,
+                        SYS_REF_LAST, swap);
+       vcq_add_pke_flush(&vcq[idx++], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, 4, 1, MGMT_MBX);
+       if (ret)
+               goto out_unmap;
+
+       /* Sync bounce buffer so CPU sees the DMA-written ref */
+       cmh_dma_sync_for_cpu(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+
+       /* Phase 2: extract shared secret from DS via actual ref */
+       vcq_set_header(&vcq[0], 3);
+       vcq_add_sys_data(&vcq[1], *ref_buf, ss_dma, clen);
+       vcq[1].id |= pke_swap_flags(req.curve);
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, 3, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(ref_dma))
+               cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ss_dma))
+               cmh_dma_unmap_single(ss_dma, clen, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(peer_dma))
+               cmh_dma_unmap_single(peer_dma, clen, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.output), ss_buf, clen))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(ref_buf);
+       kfree_sensitive(ss_buf);
+       kfree(peer_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_ecdh_keygen() - Handle CMH_MGMT_IOC_PKE_ECDH_KEYGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_ecdh_keygen(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_ecdh_keygen req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 clen, out_len;
+       u8 *pkx_buf;
+       dma_addr_t pkx_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       clen = cmh_pke_validate_curve(req.curve);
+       if (!clen)
+               return -EINVAL;
+
+       /*
+        * ECDH_KEYGEN always outputs both X and Y coordinates
+        * (2 * clen bytes total) even though only X is useful for
+        * the ECDH exchange.  Allocate the full output size to avoid
+        * a DMA buffer overflow, but copy only X back to userspace.
+        */
+       out_len = 2 * clen;
+
+       pkx_buf = kzalloc(out_len, GFP_KERNEL);
+       if (!pkx_buf)
+               return -ENOMEM;
+
+       pkx_dma = cmh_dma_map_single(pkx_buf, out_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(pkx_dma)) {
+               kfree(pkx_buf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_ecdh_keygen(&vcq[1], pke_cid, req.curve, clen,
+                               pkx_dma, req.key_ref,
+                               PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(pkx_dma, out_len, DMA_FROM_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.public_key_x),
+                                pkx_buf, clen))
+                       ret = -EFAULT;
+       }
+
+       kfree(pkx_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_eddsa_sign() - Handle CMH_MGMT_IOC_PKE_EDDSA_SIGN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_eddsa_sign(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_eddsa_sign req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 klen, sig_len;
+       u8 *msg_buf, *sig_buf;
+       dma_addr_t msg_dma, sig_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (!cmh_pke_validate_curve(req.curve) || !req.digest_len ||
+           req.digest_len > CMH_MGMT_MAX_DATA_LEN)
+               return -EINVAL;
+       if (!pke_curve_is_edwards(req.curve))
+               return -EINVAL;
+
+       klen = pke_eddsa_key_len(req.curve);
+       sig_len = 2 * klen;
+
+       msg_buf = kmalloc(req.digest_len, GFP_KERNEL);
+       sig_buf = kzalloc(sig_len, GFP_KERNEL);
+       if (!msg_buf || !sig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(msg_buf, u64_to_user_ptr(req.digest),
+                          req.digest_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       msg_dma = cmh_dma_map_single(msg_buf, req.digest_len, DMA_TO_DEVICE);
+       sig_dma = cmh_dma_map_single(sig_buf, sig_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(msg_dma) || cmh_dma_map_error(sig_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_eddsa_sign(&vcq[1], pke_cid, req.curve, klen,
+                              msg_dma, sig_dma, req.key_ref,
+                              req.digest_len, pke_swap_flags(req.curve));
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(msg_dma))
+               cmh_dma_unmap_single(msg_dma, req.digest_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.signature),
+                                sig_buf, sig_len))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(sig_buf);
+       kfree(msg_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_eddsa_verify() - Handle CMH_MGMT_IOC_PKE_EDDSA_VERIFY ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_eddsa_verify(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_eddsa_verify req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 clen, klen, sig_len;
+       u8 *msg_buf, *sig_buf, *pky_buf, *rp_buf;
+       dma_addr_t msg_dma, sig_dma, pky_dma, rp_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       clen = cmh_pke_validate_curve(req.curve);
+       if (!clen || !req.digest_len ||
+           req.digest_len > CMH_MGMT_MAX_DATA_LEN)
+               return -EINVAL;
+       if (!pke_curve_is_edwards(req.curve))
+               return -EINVAL;
+
+       klen = pke_eddsa_key_len(req.curve);
+       sig_len = 2 * klen;
+
+       msg_buf = kmalloc(req.digest_len, GFP_KERNEL);
+       sig_buf = kmalloc(sig_len, GFP_KERNEL);
+       pky_buf = kmalloc(klen, GFP_KERNEL);
+       rp_buf = kzalloc(clen, GFP_KERNEL);
+       if (!msg_buf || !sig_buf || !pky_buf || !rp_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(msg_buf, u64_to_user_ptr(req.digest),
+                          req.digest_len) ||
+           copy_from_user(sig_buf, u64_to_user_ptr(req.signature),
+                          sig_len) ||
+           copy_from_user(pky_buf, u64_to_user_ptr(req.public_key_y),
+                          klen)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       msg_dma = cmh_dma_map_single(msg_buf, req.digest_len, DMA_TO_DEVICE);
+       sig_dma = cmh_dma_map_single(sig_buf, sig_len, DMA_TO_DEVICE);
+       pky_dma = cmh_dma_map_single(pky_buf, klen, DMA_TO_DEVICE);
+       rp_dma = cmh_dma_map_single(rp_buf, clen, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(msg_dma) || cmh_dma_map_error(sig_dma) ||
+           cmh_dma_map_error(pky_dma) || cmh_dma_map_error(rp_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_eddsa_verify(&vcq[1], pke_cid, req.curve, req.digest_len,
+                                pky_dma, msg_dma, sig_dma, rp_dma,
+                                pke_swap_flags(req.curve));
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(rp_dma))
+               cmh_dma_unmap_single(rp_dma, clen, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(pky_dma))
+               cmh_dma_unmap_single(pky_dma, klen, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(msg_dma))
+               cmh_dma_unmap_single(msg_dma, req.digest_len, DMA_TO_DEVICE);
+
+out_free:
+       kfree(rp_buf);
+       kfree(pky_buf);
+       kfree(sig_buf);
+       kfree(msg_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_ec_keygen() - Handle CMH_MGMT_IOC_PKE_EC_KEYGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_ec_keygen(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_ec_keygen req;
+       /* header + SYS_NEW + ECDSA_KEYGEN + flush_pke + flush_sys */
+       struct vcq_cmd vcq[5];
+       u32 clen, key_flags, ds_len;
+       u64 *ref_buf;
+       dma_addr_t ref_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (req.flags & ~CMH_FLAG_MASK)
+               return -EINVAL;
+       clen = cmh_pke_validate_curve(req.curve);
+       if (!clen)
+               return -EINVAL;
+
+       key_flags = req.flags & CMH_FLAG_MASK;
+       /* SCA keys are stored in 2 shares -- allocate double the curve length */
+       ds_len = (req.flags & CMH_FLAG_SCA) ? clen * 2 : clen;
+
+       ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+       if (!ref_buf)
+               return -ENOMEM;
+
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ref_dma)) {
+               kfree(ref_buf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], 5);
+       vcq_add_sys_new(&vcq[1], req.cid, ref_dma, ds_len);
+       vcq_add_pke_ecdsa_keygen(&vcq[2], pke_cid, req.curve, clen,
+                                SYS_REF_LAST,
+                                SYS_TYPE_SET(key_flags, CORE_ID_PKE),
+                                pke_swap_flags(req.curve));
+       vcq_add_pke_flush(&vcq[3], pke_cid);
+       vcq_add_sys_flush(&vcq[4]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, 5, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+
+       if (!ret) {
+               req.ref = *ref_buf;
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+       kfree(ref_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_ec_pubgen() - Handle CMH_MGMT_IOC_PKE_EC_PUBGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_ec_pubgen(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_ec_pubgen req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 clen, pk_len;
+       u8 *pk_buf;
+       dma_addr_t pk_dma;
+       bool is_ed;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       clen = cmh_pke_validate_curve(req.curve);
+       if (!clen)
+               return -EINVAL;
+
+       is_ed = pke_curve_is_edwards(req.curve);
+       pk_len = is_ed ? pke_eddsa_key_len(req.curve) : 2 * clen;
+
+       pk_buf = kzalloc(pk_len, GFP_KERNEL);
+       if (!pk_buf)
+               return -ENOMEM;
+
+       pk_dma = cmh_dma_map_single(pk_buf, pk_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(pk_dma)) {
+               kfree(pk_buf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       if (is_ed)
+               vcq_add_pke_eddsa_pubgen(&vcq[1], pke_cid, req.curve,
+                                        pke_eddsa_key_len(req.curve),
+                                        pk_dma, req.key_ref,
+                                        pke_swap_flags(req.curve));
+       else
+               vcq_add_pke_ecdsa_pubgen(&vcq[1], pke_cid, req.curve, clen,
+                                        pk_dma, req.key_ref,
+                                        pke_swap_flags(req.curve));
+       vcq_add_pke_flush(&vcq[2], pke_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(pk_dma, pk_len, DMA_FROM_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.public_key),
+                                pk_buf, pk_len))
+                       ret = -EFAULT;
+       }
+
+       kfree(pk_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_pke_eddsa_keygen_sca() - Handle CMH_MGMT_IOC_PKE_EDDSA_KEYGEN_SCA ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_pke_eddsa_keygen_sca(void __user *argp)
+{
+       u32 pke_cid = cmh_core_default_id(CMH_CORE_PKE);
+
+       struct cmh_ioctl_pke_eddsa_keygen_sca req;
+       /* header + SYS_NEW + EDDSA_KEYGEN_SCA + flush_pke + flush_sys */
+       struct vcq_cmd vcq[5];
+       u64 *ref_buf;
+       dma_addr_t ref_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       /* EdDSA SCA keygen is only supported for Ed448 */
+       if (req.curve != PKE_CURVE_448)
+               return -EINVAL;
+
+       ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+       if (!ref_buf)
+               return -ENOMEM;
+
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ref_dma)) {
+               kfree(ref_buf);
+               return -ENOMEM;
+       }
+
+       vcq_set_header(&vcq[0], 5);
+       vcq_add_sys_new(&vcq[1], req.cid, ref_dma, PKE_ED448_SK_SCA_LEN);
+       vcq_add_pke_eddsa_keygen_sca(&vcq[2], pke_cid, req.curve, req.key_ref,
+                                    SYS_REF_LAST);
+       vcq_add_pke_flush(&vcq[3], pke_cid);
+       vcq_add_sys_flush(&vcq[4]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, 5, 1, MGMT_MBX);
+
+       cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+
+       if (!ret) {
+               req.sca_ref = *ref_buf;
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+       kfree(ref_buf);
+       return ret;
+}
diff --git a/drivers/crypto/cmh/cmh_mgmt_pqc.c b/drivers/crypto/cmh/cmh_mgmt_pqc.c
new file mode 100644
index 000000000000..db479e80326b
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_mgmt_pqc.c
@@ -0,0 +1,1279 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH -- PQC ioctl handlers for /dev/cmh_mgmt
+ *
+ * ML-KEM keygen/encapsulate/decapsulate, ML-DSA keygen/sign,
+ * SLH-DSA keygen/sign (pure + prehash).
+ *
+ * Split from cmh_mgmt.c for maintainability.
+ */
+
+#include <linux/kernel.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/overflow.h>
+
+#include "cmh_mgmt.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_key.h"
+#include "cmh_dma.h"
+#include "cmh_config.h"
+#include "cmh_pqc.h"
+#include "cmh_qse_abi.h"
+#include "cmh_sys_abi.h"
+#include <uapi/linux/cmh_mgmt_ioctl.h>
+
+#include <crypto/utils.h>
+
+/* -- PQC -- ML-KEM -- */
+
+/**
+ * cmh_mgmt_ml_kem_keygen() - Handle CMH_MGMT_IOC_ML_KEM_KEYGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_ml_kem_keygen(void __user *argp)
+{
+       u32 qse_cid = cmh_core_default_id(CMH_CORE_QSE);
+
+       struct cmh_ioctl_ml_kem_keygen req;
+       struct vcq_cmd vcq[QSE_VCQ_CMDS_MAX];
+       u32 ek_len, dk_len, seed_len, key_flags;
+       u32 qse_flags = 0;
+       bool masked, ds_ref, hw_rng;
+       u8 *seed_buf = NULL, *z_buf = NULL, *ek_buf, *dk_buf = NULL;
+       u64 *ref_buf = NULL;
+       dma_addr_t seed_dma = DMA_MAPPING_ERROR, z_dma = DMA_MAPPING_ERROR;
+       dma_addr_t ek_dma, dk_dma = DMA_MAPPING_ERROR, ref_dma = DMA_MAPPING_ERROR;
+       int ret, idx;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (ml_kem_k_idx(req.k) < 0)
+               return -EINVAL;
+       if (req.flags & ~(CMH_QSE_FLAG_MASK | CMH_FLAG_MASK))
+               return -EINVAL;
+
+       masked = !!(req.flags & CMH_QSE_FLAG_MASKED);
+       ds_ref = !!(req.flags & CMH_QSE_FLAG_DS_REF);
+       hw_rng = !!(req.flags & CMH_QSE_FLAG_HW_RNG);
+
+       /*
+        * QSE keys only support PT storage -- the eSW dec/sign paths
+        * hardcode SYS_TYPE_FLAG_PT when reading the key back.
+        * QSE SCA protection uses masking (CMH_QSE_FLAG_MASKED),
+        * not the 2-share mechanism (CMH_FLAG_SCA).
+        */
+       key_flags = req.flags & CMH_FLAG_MASK;
+       if (key_flags && key_flags != CMH_FLAG_PT)
+               return -EINVAL;
+       key_flags = CMH_FLAG_PT;
+
+       /* Masked keygen must store dk in DS -- polynomial unmasking not supported */
+       if (masked && !ds_ref)
+               return -EINVAL;
+
+       ek_len = ML_KEM_EK_SIZE(req.k);
+       dk_len = masked ? ML_KEM_DK_SIZE_MASKED(req.k)
+                       : ML_KEM_DK_SIZE(req.k);
+       seed_len = masked ? QSE_SEED_LEN_MASKED : QSE_SEED_LEN;
+
+       if (hw_rng)
+               qse_flags |= QSE_FLAG_USE_RNG;
+       if (ds_ref)
+               qse_flags |= QSE_FLAG_USE_REF;
+
+       ek_buf = kzalloc(ek_len, GFP_KERNEL);
+       if (!ek_buf)
+               return -ENOMEM;
+
+       if (!hw_rng && req.seed && req.z) {
+               seed_buf = kmalloc(seed_len, GFP_KERNEL);
+               z_buf = kmalloc(seed_len, GFP_KERNEL);
+               if (!seed_buf || !z_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(seed_buf, u64_to_user_ptr(req.seed),
+                                  seed_len) ||
+                   copy_from_user(z_buf, u64_to_user_ptr(req.z), seed_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       if (ds_ref) {
+               ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+               if (!ref_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       } else {
+               dk_buf = kzalloc(dk_len, GFP_KERNEL);
+               if (!dk_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       }
+
+       /* DMA map */
+       ek_dma = cmh_dma_map_single(ek_buf, ek_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ek_dma)) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (seed_buf) {
+               seed_dma = cmh_dma_map_single(seed_buf, seed_len,
+                                             DMA_TO_DEVICE);
+               z_dma = cmh_dma_map_single(z_buf, seed_len, DMA_TO_DEVICE);
+               if (cmh_dma_map_error(seed_dma) || cmh_dma_map_error(z_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       if (ds_ref) {
+               ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64),
+                                            DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(ref_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       } else {
+               dk_dma = cmh_dma_map_single(dk_buf, dk_len, DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(dk_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       idx = 0;
+       if (ds_ref) {
+               vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MAX);
+               idx++;
+               vcq_add_sys_new(&vcq[idx++], req.dk_cid, ref_dma, dk_len);
+               vcq_add_qse_ml_kem_keygen(&vcq[idx++], qse_cid, req.k, qse_flags,
+                                         seed_dma, z_dma,
+                                         ek_dma, SYS_REF_LAST,
+                                         SYS_TYPE_SET(key_flags,
+                                                      CORE_ID_QSE),
+                                         masked);
+               vcq_add_qse_flush(&vcq[idx++], qse_cid);
+               ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MAX,
+                                            1, MGMT_MBX);
+       } else {
+               vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+               idx++;
+               vcq_add_qse_ml_kem_keygen(&vcq[idx++], qse_cid, req.k, qse_flags,
+                                         seed_dma, z_dma,
+                                         ek_dma, dk_dma, 0, masked);
+               vcq_add_qse_flush(&vcq[idx++], qse_cid);
+               ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN,
+                                            1, MGMT_MBX);
+       }
+
+out_unmap:
+       if (ds_ref && !cmh_dma_map_error(ref_dma))
+               cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       if (!ds_ref && dk_buf && !cmh_dma_map_error(dk_dma))
+               cmh_dma_unmap_single(dk_dma, dk_len, DMA_FROM_DEVICE);
+       if (z_buf && !cmh_dma_map_error(z_dma))
+               cmh_dma_unmap_single(z_dma, seed_len, DMA_TO_DEVICE);
+       if (seed_buf && !cmh_dma_map_error(seed_dma))
+               cmh_dma_unmap_single(seed_dma, seed_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(ek_dma))
+               cmh_dma_unmap_single(ek_dma, ek_len, DMA_FROM_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.ek), ek_buf, ek_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               if (ds_ref) {
+                       req.dk_ref = *ref_buf;
+               } else {
+                       if (copy_to_user(u64_to_user_ptr(req.dk),
+                                        dk_buf, dk_len)) {
+                               ret = -EFAULT;
+                               goto out_free;
+                       }
+               }
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(dk_buf);
+       kfree(ref_buf);
+       kfree_sensitive(z_buf);
+       kfree_sensitive(seed_buf);
+       kfree(ek_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_ml_kem_enc() - Handle CMH_MGMT_IOC_ML_KEM_ENC ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_ml_kem_enc(void __user *argp)
+{
+       u32 qse_cid = cmh_core_default_id(CMH_CORE_QSE);
+
+       struct cmh_ioctl_ml_kem_enc req;
+       struct vcq_cmd vcq[QSE_VCQ_CMDS_MIN];
+       u32 ek_len, ct_len, ss_out_len;
+       u32 qse_flags = 0;
+       bool masked, hw_rng;
+       u8 *ek_buf, *coin_buf = NULL, *ct_buf, *ss_buf;
+       dma_addr_t ek_dma, coin_dma = DMA_MAPPING_ERROR, ct_dma, ss_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved || req.__reserved2[0] || req.__reserved2[1])
+               return -EINVAL;
+       if (ml_kem_k_idx(req.k) < 0)
+               return -EINVAL;
+
+       masked = !!(req.flags & CMH_QSE_FLAG_MASKED);
+       hw_rng = !!(req.flags & CMH_QSE_FLAG_HW_RNG);
+
+       ek_len = ML_KEM_EK_SIZE(req.k);
+       ct_len = ML_KEM_CT_SIZE(req.k);
+       ss_out_len = masked ? ML_KEM_SS_LEN_MASKED : ML_KEM_SS_LEN;
+
+       if (hw_rng)
+               qse_flags |= QSE_FLAG_USE_RNG;
+
+       ek_buf = kmalloc(ek_len, GFP_KERNEL);
+       ct_buf = kzalloc(ct_len, GFP_KERNEL);
+       ss_buf = kzalloc(ss_out_len, GFP_KERNEL);
+       if (!ek_buf || !ct_buf || !ss_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(ek_buf, u64_to_user_ptr(req.ek), ek_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       if (!hw_rng && req.coin) {
+               u32 coin_len = masked ? QSE_SEED_LEN_MASKED : QSE_SEED_LEN;
+
+               coin_buf = kmalloc(coin_len, GFP_KERNEL);
+               if (!coin_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(coin_buf, u64_to_user_ptr(req.coin),
+                                  coin_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               coin_dma = cmh_dma_map_single(coin_buf, coin_len,
+                                             DMA_TO_DEVICE);
+               if (cmh_dma_map_error(coin_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       }
+
+       ek_dma = cmh_dma_map_single(ek_buf, ek_len, DMA_TO_DEVICE);
+       ct_dma = cmh_dma_map_single(ct_buf, ct_len, DMA_FROM_DEVICE);
+       ss_dma = cmh_dma_map_single(ss_buf, ss_out_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ek_dma) || cmh_dma_map_error(ct_dma) ||
+           cmh_dma_map_error(ss_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+       vcq_add_qse_ml_kem_enc(&vcq[1], qse_cid, req.k, qse_flags,
+                              coin_dma, ek_dma, ct_dma, ss_dma, 0, masked);
+       vcq_add_qse_flush(&vcq[2], qse_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(ss_dma))
+               cmh_dma_unmap_single(ss_dma, ss_out_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ct_dma))
+               cmh_dma_unmap_single(ct_dma, ct_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ek_dma))
+               cmh_dma_unmap_single(ek_dma, ek_len, DMA_TO_DEVICE);
+       if (coin_buf && !cmh_dma_map_error(coin_dma))
+               cmh_dma_unmap_single(coin_dma,
+                                    masked ? QSE_SEED_LEN_MASKED
+                                           : QSE_SEED_LEN,
+                                    DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.ct), ct_buf, ct_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               /* Unmask ss if masked: ss = share0 ^ share1 */
+               if (masked) {
+                       crypto_xor(ss_buf, ss_buf + ML_KEM_SS_LEN,
+                                  ML_KEM_SS_LEN);
+               }
+               if (copy_to_user(u64_to_user_ptr(req.ss), ss_buf,
+                                ML_KEM_SS_LEN)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(ss_buf);
+       kfree(ct_buf);
+       kfree(coin_buf);
+       kfree(ek_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_ml_kem_dec() - Handle CMH_MGMT_IOC_ML_KEM_DEC ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_ml_kem_dec(void __user *argp)
+{
+       u32 qse_cid = cmh_core_default_id(CMH_CORE_QSE);
+
+       struct cmh_ioctl_ml_kem_dec req;
+       struct vcq_cmd vcq[QSE_VCQ_CMDS_MIN];
+       u32 ct_len, dk_len, ss_out_len;
+       u32 qse_flags = 0;
+       bool masked, ds_ref;
+       u8 *ct_buf, *dk_buf = NULL, *ss_buf;
+       dma_addr_t ct_dma, dk_dma = DMA_MAPPING_ERROR, ss_dma;
+       u64 dk_ref;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved || req.__reserved2[0] || req.__reserved2[1])
+               return -EINVAL;
+       if (ml_kem_k_idx(req.k) < 0)
+               return -EINVAL;
+
+       masked = !!(req.flags & CMH_QSE_FLAG_MASKED);
+       ds_ref = !!(req.flags & CMH_QSE_FLAG_DS_REF);
+
+       ct_len = ML_KEM_CT_SIZE(req.k);
+       dk_len = masked ? ML_KEM_DK_SIZE_MASKED(req.k)
+                       : ML_KEM_DK_SIZE(req.k);
+       ss_out_len = masked ? ML_KEM_SS_LEN_MASKED : ML_KEM_SS_LEN;
+
+       ct_buf = kmalloc(ct_len, GFP_KERNEL);
+       ss_buf = kzalloc(ss_out_len, GFP_KERNEL);
+       if (!ct_buf || !ss_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(ct_buf, u64_to_user_ptr(req.ct), ct_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       ct_dma = cmh_dma_map_single(ct_buf, ct_len, DMA_TO_DEVICE);
+       ss_dma = cmh_dma_map_single(ss_buf, ss_out_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ct_dma) || cmh_dma_map_error(ss_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       /*
+        * dk: if DS_REF flag is set, req.dk is a DS reference.
+        * Otherwise, copy raw dk from user-space and use extmem DMA.
+        * Masked decaps requires DS ref (polynomial unmasking not supported).
+        */
+       if (ds_ref) {
+               dk_ref = req.dk;
+               qse_flags |= QSE_FLAG_USE_REF;
+       } else {
+               if (masked) {
+                       ret = -EINVAL;
+                       goto out_unmap;
+               }
+               dk_buf = kmalloc(dk_len, GFP_KERNEL);
+               if (!dk_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+               if (copy_from_user(dk_buf, u64_to_user_ptr(req.dk), dk_len)) {
+                       ret = -EFAULT;
+                       goto out_unmap;
+               }
+               dk_dma = cmh_dma_map_single(dk_buf, dk_len, DMA_TO_DEVICE);
+               if (cmh_dma_map_error(dk_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+               dk_ref = dk_dma;
+       }
+
+       if (ds_ref) {
+               /*
+                * DS_REF decaps: CMH eSW resolves both dk and ss from DS.
+                * Phase 1: dec stores ss into SYS_REF_TEMP.
+                * Phase 2: sys_data reads ss from SYS_REF_TEMP to DMA.
+                */
+               vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+               vcq_add_qse_ml_kem_dec(&vcq[1], qse_cid, req.k, qse_flags,
+                                      ct_dma, dk_ref, SYS_REF_TEMP,
+                                      SYS_TYPE_SET(SYS_TYPE_FLAG_PT,
+                                                   CORE_ID_QSE),
+                                      masked);
+               vcq_add_qse_flush(&vcq[2], qse_cid);
+
+               ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN,
+                                            1, MGMT_MBX);
+               if (ret)
+                       goto out_unmap;
+
+               /* Phase 2: extract ss from SYS_REF_TEMP */
+               vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+               vcq_add_sys_data(&vcq[1], SYS_REF_TEMP, ss_dma,
+                                ss_out_len);
+               vcq_add_sys_flush(&vcq[2]);
+
+               ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN,
+                                            1, MGMT_MBX);
+       } else {
+               vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+               vcq_add_qse_ml_kem_dec(&vcq[1], qse_cid, req.k, qse_flags,
+                                      ct_dma, dk_ref, ss_dma, 0, masked);
+               vcq_add_qse_flush(&vcq[2], qse_cid);
+
+               ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN,
+                                            1, MGMT_MBX);
+       }
+
+out_unmap:
+       if (dk_buf && !cmh_dma_map_error(dk_dma))
+               cmh_dma_unmap_single(dk_dma, dk_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(ss_dma))
+               cmh_dma_unmap_single(ss_dma, ss_out_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ct_dma))
+               cmh_dma_unmap_single(ct_dma, ct_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (masked) {
+                       crypto_xor(ss_buf, ss_buf + ML_KEM_SS_LEN,
+                                  ML_KEM_SS_LEN);
+               }
+               if (copy_to_user(u64_to_user_ptr(req.ss), ss_buf,
+                                ML_KEM_SS_LEN)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(dk_buf);
+       kfree_sensitive(ss_buf);
+       kfree(ct_buf);
+       return ret;
+}
+
+/* -- PQC -- ML-DSA -- */
+
+/**
+ * cmh_mgmt_ml_dsa_keygen() - Handle CMH_MGMT_IOC_ML_DSA_KEYGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_ml_dsa_keygen(void __user *argp)
+{
+       u32 qse_cid = cmh_core_default_id(CMH_CORE_QSE);
+
+       struct cmh_ioctl_ml_dsa_keygen req;
+       struct vcq_cmd vcq[QSE_VCQ_CMDS_MAX];
+       u32 pk_size, sk_size, seed_len, key_flags;
+       u32 qse_flags = 0;
+       bool masked, ds_ref, hw_rng;
+       u8 *seed_buf = NULL, *pk_buf, *sk_buf = NULL;
+       u64 *ref_buf = NULL;
+       dma_addr_t seed_dma = DMA_MAPPING_ERROR, pk_dma;
+       dma_addr_t sk_dma = DMA_MAPPING_ERROR, ref_dma = DMA_MAPPING_ERROR;
+       int ret, idx, mi;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       mi = ml_dsa_mode_idx(req.mode);
+       if (mi < 0)
+               return -EINVAL;
+       if (req.flags & ~(CMH_QSE_FLAG_MASK | CMH_FLAG_MASK))
+               return -EINVAL;
+
+       masked = !!(req.flags & CMH_QSE_FLAG_MASKED);
+       ds_ref = !!(req.flags & CMH_QSE_FLAG_DS_REF);
+       hw_rng = !!(req.flags & CMH_QSE_FLAG_HW_RNG);
+
+       /*
+        * QSE keys only support PT storage -- the eSW sign path
+        * hardcodes SYS_TYPE_FLAG_PT when reading the key back.
+        * QSE SCA protection uses masking (CMH_QSE_FLAG_MASKED),
+        * not the 2-share mechanism (CMH_FLAG_SCA).
+        */
+       key_flags = req.flags & CMH_FLAG_MASK;
+       if (key_flags && key_flags != CMH_FLAG_PT)
+               return -EINVAL;
+       key_flags = CMH_FLAG_PT;
+
+       if (masked && !ds_ref)
+               return -EINVAL;
+
+       pk_size = ml_dsa_pk_size[mi];
+       sk_size = masked ? ml_dsa_sk_size_masked[mi] : ml_dsa_sk_size[mi];
+       seed_len = masked ? QSE_SEED_LEN_MASKED : QSE_SEED_LEN;
+
+       if (hw_rng)
+               qse_flags |= QSE_FLAG_USE_RNG;
+       if (ds_ref)
+               qse_flags |= QSE_FLAG_USE_REF;
+
+       pk_buf = kzalloc(pk_size, GFP_KERNEL);
+       if (!pk_buf)
+               return -ENOMEM;
+
+       if (!hw_rng && req.seed) {
+               seed_buf = kmalloc(seed_len, GFP_KERNEL);
+               if (!seed_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(seed_buf, u64_to_user_ptr(req.seed),
+                                  seed_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       if (ds_ref) {
+               ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+               if (!ref_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       } else {
+               sk_buf = kzalloc(sk_size, GFP_KERNEL);
+               if (!sk_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       }
+
+       pk_dma = cmh_dma_map_single(pk_buf, pk_size, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(pk_dma)) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (seed_buf) {
+               seed_dma = cmh_dma_map_single(seed_buf, seed_len,
+                                             DMA_TO_DEVICE);
+               if (cmh_dma_map_error(seed_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       if (ds_ref) {
+               ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64),
+                                            DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(ref_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       } else {
+               sk_dma = cmh_dma_map_single(sk_buf, sk_size, DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(sk_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       idx = 0;
+       if (ds_ref) {
+               vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MAX);
+               idx++;
+               vcq_add_sys_new(&vcq[idx++], req.sk_cid, ref_dma, sk_size);
+               vcq_add_qse_ml_dsa_keygen(&vcq[idx++], qse_cid, req.mode, qse_flags,
+                                         seed_dma, pk_dma,
+                                         SYS_REF_LAST,
+                                         SYS_TYPE_SET(key_flags,
+                                                      CORE_ID_QSE),
+                                         masked);
+               vcq_add_qse_flush(&vcq[idx++], qse_cid);
+               ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MAX,
+                                            1, MGMT_MBX);
+       } else {
+               vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+               idx++;
+               vcq_add_qse_ml_dsa_keygen(&vcq[idx++], qse_cid, req.mode, qse_flags,
+                                         seed_dma, pk_dma,
+                                         sk_dma, 0, masked);
+               vcq_add_qse_flush(&vcq[idx++], qse_cid);
+               ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN,
+                                            1, MGMT_MBX);
+       }
+
+out_unmap:
+       if (ds_ref && !cmh_dma_map_error(ref_dma))
+               cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       if (!ds_ref && sk_buf && !cmh_dma_map_error(sk_dma))
+               cmh_dma_unmap_single(sk_dma, sk_size, DMA_FROM_DEVICE);
+       if (seed_buf && !cmh_dma_map_error(seed_dma))
+               cmh_dma_unmap_single(seed_dma, seed_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, pk_size, DMA_FROM_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.pk), pk_buf, pk_size)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               if (ds_ref) {
+                       req.sk_ref = *ref_buf;
+               } else {
+                       if (copy_to_user(u64_to_user_ptr(req.sk),
+                                        sk_buf, sk_size)) {
+                               ret = -EFAULT;
+                               goto out_free;
+                       }
+               }
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(sk_buf);
+       kfree(ref_buf);
+       kfree_sensitive(seed_buf);
+       kfree(pk_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_ml_dsa_sign() - Handle CMH_MGMT_IOC_ML_DSA_SIGN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_ml_dsa_sign(void __user *argp)
+{
+       u32 qse_cid = cmh_core_default_id(CMH_CORE_QSE);
+
+       struct cmh_ioctl_ml_dsa_sign req;
+       struct vcq_cmd vcq[QSE_VCQ_CMDS_MIN];
+       u32 sig_size, copy_len, rnd_len;
+       u32 qse_flags = 0;
+       bool masked;
+       u8 *m_buf, *sig_buf, *sk_buf = NULL, *rnd_buf = NULL;
+       dma_addr_t m_dma = DMA_MAPPING_ERROR, sig_dma;
+       dma_addr_t sk_dma = DMA_MAPPING_ERROR, rnd_dma = DMA_MAPPING_ERROR;
+       u64 sk_ref;
+       int mi, ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       mi = ml_dsa_mode_idx(req.mode);
+       if (mi < 0)
+               return -EINVAL;
+       if (req.mlen > ML_DSA_MAX_MLEN && req.mlen != ML_DSA_MLEN_EXTERNAL_MU)
+               return -EINVAL;
+
+       masked = !!(req.flags & CMH_QSE_FLAG_MASKED);
+       rnd_len = masked ? QSE_SEED_LEN_MASKED : QSE_SEED_LEN;
+       sig_size = ml_dsa_sig_size[mi];
+       copy_len = (req.mlen == ML_DSA_MLEN_EXTERNAL_MU)
+                       ? ML_DSA_EXTMU_LEN : req.mlen;
+
+       /*
+        * sk: if DS_REF, req.sk is a DS reference (masked sk lives in DS).
+        * Otherwise, copy raw sk from user-space.
+        * Masked sign requires DS ref (polynomial unmasking not supported).
+        */
+       if (req.flags & CMH_QSE_FLAG_DS_REF) {
+               sk_ref = req.sk;
+               qse_flags |= QSE_FLAG_USE_REF;
+       } else {
+               u32 sk_size;
+
+               if (masked)
+                       return -EINVAL;
+               sk_size = ml_dsa_sk_size[mi];
+               sk_buf = kmalloc(sk_size, GFP_KERNEL);
+               if (!sk_buf)
+                       return -ENOMEM;
+               if (copy_from_user(sk_buf, u64_to_user_ptr(req.sk), sk_size)) {
+                       kfree_sensitive(sk_buf);
+                       return -EFAULT;
+               }
+       }
+
+       m_buf = kmalloc(max_t(u32, copy_len, 1), GFP_KERNEL);
+       sig_buf = kzalloc(sig_size, GFP_KERNEL);
+       if (!m_buf || !sig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_len > 0 &&
+           copy_from_user(m_buf, u64_to_user_ptr(req.m), copy_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       if (req.rnd) {
+               rnd_buf = kmalloc(rnd_len, GFP_KERNEL);
+               if (!rnd_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(rnd_buf, u64_to_user_ptr(req.rnd),
+                                  rnd_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       if (copy_len > 0) {
+               m_dma = cmh_dma_map_single(m_buf, copy_len, DMA_TO_DEVICE);
+               if (cmh_dma_map_error(m_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+       sig_dma = cmh_dma_map_single(sig_buf, sig_size, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(sig_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       if (sk_buf) {
+               sk_dma = cmh_dma_map_single(sk_buf, ml_dsa_sk_size[mi],
+                                           DMA_TO_DEVICE);
+               if (cmh_dma_map_error(sk_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+               sk_ref = sk_dma;
+       }
+
+       if (rnd_buf) {
+               rnd_dma = cmh_dma_map_single(rnd_buf, rnd_len,
+                                            DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rnd_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       vcq_set_header(&vcq[0], QSE_VCQ_CMDS_MIN);
+       vcq_add_qse_ml_dsa_sign(&vcq[1], qse_cid, req.mode, qse_flags,
+                               rnd_dma, m_dma, sk_ref, sig_dma,
+                               req.mlen, masked);
+       vcq_add_qse_flush(&vcq[2], qse_cid);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, QSE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (rnd_buf && !cmh_dma_map_error(rnd_dma))
+               cmh_dma_unmap_single(rnd_dma, rnd_len, DMA_TO_DEVICE);
+       if (sk_buf && !cmh_dma_map_error(sk_dma))
+               cmh_dma_unmap_single(sk_dma, ml_dsa_sk_size[mi],
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_size, DMA_FROM_DEVICE);
+       if (copy_len > 0 && !cmh_dma_map_error(m_dma))
+               cmh_dma_unmap_single(m_dma, copy_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.sig), sig_buf, sig_size))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(rnd_buf);
+       kfree(sig_buf);
+       kfree(m_buf);
+       kfree_sensitive(sk_buf);
+       return ret;
+}
+
+/* -- PQC -- SLH-DSA -- */
+
+/**
+ * cmh_mgmt_slhdsa_keygen() - Handle CMH_MGMT_IOC_SLHDSA_KEYGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_slhdsa_keygen(void __user *argp)
+{
+       u32 hcq_cid = cmh_core_default_id(CMH_CORE_HCQ);
+
+       struct cmh_ioctl_slhdsa_keygen req;
+       struct vcq_cmd vcq[HCQ_VCQ_CMDS_MAX];
+       u32 pk_sz, sk_sz, seed_sz, sk_alloc, vcq_cnt, key_flags;
+       bool ds_ref;
+       u8 *seed_buf, *pk_buf, *sk_buf = NULL;
+       u64 *ref_buf = NULL;
+       dma_addr_t seed_dma, pk_dma, sk_dma = DMA_MAPPING_ERROR, ref_dma = DMA_MAPPING_ERROR;
+       int ret, idx;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+       if (req.parameter_set < 1 || req.parameter_set > HCQ_SLHDSA_PARAM_MAX)
+               return -EINVAL;
+       if (req.flags & ~(CMH_QSE_FLAG_DS_REF | CMH_FLAG_MASK))
+               return -EINVAL;
+
+       ds_ref = !!(req.flags & CMH_QSE_FLAG_DS_REF);
+
+       /*
+        * QSE keys only support PT storage -- the eSW sign path
+        * hardcodes SYS_TYPE_FLAG_PT when reading the key back.
+        * HCQ core sets key type internally during keygen.
+        */
+       key_flags = req.flags & CMH_FLAG_MASK;
+       if (key_flags && key_flags != CMH_FLAG_PT)
+               return -EINVAL;
+       (void)key_flags;
+
+       pk_sz = slhdsa_pk_size(req.parameter_set);
+       sk_sz = slhdsa_sk_size(req.parameter_set);
+       seed_sz = slhdsa_seed_size(req.parameter_set);
+
+       seed_buf = kmalloc(seed_sz, GFP_KERNEL);
+       pk_buf = kzalloc(pk_sz, GFP_KERNEL);
+       if (!seed_buf || !pk_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(seed_buf, u64_to_user_ptr(req.seed), seed_sz)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       /*
+        * Both paths need ref_buf for sys_new output.  Non-ds_ref also
+        * needs sk_buf (+16 for SYS header) to read back via sys_read.
+        */
+       ref_buf = kzalloc(sizeof(u64), GFP_KERNEL);
+       if (!ref_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+       if (!ds_ref) {
+               sk_alloc = sk_sz + SYS_WRAP_HDR_SIZE;
+               sk_buf = kzalloc(sk_alloc, GFP_KERNEL);
+               if (!sk_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       }
+
+       seed_dma = cmh_dma_map_single(seed_buf, seed_sz, DMA_TO_DEVICE);
+       pk_dma = cmh_dma_map_single(pk_buf, pk_sz, DMA_FROM_DEVICE);
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(seed_dma) || cmh_dma_map_error(pk_dma) ||
+           cmh_dma_map_error(ref_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       if (!ds_ref) {
+               sk_dma = cmh_dma_map_single(sk_buf, sk_alloc,
+                                           DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(sk_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       /*
+        * SLH-DSA keygen requires seed and sk as DS references.
+        * VCQ: hdr + sys_new(sk) + sys_write(seed->TEMP) + keygen + [sys_read] + flush
+        */
+       idx = 0;
+       if (ds_ref) {
+               vcq_cnt = HCQ_VCQ_CMDS_MAX - 1; /* hdr+new+write+keygen+flush */
+               vcq_set_header(&vcq[idx++], vcq_cnt);
+               vcq_add_sys_new(&vcq[idx++], req.sk_cid, ref_dma,
+                               sk_sz);
+       } else {
+               vcq_cnt = HCQ_VCQ_CMDS_MAX; /* hdr+new+write+keygen+read+flush */
+               vcq_set_header(&vcq[idx++], vcq_cnt);
+               vcq_add_sys_new(&vcq[idx++], SYS_CID_NONE, ref_dma,
+                               sk_sz);
+       }
+       vcq_add_sys_write(&vcq[idx++], SYS_REF_TEMP, seed_dma, 0,
+                         seed_sz,
+                         SYS_TYPE_SET(SYS_TYPE_FLAG_PT, CORE_ID_HCQ));
+       vcq_add_hcq_slhdsa_keygen(&vcq[idx++], hcq_cid, req.parameter_set,
+                                 seed_sz, pk_sz, sk_sz,
+                                 SYS_REF_TEMP, pk_dma, SYS_REF_LAST);
+       if (!ds_ref)
+               vcq_add_sys_read(&vcq[idx++], SYS_REF_LAST, sk_dma,
+                                0, sk_sz + SYS_WRAP_HDR_SIZE);
+       vcq_add_hcq_flush(&vcq[idx++], hcq_cid);
+
+       ret = cmh_tm_submit_sync_tmo(vcq, vcq_cnt, 1, MGMT_MBX,
+                                    cmh_tm_slow_op_timeout_jiffies());
+
+out_unmap:
+       if (!ds_ref && sk_buf && !cmh_dma_map_error(sk_dma))
+               cmh_dma_unmap_single(sk_dma, sk_alloc, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ref_dma))
+               cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, pk_sz, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(seed_dma))
+               cmh_dma_unmap_single(seed_dma, seed_sz, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.pk), pk_buf, pk_sz)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               if (ds_ref) {
+                       req.sk_ref = *ref_buf;
+               } else {
+                       if (copy_to_user(u64_to_user_ptr(req.sk),
+                                        sk_buf + SYS_WRAP_HDR_SIZE,
+                                        sk_sz)) {
+                               ret = -EFAULT;
+                               goto out_free;
+                       }
+               }
+               if (copy_to_user(argp, &req, sizeof(req)))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(sk_buf);
+       kfree(ref_buf);
+       kfree(pk_buf);
+       kfree_sensitive(seed_buf);
+       return ret;
+}
+
+/**
+ * cmh_mgmt_slhdsa_sign() - Handle CMH_MGMT_IOC_SLHDSA_SIGN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_slhdsa_sign(void __user *argp)
+{
+       u32 hcq_cid = cmh_core_default_id(CMH_CORE_HCQ);
+
+       struct cmh_ioctl_slhdsa_sign req;
+       struct vcq_cmd vcq[HCQ_VCQ_CMDS_MIN];
+       u32 sig_sz, n_val;
+       u8 *msg_buf, *ctx_buf = NULL, *sig_buf, *rnd_buf = NULL;
+       dma_addr_t msg_dma = DMA_MAPPING_ERROR, ctx_dma = DMA_MAPPING_ERROR;
+       dma_addr_t sig_dma, rnd_dma = DMA_MAPPING_ERROR;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.parameter_set < 1 || req.parameter_set > HCQ_SLHDSA_PARAM_MAX)
+               return -EINVAL;
+       if (req.msg_len > SLHDSA_MAX_MSG_LEN)
+               return -EINVAL;
+       if (req.ctx_len > SLHDSA_MAX_CTX_LEN)
+               return -EINVAL;
+
+       sig_sz = slhdsa_get_sig_size(req.parameter_set);
+       n_val = slhdsa_n[req.parameter_set - 1];
+
+       msg_buf = kmalloc(max_t(u32, req.msg_len, 1), GFP_KERNEL);
+       sig_buf = kzalloc(sig_sz, GFP_KERNEL);
+       if (!msg_buf || !sig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (req.msg_len > 0 &&
+           copy_from_user(msg_buf, u64_to_user_ptr(req.msg), req.msg_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       if (req.ctx_len > 0 && req.ctx) {
+               ctx_buf = kmalloc(req.ctx_len, GFP_KERNEL);
+               if (!ctx_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(ctx_buf, u64_to_user_ptr(req.ctx),
+                                  req.ctx_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       if (req.add_random) {
+               rnd_buf = kmalloc(n_val, GFP_KERNEL);
+               if (!rnd_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(rnd_buf, u64_to_user_ptr(req.add_random),
+                                  n_val)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       sig_dma = cmh_dma_map_single(sig_buf, sig_sz, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(sig_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+       if (req.msg_len > 0) {
+               msg_dma = cmh_dma_map_single(msg_buf, req.msg_len,
+                                            DMA_TO_DEVICE);
+               if (cmh_dma_map_error(msg_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       if (ctx_buf) {
+               ctx_dma = cmh_dma_map_single(ctx_buf, req.ctx_len,
+                                            DMA_TO_DEVICE);
+               if (cmh_dma_map_error(ctx_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       if (rnd_buf) {
+               rnd_dma = cmh_dma_map_single(rnd_buf, n_val, DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rnd_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       vcq_set_header(&vcq[0], HCQ_VCQ_CMDS_MIN);
+       vcq_add_hcq_slhdsa_sign(&vcq[1], hcq_cid, req.parameter_set,
+                               req.msg_len, req.ctx_len,
+                               rnd_dma, msg_dma, ctx_dma,
+                               req.sk, sig_dma);
+       vcq_add_hcq_flush(&vcq[2], hcq_cid);
+
+       ret = cmh_tm_submit_sync_tmo(vcq, HCQ_VCQ_CMDS_MIN, 1, MGMT_MBX,
+                                    cmh_tm_slow_op_timeout_jiffies());
+
+out_unmap:
+       if (rnd_buf && !cmh_dma_map_error(rnd_dma))
+               cmh_dma_unmap_single(rnd_dma, n_val, DMA_TO_DEVICE);
+       if (ctx_buf && !cmh_dma_map_error(ctx_dma))
+               cmh_dma_unmap_single(ctx_dma, req.ctx_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_sz, DMA_FROM_DEVICE);
+       if (req.msg_len > 0 && !cmh_dma_map_error(msg_dma))
+               cmh_dma_unmap_single(msg_dma, req.msg_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.sig), sig_buf, sig_sz))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(rnd_buf);
+       kfree(ctx_buf);
+       kfree(sig_buf);
+       kfree(msg_buf);
+       return ret;
+}
+
+/* -- PQC -- SLH-DSA prehash -- */
+
+/**
+ * cmh_mgmt_slhdsa_sign_prehash() - Handle CMH_MGMT_IOC_SLHDSA_SIGN_PREHASH ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_slhdsa_sign_prehash(void __user *argp)
+{
+       u32 hcq_cid = cmh_core_default_id(CMH_CORE_HCQ);
+
+       struct cmh_ioctl_slhdsa_sign_prehash req;
+       struct vcq_cmd vcq[HCQ_VCQ_CMDS_MIN];
+       u32 sig_sz, n_val, hcq_cmd;
+       u8 *msg_buf, *ctx_buf = NULL, *sig_buf, *rnd_buf = NULL;
+       dma_addr_t msg_dma = DMA_MAPPING_ERROR, ctx_dma = DMA_MAPPING_ERROR;
+       dma_addr_t sig_dma, rnd_dma = DMA_MAPPING_ERROR;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.parameter_set < 1 || req.parameter_set > HCQ_SLHDSA_PARAM_MAX)
+               return -EINVAL;
+       if (req.prehash_algo < 1 || req.prehash_algo > HCQ_SLHDSA_PREHASH_SHAKE256)
+               return -EINVAL;
+       if (req.msg_len > SLHDSA_MAX_MSG_LEN)
+               return -EINVAL;
+       if (req.ctx_len > SLHDSA_MAX_CTX_LEN)
+               return -EINVAL;
+
+       hcq_cmd = req.digest ? HCQ_CMD_SLHDSA_SIGN_PREHASH_DIGEST
+                            : HCQ_CMD_SLHDSA_SIGN_PREHASH;
+
+       sig_sz = slhdsa_get_sig_size(req.parameter_set);
+       n_val = slhdsa_n[req.parameter_set - 1];
+
+       msg_buf = kmalloc(max_t(u32, req.msg_len, 1), GFP_KERNEL);
+       sig_buf = kzalloc(sig_sz, GFP_KERNEL);
+       if (!msg_buf || !sig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (req.msg_len > 0 &&
+           copy_from_user(msg_buf, u64_to_user_ptr(req.msg), req.msg_len)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       if (req.ctx_len > 0 && req.ctx) {
+               ctx_buf = kmalloc(req.ctx_len, GFP_KERNEL);
+               if (!ctx_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(ctx_buf, u64_to_user_ptr(req.ctx),
+                                  req.ctx_len)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       if (req.add_random) {
+               rnd_buf = kmalloc(n_val, GFP_KERNEL);
+               if (!rnd_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(rnd_buf, u64_to_user_ptr(req.add_random),
+                                  n_val)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       sig_dma = cmh_dma_map_single(sig_buf, sig_sz, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(sig_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+       if (req.msg_len > 0) {
+               msg_dma = cmh_dma_map_single(msg_buf, req.msg_len,
+                                            DMA_TO_DEVICE);
+               if (cmh_dma_map_error(msg_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       if (ctx_buf) {
+               ctx_dma = cmh_dma_map_single(ctx_buf, req.ctx_len,
+                                            DMA_TO_DEVICE);
+               if (cmh_dma_map_error(ctx_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       if (rnd_buf) {
+               rnd_dma = cmh_dma_map_single(rnd_buf, n_val, DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rnd_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap;
+               }
+       }
+
+       vcq_set_header(&vcq[0], HCQ_VCQ_CMDS_MIN);
+       vcq_add_hcq_slhdsa_sign_prehash(&vcq[1], hcq_cid,
+                                       hcq_cmd, req.parameter_set,
+                                       req.prehash_algo,
+                                       req.msg_len, req.ctx_len,
+                                       rnd_dma, msg_dma, ctx_dma,
+                                       req.sk, sig_dma);
+       vcq_add_hcq_flush(&vcq[2], hcq_cid);
+
+       ret = cmh_tm_submit_sync_tmo(vcq, HCQ_VCQ_CMDS_MIN, 1, MGMT_MBX,
+                                    cmh_tm_slow_op_timeout_jiffies());
+
+out_unmap:
+       if (rnd_buf && !cmh_dma_map_error(rnd_dma))
+               cmh_dma_unmap_single(rnd_dma, n_val, DMA_TO_DEVICE);
+       if (ctx_buf && !cmh_dma_map_error(ctx_dma))
+               cmh_dma_unmap_single(ctx_dma, req.ctx_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_sz, DMA_FROM_DEVICE);
+       if (req.msg_len > 0 && !cmh_dma_map_error(msg_dma))
+               cmh_dma_unmap_single(msg_dma, req.msg_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.sig), sig_buf, sig_sz))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(rnd_buf);
+       kfree(ctx_buf);
+       kfree(sig_buf);
+       kfree(msg_buf);
+       return ret;
+}
+
+/* -- EAC (Error and Alarm Controller) ---- */
+
diff --git a/drivers/crypto/cmh/cmh_pke_sm2.c b/drivers/crypto/cmh/cmh_pke_sm2.c
new file mode 100644
index 000000000000..9a6e30c7f5e5
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pke_sm2.c
@@ -0,0 +1,827 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SM2 PKE Ioctl Handlers
+ *
+ * SM2 (GM/T 0003) is the Chinese national public-key standard over the
+ * sm2p256v1 curve (256-bit).  It defines three protocols:
+ *
+ *   - Signature: reuses ECDSA sign/verify with SM2_CURVE (0x18), handled
+ *     by the existing cmh_mgmt_pke_ecdsa_{sign,verify}() paths.
+ *   - Encryption: two-step (ENC_POINT + ENC_HASH / DEC_POINT + DEC_HASH).
+ *   - Key Exchange: four-step (ECDH_KEYGEN + ID_DIGEST + ECDH + ECDH_HASH).
+ *
+ * This file implements the 8 SM2-specific ioctl handlers (0x16--0x1D).
+ * Sign/verify/keygen/pubgen use the existing ECDSA/EC paths unchanged.
+ *
+ * VCQ flag convention (from eSW API):
+ *   - Most SM2 commands use flags=0 (no swap).
+ *   - SM2_DEC_POINT and SM2_ECDH_HASH use PKE_SWAP_FLAGS on the
+ *     PKE command itself.
+ *   - SM2_ECDH and SM2_ECDH_HASH also apply PKE_SWAP_FLAGS on
+ *     their sys_new/sys_data VCQ phases (Weierstrass DS format).
+ */
+
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+
+#include "cmh_pke.h"
+#include "cmh_pke_sm2.h"
+#include "cmh_sys.h"
+#include "cmh_dma.h"
+#include "cmh_txn.h"
+#include "cmh_mgmt.h"
+#include "cmh_sys_abi.h"
+#include <uapi/linux/cmh_mgmt_ioctl.h>
+
+/* SM2 fixed sizes (sm2p256v1: 256-bit curve) */
+#define SM2_CLEN               32U     /* coordinate length */
+#define SM2_POINT_LEN          64U     /* uncompressed EC point (x||y) */
+#define SM2_SHARED_KEY_LEN     16U     /* ECDH shared key output */
+#define SM2_DIGEST_LEN         32U     /* SM3 ZA digest */
+#define SM2_NONCE_LEN          32U     /* nonce (when caller-provided) */
+/*
+ * SM2 enc_hash/dec_hash payload limit.
+ *
+ * The eSW PKE driver expands the GM/T 0003.4 KDF by issuing a single SM3
+ * invocation per command (one 32-byte block of key stream).  Messages
+ * longer than 32 bytes would require ceil(msg_len / 32) SM3 invocations
+ * with an incremented counter, which the eSW does not perform; longer
+ * inputs would silently produce incorrect ciphertext / plaintext.
+ *
+ * The eSW PKE SRAM can physically hold up to 4000 bytes of payload, but
+ * that capacity is unusable until a future eSW change implements the full
+ * KDF expansion.  Until then we cap the LKM at the 32-byte limit
+ * documented in Documentation/ABI/testing/cmh-mgmt.
+ */
+#define SM2_MAX_MSG_LEN                32U     /* max plaintext for encrypt/decrypt */
+#define SM2_MAX_ID_LEN         32U     /* max identity string */
+#define SM2_CT_OVERHEAD                96U     /* C1(64) + C3(32) */
+#define SM2_MAX_CT_LEN         (SM2_CT_OVERHEAD + SM2_MAX_MSG_LEN) /* 128 */
+
+/* -- SM2_ECDH_KEYGEN ------------------- */
+
+/**
+ * cmh_mgmt_sm2_ecdh_keygen() - Handle CMH_MGMT_IOC_SM2_ECDH_KEYGEN ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_ecdh_keygen(void __user *argp)
+{
+       struct cmh_ioctl_sm2_ecdh_keygen req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 core_id = cmh_core_default_id(CMH_CORE_PKE);
+       u8 *nonce_buf, *sk_buf;
+       dma_addr_t nonce_dma, sk_dma;
+       int nonce_dir;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.nonce_len != 0 && req.nonce_len != SM2_NONCE_LEN)
+               return -EINVAL;
+
+       sk_buf = kzalloc(SM2_POINT_LEN, GFP_KERNEL);
+       nonce_buf = kzalloc(SM2_NONCE_LEN, GFP_KERNEL);
+       if (!sk_buf || !nonce_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       /*
+        * nonce_len=32: caller provides ephemeral scalar r (DMA_TO_DEVICE).
+        * nonce_len=0:  HW generates r and writes it back (DMA_FROM_DEVICE).
+        * The caller MUST supply a valid nonce pointer in both cases.
+        */
+       if (req.nonce_len) {
+               if (copy_from_user(nonce_buf, u64_to_user_ptr(req.nonce),
+                                  SM2_NONCE_LEN)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               nonce_dir = DMA_TO_DEVICE;
+       } else {
+               nonce_dir = DMA_FROM_DEVICE;
+       }
+
+       sk_dma = cmh_dma_map_single(sk_buf, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       nonce_dma = cmh_dma_map_single(nonce_buf, SM2_NONCE_LEN, nonce_dir);
+       if (cmh_dma_map_error(sk_dma) || cmh_dma_map_error(nonce_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_sm2_ecdh_keygen(&vcq[1], core_id, nonce_dma, sk_dma,
+                                   req.nonce_len, 0);
+       vcq_add_pke_flush(&vcq[2], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(nonce_dma))
+               cmh_dma_unmap_single(nonce_dma, SM2_NONCE_LEN, nonce_dir);
+       if (!cmh_dma_map_error(sk_dma))
+               cmh_dma_unmap_single(sk_dma, SM2_POINT_LEN, DMA_FROM_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.session_key),
+                                sk_buf, SM2_POINT_LEN))
+                       ret = -EFAULT;
+               /* Write back HW-generated nonce when nonce_len=0 */
+               if (!ret && !req.nonce_len) {
+                       if (copy_to_user(u64_to_user_ptr(req.nonce),
+                                        nonce_buf, SM2_NONCE_LEN))
+                               ret = -EFAULT;
+               }
+       }
+
+out_free:
+       kfree_sensitive(nonce_buf);
+       kfree_sensitive(sk_buf);
+       return ret;
+}
+
+/* -- SM2_ECDH -------------------------- */
+
+/**
+ * cmh_mgmt_sm2_ecdh() - Handle CMH_MGMT_IOC_SM2_ECDH ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_ecdh(void __user *argp)
+{
+       struct cmh_ioctl_sm2_ecdh req;
+       /* Phase 1: hdr + sys_new + sm2_ecdh + pke_flush */
+       struct vcq_cmd vcq[4];
+       u32 sp_type, core_id;
+       u8 *nonce_buf, *peer_pk_buf, *peer_sk_buf, *sp_buf;
+       u64 *ref_buf;
+       dma_addr_t nonce_dma, peer_pk_dma, peer_sk_dma, sp_dma, ref_dma;
+       int nonce_dir, ret, idx;
+       bool keep_ds;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.nonce_len != 0 && req.nonce_len != SM2_NONCE_LEN)
+               return -EINVAL;
+
+       keep_ds = (req.shared_point_ref != 0);
+       sp_type = SYS_TYPE_SET(SYS_TYPE_FLAG_PT, CORE_ID_PKE);
+       core_id = cmh_core_default_id(CMH_CORE_PKE);
+
+       peer_pk_buf = kmalloc(SM2_POINT_LEN, GFP_KERNEL);
+       peer_sk_buf = kmalloc(SM2_POINT_LEN, GFP_KERNEL);
+       sp_buf = kzalloc(SM2_POINT_LEN, GFP_KERNEL);
+       ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+       nonce_buf = kzalloc(SM2_NONCE_LEN, GFP_KERNEL);
+       if (!peer_pk_buf || !peer_sk_buf || !sp_buf || !ref_buf ||
+           !nonce_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(peer_pk_buf, u64_to_user_ptr(req.peer_public_key),
+                          SM2_POINT_LEN) ||
+           copy_from_user(peer_sk_buf, u64_to_user_ptr(req.peer_session_key),
+                          SM2_POINT_LEN)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       if (req.nonce_len) {
+               if (copy_from_user(nonce_buf, u64_to_user_ptr(req.nonce),
+                                  SM2_NONCE_LEN)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+               nonce_dir = DMA_TO_DEVICE;
+       } else {
+               nonce_dir = DMA_FROM_DEVICE;
+       }
+
+       peer_pk_dma = cmh_dma_map_single(peer_pk_buf, SM2_POINT_LEN,
+                                        DMA_TO_DEVICE);
+       peer_sk_dma = cmh_dma_map_single(peer_sk_buf, SM2_POINT_LEN,
+                                        DMA_TO_DEVICE);
+       sp_dma = cmh_dma_map_single(sp_buf, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64), DMA_FROM_DEVICE);
+       nonce_dma = cmh_dma_map_single(nonce_buf, SM2_NONCE_LEN, nonce_dir);
+
+       if (cmh_dma_map_error(peer_pk_dma) || cmh_dma_map_error(peer_sk_dma) ||
+           cmh_dma_map_error(sp_dma) || cmh_dma_map_error(ref_dma) ||
+           cmh_dma_map_error(nonce_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       /* Phase 1: sys_new(shared_point_ref) + SM2_ECDH(->SYS_REF_LAST) */
+       idx = 0;
+       vcq_set_header(&vcq[idx++], 4);
+       vcq_add_sys_new(&vcq[idx], 0, ref_dma, SM2_POINT_LEN);
+       vcq[idx++].id |= PKE_SWAP_FLAGS;
+       vcq_add_pke_sm2_ecdh(&vcq[idx++], core_id, req.nonce_len, SM2_CLEN,
+                            nonce_dma, peer_pk_dma, peer_sk_dma,
+                            req.key_ref, SYS_REF_LAST, sp_type, 0);
+       vcq_add_pke_flush(&vcq[idx++], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, 4, 1, MGMT_MBX);
+       if (ret)
+               goto out_unmap;
+
+       if (!keep_ds) {
+               /* Sync bounce buffer so CPU sees the DMA-written ref */
+               cmh_dma_sync_for_cpu(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+
+               /* Phase 2: read shared point from DS -> DMA, consuming the slot */
+               vcq_set_header(&vcq[0], 3);
+               vcq_add_sys_data(&vcq[1], *ref_buf, sp_dma, SM2_POINT_LEN);
+               vcq[1].id |= PKE_SWAP_FLAGS;
+               vcq_add_sys_flush(&vcq[2]);
+
+               ret = cmh_tm_submit_sync_mbx(vcq, 3, 1, MGMT_MBX);
+       }
+
+out_unmap:
+       if (!cmh_dma_map_error(nonce_dma))
+               cmh_dma_unmap_single(nonce_dma, SM2_NONCE_LEN, nonce_dir);
+       if (!cmh_dma_map_error(ref_dma))
+               cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(sp_dma))
+               cmh_dma_unmap_single(sp_dma, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(peer_sk_dma))
+               cmh_dma_unmap_single(peer_sk_dma, SM2_POINT_LEN,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(peer_pk_dma))
+               cmh_dma_unmap_single(peer_pk_dma, SM2_POINT_LEN,
+                                    DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (!keep_ds) {
+                       if (copy_to_user(u64_to_user_ptr(req.shared_point),
+                                        sp_buf, SM2_POINT_LEN))
+                               ret = -EFAULT;
+               } else {
+                       /* Return DS ref for ECDH_HASH to consume */
+                       u64 __user *sp_refp = (__u64 __user *)
+                               u64_to_user_ptr(req.shared_point_ref);
+
+                       if (put_user(*ref_buf, sp_refp)) {
+                               /*
+                                * Failed to deliver the DS ref to
+                                * userspace.  Logically delete the
+                                * orphaned slot so it does not leak.
+                                */
+                               vcq_set_header(&vcq[0], 3);
+                               vcq_add_sys_grant(&vcq[1], *ref_buf,
+                                                 0, 0, 0);
+                               vcq_add_sys_flush(&vcq[2]);
+                               cmh_tm_submit_sync_mbx(vcq, 3, 1,
+                                                      MGMT_MBX);
+                               dev_warn(cmh_dev(), "SM2 ECDH put_user failed, DS slot cleaned up\n");
+                               ret = -EFAULT;
+                       }
+               }
+               /* Write back HW-generated nonce when nonce_len=0 */
+               if (!ret && !req.nonce_len) {
+                       if (copy_to_user(u64_to_user_ptr(req.nonce),
+                                        nonce_buf, SM2_NONCE_LEN))
+                               ret = -EFAULT;
+               }
+       }
+
+out_free:
+       kfree_sensitive(nonce_buf);
+       kfree(ref_buf);
+       kfree_sensitive(sp_buf);
+       kfree(peer_sk_buf);
+       kfree(peer_pk_buf);
+       return ret;
+}
+
+/* -- SM2_DEC_POINT --------------------- */
+
+/**
+ * cmh_mgmt_sm2_dec_point() - Handle CMH_MGMT_IOC_SM2_DEC_POINT ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_dec_point(void __user *argp)
+{
+       struct cmh_ioctl_sm2_dec_point req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 core_id = cmh_core_default_id(CMH_CORE_PKE);
+       u8 *ct_buf, *dp_buf;
+       dma_addr_t ct_dma, dp_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.ciphertext_len <= SM2_CT_OVERHEAD ||
+           req.ciphertext_len > SM2_MAX_CT_LEN)
+               return -EINVAL;
+
+       /* Only need C1 (first 64 bytes) for the sidecar */
+       ct_buf = kmalloc(SM2_POINT_LEN, GFP_KERNEL);
+       dp_buf = kzalloc(SM2_POINT_LEN, GFP_KERNEL);
+       if (!ct_buf || !dp_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(ct_buf, u64_to_user_ptr(req.ciphertext),
+                          SM2_POINT_LEN)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       ct_dma = cmh_dma_map_single(ct_buf, SM2_POINT_LEN, DMA_TO_DEVICE);
+       dp_dma = cmh_dma_map_single(dp_buf, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ct_dma) || cmh_dma_map_error(dp_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_sm2_dec_point(&vcq[1], core_id, req.ciphertext_len, SM2_CLEN,
+                                 ct_dma, dp_dma, req.key_ref,
+                                 PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[2], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(dp_dma))
+               cmh_dma_unmap_single(dp_dma, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ct_dma))
+               cmh_dma_unmap_single(ct_dma, SM2_POINT_LEN, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.dec_point),
+                                dp_buf, SM2_POINT_LEN))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(dp_buf);
+       kfree(ct_buf);
+       return ret;
+}
+
+/* -- SM2_ENC_POINT --------------------- */
+
+/**
+ * cmh_mgmt_sm2_enc_point() - Handle CMH_MGMT_IOC_SM2_ENC_POINT ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_enc_point(void __user *argp)
+{
+       struct cmh_ioctl_sm2_enc_point req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 core_id = cmh_core_default_id(CMH_CORE_PKE);
+       u8 *nonce_buf = NULL, *pk_buf, *ct_buf, *ep_buf;
+       dma_addr_t nonce_dma = DMA_MAPPING_ERROR, pk_dma, ct_dma, ep_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.nonce_len != 0 && req.nonce_len != SM2_NONCE_LEN)
+               return -EINVAL;
+
+       pk_buf = kmalloc(SM2_POINT_LEN, GFP_KERNEL);
+       ct_buf = kzalloc(SM2_POINT_LEN, GFP_KERNEL);
+       ep_buf = kzalloc(SM2_POINT_LEN, GFP_KERNEL);
+       if (!pk_buf || !ct_buf || !ep_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(pk_buf, u64_to_user_ptr(req.public_key),
+                          SM2_POINT_LEN)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       if (req.nonce_len) {
+               nonce_buf = kmalloc(SM2_NONCE_LEN, GFP_KERNEL);
+               if (!nonce_buf) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+               if (copy_from_user(nonce_buf, u64_to_user_ptr(req.nonce),
+                                  SM2_NONCE_LEN)) {
+                       ret = -EFAULT;
+                       goto out_free;
+               }
+       }
+
+       pk_dma = cmh_dma_map_single(pk_buf, SM2_POINT_LEN, DMA_TO_DEVICE);
+       ct_dma = cmh_dma_map_single(ct_buf, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       ep_dma = cmh_dma_map_single(ep_buf, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       if (nonce_buf)
+               nonce_dma = cmh_dma_map_single(nonce_buf, SM2_NONCE_LEN,
+                                              DMA_TO_DEVICE);
+       if (cmh_dma_map_error(pk_dma) || cmh_dma_map_error(ct_dma) ||
+           cmh_dma_map_error(ep_dma) ||
+           (nonce_buf && cmh_dma_map_error(nonce_dma))) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_sm2_enc_point(&vcq[1], core_id, nonce_dma, pk_dma, ct_dma,
+                                 ep_dma, req.nonce_len, 0);
+       vcq_add_pke_flush(&vcq[2], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (nonce_buf && !cmh_dma_map_error(nonce_dma))
+               cmh_dma_unmap_single(nonce_dma, SM2_NONCE_LEN, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(ep_dma))
+               cmh_dma_unmap_single(ep_dma, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ct_dma))
+               cmh_dma_unmap_single(ct_dma, SM2_POINT_LEN, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, SM2_POINT_LEN, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.ciphertext),
+                                ct_buf, SM2_POINT_LEN) ||
+                   copy_to_user(u64_to_user_ptr(req.enc_point),
+                                ep_buf, SM2_POINT_LEN))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(nonce_buf);
+       kfree(ep_buf);
+       kfree(ct_buf);
+       kfree(pk_buf);
+       return ret;
+}
+
+/* -- SM2_ID_DIGEST --------------------- */
+
+/**
+ * cmh_mgmt_sm2_id_digest() - Handle CMH_MGMT_IOC_SM2_ID_DIGEST ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_id_digest(void __user *argp)
+{
+       struct cmh_ioctl_sm2_id_digest req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 core_id = cmh_core_default_id(CMH_CORE_PKE);
+       u8 *id_buf, *pk_buf, *dig_buf;
+       dma_addr_t id_dma, pk_dma, dig_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.id_len || req.id_len > SM2_MAX_ID_LEN)
+               return -EINVAL;
+
+       id_buf = kmalloc(req.id_len, GFP_KERNEL);
+       pk_buf = kmalloc(SM2_POINT_LEN, GFP_KERNEL);
+       dig_buf = kzalloc(SM2_DIGEST_LEN, GFP_KERNEL);
+       if (!id_buf || !pk_buf || !dig_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(id_buf, u64_to_user_ptr(req.id), req.id_len) ||
+           copy_from_user(pk_buf, u64_to_user_ptr(req.public_key),
+                          SM2_POINT_LEN)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       id_dma = cmh_dma_map_single(id_buf, req.id_len, DMA_TO_DEVICE);
+       pk_dma = cmh_dma_map_single(pk_buf, SM2_POINT_LEN, DMA_TO_DEVICE);
+       dig_dma = cmh_dma_map_single(dig_buf, SM2_DIGEST_LEN,
+                                    DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(id_dma) || cmh_dma_map_error(pk_dma) ||
+           cmh_dma_map_error(dig_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_sm2_id_digest(&vcq[1], core_id, id_dma, pk_dma, dig_dma,
+                                 req.id_len, 0);
+       vcq_add_pke_flush(&vcq[2], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(dig_dma))
+               cmh_dma_unmap_single(dig_dma, SM2_DIGEST_LEN,
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, SM2_POINT_LEN, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(id_dma))
+               cmh_dma_unmap_single(id_dma, req.id_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.digest),
+                                dig_buf, SM2_DIGEST_LEN))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(dig_buf);
+       kfree(pk_buf);
+       kfree(id_buf);
+       return ret;
+}
+
+/* -- SM2_ECDH_HASH --------------------- */
+
+/**
+ * cmh_mgmt_sm2_ecdh_hash() - Handle CMH_MGMT_IOC_SM2_ECDH_HASH ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_ecdh_hash(void __user *argp)
+{
+       struct cmh_ioctl_sm2_ecdh_hash req;
+       /* Phase 1: hdr + sys_new + sm2_ecdh_hash + pke_flush; reused for Phase 2 */
+       struct vcq_cmd vcq[4];
+       u32 sk_type, core_id;
+       u8 *peer_dig_buf, *dig_buf, *sk_buf;
+       u64 *ref_buf;
+       dma_addr_t peer_dig_dma, dig_dma, sk_dma, ref_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.__reserved)
+               return -EINVAL;
+
+       sk_type = SYS_TYPE_SET(SYS_TYPE_FLAG_PT, CORE_ID_PKE);
+       core_id = cmh_core_default_id(CMH_CORE_PKE);
+
+       peer_dig_buf = kmalloc(SM2_DIGEST_LEN, GFP_KERNEL);
+       dig_buf = kmalloc(SM2_DIGEST_LEN, GFP_KERNEL);
+       sk_buf = kzalloc(SM2_SHARED_KEY_LEN, GFP_KERNEL);
+       ref_buf = kzalloc_obj(u64, GFP_KERNEL);
+       if (!peer_dig_buf || !dig_buf || !sk_buf || !ref_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(peer_dig_buf, u64_to_user_ptr(req.peer_id_digest),
+                          SM2_DIGEST_LEN) ||
+           copy_from_user(dig_buf, u64_to_user_ptr(req.id_digest),
+                          SM2_DIGEST_LEN)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       peer_dig_dma = cmh_dma_map_single(peer_dig_buf, SM2_DIGEST_LEN,
+                                         DMA_TO_DEVICE);
+       dig_dma = cmh_dma_map_single(dig_buf, SM2_DIGEST_LEN, DMA_TO_DEVICE);
+       sk_dma = cmh_dma_map_single(sk_buf, SM2_SHARED_KEY_LEN,
+                                   DMA_FROM_DEVICE);
+       ref_dma = cmh_dma_map_single(ref_buf, sizeof(u64), DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(peer_dig_dma) || cmh_dma_map_error(dig_dma) ||
+           cmh_dma_map_error(sk_dma) || cmh_dma_map_error(ref_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       /*
+        * Phase 1: sys_new(shared_key_ref) + SM2_ECDH_HASH
+        * The shared_point_ref from the ECDH step is passed directly
+        * as a DS reference -- the eSW hub reads it from DS.
+        */
+       vcq_set_header(&vcq[0], 4);
+       vcq_add_sys_new(&vcq[1], 0, ref_dma, SM2_SHARED_KEY_LEN);
+       vcq[1].id |= PKE_SWAP_FLAGS;
+       vcq_add_pke_sm2_ecdh_hash(&vcq[2], core_id, peer_dig_dma, dig_dma,
+                                 req.shared_point_ref, SYS_REF_LAST,
+                                 sk_type, PKE_SWAP_FLAGS);
+       vcq_add_pke_flush(&vcq[3], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, 4, 1, MGMT_MBX);
+       if (ret)
+               goto out_unmap;
+
+       /* Sync bounce buffer so CPU sees the DMA-written ref */
+       cmh_dma_sync_for_cpu(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+
+       /* Phase 2: read shared key from DS -> DMA */
+       vcq_set_header(&vcq[0], 3);
+       vcq_add_sys_data(&vcq[1], *ref_buf, sk_dma, SM2_SHARED_KEY_LEN);
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, 3, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(ref_dma))
+               cmh_dma_unmap_single(ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(sk_dma))
+               cmh_dma_unmap_single(sk_dma, SM2_SHARED_KEY_LEN,
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(dig_dma))
+               cmh_dma_unmap_single(dig_dma, SM2_DIGEST_LEN, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(peer_dig_dma))
+               cmh_dma_unmap_single(peer_dig_dma, SM2_DIGEST_LEN,
+                                    DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.shared_key),
+                                sk_buf, SM2_SHARED_KEY_LEN))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(ref_buf);
+       kfree_sensitive(sk_buf);
+       kfree(dig_buf);
+       kfree(peer_dig_buf);
+       return ret;
+}
+
+/* -- SM2_DEC_HASH ---------------------- */
+
+/**
+ * cmh_mgmt_sm2_dec_hash() - Handle CMH_MGMT_IOC_SM2_DEC_HASH ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_dec_hash(void __user *argp)
+{
+       struct cmh_ioctl_sm2_dec_hash req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 msg_len, core_id;
+       u8 *ct_buf, *dp_buf, *pt_buf;
+       dma_addr_t ct_dma, dp_dma, pt_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (req.ciphertext_len <= SM2_CT_OVERHEAD ||
+           req.ciphertext_len > SM2_MAX_CT_LEN)
+               return -EINVAL;
+
+       msg_len = req.ciphertext_len - SM2_CT_OVERHEAD;
+       core_id = cmh_core_default_id(CMH_CORE_PKE);
+
+       ct_buf = kmalloc(req.ciphertext_len, GFP_KERNEL);
+       dp_buf = kmalloc(SM2_POINT_LEN, GFP_KERNEL);
+       pt_buf = kzalloc(msg_len, GFP_KERNEL);
+       if (!ct_buf || !dp_buf || !pt_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(ct_buf, u64_to_user_ptr(req.ciphertext),
+                          req.ciphertext_len) ||
+           copy_from_user(dp_buf, u64_to_user_ptr(req.dec_point),
+                          SM2_POINT_LEN)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       ct_dma = cmh_dma_map_single(ct_buf, req.ciphertext_len,
+                                   DMA_TO_DEVICE);
+       dp_dma = cmh_dma_map_single(dp_buf, SM2_POINT_LEN, DMA_TO_DEVICE);
+       pt_dma = cmh_dma_map_single(pt_buf, msg_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(ct_dma) || cmh_dma_map_error(dp_dma) ||
+           cmh_dma_map_error(pt_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_sm2_dec_hash(&vcq[1], core_id, ct_dma, dp_dma, pt_dma,
+                                req.ciphertext_len, 0);
+       vcq_add_pke_flush(&vcq[2], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(pt_dma))
+               cmh_dma_unmap_single(pt_dma, msg_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(dp_dma))
+               cmh_dma_unmap_single(dp_dma, SM2_POINT_LEN, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(ct_dma))
+               cmh_dma_unmap_single(ct_dma, req.ciphertext_len,
+                                    DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.plaintext),
+                                pt_buf, msg_len))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree_sensitive(pt_buf);
+       kfree_sensitive(dp_buf);
+       kfree(ct_buf);
+       return ret;
+}
+
+/* -- SM2_ENC_HASH ---------------------- */
+
+/**
+ * cmh_mgmt_sm2_enc_hash() - Handle CMH_MGMT_IOC_SM2_ENC_HASH ioctl
+ * @argp: User-space ioctl argument pointer
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_mgmt_sm2_enc_hash(void __user *argp)
+{
+       struct cmh_ioctl_sm2_enc_hash req;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u32 ct_len, core_id;
+       u8 *msg_buf, *ep_buf, *ct_buf;
+       dma_addr_t msg_dma, ep_dma, ct_dma;
+       int ret;
+
+       if (copy_from_user(&req, argp, sizeof(req)))
+               return -EFAULT;
+       if (req.version != CMH_MGMT_V1)
+               return -EINVAL;
+       if (!req.message_len || req.message_len > SM2_MAX_MSG_LEN)
+               return -EINVAL;
+
+       ct_len = SM2_CT_OVERHEAD + req.message_len;
+       core_id = cmh_core_default_id(CMH_CORE_PKE);
+
+       msg_buf = kmalloc(req.message_len, GFP_KERNEL);
+       ep_buf = kmalloc(SM2_POINT_LEN, GFP_KERNEL);
+       ct_buf = kzalloc(ct_len, GFP_KERNEL);
+       if (!msg_buf || !ep_buf || !ct_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       if (copy_from_user(msg_buf, u64_to_user_ptr(req.message),
+                          req.message_len) ||
+           copy_from_user(ep_buf, u64_to_user_ptr(req.enc_point),
+                          SM2_POINT_LEN)) {
+               ret = -EFAULT;
+               goto out_free;
+       }
+
+       msg_dma = cmh_dma_map_single(msg_buf, req.message_len, DMA_TO_DEVICE);
+       ep_dma = cmh_dma_map_single(ep_buf, SM2_POINT_LEN, DMA_TO_DEVICE);
+       ct_dma = cmh_dma_map_single(ct_buf, ct_len, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(msg_dma) || cmh_dma_map_error(ep_dma) ||
+           cmh_dma_map_error(ct_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_sm2_enc_hash(&vcq[1], core_id, msg_dma, ep_dma, ct_dma,
+                                req.message_len, 0);
+       vcq_add_pke_flush(&vcq[2], core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, MGMT_MBX);
+
+out_unmap:
+       if (!cmh_dma_map_error(ct_dma))
+               cmh_dma_unmap_single(ct_dma, ct_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(ep_dma))
+               cmh_dma_unmap_single(ep_dma, SM2_POINT_LEN, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(msg_dma))
+               cmh_dma_unmap_single(msg_dma, req.message_len, DMA_TO_DEVICE);
+
+       if (!ret) {
+               if (copy_to_user(u64_to_user_ptr(req.ciphertext),
+                                ct_buf, ct_len))
+                       ret = -EFAULT;
+       }
+
+out_free:
+       kfree(ct_buf);
+       kfree(ep_buf);
+       kfree_sensitive(msg_buf);
+       return ret;
+}
diff --git a/drivers/crypto/cmh/cmh_sys.c b/drivers/crypto/cmh/cmh_sys.c
new file mode 100644
index 000000000000..b01d058e6d89
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_sys.c
@@ -0,0 +1,376 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SYS Core VCQ Builders
+ *
+ * VCQ builder functions for SYS core datastore commands.  Each function
+ * populates a single vcq_cmd slot.  Callers (cmh_mgmt.c, cmh_key.c)
+ * assemble complete VCQs by combining header + command(s) + flush,
+ * then submit via cmh_tm_submit_sync().
+ *
+ * Hardware-required datastore semantics
+ * --------------------------------------
+ * The commands below (NEW, WRITE, DATA, FIND, DELETE, FLUSH) are
+ * direct mappings of the eSW firmware SYS core command set.  The
+ * eSW maintains per-mailbox datastore namespaces with two object
+ * classes:
+ *
+ *   SYS_REF_TEMP   -- Temporary objects.  Lifetime is scoped to the
+ *                      current mailbox slot; reclaimed automatically
+ *                      when the slot is reused or on explicit FLUSH.
+ *                      Used for raw-key provisioning on every VCQ.
+ *
+ *   SYS_REF_PERSIST -- Persistent objects.  Survive across slots;
+ *                      require explicit DELETE to reclaim.  Identified
+ *                      by a 64-bit Content ID (CID) and resolved to
+ *                      a per-MBX ref via SYS_CMD_FIND.
+ *
+ * These semantics are hardware requirements, not driver policy.
+ * The per-MBX temp-stack and per-MBX ref namespace are eSW firmware
+ * design constraints that cannot be changed by the kernel driver.
+ */
+
+#include <linux/string.h>
+
+#include "cmh_sys.h"
+
+/**
+ * vcq_add_sys_flush() - Build a SYS_FLUSH VCQ command
+ * @slot: VCQ command slot to populate
+ */
+void vcq_add_sys_flush(struct vcq_cmd *slot)
+{
+       vcq_add_flush(slot, CORE_ID_SYS);
+}
+
+/**
+ * vcq_add_sys_new() - Build a SYS_NEW VCQ command
+ * @slot: VCQ command slot to populate
+ * @cid: Content identifier for the new datastore object
+ * @ref_dma: DMA address of the object reference buffer
+ * @len: Length of the object data in bytes
+ */
+void vcq_add_sys_new(struct vcq_cmd *slot, u64 cid, u64 ref_dma, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_NEW);
+       slot->hwc.sys.cmd_new.cid = cid;
+       slot->hwc.sys.cmd_new.ref = ref_dma;
+       slot->hwc.sys.cmd_new.len = len;
+}
+
+/**
+ * vcq_add_sys_write() - Build a SYS_WRITE VCQ command
+ * @slot: VCQ command slot to populate
+ * @ref: Datastore object reference handle
+ * @src_dma: DMA address of source data buffer
+ * @wrap_key: Wrapping key reference (0 if none)
+ * @len: Length of data to write in bytes
+ * @sys_type: Datastore object type identifier
+ */
+void vcq_add_sys_write(struct vcq_cmd *slot, u64 ref, u64 src_dma,
+                      u64 wrap_key, u32 len, u32 sys_type)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_WRITE);
+       slot->hwc.sys.cmd_write.ref = ref;
+       slot->hwc.sys.cmd_write.src = src_dma;
+       slot->hwc.sys.cmd_write.key = wrap_key;
+       slot->hwc.sys.cmd_write.len = len;
+       slot->hwc.sys.cmd_write.type = sys_type;
+}
+
+/**
+ * vcq_add_sys_read() - Build a SYS_READ VCQ command
+ * @slot: VCQ command slot to populate
+ * @ref: Datastore object reference handle
+ * @dst_dma: DMA address of destination buffer
+ * @wrap_key: Wrapping key reference (0 if none)
+ * @len: Length of data to read in bytes
+ */
+void vcq_add_sys_read(struct vcq_cmd *slot, u64 ref, u64 dst_dma,
+                     u64 wrap_key, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_READ);
+       slot->hwc.sys.cmd_read.ref = ref;
+       slot->hwc.sys.cmd_read.dst = dst_dma;
+       slot->hwc.sys.cmd_read.key = wrap_key;
+       slot->hwc.sys.cmd_read.len = len;
+}
+
+/**
+ * vcq_add_sys_data() - Build a SYS_DATA VCQ command
+ * @slot: VCQ command slot to populate
+ * @ref: Datastore object reference handle
+ * @dst_dma: DMA address of destination buffer
+ * @len: Length of data section to read in bytes
+ */
+void vcq_add_sys_data(struct vcq_cmd *slot, u64 ref, u64 dst_dma, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_DATA);
+       slot->hwc.sys.cmd_data.ref = ref;
+       slot->hwc.sys.cmd_data.dst = dst_dma;
+       slot->hwc.sys.cmd_data.len = len;
+}
+
+/**
+ * vcq_add_sys_find() - Build a SYS_FIND VCQ command
+ * @slot: VCQ command slot to populate
+ * @cid: Content identifier to search for
+ * @dst_dma: DMA address of destination buffer for result
+ * @len: Length of destination buffer in bytes
+ */
+void vcq_add_sys_find(struct vcq_cmd *slot, u64 cid, u64 dst_dma, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_FIND);
+       slot->hwc.sys.cmd_find.cid = cid;
+       slot->hwc.sys.cmd_find.dst = dst_dma;
+       slot->hwc.sys.cmd_find.len = len;
+}
+
+/**
+ * vcq_add_sys_list() - Build a SYS_LIST VCQ command
+ * @slot: VCQ command slot to populate
+ * @ref: Datastore object reference for enumeration start
+ * @dst_dma: DMA address of destination buffer for list
+ * @len: Length of destination buffer in bytes
+ */
+void vcq_add_sys_list(struct vcq_cmd *slot, u64 ref, u64 dst_dma, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_LIST);
+       slot->hwc.sys.cmd_list.ref = ref;
+       slot->hwc.sys.cmd_list.dst = dst_dma;
+       slot->hwc.sys.cmd_list.len = len;
+}
+
+/**
+ * vcq_add_sys_grant() - Build a SYS_GRANT VCQ command
+ * @slot: VCQ command slot to populate
+ * @ref: Datastore object reference handle
+ * @read: Read permission bitmask
+ * @write: Write permission bitmask
+ * @execute: Execute permission bitmask
+ */
+void vcq_add_sys_grant(struct vcq_cmd *slot, u64 ref, u64 read,
+                      u64 write, u64 execute)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_GRANT);
+       slot->hwc.sys.cmd_grant.ref = ref;
+       slot->hwc.sys.cmd_grant.read = read;
+       slot->hwc.sys.cmd_grant.write = write;
+       slot->hwc.sys.cmd_grant.execute = execute;
+}
+
+/**
+ * vcq_add_sys_export() - Build a SYS_EXPORT VCQ command
+ * @slot: VCQ command slot to populate
+ * @cid: Content identifier of object to export
+ * @dst_dma: DMA address of destination buffer for wrapped blob
+ * @wrap_key: Wrapping key reference for export
+ * @len: Length of destination buffer in bytes
+ */
+void vcq_add_sys_export(struct vcq_cmd *slot, u64 cid, u64 dst_dma,
+                       u64 wrap_key, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_EXPORT);
+       slot->hwc.sys.cmd_export.cid = cid;
+       slot->hwc.sys.cmd_export.dst = dst_dma;
+       slot->hwc.sys.cmd_export.key = wrap_key;
+       slot->hwc.sys.cmd_export.len = len;
+}
+
+/**
+ * vcq_add_sys_import() - Build a SYS_IMPORT VCQ command
+ * @slot: VCQ command slot to populate
+ * @src_dma: DMA address of wrapped datastore blob to import
+ * @wrap_key: Wrapping key reference for unwrapping
+ * @len: Length of wrapped blob in bytes
+ */
+void vcq_add_sys_import(struct vcq_cmd *slot, u64 src_dma,
+                       u64 wrap_key, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_SYS, 0, 1, SYS_CMD_IMPORT);
+       slot->hwc.sys.cmd_import.src = src_dma;
+       slot->hwc.sys.cmd_import.key = wrap_key;
+       slot->hwc.sys.cmd_import.len = len;
+}
+
+/* -- KIC Core VCQ Builders --------------------- */
+
+/**
+ * vcq_add_kic_hkdf1() - Build a KIC HKDF-Expand VCQ command
+ * @slot: VCQ command slot to populate
+ * @dst: Datastore reference for derived key output
+ * @base: Datastore reference for base key input
+ * @label_dma: DMA address of HKDF label/info buffer
+ * @key_len: Derived key length in bytes
+ * @label_len: Length of label buffer in bytes
+ * @type: Derived key datastore type
+ */
+void vcq_add_kic_hkdf1(struct vcq_cmd *slot, u64 dst, u64 base,
+                      u64 label_dma, u32 key_len, u32 label_len, u32 type)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_KIC, 0, 1, KIC_CMD_HKDF1);
+       slot->hwc.kic.cmd_hkdf1.dst = dst;
+       slot->hwc.kic.cmd_hkdf1.base = base;
+       slot->hwc.kic.cmd_hkdf1.label = label_dma;
+       slot->hwc.kic.cmd_hkdf1.llen = label_len;
+       slot->hwc.kic.cmd_hkdf1.len = key_len;
+       slot->hwc.kic.cmd_hkdf1.type = type;
+}
+
+/**
+ * vcq_add_kic_hkdf2() - Build a KIC HKDF-with-salt VCQ command
+ * @slot: VCQ command slot to populate
+ * @dst: Datastore reference for derived key output
+ * @base: Datastore reference for base key input
+ * @salt: Datastore reference for HKDF salt key
+ * @label_dma: DMA address of HKDF label/info buffer
+ * @key_len: Derived key length in bytes
+ * @label_len: Length of label buffer in bytes
+ * @type: Derived key datastore type
+ */
+void vcq_add_kic_hkdf2(struct vcq_cmd *slot, u64 dst, u64 base, u64 salt,
+                      u64 label_dma, u32 key_len, u32 label_len, u32 type)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_KIC, 0, 1, KIC_CMD_HKDF2);
+       slot->hwc.kic.cmd_hkdf2.dst = dst;
+       slot->hwc.kic.cmd_hkdf2.base = base;
+       slot->hwc.kic.cmd_hkdf2.salt = salt;
+       slot->hwc.kic.cmd_hkdf2.label = label_dma;
+       slot->hwc.kic.cmd_hkdf2.llen = label_len;
+       slot->hwc.kic.cmd_hkdf2.len = key_len;
+       slot->hwc.kic.cmd_hkdf2.type = type;
+}
+
+/**
+ * vcq_add_kic_aes_cmac_kdf() - Build a KIC AES-CMAC KDF VCQ command
+ * @slot: VCQ command slot to populate
+ * @out_key: Datastore reference for derived key output
+ * @base_key: Datastore reference for base key input
+ * @label_dma: DMA address of KDF label buffer
+ * @key_len: Derived key length in bytes
+ * @label_len: Length of label buffer in bytes
+ * @type: Derived key datastore type
+ */
+void vcq_add_kic_aes_cmac_kdf(struct vcq_cmd *slot, u64 out_key, u64 base_key,
+                             u64 label_dma, u32 key_len, u32 label_len,
+                             u32 type)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_KIC, 0, 1, KIC_CMD_AES_CMAC_KDF);
+       slot->hwc.kic.cmd_aes_cmac_kdf.base_key = base_key;
+       slot->hwc.kic.cmd_aes_cmac_kdf.out_key = out_key;
+       slot->hwc.kic.cmd_aes_cmac_kdf.label = label_dma;
+       slot->hwc.kic.cmd_aes_cmac_kdf.key_len = key_len;
+       slot->hwc.kic.cmd_aes_cmac_kdf.label_len = label_len;
+       slot->hwc.kic.cmd_aes_cmac_kdf.type = type;
+}
+
+/**
+ * vcq_add_kic_dkek_derive() - Build a KIC DKEK derivation VCQ command
+ * @slot: VCQ command slot to populate
+ * @out_key: Datastore reference for derived DKEK output
+ * @base_key: Datastore reference for base key input
+ * @host_id: Host identifier for key binding
+ * @metadata_dma: DMA address of derivation metadata buffer
+ * @metadata_len: Length of metadata buffer in bytes
+ */
+void vcq_add_kic_dkek_derive(struct vcq_cmd *slot, u64 out_key, u64 base_key,
+                            u32 host_id, u64 metadata_dma, u32 metadata_len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_KIC, 0, 1, KIC_CMD_DKEK_DERIVE);
+       slot->hwc.kic.cmd_dkek_derive.base_key = base_key;
+       slot->hwc.kic.cmd_dkek_derive.out_key = out_key;
+       slot->hwc.kic.cmd_dkek_derive.host_id = host_id;
+       slot->hwc.kic.cmd_dkek_derive.metadata = metadata_dma;
+       slot->hwc.kic.cmd_dkek_derive.metadata_len = metadata_len;
+}
+
+/* -- DRBG Core VCQ Builders -------------------- */
+
+/**
+ * vcq_add_drbg_reset() - Build a DRBG reset VCQ command
+ * @slot: VCQ command slot to populate
+ *
+ * Issues DRBG_CMD_RESET which clears the instantiated state, allowing
+ * a subsequent CONFIG to proceed without a double-instantiate error.
+ */
+void vcq_add_drbg_reset(struct vcq_cmd *slot)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_DRBG, 0, 1, DRBG_CMD_RESET);
+}
+
+/**
+ * vcq_add_drbg_config() - Build a DRBG configuration VCQ command
+ * @slot: VCQ command slot to populate
+ * @ratio: Entropy-to-output ratio
+ * @strength: Security strength in bits
+ */
+void vcq_add_drbg_config(struct vcq_cmd *slot, u32 ratio, u32 strength)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_DRBG, 0, 1, DRBG_CMD_CONFIG);
+       slot->hwc.drbg.cmd_config.entropy_ratio = ratio;
+       slot->hwc.drbg.cmd_config.security_strength = strength;
+}
+
+/**
+ * vcq_add_drbg_datastore() - Build a DRBG datastore setup VCQ command
+ * @slot: VCQ command slot to populate
+ * @ref: Datastore object reference handle
+ * @len: Length of datastore allocation in bytes
+ * @type: Datastore object type
+ */
+void vcq_add_drbg_datastore(struct vcq_cmd *slot, u64 ref, u32 len, u32 type)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_DRBG, 0, 1, DRBG_CMD_DATASTORE);
+       slot->hwc.drbg.cmd_datastore.ref = ref;
+       slot->hwc.drbg.cmd_datastore.len = len;
+       slot->hwc.drbg.cmd_datastore.type = type;
+}
+
+/* -- EAC Core VCQ Builder ---------------------- */
+
+/**
+ * vcq_add_eac_read() - Build an EAC read VCQ command
+ * @slot: VCQ command slot to populate
+ * @dst_dma: DMA address of destination buffer
+ * @len: Length of data to read in bytes
+ */
+void vcq_add_eac_read(struct vcq_cmd *slot, u64 dst_dma, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_EAC, 0, 1, EAC_CMD_READ);
+       slot->hwc.eac.cmd_read.dst = dst_dma;
+       slot->hwc.eac.cmd_read.len = len;
+}
diff --git a/drivers/crypto/cmh/include/cmh_key.h b/drivers/crypto/cmh/include/cmh_key.h
new file mode 100644
index 000000000000..bad69c92b892
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_key.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Per-transform key context
+ *
+ * Per-transform key context used by all keyed crypto algorithms (AES,
+ * SM4, CCP, HMAC, KMAC).  Stores raw key bytes supplied via the crypto
+ * API .setkey() callback: the key is DMA-mapped once at setkey time and
+ * written to SYS_REF_TEMP in every VCQ.
+ *
+ * Each keyed algorithm driver embeds a struct cmh_key_ctx in its
+ * per-transform context and calls cmh_key_setkey_raw() from its
+ * .setkey() callback.
+ *
+ * Raw-key atomicity (SYS_REF_TEMP)
+ * ---------------------------------
+ * SYS_CMD_WRITE to SYS_REF_TEMP is packed into the same VCQ as the
+ * algorithm commands (AES_CMD_INIT, HC_CMD_HMAC, etc.).  SYS_REF_TEMP
+ * is per-MBX -- the CMH eSW allocates it in the tail of each mailbox's
+ * own VCQ buffer (mbx_alloc_temp), so concurrent raw-key requests on
+ * different MBXes do not interfere.
+ */
+
+#ifndef CMH_KEY_H
+#define CMH_KEY_H
+
+#include <linux/types.h>
+#include "cmh_config.h"
+#include "cmh_vcq.h"
+
+/* Key context mode */
+enum cmh_key_mode {
+       CMH_KEY_NONE = 0,       /* no key set yet */
+       CMH_KEY_RAW,            /* raw key bytes in memory */
+};
+
+/* Per-transform key context */
+struct cmh_key_ctx {
+       enum cmh_key_mode mode;
+       struct {
+               u8 *data;       /* kmemdup'd raw key bytes */
+               u32 len;        /* key length in bytes */
+               u32 sys_type;   /* SYS_TYPE_SET(flags, core_id) */
+               dma_addr_t dma; /* pre-mapped DMA addr (DMA_TO_DEVICE) */
+       } raw;
+};
+
+/**
+ * cmh_key_setkey_raw() - Store raw key bytes in the transform context
+ * @ctx: Per-transform key context
+ * @key: Raw key bytes
+ * @keylen: Key length in bytes
+ * @core_id: Target algorithm core (e.g. CORE_ID_AES)
+ *
+ * SYS_TYPE_FLAG_PT is set so the written temp key
+ * can be read back as plaintext if needed.  The actual SYS_CMD_WRITE
+ * to SYS_REF_TEMP is deferred to each encrypt/decrypt VCQ, where it
+ * is packed inline for atomicity.
+ *
+ * Return: 0 on success, -ENOMEM on allocation failure.
+ */
+int cmh_key_setkey_raw(struct cmh_key_ctx *ctx, const u8 *key,
+                      u32 keylen, u32 core_id);
+
+/**
+ * cmh_key_destroy() - Free key resources
+ * @ctx: Per-transform key context
+ *
+ * Zeroises and frees the raw key buffer.
+ */
+void cmh_key_destroy(struct cmh_key_ctx *ctx);
+
+/**
+ * cmh_ds_type_to_core_id() - Map datastore key type to core ID
+ * @ds_type: CMH_DS_* key type constant
+ *
+ * Return: Corresponding CORE_ID_*, or CORE_ID_NUM (0x1F) on
+ *         unrecognised type (caller should return -EINVAL).
+ */
+u32 cmh_ds_type_to_core_id(u32 ds_type);
+
+#endif /* CMH_KEY_H */
diff --git a/drivers/crypto/cmh/include/cmh_mgmt.h b/drivers/crypto/cmh/include/cmh_mgmt.h
new file mode 100644
index 000000000000..b211014bd71d
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_mgmt.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH -- Key Management misc_device (/dev/cmh_mgmt)
+ *
+ * ioctl interface for key CRUD + datastore export/import,
+ * PKE operations (RSA, ECDSA, ECDH, EdDSA),
+ * and PQC operations (ML-KEM, ML-DSA, SLH-DSA).
+ *
+ * Registered alongside crypto algorithms in module_init,
+ * unregistered before them in module_exit.
+ */
+
+#ifndef CMH_MGMT_H
+#define CMH_MGMT_H
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_MGMT
+
+/*
+ * Pin all mgmt ioctls to MBX 0 for DS ownership and SYS_REF_TEMP scope.
+ * Shared by cmh_mgmt.c, cmh_mgmt_pke.c, cmh_mgmt_pqc.c, cmh_pke_sm2.c.
+ */
+#define MGMT_MBX       0
+
+/* Maximum DMA buffer size for key data / datastore blobs */
+#define CMH_MGMT_MAX_DATA_LEN  (256 * 1024)  /* 256 KB */
+
+int  cmh_mgmt_register(void);
+void cmh_mgmt_unregister(void);
+
+/* -- PKE ioctl handlers (cmh_mgmt_pke.c) -- */
+int cmh_mgmt_pke_rsa_enc(void __user *argp);
+int cmh_mgmt_pke_rsa_dec(void __user *argp);
+int cmh_mgmt_pke_rsa_crt_dec(void __user *argp);
+int cmh_mgmt_pke_rsa_keygen(void __user *argp);
+int cmh_mgmt_pke_ecdsa_sign(void __user *argp);
+int cmh_mgmt_pke_ecdh(void __user *argp);
+int cmh_mgmt_pke_ecdh_keygen(void __user *argp);
+int cmh_mgmt_pke_eddsa_sign(void __user *argp);
+int cmh_mgmt_pke_eddsa_verify(void __user *argp);
+int cmh_mgmt_pke_ec_keygen(void __user *argp);
+int cmh_mgmt_pke_ec_pubgen(void __user *argp);
+int cmh_mgmt_pke_eddsa_keygen_sca(void __user *argp);
+
+/* -- PQC ioctl handlers (cmh_mgmt_pqc.c) -- */
+int cmh_mgmt_ml_kem_keygen(void __user *argp);
+int cmh_mgmt_ml_kem_enc(void __user *argp);
+int cmh_mgmt_ml_kem_dec(void __user *argp);
+int cmh_mgmt_ml_dsa_keygen(void __user *argp);
+int cmh_mgmt_ml_dsa_sign(void __user *argp);
+int cmh_mgmt_slhdsa_keygen(void __user *argp);
+int cmh_mgmt_slhdsa_sign(void __user *argp);
+int cmh_mgmt_slhdsa_sign_prehash(void __user *argp);
+
+#else /* !CONFIG_CRYPTO_DEV_CMH_MGMT */
+
+static inline int  cmh_mgmt_register(void) { return 0; }
+static inline void cmh_mgmt_unregister(void) { }
+
+#endif /* CONFIG_CRYPTO_DEV_CMH_MGMT */
+
+#endif /* CMH_MGMT_H */
diff --git a/drivers/crypto/cmh/include/cmh_pke.h b/drivers/crypto/cmh/include/cmh_pke.h
new file mode 100644
index 000000000000..dcfdb3fc3cd6
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_pke.h
@@ -0,0 +1,245 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- PKE Common Types and Helpers
+ *
+ * Shared definitions for RSA, ECDSA, ECDH, EdDSA, and SM2 drivers.
+ * Curve -> coordinate-length mapping, VCQ byte-swap flags, and
+ * common VCQ builder prototypes.
+ */
+
+#ifndef CMH_PKE_H
+#define CMH_PKE_H
+
+#include <linux/types.h>
+#include "cmh_vcq.h"
+#include "cmh_pke_abi.h"
+
+/* VCQ byte-swap flags for DMA transfers (per CMH VCQ ABI) */
+#define VCQ_FLAG_SWAP_BYTES    0x400000U
+#define VCQ_FLAG_SWAP_WORDS    0x200000U
+
+/* VCQ byte-swap flags for PKE -- big-endian data on LE bus */
+#define PKE_SWAP_FLAGS (VCQ_FLAG_SWAP_BYTES | VCQ_FLAG_SWAP_WORDS)
+
+/* VCQ layout: header + [SYS_WRITE] + PKE_CMD + flush */
+#define PKE_VCQ_CMDS_MIN       3       /* header + cmd + flush */
+#define PKE_VCQ_CMDS_MAX       4       /* header + SYS_WRITE + cmd + flush */
+
+/* Max RSA key size in bytes (4096 bits) */
+#define PKE_RSA_MAX_BYTES      512
+#define PKE_RSA_MIN_BITS       1024
+#define PKE_RSA_MAX_BITS       4096
+
+/* EdDSA SCA: Ed448 blinded private key length (bytes) */
+#define PKE_ED448_SK_SCA_LEN   226
+
+/**
+ * pke_curve_clen() - Get EC curve coordinate length in bytes
+ * @curve: PKE curve identifier (PKE_CURVE_*)
+ *
+ * Return: Coordinate length in bytes, or 0 for unknown curves.
+ */
+static inline u32 pke_curve_clen(u32 curve)
+{
+       switch (curve) {
+       case PKE_CURVE_P192:
+       case PKE_CURVE_BP192R1:
+               return 24;
+       case PKE_CURVE_P224:
+       case PKE_CURVE_BP224R1:
+               return 28;
+       case PKE_CURVE_P256:
+       case PKE_CURVE_SECP256K1:
+       case PKE_CURVE_BP256R1:
+       case PKE_CURVE_ANSSI_FRP256V1:
+       case PKE_CURVE_SM2:
+       case PKE_CURVE_25519:
+               return 32;
+       case PKE_CURVE_BP320R1:
+               return 40;
+       case PKE_CURVE_P384:
+       case PKE_CURVE_BP384R1:
+               return 48;
+       case PKE_CURVE_BP512R1:
+               return 64;
+       case PKE_CURVE_P521:
+               return 68; /* ceil(521/8) = 66, ABI uses ALIGN(66, 4) = 68 */
+       case PKE_CURVE_448:
+               return 56;
+       default:
+               return 0;
+       }
+}
+
+/**
+ * pke_curve_bits() - Get EC curve size in bits
+ * @curve: PKE curve identifier (PKE_CURVE_*)
+ *
+ * Return: Curve size in bits, or 0 for unknown curves.
+ */
+static inline u32 pke_curve_bits(u32 curve)
+{
+       switch (curve) {
+       case PKE_CURVE_P192:
+       case PKE_CURVE_BP192R1:
+               return 192;
+       case PKE_CURVE_P224:
+       case PKE_CURVE_BP224R1:
+               return 224;
+       case PKE_CURVE_P256:
+       case PKE_CURVE_SECP256K1:
+       case PKE_CURVE_BP256R1:
+       case PKE_CURVE_ANSSI_FRP256V1:
+       case PKE_CURVE_SM2:
+       case PKE_CURVE_25519:
+               return 256;
+       case PKE_CURVE_BP320R1:
+               return 320;
+       case PKE_CURVE_P384:
+       case PKE_CURVE_BP384R1:
+               return 384;
+       case PKE_CURVE_BP512R1:
+               return 512;
+       case PKE_CURVE_P521:
+               return 521;
+       case PKE_CURVE_448:
+               return 448;
+       default:
+               return 0;
+       }
+}
+
+/**
+ * pke_eddsa_key_len() - Get EdDSA key/pubkey length
+ * @curve: PKE curve identifier (PKE_CURVE_25519 or PKE_CURVE_448)
+ *
+ * Ed25519 uses 32 bytes (== clen), Ed448 uses 57 bytes (clen + 1
+ * flag byte per RFC 8032).  Signature length is 2 * pke_eddsa_key_len().
+ *
+ * Return: Key length in bytes.
+ */
+static inline u32 pke_eddsa_key_len(u32 curve)
+{
+       u32 clen = pke_curve_clen(curve);
+
+       return (curve == PKE_CURVE_448) ? clen + 1 : clen;
+}
+
+/**
+ * pke_curve_is_edwards() - Check if curve uses Edwards form
+ * @curve: PKE curve identifier (PKE_CURVE_*)
+ *
+ * Return: true for Curve25519 and Curve448, false otherwise.
+ */
+static inline bool pke_curve_is_edwards(u32 curve)
+{
+       return curve == PKE_CURVE_25519 || curve == PKE_CURVE_448;
+}
+
+/**
+ * pke_swap_flags() - Get VCQ byte-swap flags for a given curve
+ * @curve: PKE curve identifier (PKE_CURVE_*)
+ *
+ * Weierstrass curves need byte+word swap; Edwards curves do not.
+ *
+ * Return: VCQ swap flags to OR into the command ID.
+ */
+static inline u32 pke_swap_flags(u32 curve)
+{
+       return pke_curve_is_edwards(curve) ? 0 : PKE_SWAP_FLAGS;
+}
+
+/* Common VCQ builder prototypes */
+
+void vcq_add_pke_flush(struct vcq_cmd *slot, u32 core_id);
+
+void vcq_add_pke_rsa_enc(struct vcq_cmd *slot, u32 core_id, u32 bits, u32 e_len,
+                        u64 e_dma, u64 n_dma, u64 m_dma, u64 c_dma,
+                        u32 flags);
+
+void vcq_add_pke_rsa_dec(struct vcq_cmd *slot, u32 core_id, u32 bits, u32 e_len,
+                        u64 e_dma, u64 n_dma, u64 c_dma, u64 m_dma,
+                        u64 d_ref, u32 flags);
+
+void vcq_add_pke_rsa_crt_dec(struct vcq_cmd *slot, u32 core_id, u32 bits, u32 e_len,
+                            u64 e_dma, u64 n_dma, u64 c_dma, u64 m_dma,
+                            u64 crt_ref, u32 flags);
+
+void vcq_add_pke_ecdsa_verify(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 dlen,
+                             u64 pk_dma, u64 dig_dma, u64 sig_dma,
+                             u64 rp_dma, u32 flags);
+
+void vcq_add_pke_ecdsa_sign(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                           u64 dig_dma, u64 sig_dma, u64 sk_ref,
+                           u32 dlen, u32 flags);
+
+void vcq_add_pke_ecdsa_pubgen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                             u64 pk_dma, u64 sk_ref, u32 flags);
+
+void vcq_add_pke_ecdsa_keygen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                             u64 sk_ref, u32 sk_type, u32 flags);
+
+void vcq_add_pke_ecdh_keygen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                            u64 pkx_dma, u64 sk_ref, u32 flags);
+
+void vcq_add_pke_ecdh(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                     u32 sslen, u32 ss_type, u64 peer_dma, u64 sk_ref,
+                     u64 ss_ref, u32 flags);
+
+void vcq_add_pke_eddsa_verify(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 dlen,
+                             u64 pky_dma, u64 dig_dma, u64 sig_dma,
+                             u64 rp_dma, u32 flags);
+
+void vcq_add_pke_eddsa_sign(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                           u64 dig_dma, u64 sig_dma, u64 sk_ref,
+                           u32 dlen, u32 flags);
+
+void vcq_add_pke_eddsa_pubgen(struct vcq_cmd *slot, u32 core_id, u32 curve, u32 sklen,
+                             u64 pky_dma, u64 sk_ref, u32 flags);
+
+void vcq_add_pke_eddsa_keygen_sca(struct vcq_cmd *slot, u32 core_id, u32 curve,
+                                 u64 sk_ref, u64 sca_sk_ref);
+
+/* SM2 VCQ builders */
+
+void vcq_add_pke_sm2_ecdh_keygen(struct vcq_cmd *slot, u32 core_id, u64 nonce_dma,
+                                u64 session_key_dma, u32 nonce_len, u32 flags);
+
+void vcq_add_pke_sm2_ecdh(struct vcq_cmd *slot, u32 core_id, u32 nonce_len,
+                         u32 private_key_len, u64 nonce_dma,
+                         u64 peer_pk_dma, u64 peer_sk_dma,
+                         u64 priv_ref, u64 sp_ref, u32 sp_type, u32 flags);
+
+void vcq_add_pke_sm2_dec_point(struct vcq_cmd *slot, u32 core_id, u32 ct_len,
+                              u32 pk_len, u64 ct_dma, u64 dp_dma,
+                              u64 priv_ref, u32 flags);
+
+void vcq_add_pke_sm2_enc_point(struct vcq_cmd *slot, u32 core_id, u64 nonce_dma,
+                              u64 pk_dma, u64 ct_dma, u64 ep_dma,
+                              u32 nonce_len, u32 flags);
+
+void vcq_add_pke_sm2_id_digest(struct vcq_cmd *slot, u32 core_id, u64 id_dma,
+                              u64 pk_dma, u64 dig_dma, u32 id_len,
+                              u32 flags);
+
+void vcq_add_pke_sm2_ecdh_hash(struct vcq_cmd *slot, u32 core_id, u64 peer_dig_dma,
+                              u64 dig_dma, u64 sp_ref, u64 sk_ref,
+                              u32 sk_type, u32 flags);
+
+void vcq_add_pke_sm2_dec_hash(struct vcq_cmd *slot, u32 core_id, u64 ct_dma,
+                             u64 dp_dma, u64 pt_dma, u32 ct_len, u32 flags);
+
+void vcq_add_pke_sm2_enc_hash(struct vcq_cmd *slot, u32 core_id, u64 msg_dma,
+                             u64 ep_dma, u64 ct_dma, u32 msg_len, u32 flags);
+
+/* Registration */
+
+int cmh_pke_rsa_register(void);
+void cmh_pke_rsa_unregister(void);
+int cmh_pke_ecdsa_register(void);
+void cmh_pke_ecdsa_unregister(void);
+int cmh_pke_ecdh_register(void);
+void cmh_pke_ecdh_unregister(void);
+
+#endif /* CMH_PKE_H */
diff --git a/drivers/crypto/cmh/include/cmh_pke_sm2.h b/drivers/crypto/cmh/include/cmh_pke_sm2.h
new file mode 100644
index 000000000000..a2c7164b8d49
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_pke_sm2.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SM2 PKE Ioctl Handler Declarations
+ *
+ * SM2 signature (GM/T 0003.2) requires the caller to compute
+ * ZA = SM3(ENTLA || IDA || a || b || xG || yG || xA || yA)
+ * and pass SM3(ZA || M) as the digest to the sign/verify path.
+ * The CMH eSW does NOT compute ZA internally; the full
+ * identity pre-hash is the caller's responsibility.
+ *
+ * For the in-kernel akcipher "sm2" algorithm this means the
+ * caller (e.g. asymmetric_key subsystem) must pre-hash with ZA
+ * before invoking verify.  The SM2_ID_DIGEST ioctl below can
+ * compute ZA for userspace callers of the misc-device path.
+ */
+
+#ifndef CMH_PKE_SM2_H
+#define CMH_PKE_SM2_H
+
+int cmh_mgmt_sm2_ecdh_keygen(void __user *argp);
+int cmh_mgmt_sm2_ecdh(void __user *argp);
+int cmh_mgmt_sm2_dec_point(void __user *argp);
+int cmh_mgmt_sm2_enc_point(void __user *argp);
+int cmh_mgmt_sm2_id_digest(void __user *argp);
+int cmh_mgmt_sm2_ecdh_hash(void __user *argp);
+int cmh_mgmt_sm2_dec_hash(void __user *argp);
+int cmh_mgmt_sm2_enc_hash(void __user *argp);
+
+#endif /* CMH_PKE_SM2_H */
diff --git a/drivers/crypto/cmh/include/cmh_pqc.h b/drivers/crypto/cmh/include/cmh_pqc.h
new file mode 100644
index 000000000000..cd4761a0ce5c
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_pqc.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- PQC Algorithm Registration
+ *
+ * Registration/unregistration functions for PQC akcipher algorithms:
+ * ML-DSA, SLH-DSA, LMS, XMSS.
+ */
+
+#ifndef CMH_PQC_H
+#define CMH_PQC_H
+
+int cmh_pqc_mldsa_register(void);
+void cmh_pqc_mldsa_unregister(void);
+
+int cmh_pqc_slhdsa_register(void);
+void cmh_pqc_slhdsa_unregister(void);
+
+int cmh_pqc_lms_register(void);
+void cmh_pqc_lms_unregister(void);
+
+int cmh_pqc_xmss_register(void);
+void cmh_pqc_xmss_unregister(void);
+
+#endif /* CMH_PQC_H */
diff --git a/drivers/crypto/cmh/include/cmh_sys.h b/drivers/crypto/cmh/include/cmh_sys.h
new file mode 100644
index 000000000000..dd336b67bd65
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_sys.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SYS Core VCQ Builders
+ *
+ * VCQ builder functions for SYS core commands (NEW, WRITE, READ,
+ * FIND, GRANT, DATA, EXPORT, IMPORT).  Each builder populates one
+ * vcq_cmd slot with the appropriate magic, command ID, and payload.
+ *
+ * Callers combine these with vcq_set_header() + vcq_add_flush()
+ * and submit via cmh_tm_submit_sync().
+ */
+
+#ifndef CMH_SYS_H
+#define CMH_SYS_H
+
+#include "cmh_vcq.h"
+
+void vcq_add_sys_new(struct vcq_cmd *slot, u64 cid, u64 ref_dma, u32 len);
+void vcq_add_sys_write(struct vcq_cmd *slot, u64 ref, u64 src_dma,
+                      u64 wrap_key, u32 len, u32 sys_type);
+void vcq_add_sys_read(struct vcq_cmd *slot, u64 ref, u64 dst_dma,
+                     u64 wrap_key, u32 len);
+void vcq_add_sys_data(struct vcq_cmd *slot, u64 ref, u64 dst_dma, u32 len);
+void vcq_add_sys_find(struct vcq_cmd *slot, u64 cid, u64 dst_dma, u32 len);
+void vcq_add_sys_list(struct vcq_cmd *slot, u64 ref, u64 dst_dma, u32 len);
+void vcq_add_sys_grant(struct vcq_cmd *slot, u64 ref, u64 read,
+                      u64 write, u64 execute);
+void vcq_add_sys_export(struct vcq_cmd *slot, u64 cid, u64 dst_dma,
+                       u64 wrap_key, u32 len);
+void vcq_add_sys_import(struct vcq_cmd *slot, u64 src_dma,
+                       u64 wrap_key, u32 len);
+
+/* KIC core VCQ builders */
+void vcq_add_kic_hkdf1(struct vcq_cmd *slot, u64 dst, u64 base,
+                      u64 label_dma, u32 key_len, u32 label_len, u32 type);
+void vcq_add_kic_hkdf2(struct vcq_cmd *slot, u64 dst, u64 base, u64 salt,
+                      u64 label_dma, u32 key_len, u32 label_len, u32 type);
+void vcq_add_kic_aes_cmac_kdf(struct vcq_cmd *slot, u64 out_key, u64 base_key,
+                             u64 label_dma, u32 key_len, u32 label_len,
+                             u32 type);
+void vcq_add_kic_dkek_derive(struct vcq_cmd *slot, u64 out_key, u64 base_key,
+                            u32 host_id, u64 metadata_dma, u32 metadata_len);
+
+/* DRBG core VCQ builders */
+void vcq_add_drbg_reset(struct vcq_cmd *slot);
+void vcq_add_drbg_config(struct vcq_cmd *slot, u32 ratio, u32 strength);
+void vcq_add_drbg_datastore(struct vcq_cmd *slot, u64 ref, u32 len, u32 type);
+
+/* QSE core VCQ builders */
+void vcq_add_qse_flush(struct vcq_cmd *slot, u32 core_id);
+void vcq_add_qse_ml_kem_keygen(struct vcq_cmd *slot, u32 core_id, u32 k, u32 flags,
+                              u64 seed, u64 z, u64 ek, u64 dk, u32 dk_type,
+                              bool masked);
+void vcq_add_qse_ml_kem_enc(struct vcq_cmd *slot, u32 core_id, u32 k, u32 flags,
+                           u64 coin, u64 ek, u64 ct, u64 ss, u32 ss_type,
+                           bool masked);
+void vcq_add_qse_ml_kem_dec(struct vcq_cmd *slot, u32 core_id, u32 k, u32 flags,
+                           u64 ct, u64 dk, u64 ss, u32 ss_type,
+                           bool masked);
+void vcq_add_qse_ml_dsa_keygen(struct vcq_cmd *slot, u32 core_id, u32 mode, u32 flags,
+                              u64 seed, u64 pk, u64 sk, u32 sk_type,
+                              bool masked);
+void vcq_add_qse_ml_dsa_sign(struct vcq_cmd *slot, u32 core_id, u32 mode, u32 flags,
+                            u64 rnd, u64 m, u64 sk, u64 sig, u32 mlen,
+                            bool masked);
+void vcq_add_qse_ml_dsa_verify(struct vcq_cmd *slot, u32 core_id, u32 mode, u32 flags,
+                              u64 m, u64 pk, u64 sig, u32 mlen);
+
+/* HCQ core VCQ builders */
+void vcq_add_hcq_flush(struct vcq_cmd *slot, u32 core_id);
+void vcq_add_hcq_slhdsa_keygen(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                              u32 seed_len, u32 pk_len, u32 sk_len,
+                              u64 seed, u64 pk, u64 sk);
+void vcq_add_hcq_slhdsa_sign(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                            u32 msg_len, u32 ctx_len,
+                            u64 add_random, u64 msg, u64 ctx,
+                            u64 sk, u64 sig);
+void vcq_add_hcq_slhdsa_sign_internal(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                                     u32 msg_len, u64 add_random,
+                                     u64 msg, u64 sk, u64 sig);
+void vcq_add_hcq_slhdsa_verify(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                              u32 msg_len, u32 ctx_len,
+                              u64 msg, u64 ctx, u64 pk, u64 sig);
+void vcq_add_hcq_slhdsa_sign_prehash(struct vcq_cmd *slot, u32 core_id,
+                                    u32 cmd, u32 param_set, u32 prehash_algo,
+                                    u32 msg_len, u32 ctx_len,
+                                    u64 add_random, u64 msg, u64 ctx,
+                                    u64 sk, u64 sig);
+void vcq_add_hcq_slhdsa_verify_prehash(struct vcq_cmd *slot, u32 core_id,
+                                      u32 cmd, u32 param_set, u32 prehash_algo,
+                                      u32 msg_len, u32 ctx_len,
+                                      u64 msg, u64 ctx, u64 pk, u64 sig);
+void vcq_add_hcq_slhdsa_verify_internal(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                                       u32 msg_len, u64 msg, u64 pk, u64 sig);
+void vcq_add_hcq_slhdsa_pubgen(struct vcq_cmd *slot, u32 core_id, u32 param_set,
+                              u32 sk_len, u64 sk, u64 pk);
+void vcq_add_hcq_lms_verify(struct vcq_cmd *slot, u32 core_id, u32 lms_hss,
+                           u32 pk_len, u32 sig_len, u32 dig_len,
+                           u64 pk, u64 sig, u64 dig);
+void vcq_add_hcq_xmss_verify(struct vcq_cmd *slot, u32 core_id, u32 xmss_mt,
+                            u32 pk_len, u32 sig_len, u32 dig_len,
+                            u64 pk, u64 sig, u64 dig);
+
+/* SYS core flush */
+void vcq_add_sys_flush(struct vcq_cmd *slot);
+
+/* EAC core VCQ builder */
+void vcq_add_eac_read(struct vcq_cmd *slot, u64 dst_dma, u32 len);
+
+#endif /* CMH_SYS_H */
diff --git a/include/uapi/linux/cmh_mgmt_ioctl.h b/include/uapi/linux/cmh_mgmt_ioctl.h
new file mode 100644
index 000000000000..a690454fae69
--- /dev/null
+++ b/include/uapi/linux/cmh_mgmt_ioctl.h
@@ -0,0 +1,895 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Key Management ioctl Interface (User-Space API)
+ *
+ * ioctl commands for /dev/cmh_mgmt -- key CRUD, datastore
+ * export/import, KIC key derivation, PKE, SM2, and PQC operations.
+ *
+ * Relationship to the in-kernel crypto API
+ * -----------------------------------------
+ * Most commands here have no crypto API representation (no transform
+ * type or verb exists): keystore CRUD, key generation, KIC key
+ * derivation, ML-KEM encapsulate/decapsulate, SM2 multi-step
+ * encrypt/decrypt/key-exchange, EdDSA, EAC, and DRBG configuration.
+ * For these the character device is the only available UAPI.
+ *
+ * A bounded subset names primitives the driver ALSO registers with
+ * the crypto API, and the overlap is intentional:
+ *   - Hardware-held-key operations (RSA decrypt, ECDSA/ML-DSA/SLH-DSA
+ *     sign, ECDH) reference a private key by datastore handle.  The
+ *     crypto API set_priv_key()/set_secret() take only raw key bytes
+ *     and cannot name a key that never leaves the hardware; these
+ *     ioctls keep the key hardware-resident.  The registered
+ *     transforms serve raw-key in-kernel users -- the paths are
+ *     complementary.
+ *
+ * Multi-step protocol flows are documented above the PKE and SM2
+ * struct sections.  Single-command ioctls are self-documenting.
+ *
+ * Versioned structs: user space sets .version = CMH_MGMT_V1 so the
+ * driver can extend structs in the future without breaking ABI.
+ */
+
+#ifndef _UAPI_CMH_MGMT_IOCTL_H
+#define _UAPI_CMH_MGMT_IOCTL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+#include <linux/const.h>
+
+#define CMH_MGMT_V1            1
+
+/* Special reference values */
+#define CMH_REF_NONE           0x0000000000000000ULL   /* no key (plaintext) */
+
+/* Flags for cmh_ioctl_key_new.flags / cmh_ioctl_key_write.flags */
+#define CMH_FLAG_PT            _BITUL(16)      /* key can be read as plaintext */
+#define CMH_FLAG_XC            _BITUL(17)      /* key can be exported over XC bus */
+#define CMH_FLAG_SCA           _BITUL(18)      /* SCA key stored in 2 shares */
+#define CMH_FLAG_MASK          (CMH_FLAG_PT | CMH_FLAG_XC | CMH_FLAG_SCA)
+
+/*
+ * Datastore key types -- the LKM maps these to core IDs internally.
+ * User space passes these in cmh_ioctl_key_new.ds_type.
+ */
+#define CMH_DS_RAW_VALUE               1
+#define CMH_DS_AES_KEY                 2
+#define CMH_DS_AES_XTS_KEY             3
+#define CMH_DS_HMAC_KEY                        4
+#define CMH_DS_KMAC_KEY                        5
+#define CMH_DS_SM4_KEY                 6
+#define CMH_DS_CHACHA20_KEY            7
+
+/* PKE key types -- all map to CORE_ID_PKE (0x0A) */
+#define CMH_DS_RSA_PRIV_KEY            10
+#define CMH_DS_RSA_PUB_KEY             11
+#define CMH_DS_RSA_CRT_KEY             12
+#define CMH_DS_ECDSA_PRIV_KEY          13
+#define CMH_DS_ECDSA_PUB_KEY           14
+#define CMH_DS_ECDH_PRIV_KEY           15
+#define CMH_DS_EDDSA_PRIV_KEY          16
+#define CMH_DS_SHARED_SECRET           17
+#define CMH_DS_SM2_PRIV_KEY            18
+
+/* QSE key types -- map to CORE_ID_QSE (0x09) */
+#define CMH_DS_ML_KEM_DK               20
+#define CMH_DS_ML_DSA_SK               21
+
+/* HCQ key types -- map to CORE_ID_HCQ (0x08) */
+#define CMH_DS_SLHDSA_SK               25
+
+/* ioctl argument structures */
+
+struct cmh_ioctl_key_new {
+       __u32 version;          /* must be CMH_MGMT_V1 */
+       __u32 ds_type;          /* CMH_DS_* key type */
+       __u32 len;              /* key length in bytes */
+       __u32 flags;            /* CMH_FLAG_* (e.g. CMH_FLAG_PT) */
+       __u64 cid;              /* caller ID (name) for the key */
+       __u64 ref;              /* [out] CMH eSW returns key_ref here */
+};
+
+struct cmh_ioctl_key_write {
+       __u32 version;
+       __u32 len;              /* key data length */
+       __u32 ds_type;          /* CMH_DS_* key type */
+       __u32 flags;            /* CMH_FLAG_* (e.g. CMH_FLAG_PT) */
+       __u64 ref;              /* key reference from KEY_NEW */
+       __u64 wrap_key;         /* wrapping key ref (CMH_REF_NONE = plaintext) */
+       __u64 data;             /* user-space pointer to key material */
+};
+
+struct cmh_ioctl_key_read {
+       __u32 version;
+       __u32 len;              /* buffer length */
+       __u64 ref;              /* key reference */
+       __u64 wrap_key;         /* wrapping key ref (CMH_REF_NONE = plaintext) */
+       __u64 data;             /* user-space pointer to output buffer */
+       __u32 out_len;          /* [out] actual bytes written */
+       __u32 __reserved;
+};
+
+struct cmh_ioctl_key_find {
+       __u32 version;
+       __u32 __reserved;
+       __u64 cid;              /* caller ID to search for */
+       __u64 ref;              /* [out] resolved key reference */
+       __u32 len;              /* [out] key length */
+       __u32 type;             /* [out] key type */
+};
+
+/*
+ * KEY_LIST -- iterate datastore objects.
+ *
+ * Pass start_ref=0 to begin from the first accessible object.
+ * On return, ref/cid/len/type describe that object.  Pass the
+ * returned ref as start_ref in the next call to advance.  Iteration
+ * ends when ref == 0 (no more objects).
+ */
+struct cmh_ioctl_key_list {
+       __u32 version;
+       __u32 __reserved;
+       __u64 start_ref;        /* starting DS reference (0 = first) */
+       __u64 ref;              /* [out] object reference */
+       __u64 cid;              /* [out] caller ID */
+       __u32 len;              /* [out] object length */
+       __u32 type;             /* [out] object type */
+};
+
+struct cmh_ioctl_key_grant {
+       __u32 version;
+       __u32 __reserved;
+       __u64 ref;              /* key reference */
+       __u64 read;             /* per-MBX read permission bitfield */
+       __u64 write;            /* per-MBX write permission bitfield */
+       __u64 execute;          /* per-MBX execute permission bitfield */
+};
+
+/* Export blob overhead beyond the raw object data (bytes) */
+#define CMH_DS_EXPORT_OVERHEAD_WRAPPED 48      /* 16B hdr + 16B nonce + 16B tag */
+#define CMH_DS_EXPORT_OVERHEAD_PLAIN   16      /* 16B hdr only */
+
+/**
+ * struct cmh_ioctl_ds_export - Export a datastore object to a wrapped blob
+ * @version:   protocol version (CMH_MGMT_V1)
+ * @len:       DMA buffer size; must be >= export blob size:
+ *               wrapped:   CMH_DS_EXPORT_OVERHEAD_WRAPPED + object_len
+ *               plaintext: CMH_DS_EXPORT_OVERHEAD_PLAIN + object_len
+ *             object_len is known from KEY_NEW or KEY_FIND.
+ *             If too small, the eSW rejects the command (-EIO).
+ * @cid:       caller ID of the object to export
+ * @wrap_key:  wrapping key ref (CMH_REF_NONE = plaintext export)
+ * @data:      user-space pointer to output buffer (at least @len bytes)
+ * @out_len:   [out] actual blob bytes written on success
+ * @__reserved: must be zero
+ */
+struct cmh_ioctl_ds_export {
+       __u32 version;
+       __u32 len;              /* buffer length (see sizing rule above) */
+       __u64 cid;              /* caller ID for response tagging */
+       __u64 wrap_key;         /* wrapping key ref (CMH_REF_NONE = plaintext) */
+       __u64 data;             /* user-space pointer to output buffer */
+       __u32 out_len;          /* [out] actual bytes written */
+       __u32 __reserved;
+};
+
+struct cmh_ioctl_ds_import {
+       __u32 version;
+       __u32 len;              /* blob length */
+       __u64 wrap_key;         /* wrapping key ref (CMH_REF_NONE = plaintext) */
+       __u64 data;             /* user-space pointer to import blob */
+};
+
+/* Flags for cmh_ioctl_kic_hkdf1.flags / cmh_ioctl_kic_hkdf2.flags */
+#define CMH_KIC_FLAG_TEMP      0x01    /* store result in TEMP (not persistent DS) */
+
+/*
+ * KIC hardware base key references.
+ *
+ * Each CMH device has up to 8 hardware base keys provisioned in OTP/fuses.
+ * These values are passed in the base_key field of KIC ioctl structs.
+ * The key valid bitmask is visible via R_KIC_KEY_VALID (MMIO 0x100).
+ */
+#define CMH_KIC_KEY1           0x0000000100000001ULL
+#define CMH_KIC_KEY2           0x0000000200000002ULL
+#define CMH_KIC_KEY3           0x0000000300000003ULL
+#define CMH_KIC_KEY4           0x0000000400000004ULL
+#define CMH_KIC_KEY5           0x0000000500000005ULL
+#define CMH_KIC_KEY6           0x0000000600000006ULL
+#define CMH_KIC_KEY7           0x0000000700000007ULL
+#define CMH_KIC_KEY8           0x0000000800000008ULL
+
+struct cmh_ioctl_kic_hkdf1 {
+       __u32 version;
+       __u32 key_len;          /* output key length (e.g., 32) */
+       __u64 base_key;         /* KIC base key reference */
+       __u64 cid;              /* CID for the new DS entry (ignored if TEMP) */
+       __u64 label;            /* user-space pointer to label data */
+       __u32 label_len;        /* label length in bytes */
+       __u32 flags;            /* CMH_KIC_FLAG_* */
+       __u64 ref;              /* [out] derived key reference */
+};
+
+struct cmh_ioctl_kic_hkdf2 {
+       __u32 version;
+       __u32 key_len;          /* output key length (e.g., 32) */
+       __u64 base_key;         /* KIC base key reference */
+       __u64 salt_key;         /* salt key reference (CMH_REF_NONE = no salt) */
+       __u64 cid;              /* CID for the new DS entry (ignored if TEMP) */
+       __u64 label;            /* user-space pointer to label data */
+       __u32 label_len;        /* label length in bytes */
+       __u32 flags;            /* CMH_KIC_FLAG_* */
+       __u64 ref;              /* [out] derived key reference */
+};
+
+struct cmh_ioctl_kic_aes_cmac_kdf {
+       __u32 version;
+       __u32 key_len;          /* base & output key length (must be 32) */
+       __u64 base_key;         /* KIC base key or DS reference */
+       __u64 cid;              /* CID for the new DS entry (ignored if TEMP) */
+       __u64 label;            /* user-space pointer to label data */
+       __u32 label_len;        /* label length in bytes */
+       __u32 flags;            /* CMH_KIC_FLAG_* */
+       __u64 ref;              /* [out] derived key reference */
+};
+
+#define KIC_DKEK_MAX_METADATA  64      /* max metadata length for DKEK */
+
+struct cmh_ioctl_kic_dkek_derive {
+       __u32 version;
+       __u32 host_id;          /* target host ID (0 = caller's own) */
+       __u64 base_key;         /* KIC base key reference */
+       __u64 cid;              /* CID for the new DS entry (ignored if TEMP) */
+       __u64 metadata;         /* user-space pointer to metadata */
+       __u32 metadata_len;     /* metadata length in bytes */
+       __u32 flags;            /* CMH_KIC_FLAG_* */
+       __u64 ref;              /* [out] derived KEK reference */
+};
+
+/* -- PKE ioctl argument structures ----------- */
+
+/*
+ * PKE multi-step protocol flows
+ *
+ * RSA encrypt/decrypt:
+ *   1. KEY_NEW(CMH_DS_RSA_PRIV_KEY) + KEY_WRITE -> priv_ref (or RSA_KEYGEN -> priv_ref)
+ *   2. RSA_ENC(e, n, plaintext) -> ciphertext         (public key = raw e,n)
+ *   3. RSA_DEC(e, n, ciphertext, priv_ref) -> plaintext   (or RSA_CRT_DEC)
+ *
+ * ECDSA sign:
+ *   1. EC_KEYGEN(curve) -> priv_ref                    (or KEY_NEW + KEY_WRITE)
+ *   2. EC_PUBGEN(priv_ref) -> public_key               (raw x||y returned)
+ *   3. ECDSA_SIGN(digest, priv_ref) -> signature
+ *   SM2 sign uses the same path with curve=CMH_CURVE_SM2.
+ *
+ * ECDH shared secret:
+ *   1. EC_KEYGEN(curve) -> priv_ref                    (or KEY_NEW + KEY_WRITE)
+ *   2. ECDH_KEYGEN(priv_ref) -> public_key_x           (derive pub from priv)
+ *   3. Exchange public keys with peer
+ *   4. ECDH(peer_key_x, priv_ref) -> shared_secret     (raw or DS ref via FLAG_DS_RESULT)
+ *
+ * EdDSA sign/verify:
+ *   1. EC_KEYGEN(CURVE_25519 or CURVE_448) -> priv_ref
+ *   2. EC_PUBGEN(priv_ref) -> public_key
+ *   3. EDDSA_SIGN(message, priv_ref) -> signature      (pure EdDSA, not prehash)
+ *   4. EDDSA_VERIFY(message, signature, public_key_y)
+ *   For Ed448 SCA: EDDSA_KEYGEN_SCA(priv_ref) -> sca_ref (2-share blinded key)
+ *
+ * SM2 encryption (GM/T 0003.4):
+ *   1. EC_KEYGEN(CMH_CURVE_SM2) -> priv_ref            (or KEY_NEW + KEY_WRITE)
+ *   2. EC_PUBGEN(priv_ref) -> public_key
+ *   3. SM2_ENC_POINT(public_key) -> C1, enc_point      (nonce_len=0: HW ephemeral)
+ *   4. SM2_ENC_HASH(enc_point, message) -> ciphertext   (C1||C3||C2)
+ *   Decrypt:
+ *   5. SM2_DEC_POINT(C1, priv_ref) -> dec_point
+ *   6. SM2_DEC_HASH(ciphertext, dec_point) -> plaintext
+ *   enc_point and dec_point are raw DMA buffers (64B each), not DS refs.
+ *
+ * SM2 key exchange (GM/T 0003.3):
+ *   1. EC_KEYGEN(CMH_CURVE_SM2) -> priv_ref            (long-lived, persistent DS)
+ *   2. EC_PUBGEN(priv_ref) -> public_key
+ *   3. SM2_ID_DIGEST(id, public_key) -> ZA             (SM3-based identity digest)
+ *   4. SM2_ECDH_KEYGEN(nonce) -> session_key, r        (ephemeral scalar r)
+ *      - nonce_len=32: caller supplies r (deterministic)
+ *      - nonce_len=0:  HW generates r, writes it back to .nonce
+ *      Exchange session_key with peer.
+ *   5. SM2_ECDH(r, priv_ref, peer_pub, peer_sess) -> shared_point
+ *      - Must pass the same r from step 4 (nonce_len=32)
+ *      - shared_point_ref=0: reads back raw shared_point, destroys DS slot
+ *      - shared_point_ref=&ref: keeps DS slot alive, writes ref for ECDH_HASH
+ *   6. SM2_ECDH_HASH(shared_point_ref, ZA_self, ZA_peer) -> shared_key (16B)
+ *      - shared_point_ref is a persistent DS reference from step 5
+ *      - The DS slot is consumed by the hub; caller should delete it afterward
+ *   The nonce r is a raw 32-byte scalar in userspace memory between steps 4-5.
+ *   The shared_point is a persistent DS ref between steps 5-6.
+ *   The long-lived private key (priv_ref) persists independently.
+ */
+
+/* PKE operation flags */
+#define CMH_PKE_FLAG_DS_RESULT _BITUL(0)       /* store result in DS, return ref */
+
+struct cmh_ioctl_pke_rsa_enc {
+       __u32 version;
+       __u32 bits;             /* RSA key size in bits (512-4096) */
+       __u64 e;                /* user-space pointer to public exponent */
+       __u32 e_len;            /* exponent length in bytes */
+       __u32 __reserved;
+       __u64 n;                /* user-space pointer to modulus */
+       __u64 input;            /* user-space pointer to input data */
+       __u64 output;           /* user-space pointer to output buffer */
+};
+
+struct cmh_ioctl_pke_rsa_dec {
+       __u32 version;
+       __u32 bits;
+       __u64 e;                /* public exponent */
+       __u32 e_len;
+       __u32 __reserved;
+       __u64 n;                /* modulus */
+       __u64 input;            /* ciphertext */
+       __u64 output;           /* plaintext output */
+       __u64 key_ref;          /* private key DS reference */
+};
+
+struct cmh_ioctl_pke_rsa_crt_dec {
+       __u32 version;
+       __u32 bits;
+       __u64 e;
+       __u32 e_len;
+       __u32 __reserved;
+       __u64 n;
+       __u64 input;
+       __u64 output;
+       __u64 crt_ref;          /* CRT key DS reference */
+};
+
+struct cmh_ioctl_pke_rsa_keygen {
+       __u32 version;
+       __u32 bits;             /* key size in bits */
+       __u64 e;                /* user-space pointer to public exponent */
+       __u32 e_len;
+       __u32 flags;            /* CMH_FLAG_* */
+       __u64 n;                /* [out] user-space pointer to modulus buffer */
+       __u64 d_cid;            /* CID for private key DS entry */
+       __u64 d_ref;            /* [out] private key reference */
+       __u64 crt_cid;          /* CID for CRT key DS entry (0 = skip CRT) */
+       __u64 crt_ref;          /* [out] CRT key reference */
+};
+
+struct cmh_ioctl_pke_ecdsa_sign {
+       __u32 version;
+       __u32 curve;            /* ABI curve ID (e.g. 0x03 = P-256) */
+       __u64 digest;           /* user-space pointer to hash digest */
+       __u32 digest_len;       /* digest length in bytes */
+       __u32 __reserved;
+       __u64 signature;        /* [out] user-space pointer to (r,s) */
+       __u64 key_ref;          /* private key DS reference */
+};
+
+struct cmh_ioctl_pke_ecdh {
+       __u32 version;
+       __u32 curve;
+       __u64 peer_key_x;       /* user-space pointer to peer public key X */
+       __u64 key_ref;          /* private key DS reference */
+       __u32 flags;            /* CMH_PKE_FLAG_DS_RESULT */
+       __u32 __reserved;
+       __u64 result_cid;       /* CID for DS result (if FLAG_DS_RESULT) */
+       __u64 output;           /* [out] raw shared secret or DS ref */
+};
+
+struct cmh_ioctl_pke_ecdh_keygen {
+       __u32 version;
+       __u32 curve;
+       __u64 key_ref;          /* private key DS reference */
+       __u64 public_key_x;     /* [out] user-space pointer to public key X */
+};
+
+struct cmh_ioctl_pke_eddsa_sign {
+       __u32 version;
+       __u32 curve;            /* CURVE_25519 or CURVE_448 */
+       __u64 digest;           /* user-space ptr to message (not digest) */
+       __u32 digest_len;
+       __u32 __reserved;
+       __u64 signature;        /* [out] user-space pointer to signature */
+       __u64 key_ref;          /* private key DS reference */
+};
+
+struct cmh_ioctl_pke_eddsa_verify {
+       __u32 version;
+       __u32 curve;
+       __u64 digest;
+       __u32 digest_len;
+       __u32 __reserved;
+       __u64 signature;
+       __u64 public_key_y;     /* user-space pointer to public key Y */
+};
+
+struct cmh_ioctl_pke_ec_keygen {
+       __u32 version;
+       __u32 curve;
+       __u32 flags;            /* CMH_FLAG_* */
+       __u32 __reserved;
+       __u64 cid;              /* CID for the new key DS entry */
+       __u64 ref;              /* [out] private key reference */
+};
+
+struct cmh_ioctl_pke_ec_pubgen {
+       __u32 version;
+       __u32 curve;
+       __u64 key_ref;          /* private key DS reference */
+       __u64 public_key;       /* [out] user-space pointer to public key */
+};
+
+struct cmh_ioctl_pke_eddsa_keygen_sca {
+       __u32 version;
+       __u32 curve;            /* must be CURVE_448 */
+       __u64 key_ref;          /* input: normal Ed448 private key DS ref */
+       __u64 cid;              /* CID for the new SCA key DS entry */
+       __u64 sca_ref;          /* [out] SCA private key reference */
+};
+
+/*
+ * ioctl numbers -- type 'J', sequential.
+ * 'C' conflicts with OSS sound, CAPI/ISDN, and COSA WAN drivers;
+ * 'J' is unregistered in Documentation/userspace-api/ioctl/ioctl-number.rst.
+ */
+
+#define CMH_MGMT_IOC_MAGIC     'J'
+
+#define CMH_IOCTL_KEY_NEW      _IOWR(CMH_MGMT_IOC_MAGIC, 0x01, struct cmh_ioctl_key_new)
+#define CMH_IOCTL_KEY_WRITE    _IOW(CMH_MGMT_IOC_MAGIC,  0x02, struct cmh_ioctl_key_write)
+#define CMH_IOCTL_KEY_READ     _IOWR(CMH_MGMT_IOC_MAGIC, 0x03, struct cmh_ioctl_key_read)
+#define CMH_IOCTL_KEY_FIND     _IOWR(CMH_MGMT_IOC_MAGIC, 0x04, struct cmh_ioctl_key_find)
+#define CMH_IOCTL_KEY_GRANT    _IOW(CMH_MGMT_IOC_MAGIC,  0x05, struct cmh_ioctl_key_grant)
+#define CMH_IOCTL_KEY_DELETE   _IOW(CMH_MGMT_IOC_MAGIC,  0x06, struct cmh_ioctl_key_grant)
+#define CMH_IOCTL_DS_EXPORT    _IOWR(CMH_MGMT_IOC_MAGIC, 0x07, struct cmh_ioctl_ds_export)
+#define CMH_IOCTL_DS_IMPORT    _IOW(CMH_MGMT_IOC_MAGIC,  0x08, struct cmh_ioctl_ds_import)
+#define CMH_IOCTL_KIC_HKDF1    _IOWR(CMH_MGMT_IOC_MAGIC, 0x09, struct cmh_ioctl_kic_hkdf1)
+#define CMH_IOCTL_KIC_HKDF2    _IOWR(CMH_MGMT_IOC_MAGIC, 0x0A, struct cmh_ioctl_kic_hkdf2)
+#define CMH_IOCTL_KEY_NEW_RANDOM _IOWR(CMH_MGMT_IOC_MAGIC, 0x0B, struct cmh_ioctl_key_new)
+#define CMH_IOCTL_KIC_AES_CMAC_KDF _IOWR(CMH_MGMT_IOC_MAGIC, 0x0C, \
+                                       struct cmh_ioctl_kic_aes_cmac_kdf)
+#define CMH_IOCTL_KIC_DKEK_DERIVE _IOWR(CMH_MGMT_IOC_MAGIC, 0x0D, \
+                                       struct cmh_ioctl_kic_dkek_derive)
+#define CMH_IOCTL_KEY_LIST     _IOWR(CMH_MGMT_IOC_MAGIC, 0x0E, struct cmh_ioctl_key_list)
+
+/* PKE operation ioctls */
+#define CMH_IOCTL_PKE_RSA_ENC          _IOWR(CMH_MGMT_IOC_MAGIC, 0x10, \
+                                       struct cmh_ioctl_pke_rsa_enc)
+#define CMH_IOCTL_PKE_RSA_DEC          _IOWR(CMH_MGMT_IOC_MAGIC, 0x11, \
+                                       struct cmh_ioctl_pke_rsa_dec)
+#define CMH_IOCTL_PKE_RSA_CRT_DEC      _IOWR(CMH_MGMT_IOC_MAGIC, 0x12, \
+                                       struct cmh_ioctl_pke_rsa_crt_dec)
+#define CMH_IOCTL_PKE_RSA_KEYGEN       _IOWR(CMH_MGMT_IOC_MAGIC, 0x13, \
+                                       struct cmh_ioctl_pke_rsa_keygen)
+#define CMH_IOCTL_PKE_ECDSA_SIGN       _IOWR(CMH_MGMT_IOC_MAGIC, 0x14, \
+                                       struct cmh_ioctl_pke_ecdsa_sign)
+#define CMH_IOCTL_PKE_ECDH             _IOWR(CMH_MGMT_IOC_MAGIC, 0x16, \
+                                       struct cmh_ioctl_pke_ecdh)
+#define CMH_IOCTL_PKE_ECDH_KEYGEN      _IOWR(CMH_MGMT_IOC_MAGIC, 0x17, \
+                                       struct cmh_ioctl_pke_ecdh_keygen)
+#define CMH_IOCTL_PKE_EDDSA_SIGN       _IOWR(CMH_MGMT_IOC_MAGIC, 0x18, \
+                                       struct cmh_ioctl_pke_eddsa_sign)
+#define CMH_IOCTL_PKE_EDDSA_VERIFY     _IOW(CMH_MGMT_IOC_MAGIC,  0x19, \
+                                       struct cmh_ioctl_pke_eddsa_verify)
+#define CMH_IOCTL_PKE_EC_KEYGEN                _IOWR(CMH_MGMT_IOC_MAGIC, 0x1A, \
+                                       struct cmh_ioctl_pke_ec_keygen)
+#define CMH_IOCTL_PKE_EC_PUBGEN                _IOWR(CMH_MGMT_IOC_MAGIC, 0x1B, \
+                                       struct cmh_ioctl_pke_ec_pubgen)
+#define CMH_IOCTL_PKE_EDDSA_KEYGEN_SCA _IOWR(CMH_MGMT_IOC_MAGIC, 0x1C, \
+                                       struct cmh_ioctl_pke_eddsa_keygen_sca)
+
+/* -- PQC ioctl argument structures ----------- */
+
+/*
+ * PQC operation flags (bits [2:0]).
+ * PQC keygen ioctls accept CMH_FLAG_PT in bits [18:16] to explicitly
+ * set the DS key storage attribute when CMH_QSE_FLAG_DS_REF is set.
+ * CMH_FLAG_SCA and CMH_FLAG_XC are rejected -- QSE SCA protection uses
+ * polynomial masking (CMH_QSE_FLAG_MASKED), not 2-share storage,
+ * and the eSW dec/sign paths hardcode SYS_TYPE_FLAG_PT.
+ * If no CMH_FLAG_* bits are set, DS keys default to CMH_FLAG_PT.
+ */
+#define CMH_QSE_FLAG_MASKED    _BITUL(0)       /* use masked (SCA-resistant) HW commands */
+#define CMH_QSE_FLAG_DS_REF    _BITUL(1)       /* store key output in DS, return ref */
+#define CMH_QSE_FLAG_HW_RNG    _BITUL(2)       /* use HW RNG for seed/randomness */
+#define CMH_QSE_FLAG_MASK      (_BITUL(0) | _BITUL(1) | _BITUL(2))
+
+/* -- SYS wrap header size -------------------- */
+/* sys_read prepends a 16-byte header even for plaintext reads */
+#define CMH_SYS_WRAP_HDR_SIZE  16
+
+/* -- Seed / randomness lengths --------------- */
+
+#define CMH_QSE_SEED_LEN               32      /* ML-KEM/ML-DSA seed size */
+#define CMH_QSE_SEED_LEN_MASKED                64      /* seed size for masked mode */
+
+/* -- ML-DSA ExternalMu sentinel -------------- */
+/* Pass this as mlen to use 64-byte pre-hashed mu instead of raw message */
+#define CMH_ML_DSA_MLEN_EXTERNAL_MU    0xFFFFFFFFU
+
+/* -- ML-KEM size macros ---------------------- */
+
+#define CMH_ML_KEM_EK_SIZE(k)          (384U * (k) + 32U)
+#define CMH_ML_KEM_DK_SIZE(k)          (768U * (k) + 96U)
+/* CT sizes: k=2 -> 768, k=3 -> 1088, k=4 -> 1568 */
+#define CMH_ML_KEM_CT_SIZE_512         768U
+#define CMH_ML_KEM_CT_SIZE_768         1088U
+#define CMH_ML_KEM_CT_SIZE_1024                1568U
+#define CMH_ML_KEM_SS_LEN              32U
+
+/* -- ML-DSA size macros ---------------------- */
+/* Indexed by mode: [0]=44 (mode=2), [1]=65 (mode=3), [2]=87 (mode=5) */
+
+#define CMH_ML_DSA_44_PK_SIZE          1312U
+#define CMH_ML_DSA_44_SK_SIZE          2560U
+#define CMH_ML_DSA_44_SIG_SIZE         2420U
+#define CMH_ML_DSA_65_PK_SIZE          1952U
+#define CMH_ML_DSA_65_SK_SIZE          4032U
+#define CMH_ML_DSA_65_SIG_SIZE         3309U
+#define CMH_ML_DSA_87_PK_SIZE          2592U
+#define CMH_ML_DSA_87_SK_SIZE          4896U
+#define CMH_ML_DSA_87_SIG_SIZE         4627U
+
+/* -- SLH-DSA parameter set IDs --------------- */
+
+#define CMH_SLHDSA_SHAKE_128S          1U
+#define CMH_SLHDSA_SHAKE_128F          2U
+#define CMH_SLHDSA_SHAKE_192S          3U
+#define CMH_SLHDSA_SHAKE_192F          4U
+#define CMH_SLHDSA_SHAKE_256S          5U
+#define CMH_SLHDSA_SHAKE_256F          6U
+#define CMH_SLHDSA_SHA2_128S           7U
+#define CMH_SLHDSA_SHA2_128F           8U
+#define CMH_SLHDSA_SHA2_192S           9U
+#define CMH_SLHDSA_SHA2_192F           10U
+#define CMH_SLHDSA_SHA2_256S           11U
+#define CMH_SLHDSA_SHA2_256F           12U
+#define CMH_SLHDSA_PARAM_MAX           12U
+
+/* SLH-DSA prehash algorithm IDs */
+#define CMH_SLHDSA_PREHASH_SHA256      1U
+#define CMH_SLHDSA_PREHASH_SHA512      2U
+#define CMH_SLHDSA_PREHASH_SHAKE128    3U
+#define CMH_SLHDSA_PREHASH_SHAKE256    4U
+#define CMH_SLHDSA_PREHASH_MAX         4U
+
+/* SLH-DSA n-value table indexed by (param_set - 1) */
+#define CMH_SLHDSA_N_128               16U
+#define CMH_SLHDSA_N_192               24U
+#define CMH_SLHDSA_N_256               32U
+
+/* SLH-DSA key sizes: pk = 2*n, sk = 4*n, seed = 3*n */
+#define CMH_SLHDSA_PK_SIZE(n)          (2U * (n))
+#define CMH_SLHDSA_SK_SIZE(n)          (4U * (n))
+#define CMH_SLHDSA_SEED_SIZE(n)                (3U * (n))
+
+/* SLH-DSA signature sizes indexed by (param_set - 1) */
+#define CMH_SLHDSA_SIG_SIZE_SHAKE_128S 7856U
+#define CMH_SLHDSA_SIG_SIZE_SHAKE_128F 17088U
+#define CMH_SLHDSA_SIG_SIZE_SHAKE_192S 16224U
+#define CMH_SLHDSA_SIG_SIZE_SHAKE_192F 35664U
+#define CMH_SLHDSA_SIG_SIZE_SHAKE_256S 29792U
+#define CMH_SLHDSA_SIG_SIZE_SHAKE_256F 49856U
+#define CMH_SLHDSA_SIG_SIZE_SHA2_128S  7856U
+#define CMH_SLHDSA_SIG_SIZE_SHA2_128F  17088U
+#define CMH_SLHDSA_SIG_SIZE_SHA2_192S  16224U
+#define CMH_SLHDSA_SIG_SIZE_SHA2_192F  35664U
+#define CMH_SLHDSA_SIG_SIZE_SHA2_256S  29792U
+#define CMH_SLHDSA_SIG_SIZE_SHA2_256F  49856U
+
+/* -- PKE curve IDs -------------- */
+
+#define CMH_CURVE_P192                 0x01U
+#define CMH_CURVE_P224                 0x02U
+#define CMH_CURVE_P256                 0x03U
+#define CMH_CURVE_P384                 0x04U
+#define CMH_CURVE_P521                 0x05U
+#define CMH_CURVE_SECP256K1            0x07U
+#define CMH_CURVE_BP192R1              0x11U
+#define CMH_CURVE_BP224R1              0x12U
+#define CMH_CURVE_BP256R1              0x13U
+#define CMH_CURVE_BP320R1              0x14U
+#define CMH_CURVE_BP384R1              0x15U
+#define CMH_CURVE_BP512R1              0x16U
+#define CMH_CURVE_SM2                  0x18U
+#define CMH_CURVE_25519                        0x21U
+#define CMH_CURVE_448                  0x22U
+
+/* ML-KEM */
+
+struct cmh_ioctl_ml_kem_keygen {
+       __u32 version;
+       __u32 k;                /* security parameter: 2/3/4 */
+       __u32 flags;            /* CMH_QSE_FLAG_* */
+       __u32 __reserved;
+       __u64 seed;             /* user-space pointer to seed (or 0 for HW RNG) */
+       __u64 z;                /* user-space pointer to z (or 0 for HW RNG) */
+       __u64 ek;               /* [out] user-space pointer to encapsulation key */
+       __u64 dk;               /* [out] user-space pointer to decapsulation key
+                                * or [out] DS ref if CMH_QSE_FLAG_DS_REF
+                                */
+       __u64 dk_cid;           /* CID for DS entry (if DS_REF) */
+       __u64 dk_ref;           /* [out] dk DS reference (if DS_REF) */
+};
+
+struct cmh_ioctl_ml_kem_enc {
+       __u32 version;
+       __u32 k;
+       __u32 flags;            /* CMH_QSE_FLAG_* */
+       __u32 __reserved;
+       __u64 coin;             /* user-space pointer to random coin (or 0) */
+       __u64 ek;               /* user-space pointer to encapsulation key */
+       __u64 ct;               /* [out] user-space pointer to ciphertext */
+       __u64 ss;               /* [out] user-space pointer to shared secret */
+       __u64 __reserved2[2];   /* reserved for future use */
+};
+
+struct cmh_ioctl_ml_kem_dec {
+       __u32 version;
+       __u32 k;
+       __u32 flags;            /* CMH_QSE_FLAG_* */
+       __u32 __reserved;
+       __u64 ct;               /* user-space pointer to ciphertext */
+       __u64 dk;               /* user-space pointer to dk or DS ref */
+       __u64 ss;               /* [out] user-space pointer to shared secret */
+       __u64 __reserved2[2];   /* reserved for future use */
+};
+
+/* ML-DSA */
+
+struct cmh_ioctl_ml_dsa_keygen {
+       __u32 version;
+       __u32 mode;             /* security parameter: 2/3/5 */
+       __u32 flags;            /* CMH_QSE_FLAG_* */
+       __u32 __reserved;
+       __u64 seed;             /* user-space pointer to seed (or 0 for HW RNG) */
+       __u64 pk;               /* [out] user-space pointer to public key */
+       __u64 sk;               /* [out] user-space pointer to secret key
+                                * or [out] DS ref if CMH_QSE_FLAG_DS_REF
+                                */
+       __u64 sk_cid;           /* CID for DS entry (if DS_REF) */
+       __u64 sk_ref;           /* [out] sk DS reference (if DS_REF) */
+};
+
+struct cmh_ioctl_ml_dsa_sign {
+       __u32 version;
+       __u32 mode;
+       __u32 flags;            /* CMH_QSE_FLAG_* */
+       __u32 mlen;             /* message length in bytes */
+       __u64 m;                /* user-space pointer to message */
+       __u64 sk;               /* user-space pointer to sk or DS ref */
+       __u64 sig;              /* [out] user-space pointer to signature */
+       __u64 rnd;              /* user-space pointer to randomness (or 0) */
+};
+
+/* SLH-DSA */
+
+struct cmh_ioctl_slhdsa_keygen {
+       __u32 version;
+       __u32 parameter_set;    /* HCQ_SLHDSA_SHAKE_128S .. SHA2_256F */
+       __u32 flags;            /* CMH_QSE_FLAG_DS_REF */
+       __u32 __reserved;
+       __u64 seed;             /* user-space pointer to seed */
+       __u64 pk;               /* [out] user-space pointer to public key */
+       __u64 sk;               /* [out] user-space pointer to secret key
+                                * or [out] DS ref if CMH_QSE_FLAG_DS_REF
+                                */
+       __u64 sk_cid;           /* CID for DS entry (if DS_REF) */
+       __u64 sk_ref;           /* [out] sk DS reference (if DS_REF) */
+};
+
+struct cmh_ioctl_slhdsa_sign {
+       __u32 version;
+       __u32 parameter_set;
+       __u32 msg_len;
+       __u32 ctx_len;
+       __u64 msg;              /* user-space pointer to message */
+       __u64 ctx;              /* user-space pointer to context (or 0) */
+       __u64 sk;               /* DS ref for secret key */
+       __u64 sig;              /* [out] user-space pointer to signature */
+       __u64 add_random;       /* user-space pointer to addl. randomness (or 0) */
+};
+
+struct cmh_ioctl_slhdsa_sign_prehash {
+       __u32 version;
+       __u32 parameter_set;
+       __u32 prehash_algo;     /* CMH_SLHDSA_PREHASH_* */
+       __u32 digest;           /* 0 = raw msg (eSW hashes), 1 = pre-computed digest */
+       __u32 msg_len;
+       __u32 ctx_len;
+       __u64 msg;              /* user-space pointer to message/digest */
+       __u64 ctx;              /* user-space pointer to context (or 0) */
+       __u64 sk;               /* DS ref for secret key */
+       __u64 sig;              /* [out] user-space pointer to signature */
+       __u64 add_random;       /* user-space pointer to addl. randomness (or 0) */
+};
+
+/* -- SM2 ioctl argument structures ----------- */
+
+/* SM2 fixed key sizes (sm2p256v1 curve, 256-bit) */
+#define CMH_SM2_CLEN                   32U     /* coordinate length */
+#define CMH_SM2_PUBKEY_LEN             64U     /* uncompressed (x||y) */
+#define CMH_SM2_POINT_LEN              64U     /* EC point (x||y) */
+#define CMH_SM2_SHARED_KEY_LEN         16U     /* ECDH shared key */
+#define CMH_SM2_DIGEST_LEN             32U     /* SM3 digest (ZA) */
+/*
+ * SM2 enc_hash/dec_hash payload limit.
+ *
+ * The eSW PKE driver implements only a single-block GM/T 0003.4 KDF
+ * (one SM3 invocation, 32 bytes of key stream).  Longer messages would
+ * silently produce incorrect ciphertext / plaintext, so the driver caps
+ * the payload at 32 bytes.  See Documentation/ABI/testing/cmh-mgmt.
+ */
+#define CMH_SM2_MAX_MSG_LEN            32U     /* encrypt/decrypt */
+#define CMH_SM2_MAX_ID_LEN             32U     /* identity string */
+#define CMH_SM2_CT_OVERHEAD            96U     /* C1(64) + C3(32) */
+#define CMH_SM2_MAX_CT_LEN             128U    /* 96 + max_msg = 128 */
+
+struct cmh_ioctl_sm2_ecdh_keygen {
+       __u32 version;
+       __u32 nonce_len;        /* 0 = HW generates r (written back), 32 = caller provides r */
+       __u64 nonce;            /* [in/out] user-space pointer to nonce buffer (32B) */
+       __u64 session_key;      /* [out] user-space pointer to session key R=r*G (64B) */
+};
+
+struct cmh_ioctl_sm2_ecdh {
+       __u32 version;
+       __u32 nonce_len;        /* 0 = HW generates (written back), 32 = caller provides */
+       __u64 nonce;            /* [in/out] user-space pointer to nonce r (32B) */
+       __u64 peer_public_key;  /* user-space pointer to peer pub key (64B) */
+       __u64 peer_session_key; /* user-space pointer to peer session key (64B) */
+       __u64 key_ref;          /* private key DS reference */
+       __u64 shared_point;     /* [out] user-space pointer to shared point (64B) */
+       __u64 shared_point_ref; /* [in/out] 0 = read-back mode; &ref = keep DS, write ref */
+};
+
+struct cmh_ioctl_sm2_dec_point {
+       __u32 version;
+       __u32 ciphertext_len;   /* total ciphertext length (97..128) */
+       __u64 ciphertext;       /* user-space pointer to ciphertext (64B: C1) */
+       __u64 dec_point;        /* [out] user-space pointer to dec point (64B) */
+       __u64 key_ref;          /* private key DS reference */
+};
+
+struct cmh_ioctl_sm2_enc_point {
+       __u32 version;
+       __u32 nonce_len;        /* 0 = HW generates, 32 = caller provides */
+       __u64 nonce;            /* user-space pointer to nonce (or 0) */
+       __u64 public_key;       /* user-space pointer to public key (64B) */
+       __u64 ciphertext;       /* [out] user-space pointer to C1 (64B) */
+       __u64 enc_point;        /* [out] user-space pointer to enc point (64B) */
+};
+
+struct cmh_ioctl_sm2_id_digest {
+       __u32 version;
+       __u32 id_len;           /* identity length in bytes (<=32) */
+       __u64 id;               /* user-space pointer to identity string */
+       __u64 public_key;       /* user-space pointer to public key (64B) */
+       __u64 digest;           /* [out] user-space pointer to ZA digest (32B) */
+};
+
+/*
+ * SM2 ECDH_HASH -- derive shared key from shared point + ZA digests.
+ *
+ * IMPORTANT: The digest fields use ABSOLUTE ordering per GM/T 0003.3,
+ * NOT relative own/peer ordering.  Both parties must pass:
+ *   peer_id_digest = Z_A (initiator's digest) -- hashed FIRST
+ *   id_digest      = Z_B (responder's digest) -- hashed SECOND
+ * The eSW computes: KDF(shared_point || peer_id_digest || id_digest).
+ */
+struct cmh_ioctl_sm2_ecdh_hash {
+       __u32 version;
+       __u32 __reserved;
+       __u64 peer_id_digest;   /* ptr to Z_A -- initiator's digest (32B) */
+       __u64 id_digest;        /* ptr to Z_B -- responder's digest (32B) */
+       __u64 shared_point_ref; /* DS reference from SM2_ECDH */
+       __u64 shared_key;       /* [out] ptr to shared key (16B) */
+};
+
+struct cmh_ioctl_sm2_dec_hash {
+       __u32 version;
+       __u32 ciphertext_len;   /* ciphertext length (97..128) */
+       __u64 ciphertext;       /* user-space pointer to full ciphertext */
+       __u64 dec_point;        /* user-space pointer to dec point (64B) */
+       __u64 plaintext;        /* [out] user-space pointer to plaintext */
+};
+
+struct cmh_ioctl_sm2_enc_hash {
+       __u32 version;
+       __u32 message_len;      /* message length (1..32) */
+       __u64 message;          /* user-space pointer to plaintext */
+       __u64 enc_point;        /* user-space pointer to enc point (64B) */
+       __u64 ciphertext;       /* [out] user-space pointer to ciphertext */
+};
+
+/* PQC ioctl numbers */
+#define CMH_IOCTL_ML_KEM_KEYGEN                _IOWR(CMH_MGMT_IOC_MAGIC, 0x20, \
+                                       struct cmh_ioctl_ml_kem_keygen)
+#define CMH_IOCTL_ML_KEM_ENC           _IOWR(CMH_MGMT_IOC_MAGIC, 0x21, \
+                                       struct cmh_ioctl_ml_kem_enc)
+#define CMH_IOCTL_ML_KEM_DEC           _IOWR(CMH_MGMT_IOC_MAGIC, 0x22, \
+                                       struct cmh_ioctl_ml_kem_dec)
+#define CMH_IOCTL_ML_DSA_KEYGEN                _IOWR(CMH_MGMT_IOC_MAGIC, 0x23, \
+                                       struct cmh_ioctl_ml_dsa_keygen)
+#define CMH_IOCTL_ML_DSA_SIGN          _IOWR(CMH_MGMT_IOC_MAGIC, 0x24, \
+                                       struct cmh_ioctl_ml_dsa_sign)
+#define CMH_IOCTL_SLHDSA_KEYGEN                _IOWR(CMH_MGMT_IOC_MAGIC, 0x28, \
+                                       struct cmh_ioctl_slhdsa_keygen)
+#define CMH_IOCTL_SLHDSA_SIGN          _IOWR(CMH_MGMT_IOC_MAGIC, 0x29, \
+                                       struct cmh_ioctl_slhdsa_sign)
+#define CMH_IOCTL_SLHDSA_SIGN_PREHASH  _IOWR(CMH_MGMT_IOC_MAGIC, 0x2D, \
+                                       struct cmh_ioctl_slhdsa_sign_prehash)
+
+/* SM2 operation ioctls */
+#define CMH_IOCTL_SM2_ECDH_KEYGEN      _IOWR(CMH_MGMT_IOC_MAGIC, 0x30, \
+                                       struct cmh_ioctl_sm2_ecdh_keygen)
+#define CMH_IOCTL_SM2_ECDH             _IOWR(CMH_MGMT_IOC_MAGIC, 0x31, \
+                                       struct cmh_ioctl_sm2_ecdh)
+#define CMH_IOCTL_SM2_DEC_POINT                _IOWR(CMH_MGMT_IOC_MAGIC, 0x32, \
+                                       struct cmh_ioctl_sm2_dec_point)
+#define CMH_IOCTL_SM2_ENC_POINT                _IOWR(CMH_MGMT_IOC_MAGIC, 0x33, \
+                                       struct cmh_ioctl_sm2_enc_point)
+#define CMH_IOCTL_SM2_ID_DIGEST                _IOWR(CMH_MGMT_IOC_MAGIC, 0x34, \
+                                       struct cmh_ioctl_sm2_id_digest)
+#define CMH_IOCTL_SM2_ECDH_HASH                _IOWR(CMH_MGMT_IOC_MAGIC, 0x35, \
+                                       struct cmh_ioctl_sm2_ecdh_hash)
+#define CMH_IOCTL_SM2_DEC_HASH         _IOWR(CMH_MGMT_IOC_MAGIC, 0x36, \
+                                       struct cmh_ioctl_sm2_dec_hash)
+#define CMH_IOCTL_SM2_ENC_HASH         _IOWR(CMH_MGMT_IOC_MAGIC, 0x37, \
+                                       struct cmh_ioctl_sm2_enc_hash)
+
+/*
+ * EAC (Error and Alarm Controller) -- read and clear error registers.
+ *
+ * Returns a snapshot of all hardware error/safety/notification registers.
+ * The eSW atomically reads and clears the registers on each call, so
+ * successive reads show only new events.
+ */
+struct cmh_ioctl_eac_read {
+       __u32 version;                  /* must be CMH_MGMT_V1 */
+       __u32 __reserved;
+       __u64 mailbox_notification;     /* [out] MBX safety notification bitmask */
+       __u32 hw_error;                 /* [out] HWC error bitmask */
+       __u32 hw_nmi;                   /* [out] HWC NMI bitmask */
+       __u32 hw_panic;                 /* [out] HWC panic bitmask */
+       __u32 safety_fatal;             /* [out] HWC fatal safety bitmask */
+       __u32 safety_notification;      /* [out] HWC safety notification bitmask */
+       __u32 sw_info0;                 /* [out] eSW tracing info */
+       __u32 sw_info1;                 /* [out] eSW tracing info */
+       __u32 sram_bank_errors[4];      /* [out] correctable ECC error counts */
+       __u32 __pad;                    /* explicit tail padding (prevent info leak) */
+};
+
+/*
+ * DRBG CONFIG -- configure the hardware DRBG before first use.
+ *
+ * This is a management operation normally performed once at system
+ * start-up.  Must be called before any hwrng reads or DRBG GENERATE
+ * operations.
+ */
+#define CMH_DRBG_RATIO_ONE             0       /* 1:1 entropy ratio */
+#define CMH_DRBG_RATIO_ONE_HALF                1       /* 1:2 */
+#define CMH_DRBG_RATIO_ONE_THIRD       2       /* 1:3 */
+#define CMH_DRBG_RATIO_ONE_FOURTH      3       /* 1:4 */
+
+#define CMH_DRBG_STRENGTH_128          0x00    /* 128-bit security */
+#define CMH_DRBG_STRENGTH_256          0x10    /* 256-bit security */
+
+struct cmh_ioctl_drbg_config {
+       __u32 version;                  /* must be CMH_MGMT_V1 */
+       __u32 entropy_ratio;            /* CMH_DRBG_RATIO_* */
+       __u32 security_strength;        /* CMH_DRBG_STRENGTH_* */
+       __u32 __reserved;
+};
+
+/* EAC ioctl number */
+#define CMH_IOCTL_EAC_READ             _IOWR(CMH_MGMT_IOC_MAGIC, 0x0F, \
+                                       struct cmh_ioctl_eac_read)
+
+/* DRBG management ioctl number */
+#define CMH_IOCTL_DRBG_CONFIG          _IOW(CMH_MGMT_IOC_MAGIC, 0x40, \
+                                       struct cmh_ioctl_drbg_config)
+
+#endif /* _UAPI_CMH_MGMT_IOCTL_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 09/19] crypto: cmh - add SM4 skcipher/aead/cmac/xcbc
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register SM4 algorithms using the CMH SM4 core (core ID 0x04):
- skcipher: SM4-ECB, SM4-CBC, SM4-CTR, SM4-XTS, SM4-CFB
- aead: SM4-GCM, SM4-CCM
- ahash: SM4-CMAC, SM4-XCBC

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile           |   5 +-
 drivers/crypto/cmh/cmh_main.c         |  25 +
 drivers/crypto/cmh/cmh_sm4_aead.c     | 870 ++++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_sm4_cmac.c     | 672 ++++++++++++++++++++
 drivers/crypto/cmh/cmh_sm4_skcipher.c | 690 ++++++++++++++++++++
 drivers/crypto/cmh/include/cmh_sm4.h  |  24 +
 6 files changed, 2285 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_sm4_aead.c
 create mode 100644 drivers/crypto/cmh/cmh_sm4_cmac.c
 create mode 100644 drivers/crypto/cmh/cmh_sm4_skcipher.c
 create mode 100644 drivers/crypto/cmh/include/cmh_sm4.h

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index ced8d1748e6c..1f36cd9c0b98 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -22,7 +22,10 @@ cmh-y := \
        cmh_sm3.o \
        cmh_aes.o \
        cmh_aes_aead.o \
-       cmh_aes_cmac.o
+       cmh_aes_cmac.o \
+       cmh_sm4_skcipher.o \
+       cmh_sm4_aead.o \
+       cmh_sm4_cmac.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 1edd8d14c666..5d67a4a12333 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -35,6 +35,7 @@
 #include "cmh_kmac.h"
 #include "cmh_sm3.h"
 #include "cmh_aes.h"
+#include "cmh_sm4.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -237,6 +238,21 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_aes_cmac_register;

+       /* Register SM4 skcipher algorithms */
+       ret = cmh_sm4_register();
+       if (ret)
+               goto err_sm4_register;
+
+       /* Register SM4 AEAD algorithms (GCM, CCM) */
+       ret = cmh_sm4_aead_register();
+       if (ret)
+               goto err_sm4_aead_register;
+
+       /* Register SM4 CMAC/XCBC algorithms */
+       ret = cmh_sm4_cmac_register();
+       if (ret)
+               goto err_sm4_cmac_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -249,6 +265,12 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_sm4_cmac_unregister();
+err_sm4_cmac_register:
+       cmh_sm4_aead_unregister();
+err_sm4_aead_register:
+       cmh_sm4_unregister();
+err_sm4_register:
        cmh_aes_cmac_unregister();
 err_aes_cmac_register:
        cmh_aes_aead_unregister();
@@ -291,6 +313,9 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_sm4_cmac_unregister();
+       cmh_sm4_aead_unregister();
+       cmh_sm4_unregister();
        cmh_aes_cmac_unregister();
        cmh_aes_aead_unregister();
        cmh_aes_unregister();
diff --git a/drivers/crypto/cmh/cmh_sm4_aead.c b/drivers/crypto/cmh/cmh_sm4_aead.c
new file mode 100644
index 000000000000..478119bb9c08
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_sm4_aead.c
@@ -0,0 +1,870 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API SM4 AEAD Driver (GCM/CCM)
+ *
+ * Registers AEAD algorithms with the Linux crypto subsystem:
+ *   gcm(sm4), ccm(sm4)
+ *
+ * GCM: SM4_CMD_INIT(mode=GCM) + [AAD_FINAL] + SM4_CMD_FINAL + FLUSH
+ * CCM: SM4_CMD_CCM_INIT + [AAD_FINAL] + SM4_CMD_FINAL + FLUSH
+ *   - SM4 CCM uses a distinct sm4_cmd_ccm_init struct
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/utils.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_sm4.h"
+#include "cmh_vcq.h"
+#include "cmh_sm4_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/*
+ * GCM IV contract:
+ *
+ * The SM4 core requires exactly 16 bytes loaded into its IV register.
+ * For standard 96-bit nonce GCM, the driver passes:
+ *
+ *   IV[0..11]  = user-supplied 12-byte nonce
+ *   IV[12..15] = 0x00000000
+ *
+ * The hardware internally sets the last 32 bits to the big-endian
+ * counter value 1 (forming J0 = nonce || 0x00000001) before
+ * processing AAD.  The driver must NOT pre-set the counter.
+ *
+ * If the IV format is incorrect, GCM authentication will fail
+ * (encrypt produces wrong ciphertext/tag, decrypt rejects).
+ */
+#define SM4_GCM_IV_SIZE                12U     /* GCM nonce size (standard) */
+#define SM4_GCM_HW_IV_SIZE     16U     /* HW requires 16-byte IV buffer */
+#define SM4_GCM_TAG_SIZE       16U
+
+/* CCM: callers pass a 16-byte IV in RFC 3610 format:
+ * iv[0] = L-1, iv[1..14-iv[0]] = nonce, rest = counter (zeroed).
+ * Nonce length = 14 - iv[0], range 7..13.
+ */
+#define SM4_CCM_IV_SIZE        16U
+
+enum cmh_sm4_aead_type {
+       CMH_SM4_AEAD_GCM,
+       CMH_SM4_AEAD_CCM,
+};
+
+struct cmh_sm4_aead_info {
+       enum cmh_sm4_aead_type type;
+       u32         sm4_mode;
+       u32         ivsize;
+       u32         maxauthsize;
+       const char *alg_name;
+       const char *drv_name;
+};
+
+static const struct cmh_sm4_aead_info sm4_aead_algs[] = {
+       { CMH_SM4_AEAD_GCM, SM4_MODE_GCM, SM4_GCM_IV_SIZE,
+         SM4_GCM_TAG_SIZE, "gcm(sm4)", "cri-cmh-gcm-sm4" },
+       { CMH_SM4_AEAD_CCM, SM4_MODE_CCM, SM4_CCM_IV_SIZE,
+         SM4_GCM_TAG_SIZE, "ccm(sm4)", "cri-cmh-ccm-sm4" },
+};
+
+struct cmh_sm4_aead_tfm_ctx {
+       struct cmh_key_ctx key;
+       u32 authsize;
+       struct crypto_cipher *sw_cipher;        /* CCM empty-input fallback */
+};
+
+/* Per-request context (lives in aead_request::__ctx) */
+
+#define CMH_SM4_AEAD_MAX_PAYLOAD       5
+#define CMH_SM4_AEAD_MAX_PACKED                (CMH_SM4_AEAD_MAX_PAYLOAD * 2)
+
+struct cmh_sm4_aead_reqctx {
+       dma_addr_t in_dma;
+       dma_addr_t out_dma;
+       dma_addr_t iv_dma;
+       dma_addr_t key_dma;
+       dma_addr_t aad_dma;
+       dma_addr_t tag_dma;
+       u8 *in_buf;
+       u8 *out_buf;
+       u8 *iv_buf;
+       u8 *aad_buf;
+       u8 *tag_buf;
+       u32 cryptlen;
+       u32 assoclen;
+       u32 authsize;
+       u32 iv_map_len;
+       u32 keylen;
+       bool encrypting;
+       bool empty_gcm_fallback;
+       struct vcq_cmd packed[CMH_SM4_AEAD_MAX_PACKED];
+};
+
+struct cmh_sm4_aead_drv {
+       struct aead_alg                  alg;
+       const struct cmh_sm4_aead_info  *info;
+};
+
+static const struct cmh_sm4_aead_info *
+cmh_sm4_aead_get_info(struct crypto_aead *tfm)
+{
+       struct aead_alg *alg = crypto_aead_alg(tfm);
+
+       return container_of(alg, struct cmh_sm4_aead_drv, alg)->info;
+}
+
+/* VCQ Builders -- SM4 AEAD-specific */
+
+static void vcq_add_sm4_aead_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref,
+                                 u64 iv_dma, u32 keylen, u32 ivlen,
+                                 u32 mode, u32 op, u32 aadlen, u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_INIT);
+       slot->hwc.sm4.cmd_init.key = key_ref;
+       slot->hwc.sm4.cmd_init.iv = iv_dma;
+       slot->hwc.sm4.cmd_init.keylen = keylen;
+       slot->hwc.sm4.cmd_init.ivlen = ivlen;
+       slot->hwc.sm4.cmd_init.mode = mode;
+       slot->hwc.sm4.cmd_init.op = op;
+       slot->hwc.sm4.cmd_init.aadlen = aadlen;
+       slot->hwc.sm4.cmd_init.iolen = iolen;
+}
+
+static void vcq_add_sm4_ccm_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref,
+                                u64 nonce_dma, u32 keylen, u32 noncelen,
+                                u32 op, u32 aadlen, u32 iolen, u32 taglen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_CCM_INIT);
+       slot->hwc.sm4.cmd_ccm_init.key = key_ref;
+       slot->hwc.sm4.cmd_ccm_init.nonce = nonce_dma;
+       slot->hwc.sm4.cmd_ccm_init.keylen = keylen;
+       slot->hwc.sm4.cmd_ccm_init.noncelen = noncelen;
+       slot->hwc.sm4.cmd_ccm_init.op = op;
+       slot->hwc.sm4.cmd_ccm_init.aadlen = aadlen;
+       slot->hwc.sm4.cmd_ccm_init.iolen = iolen;
+       slot->hwc.sm4.cmd_ccm_init.taglen = taglen;
+}
+
+static void vcq_add_sm4_aad_final(struct vcq_cmd *slot, u32 core_id, u64 aad_dma,
+                                 u32 aadlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_AAD_FINAL);
+       slot->hwc.sm4.cmd_aad_final.data = aad_dma;
+       slot->hwc.sm4.cmd_aad_final.datalen = aadlen;
+}
+
+static void vcq_add_sm4_aead_final(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                                  u64 output_dma, u64 tag_dma,
+                                  u32 iolen, u32 taglen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_FINAL);
+       slot->hwc.sm4.cmd_final.input = input_dma;
+       slot->hwc.sm4.cmd_final.output = output_dma;
+       slot->hwc.sm4.cmd_final.tag = tag_dma;
+       slot->hwc.sm4.cmd_final.iolen = iolen;
+       slot->hwc.sm4.cmd_final.taglen = taglen;
+}
+
+/* setkey */
+static int cmh_sm4_aead_setkey(struct crypto_aead *tfm, const u8 *key,
+                              unsigned int keylen)
+{
+       struct cmh_sm4_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       /* SM4 always uses 128-bit keys */
+       if (keylen != CMH_SM4_KEY_SIZE)
+               return -EINVAL;
+
+       if (tctx->sw_cipher) {
+               int ret;
+
+               ret = crypto_cipher_setkey(tctx->sw_cipher, key, keylen);
+               if (ret)
+                       return ret;
+       }
+
+       return cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_SM4);
+}
+
+static int cmh_sm4_aead_setauthsize(struct crypto_aead *tfm,
+                                   unsigned int authsize)
+{
+       struct cmh_sm4_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       const struct cmh_sm4_aead_info *info = cmh_sm4_aead_get_info(tfm);
+
+       if (info->type == CMH_SM4_AEAD_GCM) {
+               /* eSW enforces taglen == 16 for SM4 GCM (EIP40_SM4_TAG_SIZE) */
+               if (authsize != 16)
+                       return -EINVAL;
+       } else {
+               /* CCM: accept 4, 6, 8, 10, 12, 14, 16 per RFC 3610 */
+               if (authsize < 4 || authsize > 16 || (authsize & 1))
+                       return -EINVAL;
+       }
+
+       tctx->authsize = authsize;
+       return 0;
+}
+
+static int cmh_sm4_aead_init_tfm(struct crypto_aead *tfm)
+{
+       struct cmh_sm4_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       const struct cmh_sm4_aead_info *info = cmh_sm4_aead_get_info(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       tctx->authsize = info->maxauthsize;
+
+       if (info->type == CMH_SM4_AEAD_CCM) {
+               struct crypto_cipher *ci;
+
+               ci = crypto_alloc_cipher("sm4", 0, 0);
+               if (IS_ERR(ci))
+                       return PTR_ERR(ci);
+               tctx->sw_cipher = ci;
+       }
+
+       crypto_aead_set_reqsize(tfm, sizeof(struct cmh_sm4_aead_reqctx));
+       return 0;
+}
+
+static void cmh_sm4_aead_exit_tfm(struct crypto_aead *tfm)
+{
+       struct cmh_sm4_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+
+       if (tctx->sw_cipher)
+               crypto_free_cipher(tctx->sw_cipher);
+       cmh_key_destroy(&tctx->key);
+}
+
+/* DMA unmap helper */
+static void cmh_sm4_aead_unmap_dma(struct cmh_sm4_aead_reqctx *rctx)
+{
+       u32 tag_map_len;
+
+       cmh_dma_unmap_single(rctx->iv_dma, rctx->iv_map_len, DMA_TO_DEVICE);
+       tag_map_len = rctx->empty_gcm_fallback ?
+                     SM4_GCM_HW_IV_SIZE : rctx->authsize;
+       cmh_dma_unmap_single(rctx->tag_dma, tag_map_len,
+                            (rctx->encrypting || rctx->empty_gcm_fallback) ?
+                             DMA_FROM_DEVICE : DMA_TO_DEVICE);
+       if (rctx->cryptlen > 0) {
+               cmh_dma_unmap_single(rctx->out_dma, rctx->cryptlen,
+                                    DMA_FROM_DEVICE);
+               cmh_dma_unmap_single(rctx->in_dma, rctx->cryptlen,
+                                    DMA_TO_DEVICE);
+       }
+       if (rctx->assoclen > 0)
+               cmh_dma_unmap_single(rctx->aad_dma, rctx->assoclen,
+                                    DMA_TO_DEVICE);
+}
+
+static void cmh_sm4_aead_free_bufs(struct cmh_sm4_aead_reqctx *rctx)
+{
+       kfree(rctx->iv_buf);
+       rctx->iv_buf = NULL;
+       kfree(rctx->tag_buf);
+       rctx->tag_buf = NULL;
+       kfree_sensitive(rctx->out_buf);
+       rctx->out_buf = NULL;
+       kfree_sensitive(rctx->in_buf);
+       rctx->in_buf = NULL;
+       kfree(rctx->aad_buf);
+       rctx->aad_buf = NULL;
+}
+
+static void cmh_sm4_aead_complete(void *data, int error)
+{
+       struct aead_request *req = data;
+       struct cmh_sm4_aead_reqctx *rctx = aead_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       cmh_sm4_aead_unmap_dma(rctx);
+
+       /*
+        * Map HW error on decrypt to -EBADMSG.  The eSW SM4 core uses a
+        * single error code (-EIO) for both authentication failures and
+        * other core errors (e.g. DMA timeout), so we cannot distinguish
+        * them from the MBX_STATUS alone.  In practice the only error
+        * during a well-formed AEAD decrypt is auth-tag mismatch; a DMA
+        * timeout would indicate a fatal HW problem where -EBADMSG vs
+        * -EIO is moot.  The kernel crypto API requires -EBADMSG for
+        * AEAD authentication failures.
+        */
+       if (error == -EIO && !rctx->encrypting)
+               error = -EBADMSG;
+
+       if (!error) {
+               if (rctx->empty_gcm_fallback && !rctx->encrypting) {
+                       if (crypto_memneq(rctx->tag_buf, rctx->in_buf,
+                                         rctx->authsize))
+                               error = -EBADMSG;
+               }
+               if (!error && rctx->cryptlen > 0)
+                       scatterwalk_map_and_copy(rctx->out_buf, req->dst,
+                                                req->assoclen,
+                                               rctx->cryptlen, 1);
+               if (!error && rctx->encrypting)
+                       scatterwalk_map_and_copy(rctx->tag_buf, req->dst,
+                                                req->assoclen +
+                                               rctx->cryptlen,
+                                               rctx->authsize, 1);
+       }
+
+       cmh_sm4_aead_free_bufs(rctx);
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * GCM empty-input fallback (SM4).
+ *
+ * When both AAD and plaintext are empty, GCM reduces to:
+ *   tag = E(K, J0) where J0 = nonce || 0x00000001
+ *
+ * The eSW GCM engine rejects this degenerate case, so we compute it
+ * via a single ECB block encryption of J0.
+ *
+ * VCQ: [SYS_CMD_WRITE] + SM4_CMD_INIT(ECB) + SM4_CMD_FINAL + FLUSH
+ */
+static int cmh_sm4_gcm_empty(struct aead_request *req, u32 sm4_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_sm4_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       struct cmh_sm4_aead_reqctx *rctx = aead_request_ctx(req);
+       struct vcq_cmd cmds[CMH_SM4_AEAD_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen, authsize;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       authsize = tctx->authsize;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = 0;
+       rctx->assoclen = 0;
+       rctx->authsize = authsize;
+       rctx->encrypting = (sm4_op == SM4_OP_ENCRYPT);
+       rctx->empty_gcm_fallback = true;
+
+       /* Build J0 = nonce || 0x00000001 in iv_buf */
+       rctx->iv_buf = kzalloc(SM4_GCM_HW_IV_SIZE, gfp);
+       if (!rctx->iv_buf)
+               return -ENOMEM;
+       memcpy(rctx->iv_buf, req->iv, SM4_GCM_IV_SIZE);
+       rctx->iv_buf[15] = 0x01;
+       rctx->iv_map_len = SM4_GCM_HW_IV_SIZE;
+
+       rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf, SM4_GCM_HW_IV_SIZE,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->iv_dma)) {
+               ret = -ENOMEM;
+               goto out_free_iv;
+       }
+
+       /* Tag buffer -- receives E(K, J0) output */
+       rctx->tag_buf = kzalloc(SM4_GCM_HW_IV_SIZE, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_iv;
+       }
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf, SM4_GCM_HW_IV_SIZE,
+                                          DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       /* For decrypt: read expected tag from request */
+       if (!rctx->encrypting) {
+               rctx->in_buf = kmalloc(authsize, gfp);
+               if (!rctx->in_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+               scatterwalk_map_and_copy(rctx->in_buf, req->src, 0,
+                                        authsize, 0);
+       }
+
+       /* Resolve key */
+       idx = 0;
+       rctx->key_dma = tctx->key.raw.dma;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_SM4);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /* ECB INIT: single block encryption of J0 */
+       vcq_add_sm4_aead_init(&cmds[idx++], core_id, key_ref,
+                             0, keylen, 0, SM4_MODE_ECB, SM4_OP_ENCRYPT,
+                             0, SM4_GCM_HW_IV_SIZE);
+
+       /* FINAL: J0 in, E(K,J0) out */
+       vcq_add_sm4_aead_final(&cmds[idx++], core_id,
+                              (u64)rctx->iv_dma, (u64)rctx->tag_dma,
+                              0, SM4_GCM_HW_IV_SIZE, 0);
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_SM4_AEAD_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_sm4_aead_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_free_in;
+
+       return -EINPROGRESS;
+
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, SM4_GCM_HW_IV_SIZE,
+                            DMA_FROM_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_unmap_iv:
+       cmh_dma_unmap_single(rctx->iv_dma, SM4_GCM_HW_IV_SIZE, DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+       return ret;
+}
+
+/*
+ * CCM empty-input fallback (SM4).
+ *
+ * When both AAD and plaintext are empty, CCM reduces to:
+ *   T  = E(K, B0)    -- CBC-MAC of the single formatting block
+ *   S0 = E(K, A0)    -- CTR block zero
+ *   tag = (T XOR S0)[0..authsize-1]
+ *
+ * The eSW rejects this degenerate case, so the driver computes it
+ * synchronously via two crypto_cipher single-block encryptions.
+ */
+static int cmh_sm4_ccm_empty(struct aead_request *req, u32 sm4_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_sm4_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       u32 authsize = tctx->authsize;
+       u8 b0[CMH_SM4_BLOCK_SIZE], a0[CMH_SM4_BLOCK_SIZE];
+       u8 t[CMH_SM4_BLOCK_SIZE], s0[CMH_SM4_BLOCK_SIZE];
+       u8 tag[CMH_SM4_BLOCK_SIZE];
+       u8 L;
+       u32 i;
+
+       /* Defense-in-depth: iv[0] = L-1, valid L is 2..8 per RFC 3610 S2.1 */
+       if (WARN_ON_ONCE(req->iv[0] < 1 || req->iv[0] > 7))
+               return -EINVAL;
+
+       L = req->iv[0] + 1;
+
+       if (tctx->key.mode != CMH_KEY_RAW)
+               return -EOPNOTSUPP;
+
+       /* B0: flags || nonce || Q(=0).  Adata=0, t=authsize, q=L. */
+       memset(b0, 0, CMH_SM4_BLOCK_SIZE);
+       b0[0] = (u8)(8 * ((authsize - 2) / 2) + (L - 1));
+       memcpy(&b0[1], &req->iv[1], 15 - L);
+
+       /* A0: (L-1) || nonce || counter(=0) */
+       memset(a0, 0, CMH_SM4_BLOCK_SIZE);
+       a0[0] = (u8)(L - 1);
+       memcpy(&a0[1], &req->iv[1], 15 - L);
+
+       crypto_cipher_encrypt_one(tctx->sw_cipher, t, b0);
+       crypto_cipher_encrypt_one(tctx->sw_cipher, s0, a0);
+
+       for (i = 0; i < authsize; i++)
+               tag[i] = t[i] ^ s0[i];
+
+       if (sm4_op == SM4_OP_ENCRYPT) {
+               scatterwalk_map_and_copy(tag, req->dst,
+                                        req->assoclen, authsize, 1);
+       } else {
+               u8 expected[CMH_SM4_BLOCK_SIZE];
+
+               scatterwalk_map_and_copy(expected, req->src,
+                                        req->assoclen, authsize, 0);
+               if (crypto_memneq(tag, expected, authsize))
+                       return -EBADMSG;
+       }
+
+       return 0;
+}
+
+static int cmh_sm4_aead_crypt(struct aead_request *req, u32 sm4_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_sm4_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       const struct cmh_sm4_aead_info *info = cmh_sm4_aead_get_info(tfm);
+       struct cmh_sm4_aead_reqctx *rctx = aead_request_ctx(req);
+       struct vcq_cmd cmds[CMH_SM4_AEAD_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen, authsize, cryptlen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE)
+               return -ENOKEY;
+
+       authsize = tctx->authsize;
+
+       if (sm4_op == SM4_OP_ENCRYPT) {
+               cryptlen = req->cryptlen;
+       } else {
+               if (req->cryptlen < authsize)
+                       return -EINVAL;
+               cryptlen = req->cryptlen - authsize;
+       }
+
+       /*
+        * Validate CCM IV format early -- the empty-input fallback and
+        * nonce extraction both depend on iv[0] being in range [1,7].
+        */
+       if (info->type == CMH_SM4_AEAD_CCM) {
+               if (req->iv[0] < 1 || req->iv[0] > 7)
+                       return -EINVAL;
+       }
+
+       /*
+        * The CMH eSW rejects SM4 GCM/CCM when both aadlen and iolen
+        * are zero.  For GCM, the tag is simply E(K, J0) -- use ECB
+        * fallback.  For CCM, compute tag = E(K,B0) XOR E(K,A0) in SW.
+        */
+       if (cryptlen == 0 && req->assoclen == 0) {
+               if (info->type == CMH_SM4_AEAD_GCM)
+                       return cmh_sm4_gcm_empty(req, sm4_op);
+               return cmh_sm4_ccm_empty(req, sm4_op);
+       }
+
+       /*
+        * HW uses a proprietary LLI scatter-gather format that is
+        * incompatible with struct scatterlist, so the payload is
+        * linearised into contiguous buffers for DMA.  Cap total
+        * size to prevent excessive memory consumption.
+        */
+       if ((u64)cryptlen + req->assoclen > SZ_1M)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = cryptlen;
+       rctx->assoclen = req->assoclen;
+       rctx->authsize = authsize;
+       rctx->encrypting = (sm4_op == SM4_OP_ENCRYPT);
+
+       /* Linearise AAD */
+       if (req->assoclen > 0) {
+               rctx->aad_buf = kmalloc(req->assoclen, gfp);
+               if (!rctx->aad_buf)
+                       return -ENOMEM;
+               scatterwalk_map_and_copy(rctx->aad_buf, req->src,
+                                        0, req->assoclen, 0);
+               rctx->aad_dma = cmh_dma_map_single(rctx->aad_buf,
+                                                  req->assoclen,
+                                                   DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->aad_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_aad;
+               }
+       }
+
+       /* Linearise input */
+       if (cryptlen > 0) {
+               rctx->in_buf = kmalloc(cryptlen, gfp);
+               if (!rctx->in_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_aad;
+               }
+               scatterwalk_map_and_copy(rctx->in_buf, req->src,
+                                        req->assoclen, cryptlen, 0);
+               rctx->in_dma = cmh_dma_map_single(rctx->in_buf, cryptlen,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->in_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_in;
+               }
+       }
+
+       /* Allocate output buffer */
+       if (cryptlen > 0) {
+               rctx->out_buf = kmalloc(cryptlen, gfp);
+               if (!rctx->out_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_in;
+               }
+               rctx->out_dma = cmh_dma_map_single(rctx->out_buf, cryptlen,
+                                                  DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(rctx->out_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_out;
+               }
+       }
+
+       /* Tag buffer */
+       rctx->tag_buf = kmalloc(authsize, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_out;
+       }
+
+       if (!rctx->encrypting) {
+               scatterwalk_map_and_copy(rctx->tag_buf, req->src,
+                                        req->assoclen + cryptlen,
+                                       authsize, 0);
+       } else {
+               memset(rctx->tag_buf, 0, authsize);
+       }
+
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf, authsize,
+                                          rctx->encrypting ?
+                                           DMA_FROM_DEVICE : DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       /* Map IV/nonce */
+       if (info->type == CMH_SM4_AEAD_GCM) {
+               rctx->iv_buf = kzalloc(SM4_GCM_HW_IV_SIZE, gfp);
+               if (!rctx->iv_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+               memcpy(rctx->iv_buf, req->iv, SM4_GCM_IV_SIZE);
+               rctx->iv_map_len = SM4_GCM_HW_IV_SIZE;
+               rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf,
+                                                 rctx->iv_map_len,
+                                                  DMA_TO_DEVICE);
+       } else {
+               u32 noncelen;
+
+               if (req->iv[0] < 1 || req->iv[0] > 7) {
+                       ret = -EINVAL;
+                       goto out_unmap_tag;
+               }
+               noncelen = 14 - req->iv[0];
+
+               rctx->iv_buf = kmemdup(req->iv + 1, noncelen, gfp);
+               if (!rctx->iv_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+               rctx->iv_map_len = noncelen;
+               rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf,
+                                                 rctx->iv_map_len,
+                                                  DMA_TO_DEVICE);
+       }
+       if (cmh_dma_map_error(rctx->iv_dma)) {
+               ret = -ENOMEM;
+               goto out_free_iv;
+       }
+
+       /* Resolve key reference */
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_SM4);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /* Build INIT command */
+       if (info->type == CMH_SM4_AEAD_CCM) {
+               vcq_add_sm4_ccm_init(&cmds[idx++], core_id, key_ref,
+                                    (u64)rctx->iv_dma, keylen,
+                                    rctx->iv_map_len, sm4_op,
+                                    req->assoclen, cryptlen, authsize);
+       } else {
+               vcq_add_sm4_aead_init(&cmds[idx++], core_id, key_ref,
+                                     (u64)rctx->iv_dma, keylen,
+                                     SM4_GCM_HW_IV_SIZE, info->sm4_mode,
+                                     sm4_op, req->assoclen, cryptlen);
+       }
+
+       if (req->assoclen > 0)
+               vcq_add_sm4_aad_final(&cmds[idx++], core_id,
+                                     (u64)rctx->aad_dma, req->assoclen);
+
+       vcq_add_sm4_aead_final(&cmds[idx++], core_id,
+                              cryptlen > 0 ? (u64)rctx->in_dma : 0,
+                              cryptlen > 0 ? (u64)rctx->out_dma : 0,
+                              (u64)rctx->tag_dma, cryptlen, authsize);
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_SM4_AEAD_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_sm4_aead_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       cmh_dma_unmap_single(rctx->iv_dma, rctx->iv_map_len, DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, authsize,
+                            rctx->encrypting ? DMA_FROM_DEVICE :
+                                              DMA_TO_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_unmap_out:
+       if (cryptlen > 0)
+               cmh_dma_unmap_single(rctx->out_dma, cryptlen, DMA_FROM_DEVICE);
+out_free_out:
+       kfree_sensitive(rctx->out_buf);
+out_unmap_in:
+       if (cryptlen > 0)
+               cmh_dma_unmap_single(rctx->in_dma, cryptlen, DMA_TO_DEVICE);
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+out_unmap_aad:
+       if (req->assoclen > 0)
+               cmh_dma_unmap_single(rctx->aad_dma, req->assoclen,
+                                    DMA_TO_DEVICE);
+out_free_aad:
+       kfree(rctx->aad_buf);
+       return ret;
+}
+
+static int cmh_sm4_aead_encrypt(struct aead_request *req)
+{
+       return cmh_sm4_aead_crypt(req, SM4_OP_ENCRYPT);
+}
+
+static int cmh_sm4_aead_decrypt(struct aead_request *req)
+{
+       return cmh_sm4_aead_crypt(req, SM4_OP_DECRYPT);
+}
+
+/* Registration */
+
+static struct cmh_sm4_aead_drv sm4_aead_drv_algs[ARRAY_SIZE(sm4_aead_algs)];
+
+/**
+ * cmh_sm4_aead_register() - Register SM4-GCM/CCM AEAD algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_sm4_aead_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < ARRAY_SIZE(sm4_aead_algs); i++) {
+               const struct cmh_sm4_aead_info *info = &sm4_aead_algs[i];
+               struct cmh_sm4_aead_drv *drv = &sm4_aead_drv_algs[i];
+               struct aead_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               memset(alg, 0, sizeof(*alg));
+
+               alg->setkey      = cmh_sm4_aead_setkey;
+               alg->setauthsize = cmh_sm4_aead_setauthsize;
+               alg->encrypt     = cmh_sm4_aead_encrypt;
+               alg->decrypt     = cmh_sm4_aead_decrypt;
+               alg->init        = cmh_sm4_aead_init_tfm;
+               alg->exit        = cmh_sm4_aead_exit_tfm;
+               alg->ivsize      = info->ivsize;
+               alg->maxauthsize = info->maxauthsize;
+
+               strscpy(alg->base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->base.cra_priority  = 300;
+               alg->base.cra_flags     = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                         CRYPTO_ALG_ASYNC;
+               alg->base.cra_blocksize = 1;
+               alg->base.cra_ctxsize  = sizeof(struct cmh_sm4_aead_tfm_ctx);
+               alg->base.cra_module   = THIS_MODULE;
+
+               ret = crypto_register_aead(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh_sm4_aead: failed to register %s (rc=%d)\n",
+                               info->alg_name, ret);
+                       goto err_unregister;
+               }
+
+               dev_dbg(cmh_dev(), "cmh_sm4_aead: registered %s\n", info->alg_name);
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_aead(&sm4_aead_drv_algs[i].alg);
+       return ret;
+}
+
+/**
+ * cmh_sm4_aead_unregister() - Unregister SM4 AEAD algorithms from the crypto framework
+ */
+void cmh_sm4_aead_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < ARRAY_SIZE(sm4_aead_algs); i++) {
+               crypto_unregister_aead(&sm4_aead_drv_algs[i].alg);
+               dev_dbg(cmh_dev(), "cmh_sm4_aead: unregistered %s\n",
+                       sm4_aead_algs[i].alg_name);
+       }
+}
diff --git a/drivers/crypto/cmh/cmh_sm4_cmac.c b/drivers/crypto/cmh/cmh_sm4_cmac.c
new file mode 100644
index 000000000000..9304dede3f68
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_sm4_cmac.c
@@ -0,0 +1,672 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API SM4-CMAC / SM4-XCBC (ahash) Driver
+ *
+ * Registers cmac(sm4) and xcbc(sm4) as ahash algorithms.
+ *
+ * Both produce a 16-byte tag (MAC) from a key and message.
+ * VCQ sequence: [SYS_CMD_WRITE] + SM4_CMD_INIT(CMAC/XCBC) +
+ *               SM4_CMD_AAD_FINAL + SM4_CMD_FINAL + FLUSH
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/hash.h>
+#include <crypto/scatterwalk.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_sm4.h"
+#include "cmh_vcq.h"
+#include "cmh_sm4_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+#define SM4_MAC_DIGEST_SIZE    16U
+#define SM4_MAC_BLOCK_SIZE     16U
+/*
+ * Maximum accumulated data for SM4 MAC -- driver-imposed, not HW.
+ *
+ * The SM4 core does not expose external save/restore VCQ commands,
+ * so the driver must accumulate all data in kernel memory via
+ * .update() and submit it atomically in .final().  This cap limits
+ * the per-request kernel allocation.
+ */
+#define SM4_MAC_MAX_DATA       (64 * 1024)
+
+struct cmh_sm4_mac_alg_info {
+       u32         sm4_mode;   /* SM4_MODE_CMAC or SM4_MODE_XCBC */
+       const char *alg_name;
+       const char *drv_name;
+};
+
+static const struct cmh_sm4_mac_alg_info sm4_mac_algs[] = {
+       { SM4_MODE_CMAC, "cmac(sm4)", "cri-cmh-cmac-sm4" },
+       { SM4_MODE_XCBC, "xcbc(sm4)", "cri-cmh-xcbc-sm4" },
+};
+
+struct cmh_sm4_mac_tfm_ctx {
+       struct cmh_key_ctx key;
+       u32 sm4_mode;
+       struct crypto_cipher *sw_cipher;        /* XCBC empty-input fallback */
+       /* Cached XCBC subkeys (derived at setkey time for concurrency safety) */
+       u8 xcbc_k1[CMH_SM4_BLOCK_SIZE];         /* K1 = E(K, 0x01..01) */
+       u8 xcbc_k3[CMH_SM4_BLOCK_SIZE];         /* K3 = E(K, 0x03..03) */
+       bool xcbc_subkeys_valid;
+       spinlock_t         chunk_lock;  /* protects all_chunks */
+       struct list_head   all_chunks;  /* orphan-safe chunk tracking */
+};
+
+/* Per-request context (lives in ahash_request::__ctx) */
+/* Chunk node for O(1) update() appends */
+struct cmh_sm4_mac_chunk {
+       struct list_head list;
+       struct list_head tfm_node; /* per-tfm orphan tracking */
+       u32 len;
+       u8  data[];
+};
+
+/* Per-request context (lives in ahash_request::__ctx) */
+
+#define CMH_SM4_MAC_MAX_PAYLOAD                5
+#define CMH_SM4_MAC_MAX_PACKED         (CMH_SM4_MAC_MAX_PAYLOAD * 2)
+
+struct cmh_sm4_mac_reqctx {
+       struct list_head chunks;
+       u32  total_len;
+       u8  *buf;               /* linearised in final() */
+       /* DMA state for async final */
+       dma_addr_t key_dma;
+       dma_addr_t in_dma;
+       dma_addr_t tag_dma;
+       u8 *tag_buf;
+       u32 keylen;
+       struct vcq_cmd packed[CMH_SM4_MAC_MAX_PACKED];
+};
+
+/* Flat state for export/import -- holds accumulated input data only */
+struct cmh_sm4_mac_export_state {
+       u32 total_len;
+       u8  data[];
+};
+
+/*
+ * Flat state buffer for export/import.  The CMH SM4 core does not
+ * support save/restore of intermediate MAC state, so this driver
+ * accumulates input in SW and serialises the buffer on export.
+ *
+ * PAGE_SIZE (4096) caps the exportable accumulated-data window.
+ * Full-range export is not feasible because the crypto subsystem
+ * pre-allocates statesize bytes per request.  Export returns -EINVAL
+ * if the caller has accumulated more than CMH_SM4_MAC_EXPORT_MAX.
+ */
+#define CMH_SM4_MAC_STATE_SIZE 4096
+#define CMH_SM4_MAC_EXPORT_MAX \
+       (CMH_SM4_MAC_STATE_SIZE - sizeof(struct cmh_sm4_mac_export_state))
+
+struct cmh_sm4_mac_drv {
+       struct ahash_alg                   alg;
+       const struct cmh_sm4_mac_alg_info *info;
+};
+
+static int cmh_sm4_mac_setkey(struct crypto_ahash *tfm, const u8 *key,
+                             unsigned int keylen)
+{
+       struct cmh_sm4_mac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       int ret;
+
+       if (keylen != CMH_SM4_KEY_SIZE)
+               return -EINVAL;
+
+       if (tctx->sw_cipher) {
+               u8 const1[CMH_SM4_BLOCK_SIZE], const3[CMH_SM4_BLOCK_SIZE];
+
+               ret = crypto_cipher_setkey(tctx->sw_cipher, key, keylen);
+               if (ret)
+                       return ret;
+
+               /* Pre-derive XCBC subkeys for concurrent-safe final() */
+               memset(const1, 0x01, CMH_SM4_BLOCK_SIZE);
+               memset(const3, 0x03, CMH_SM4_BLOCK_SIZE);
+               crypto_cipher_encrypt_one(tctx->sw_cipher, tctx->xcbc_k1,
+                                         const1);
+               crypto_cipher_encrypt_one(tctx->sw_cipher, tctx->xcbc_k3,
+                                         const3);
+
+               /*
+                * Leave sw_cipher keyed with K1 permanently.
+                * final() only needs E(K1, block) and never touches the
+                * original key again, so no re-keying in the hot path
+                * eliminates the per-tfm concurrency race entirely.
+                */
+               ret = crypto_cipher_setkey(tctx->sw_cipher, tctx->xcbc_k1,
+                                          CMH_SM4_BLOCK_SIZE);
+               if (ret)
+                       return ret;
+       }
+
+       ret = cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_SM4);
+       if (ret)
+               return ret;
+
+       if (tctx->sw_cipher)
+               tctx->xcbc_subkeys_valid = true;
+
+       return 0;
+}
+
+static void cmh_sm4_mac_free_chunks(struct cmh_sm4_mac_reqctx *rctx,
+                                   struct cmh_sm4_mac_tfm_ctx *tctx)
+{
+       struct cmh_sm4_mac_chunk *c, *tmp;
+
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(c, tmp, &rctx->chunks, list) {
+               list_del(&c->list);
+               list_del(&c->tfm_node);
+               kfree_sensitive(c);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+}
+
+static int cmh_sm4_mac_init(struct ahash_request *req)
+{
+       struct cmh_sm4_mac_reqctx *rctx = ahash_request_ctx(req);
+
+       memset(rctx, 0, sizeof(*rctx));
+       INIT_LIST_HEAD(&rctx->chunks);
+       return 0;
+}
+
+static int cmh_sm4_mac_update(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_sm4_mac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_sm4_mac_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_sm4_mac_chunk *chunk;
+       gfp_t gfp;
+       int ret;
+
+       if (!req->nbytes)
+               return 0;
+
+       if (req->nbytes > SM4_MAC_MAX_DATA - rctx->total_len) {
+               ret = -EINVAL;
+               goto err_free_chunks;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+       chunk = kmalloc(sizeof(*chunk) + req->nbytes, gfp);
+       if (!chunk) {
+               ret = -ENOMEM;
+               goto err_free_chunks;
+       }
+
+       chunk->len = req->nbytes;
+       if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+               memcpy(chunk->data, req->svirt, req->nbytes);
+       else
+               scatterwalk_map_and_copy(chunk->data, req->src,
+                                        0, req->nbytes, 0);
+       list_add_tail(&chunk->list, &rctx->chunks);
+       spin_lock_bh(&tctx->chunk_lock);
+       list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->total_len += req->nbytes;
+       return 0;
+
+err_free_chunks:
+       /*
+        * Terminal error -- free all previously accumulated chunks.
+        * callers may not call .final() on error, so they would leak.
+        */
+       cmh_sm4_mac_free_chunks(rctx, tctx);
+       return ret;
+}
+
+static void cmh_sm4_mac_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_sm4_mac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_sm4_mac_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (rctx->total_len > 0)
+               cmh_dma_unmap_single(rctx->in_dma, rctx->total_len,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->tag_dma, SM4_MAC_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->tag_buf, SM4_MAC_DIGEST_SIZE);
+
+       kfree(rctx->tag_buf);
+       rctx->tag_buf = NULL;
+       cmh_sm4_mac_free_chunks(rctx, tctx);
+       kfree_sensitive(rctx->buf);
+       rctx->buf = NULL;
+       rctx->total_len = 0;
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_sm4_mac_final(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_sm4_mac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_sm4_mac_reqctx *rctx = ahash_request_ctx(req);
+       struct vcq_cmd cmds[CMH_SM4_MAC_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE) {
+               ret = -ENOKEY;
+               goto out_free_chunks;
+       }
+
+       /*
+        * XCBC empty-input SW fallback (RFC 3566).
+        *
+        * For a zero-length message:
+        *   K1 = E(K, 0x01010101...)  -- encryption subkey
+        *   K3 = E(K, 0x03030303...)  -- incomplete-block subkey
+        *   pad = 0x80 00...00        -- single 1 bit + 127 zero bits
+        *   tag = E(K1, pad XOR K3)
+        *
+        * The eSW produces incorrect output for this case, so the driver
+        * computes it synchronously using crypto_cipher.
+        *
+        * For DS keys we cannot derive subkeys (no raw key material),
+        * and the HW also cannot handle empty XCBC correctly, so
+        * return -EOPNOTSUPP.
+        */
+       if (rctx->total_len == 0 && tctx->sm4_mode == SM4_MODE_XCBC) {
+               u8 block[CMH_SM4_BLOCK_SIZE];
+               u32 i;
+
+               if (tctx->key.mode != CMH_KEY_RAW ||
+                   !tctx->xcbc_subkeys_valid) {
+                       cmh_sm4_mac_free_chunks(rctx, tctx);
+                       return -EOPNOTSUPP;
+               }
+
+               /* block = pad XOR K3 */
+               memset(block, 0, CMH_SM4_BLOCK_SIZE);
+               block[0] = 0x80;
+               for (i = 0; i < CMH_SM4_BLOCK_SIZE; i++)
+                       block[i] ^= tctx->xcbc_k3[i];
+
+               /*
+                * tag = E(K1, block)
+                *
+                * sw_cipher is permanently keyed with K1 (set at setkey
+                * time), so this is safe for concurrent requests sharing
+                * the same tfm -- no re-keying, no race.
+                */
+               crypto_cipher_encrypt_one(tctx->sw_cipher, req->result,
+                                         block);
+
+               cmh_sm4_mac_free_chunks(rctx, tctx);
+               return 0;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       /* Linearise chunks into a single contiguous buffer for DMA */
+       if (rctx->total_len > 0) {
+               struct cmh_sm4_mac_chunk *c;
+               u32 off = 0;
+
+               rctx->buf = kmalloc(rctx->total_len, gfp);
+               if (!rctx->buf) {
+                       ret = -ENOMEM;
+                       goto out_free_chunks;
+               }
+               list_for_each_entry(c, &rctx->chunks, list) {
+                       memcpy(rctx->buf + off, c->data, c->len);
+                       off += c->len;
+               }
+       }
+
+       rctx->tag_buf = kzalloc(SM4_MAC_DIGEST_SIZE, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_free_buf;
+       }
+
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf,
+                                          SM4_MAC_DIGEST_SIZE,
+                                           DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       if (rctx->total_len > 0) {
+               rctx->in_dma = cmh_dma_map_single(rctx->buf, rctx->total_len,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->in_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+       }
+
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_SM4);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /*
+        * INIT: mode=CMAC or XCBC
+        * CMAC/XCBC data goes through the AAD path:
+        *   aadlen = total data length, iolen = 0
+        */
+       {
+               struct vcq_cmd *slot = &cmds[idx++];
+
+               memset(slot, 0, sizeof(*slot));
+               slot->magic = VCQ_CMD_MAGIC;
+               slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_INIT);
+               slot->hwc.sm4.cmd_init.key = key_ref;
+               slot->hwc.sm4.cmd_init.iv = 0;
+               slot->hwc.sm4.cmd_init.keylen = keylen;
+               slot->hwc.sm4.cmd_init.ivlen = 0;
+               slot->hwc.sm4.cmd_init.mode = tctx->sm4_mode;
+               slot->hwc.sm4.cmd_init.op = SM4_OP_ENCRYPT;
+               slot->hwc.sm4.cmd_init.aadlen = rctx->total_len;
+               slot->hwc.sm4.cmd_init.iolen = 0;
+       }
+
+       /* AAD_FINAL: send data through the AAD path */
+       if (rctx->total_len > 0) {
+               struct vcq_cmd *slot = &cmds[idx++];
+
+               memset(slot, 0, sizeof(*slot));
+               slot->magic = VCQ_CMD_MAGIC;
+               slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_AAD_FINAL);
+               slot->hwc.sm4.cmd_aad_final.data = (u64)rctx->in_dma;
+               slot->hwc.sm4.cmd_aad_final.datalen = rctx->total_len;
+       }
+
+       /* FINAL: tag extraction only (no data) */
+       {
+               struct vcq_cmd *slot = &cmds[idx++];
+
+               memset(slot, 0, sizeof(*slot));
+               slot->magic = VCQ_CMD_MAGIC;
+               slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_FINAL);
+               slot->hwc.sm4.cmd_final.input = 0;
+               slot->hwc.sm4.cmd_final.output = 0;
+               slot->hwc.sm4.cmd_final.tag = (u64)rctx->tag_dma;
+               slot->hwc.sm4.cmd_final.iolen = 0;
+               slot->hwc.sm4.cmd_final.taglen = SM4_MAC_DIGEST_SIZE;
+       }
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_SM4_MAC_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_sm4_mac_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       if (rctx->total_len > 0 && !cmh_dma_map_error(rctx->in_dma))
+               cmh_dma_unmap_single(rctx->in_dma, rctx->total_len,
+                                    DMA_TO_DEVICE);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, SM4_MAC_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_free_buf:
+       kfree_sensitive(rctx->buf);
+       rctx->buf = NULL;
+out_free_chunks:
+       cmh_sm4_mac_free_chunks(rctx, tctx);
+       rctx->total_len = 0;
+       return ret;
+}
+
+/*
+ * ahash .export()/.import(): serialize/deserialize the software
+ * accumulation buffer.  No HW state is involved.
+ */
+
+static int cmh_sm4_mac_export(struct ahash_request *req, void *out)
+{
+       struct cmh_sm4_mac_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_sm4_mac_export_state *state = out;
+       struct cmh_sm4_mac_chunk *chunk;
+       u32 offset = 0;
+
+       if (rctx->total_len > CMH_SM4_MAC_EXPORT_MAX)
+               return -ENOSPC;
+
+       state->total_len = rctx->total_len;
+       list_for_each_entry(chunk, &rctx->chunks, list) {
+               memcpy(state->data + offset, chunk->data, chunk->len);
+               offset += chunk->len;
+       }
+       return 0;
+}
+
+static int cmh_sm4_mac_import(struct ahash_request *req, const void *in)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_sm4_mac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_sm4_mac_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_sm4_mac_export_state *state = in;
+       struct cmh_sm4_mac_chunk *chunk;
+
+       /*
+        * Do NOT call free_chunks() here: the crypto API does not
+        * guarantee the request context is in a valid state before
+        * import(), so the list pointers may be stale or invalid.
+        * Re-initialize from scratch instead.  Any pre-existing chunks
+        * are tracked on tctx->all_chunks and freed in exit_tfm.
+        */
+       memset(rctx, 0, sizeof(*rctx));
+       INIT_LIST_HEAD(&rctx->chunks);
+
+       if (state->total_len > CMH_SM4_MAC_EXPORT_MAX)
+               return -EINVAL;
+
+       if (state->total_len) {
+               chunk = kmalloc(sizeof(*chunk) + state->total_len, GFP_KERNEL);
+               if (!chunk)
+                       return -ENOMEM;
+               chunk->len = state->total_len;
+               memcpy(chunk->data, state->data, state->total_len);
+               list_add_tail(&chunk->list, &rctx->chunks);
+               spin_lock_bh(&tctx->chunk_lock);
+               list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+               spin_unlock_bh(&tctx->chunk_lock);
+               rctx->total_len = state->total_len;
+       }
+       return 0;
+}
+
+static int cmh_sm4_mac_finup(struct ahash_request *req)
+{
+       int err;
+
+       err = cmh_sm4_mac_update(req);
+       if (err)
+               return err;
+       return cmh_sm4_mac_final(req);
+}
+
+static int cmh_sm4_mac_digest(struct ahash_request *req)
+{
+       int err;
+
+       err = cmh_sm4_mac_init(req);
+       if (err)
+               return err;
+       return cmh_sm4_mac_finup(req);
+}
+
+/* Registration */
+
+static struct cmh_sm4_mac_drv sm4_mac_drv_algs[ARRAY_SIZE(sm4_mac_algs)];
+
+static int cmh_sm4_mac_init_tfm(struct crypto_ahash *tfm)
+{
+       struct cmh_sm4_mac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct ahash_alg *alg = crypto_ahash_alg(tfm);
+       struct cmh_sm4_mac_drv *drv =
+               container_of(alg, struct cmh_sm4_mac_drv, alg);
+
+       memset(tctx, 0, sizeof(*tctx));
+       tctx->sm4_mode = drv->info->sm4_mode;
+       spin_lock_init(&tctx->chunk_lock);
+       INIT_LIST_HEAD(&tctx->all_chunks);
+
+       /* Allocate SW cipher for XCBC empty-input fallback */
+       if (tctx->sm4_mode == SM4_MODE_XCBC) {
+               struct crypto_cipher *ci;
+
+               ci = crypto_alloc_cipher("sm4", 0, 0);
+               if (IS_ERR(ci))
+                       return PTR_ERR(ci);
+               tctx->sw_cipher = ci;
+       }
+
+       crypto_ahash_set_reqsize(tfm, sizeof(struct cmh_sm4_mac_reqctx));
+       return 0;
+}
+
+static void cmh_sm4_mac_exit_tfm(struct crypto_ahash *tfm)
+{
+       struct cmh_sm4_mac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_sm4_mac_chunk *c, *tmp;
+
+       /* Free any orphaned chunks (e.g. testmgr export/reimport poison) */
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(c, tmp, &tctx->all_chunks, tfm_node) {
+               list_del(&c->tfm_node);
+               kfree_sensitive(c);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+
+       if (tctx->sw_cipher)
+               crypto_free_cipher(tctx->sw_cipher);
+       memzero_explicit(tctx->xcbc_k1, sizeof(tctx->xcbc_k1));
+       memzero_explicit(tctx->xcbc_k3, sizeof(tctx->xcbc_k3));
+       cmh_key_destroy(&tctx->key);
+}
+
+/**
+ * cmh_sm4_cmac_register() - Register SM4-CMAC/XCBC hash algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_sm4_cmac_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < ARRAY_SIZE(sm4_mac_algs); i++) {
+               const struct cmh_sm4_mac_alg_info *info = &sm4_mac_algs[i];
+               struct cmh_sm4_mac_drv *drv = &sm4_mac_drv_algs[i];
+               struct ahash_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               memset(alg, 0, sizeof(*alg));
+
+               alg->init       = cmh_sm4_mac_init;
+               alg->update     = cmh_sm4_mac_update;
+               alg->final      = cmh_sm4_mac_final;
+               alg->finup      = cmh_sm4_mac_finup;
+               alg->digest     = cmh_sm4_mac_digest;
+               alg->export     = cmh_sm4_mac_export;
+               alg->import     = cmh_sm4_mac_import;
+               alg->setkey     = cmh_sm4_mac_setkey;
+               alg->init_tfm   = cmh_sm4_mac_init_tfm;
+               alg->exit_tfm   = cmh_sm4_mac_exit_tfm;
+
+               alg->halg.digestsize = SM4_MAC_DIGEST_SIZE;
+               alg->halg.statesize = CMH_SM4_MAC_STATE_SIZE;
+
+               strscpy(alg->halg.base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->halg.base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->halg.base.cra_priority  = 300;
+               alg->halg.base.cra_flags     = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                               CRYPTO_ALG_NO_FALLBACK |
+                                               CRYPTO_ALG_ASYNC |
+                                               CRYPTO_ALG_REQ_VIRT;
+               alg->halg.base.cra_blocksize = SM4_MAC_BLOCK_SIZE;
+               alg->halg.base.cra_ctxsize  = sizeof(struct cmh_sm4_mac_tfm_ctx);
+               alg->halg.base.cra_module   = THIS_MODULE;
+
+               ret = crypto_register_ahash(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh_sm4_mac: failed to register %s (rc=%d)\n",
+                               info->alg_name, ret);
+                       goto err_unregister;
+               }
+
+               dev_dbg(cmh_dev(), "cmh_sm4_mac: registered %s\n",
+                       info->alg_name);
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_ahash(&sm4_mac_drv_algs[i].alg);
+       return ret;
+}
+
+/**
+ * cmh_sm4_cmac_unregister() - Unregister SM4 MAC hash algorithms from the crypto framework
+ */
+void cmh_sm4_cmac_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < ARRAY_SIZE(sm4_mac_algs); i++) {
+               crypto_unregister_ahash(&sm4_mac_drv_algs[i].alg);
+               dev_dbg(cmh_dev(), "cmh_sm4_mac: unregistered %s\n",
+                       sm4_mac_algs[i].alg_name);
+       }
+}
diff --git a/drivers/crypto/cmh/cmh_sm4_skcipher.c b/drivers/crypto/cmh/cmh_sm4_skcipher.c
new file mode 100644
index 000000000000..8cd76cba9235
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_sm4_skcipher.c
@@ -0,0 +1,690 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API SM4 (skcipher) Driver
+ *
+ * Registers skcipher algorithms with the Linux crypto subsystem:
+ *   ecb(sm4), cbc(sm4), ctr(sm4), cfb(sm4), xts(sm4)
+ *
+ * Uses the CMH SM4 Core via VCQ commands:
+ *   [SYS_CMD_WRITE] + SM4_CMD_INIT + SM4_CMD_FINAL + VCQ_CMD_FLUSH
+ *
+ * The SM4 core requires bidirectional DMA -- both input and output
+ * buffers are mapped and passed in a single SM4_CMD_FINAL command.
+ *
+ * Raw-key atomicity: SYS_CMD_WRITE to SYS_REF_TEMP is packed into
+ * the same VCQ as SM4 commands (see cmh_key.h for details).
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/algapi.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/unaligned.h>
+
+#include "cmh_sm4.h"
+#include "cmh_vcq.h"
+#include "cmh_sm4_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/* Algorithm Table */
+
+struct cmh_sm4_alg_info {
+       u32         sm4_mode;   /* SM4_MODE_* */
+       u32         ivsize;     /* bytes (0 for ECB) */
+       u32         min_keysize;
+       u32         max_keysize;
+       const char *alg_name;   /* Linux crypto name: "ecb(sm4)" */
+       const char *drv_name;   /* driver name: "cri-cmh-ecb-sm4" */
+};
+
+static const struct cmh_sm4_alg_info sm4_algs[] = {
+       { SM4_MODE_ECB, 0,               CMH_SM4_KEY_SIZE, CMH_SM4_KEY_SIZE,
+         "ecb(sm4)", "cri-cmh-ecb-sm4" },
+       { SM4_MODE_CBC, CMH_SM4_IV_SIZE, CMH_SM4_KEY_SIZE, CMH_SM4_KEY_SIZE,
+         "cbc(sm4)", "cri-cmh-cbc-sm4" },
+       { SM4_MODE_CTR, CMH_SM4_IV_SIZE, CMH_SM4_KEY_SIZE, CMH_SM4_KEY_SIZE,
+         "ctr(sm4)", "cri-cmh-ctr-sm4" },
+       { SM4_MODE_CFB, CMH_SM4_IV_SIZE, CMH_SM4_KEY_SIZE, CMH_SM4_KEY_SIZE,
+         "cfb(sm4)", "cri-cmh-cfb-sm4" },
+       { SM4_MODE_XTS, CMH_SM4_IV_SIZE, CMH_SM4_KEY_SIZE * 2,
+                                        CMH_SM4_KEY_SIZE * 2,
+         "xts(sm4)", "cri-cmh-xts-sm4" },
+};
+
+/* Per-transform context (allocated by crypto framework) */
+
+struct cmh_sm4_tfm_ctx {
+       struct cmh_key_ctx key;
+};
+
+/* Per-request context (lives in skcipher_request::__ctx) */
+
+/*
+ * Maximum payload commands:
+ *   [SYS_CMD_WRITE] + SM4_CMD_INIT + [SM4_CMD_UPDATE] + SM4_CMD_FINAL
+ *   + VCQ_CMD_FLUSH = 5
+ * UPDATE is used for XTS data > 2 blocks (see cmh_sm4_crypt).
+ */
+#define CMH_SM4_MAX_PAYLOAD    5
+#define CMH_SM4_MAX_PACKED     (CMH_SM4_MAX_PAYLOAD * 2)
+
+struct cmh_sm4_reqctx {
+       dma_addr_t in_dma;
+       dma_addr_t out_dma;
+       dma_addr_t iv_dma;
+       dma_addr_t iv2_dma;
+       dma_addr_t key_dma;
+       u8 *in_buf;
+       u8 *out_buf;
+       u8 *iv_buf;
+       u8 *iv2_buf;
+       u32 cryptlen;
+       u32 ivsize;
+       u32 keylen;
+       u32 sm4_mode;
+       u32 sm4_op;
+       /* CTR counter-wrap split state */
+       u32 ctr_chunk1_len;
+       u32 core_id;
+       s32 target_mbx;
+       u64 key_ref;
+       struct vcq_cmd packed[CMH_SM4_MAX_PACKED];
+};
+
+/* VCQ Builders -- SM4-specific */
+
+static void vcq_add_sm4_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref, u64 iv_dma,
+                            u32 keylen, u32 ivlen, u32 mode, u32 op,
+                            u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_INIT);
+       slot->hwc.sm4.cmd_init.key = key_ref;
+       slot->hwc.sm4.cmd_init.iv = iv_dma;
+       slot->hwc.sm4.cmd_init.keylen = keylen;
+       slot->hwc.sm4.cmd_init.ivlen = ivlen;
+       slot->hwc.sm4.cmd_init.mode = mode;
+       slot->hwc.sm4.cmd_init.op = op;
+       slot->hwc.sm4.cmd_init.aadlen = 0;
+       slot->hwc.sm4.cmd_init.iolen = iolen;
+}
+
+static void vcq_add_sm4_update(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                              u64 output_dma, u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_UPDATE);
+       slot->hwc.sm4.cmd_update.input = input_dma;
+       slot->hwc.sm4.cmd_update.output = output_dma;
+       slot->hwc.sm4.cmd_update.iolen = iolen;
+}
+
+static void vcq_add_sm4_final(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                             u64 output_dma, u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM4_CMD_FINAL);
+       slot->hwc.sm4.cmd_final.input = input_dma;
+       slot->hwc.sm4.cmd_final.output = output_dma;
+       slot->hwc.sm4.cmd_final.iolen = iolen;
+       slot->hwc.sm4.cmd_final.tag = 0;
+       slot->hwc.sm4.cmd_final.taglen = 0;
+}
+
+/*
+ * We wrap each skcipher_alg with its info pointer in a compound struct,
+ * then use container_of() in cmh_sm4_get_info() to recover it.
+ */
+struct cmh_sm4_alg_drv {
+       struct skcipher_alg              alg;
+       const struct cmh_sm4_alg_info   *info;
+};
+
+static bool sm4_is_stream_mode(u32 mode)
+{
+       return mode == SM4_MODE_CTR || mode == SM4_MODE_CFB;
+}
+
+/*
+ * Update req->iv after a successful encrypt/decrypt.
+ * Same semantics as cmh_aes_update_iv -- see cmh_aes.c.
+ */
+static void cmh_sm4_update_iv(struct skcipher_request *req, u32 mode,
+                             u32 op, const u8 *in_buf, const u8 *out_buf)
+{
+       u32 bs = CMH_SM4_BLOCK_SIZE;
+       u32 nblocks;
+
+       switch (mode) {
+       case SM4_MODE_CBC:
+               if (op == SM4_OP_ENCRYPT)
+                       memcpy(req->iv, out_buf + req->cryptlen - bs, bs);
+               else
+                       memcpy(req->iv, in_buf + req->cryptlen - bs, bs);
+               break;
+       case SM4_MODE_CTR:
+               /* Arithmetic big-endian 128-bit counter increment */
+               nblocks = DIV_ROUND_UP(req->cryptlen, bs);
+               {
+                       u8 *iv = req->iv;
+                       int i;
+
+                       for (i = bs - 1; i >= 0 && nblocks; i--) {
+                               u32 sum = (u32)iv[i] + (nblocks & 0xff);
+
+                               iv[i] = (u8)sum;
+                               nblocks = (nblocks >> 8) + (sum >> 8);
+                       }
+               }
+               break;
+       case SM4_MODE_CFB:
+               /*
+                * For sub-block requests (cryptlen < 16), there is no
+                * complete ciphertext block to chain, so the IV is left
+                * unchanged -- CFB-128 has no defined chaining semantic
+                * for partial blocks (shift-register CFB-n is a different
+                * mode).  Without this guard the pointer arithmetic
+                * underflows and reads before the buffer.
+                */
+               if (req->cryptlen >= bs) {
+                       if (op == SM4_OP_ENCRYPT)
+                               memcpy(req->iv, out_buf + req->cryptlen - bs,
+                                      bs);
+                       else
+                               memcpy(req->iv, in_buf + req->cryptlen - bs,
+                                      bs);
+               }
+               break;
+       default:
+               break;
+       }
+}
+
+/* skcipher Operations */
+
+static const struct cmh_sm4_alg_info *
+cmh_sm4_get_info(struct crypto_skcipher *tfm)
+{
+       struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+
+       return container_of(alg, struct cmh_sm4_alg_drv, alg)->info;
+}
+
+static int cmh_sm4_setkey(struct crypto_skcipher *tfm, const u8 *key,
+                         unsigned int keylen)
+{
+       struct cmh_sm4_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       const struct cmh_sm4_alg_info *info = cmh_sm4_get_info(tfm);
+
+       if (info->sm4_mode == SM4_MODE_XTS) {
+               int err;
+
+               /* XTS: double key (32 bytes) */
+               if (keylen != CMH_SM4_KEY_SIZE * 2)
+                       return -EINVAL;
+               err = xts_verify_key(tfm, key, keylen);
+               if (err)
+                       return err;
+       } else {
+               /* SM4 always uses 128-bit (16-byte) keys */
+               if (keylen != CMH_SM4_KEY_SIZE)
+                       return -EINVAL;
+       }
+
+       return cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_SM4);
+}
+
+static int cmh_sm4_init_tfm(struct crypto_skcipher *tfm)
+{
+       struct cmh_sm4_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       crypto_skcipher_set_reqsize(tfm, sizeof(struct cmh_sm4_reqctx));
+       return 0;
+}
+
+static void cmh_sm4_exit_tfm(struct crypto_skcipher *tfm)
+{
+       struct cmh_sm4_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+
+       cmh_key_destroy(&tctx->key);
+}
+
+#define CMH_SM4_MAX_CRYPTLEN   SZ_32M
+
+/* DMA unmap helper */
+static void cmh_sm4_unmap_dma(struct cmh_sm4_reqctx *rctx)
+{
+       if (rctx->iv2_buf)
+               cmh_dma_unmap_single(rctx->iv2_dma, rctx->ivsize,
+                                    DMA_TO_DEVICE);
+       if (rctx->ivsize > 0)
+               cmh_dma_unmap_single(rctx->iv_dma, rctx->ivsize,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->out_dma, rctx->cryptlen, DMA_FROM_DEVICE);
+       cmh_dma_unmap_single(rctx->in_dma, rctx->cryptlen, DMA_TO_DEVICE);
+}
+
+static void cmh_sm4_free_bufs(struct cmh_sm4_reqctx *rctx)
+{
+       kfree(rctx->iv2_buf);
+       rctx->iv2_buf = NULL;
+       kfree(rctx->iv_buf);
+       rctx->iv_buf = NULL;
+       kfree_sensitive(rctx->out_buf);
+       rctx->out_buf = NULL;
+       kfree_sensitive(rctx->in_buf);
+       rctx->in_buf = NULL;
+}
+
+/*
+ * Submit the second CTR chunk after the first completes.
+ * Called from cmh_sm4_complete when ctr_chunk1_len > 0.
+ */
+static int cmh_sm4_ctr_submit_chunk2(struct skcipher_request *req);
+
+static void cmh_sm4_complete(void *data, int error)
+{
+       struct skcipher_request *req = data;
+       struct cmh_sm4_reqctx *rctx = skcipher_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       /*
+        * CTR counter-wrap: first chunk completed, submit second.
+        * DMA mappings remain valid (they cover the full buffer).
+        *
+        * Recursion depth bounded: chunk2 clears ctr_chunk1_len before
+        * submission, so the second cmh_sm4_complete invocation sees 0
+        * and finalizes (max depth = 2).
+        */
+       if (rctx->ctr_chunk1_len && !error) {
+               int ret = cmh_sm4_ctr_submit_chunk2(req);
+
+               if (!ret || ret == -EBUSY)
+                       return;
+               /* Submission failed; clean up below */
+               error = ret;
+       }
+
+       cmh_sm4_unmap_dma(rctx);
+
+       if (!error) {
+               scatterwalk_map_and_copy(rctx->out_buf, req->dst,
+                                        0, rctx->cryptlen, 1);
+               cmh_sm4_update_iv(req, rctx->sm4_mode, rctx->sm4_op,
+                                 rctx->in_buf, rctx->out_buf);
+       }
+
+       cmh_sm4_free_bufs(rctx);
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_sm4_ctr_submit_chunk2(struct skcipher_request *req)
+{
+       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+       struct cmh_sm4_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       struct cmh_sm4_reqctx *rctx = skcipher_request_ctx(req);
+       struct vcq_cmd cmds[CMH_SM4_MAX_PAYLOAD];
+       u32 chunk1 = rctx->ctr_chunk1_len;
+       u32 chunk2 = rctx->cryptlen - chunk1;
+       u64 key_ref;
+       u32 keylen;
+       u32 idx = 0;
+
+       /* Clear split flag so next completion is final */
+       rctx->ctr_chunk1_len = 0;
+
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+
+       vcq_add_sm4_init(&cmds[idx++], rctx->core_id, key_ref,
+                        (u64)rctx->iv2_dma, keylen, rctx->ivsize,
+                        rctx->sm4_mode, rctx->sm4_op, chunk2);
+       vcq_add_sm4_final(&cmds[idx++], rctx->core_id,
+                         (u64)(rctx->in_dma + chunk1),
+                         (u64)(rctx->out_dma + chunk1), chunk2);
+       vcq_add_flush(&cmds[idx++], rctx->core_id);
+
+       return cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                            CMH_SM4_MAX_PACKED,
+                                            rctx->target_mbx,
+                                            cmh_sm4_complete, req,
+                                            !!(req->base.flags &
+                                               CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                            cmh_tm_async_timeout_jiffies());
+}
+
+static int cmh_sm4_crypt(struct skcipher_request *req, u32 sm4_op)
+{
+       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+       struct cmh_sm4_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       const struct cmh_sm4_alg_info *info = cmh_sm4_get_info(tfm);
+       struct cmh_sm4_reqctx *rctx = skcipher_request_ctx(req);
+       struct vcq_cmd cmds[CMH_SM4_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE)
+               return -ENOKEY;
+
+       if (!req->cryptlen)
+               return 0;
+
+       if (req->cryptlen > CMH_SM4_MAX_CRYPTLEN)
+               return -EINVAL;
+
+       switch (info->sm4_mode) {
+       case SM4_MODE_CTR:
+       case SM4_MODE_CFB:
+               break;
+       case SM4_MODE_XTS:
+               if (req->cryptlen < CMH_SM4_BLOCK_SIZE)
+                       return -EINVAL;
+               break;
+       default:
+               if (req->cryptlen & (CMH_SM4_BLOCK_SIZE - 1))
+                       return -EINVAL;
+               break;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = req->cryptlen;
+       rctx->ivsize = info->ivsize;
+       rctx->sm4_mode = info->sm4_mode;
+       rctx->sm4_op = sm4_op;
+       rctx->iv2_buf = NULL;
+
+       rctx->in_buf = kmalloc(req->cryptlen, gfp);
+       if (!rctx->in_buf)
+               return -ENOMEM;
+
+       scatterwalk_map_and_copy(rctx->in_buf, req->src, 0, req->cryptlen, 0);
+
+       rctx->in_dma = cmh_dma_map_single(rctx->in_buf, req->cryptlen,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->in_dma)) {
+               ret = -ENOMEM;
+               goto out_free_in;
+       }
+
+       rctx->out_buf = kmalloc(req->cryptlen, gfp);
+       if (!rctx->out_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_in;
+       }
+
+       rctx->out_dma = cmh_dma_map_single(rctx->out_buf, req->cryptlen,
+                                          DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->out_dma)) {
+               ret = -ENOMEM;
+               goto out_free_out;
+       }
+
+       if (info->ivsize > 0) {
+               rctx->iv_buf = kmemdup(req->iv, info->ivsize, gfp);
+               if (!rctx->iv_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_out;
+               }
+               rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf, info->ivsize,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->iv_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_iv;
+               }
+       }
+
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_SM4);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /*
+        * iolen in INIT: passed for all modes.  The EIP-40 eSW ignores
+        * it for CTR (stream cipher), but uses it for XTS/CBC/ECB to
+        * know the total data length.  Pass cryptlen unconditionally.
+        */
+       vcq_add_sm4_init(&cmds[idx++], core_id, key_ref, (u64)rctx->iv_dma,
+                        keylen, info->ivsize, info->sm4_mode, sm4_op,
+                        req->cryptlen);
+
+       if (info->sm4_mode == SM4_MODE_XTS &&
+           req->cryptlen > 2 * CMH_SM4_BLOCK_SIZE) {
+               u32 final_len, update_len;
+
+               if (req->cryptlen & (CMH_SM4_BLOCK_SIZE - 1))
+                       final_len = CMH_SM4_BLOCK_SIZE +
+                                   (req->cryptlen & (CMH_SM4_BLOCK_SIZE - 1));
+               else
+                       final_len = 2 * CMH_SM4_BLOCK_SIZE;
+
+               update_len = req->cryptlen - final_len;
+
+               vcq_add_sm4_update(&cmds[idx++], core_id,
+                                  (u64)rctx->in_dma,
+                                  (u64)rctx->out_dma, update_len);
+               vcq_add_sm4_final(&cmds[idx++], core_id,
+                                 (u64)(rctx->in_dma + update_len),
+                                 (u64)(rctx->out_dma + update_len),
+                                 final_len);
+       } else if (info->sm4_mode == SM4_MODE_CTR) {
+               /*
+                * CTR counter-wrap: split at the 64-bit boundary,
+                * consistent with the AES-SCA driver.  The completion
+                * callback submits chunk2 with IV = {upper64+1, 0}.
+                */
+               u64 lower64 = get_unaligned_be64(rctx->iv_buf + 8);
+               u32 nblocks = DIV_ROUND_UP(req->cryptlen,
+                                         CMH_SM4_BLOCK_SIZE);
+               u64 bwrap = lower64 ? (~lower64 + 1ULL) : U64_MAX;
+
+               if (nblocks > bwrap) {
+                       u32 chunk1 = (u32)bwrap * CMH_SM4_BLOCK_SIZE;
+                       u64 upper64;
+
+                       /* Prepare second IV for chained submission */
+                       rctx->iv2_buf = kmalloc(info->ivsize, gfp);
+                       if (!rctx->iv2_buf) {
+                               ret = -ENOMEM;
+                               goto out_unmap_iv;
+                       }
+                       upper64 = get_unaligned_be64(rctx->iv_buf);
+                       put_unaligned_be64(upper64 + 1, rctx->iv2_buf);
+                       put_unaligned_be64(0, rctx->iv2_buf + 8);
+
+                       rctx->iv2_dma =
+                               cmh_dma_map_single(rctx->iv2_buf,
+                                                  info->ivsize,
+                                                  DMA_TO_DEVICE);
+                       if (cmh_dma_map_error(rctx->iv2_dma)) {
+                               ret = -ENOMEM;
+                               goto out_free_iv2;
+                       }
+
+                       /* Store state for the chained second submission */
+                       rctx->ctr_chunk1_len = chunk1;
+                       rctx->core_id = core_id;
+                       rctx->target_mbx = target_mbx;
+                       rctx->key_ref = key_ref;
+
+                       /* First transaction: only chunk1 */
+                       vcq_add_sm4_final(&cmds[idx++], core_id,
+                                         (u64)rctx->in_dma,
+                                         (u64)rctx->out_dma, chunk1);
+               } else {
+                       /* No wrap: single FINAL with all data */
+                       vcq_add_sm4_final(&cmds[idx++], core_id,
+                                         (u64)rctx->in_dma,
+                                         (u64)rctx->out_dma,
+                                         req->cryptlen);
+               }
+       } else {
+               vcq_add_sm4_final(&cmds[idx++], core_id,
+                                 (u64)rctx->in_dma,
+                                 (u64)rctx->out_dma, req->cryptlen);
+       }
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_SM4_MAX_PACKED, target_mbx,
+                                           cmh_sm4_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       if (rctx->iv2_buf) {
+               cmh_dma_unmap_single(rctx->iv2_dma, info->ivsize,
+                                    DMA_TO_DEVICE);
+       }
+out_free_iv2:
+       kfree(rctx->iv2_buf);
+out_unmap_iv:
+       if (info->ivsize > 0)
+               cmh_dma_unmap_single(rctx->iv_dma, info->ivsize,
+                                    DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+out_unmap_out:
+       cmh_dma_unmap_single(rctx->out_dma, req->cryptlen, DMA_FROM_DEVICE);
+out_free_out:
+       kfree_sensitive(rctx->out_buf);
+out_unmap_in:
+       cmh_dma_unmap_single(rctx->in_dma, req->cryptlen, DMA_TO_DEVICE);
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+       return ret;
+}
+
+static int cmh_sm4_encrypt(struct skcipher_request *req)
+{
+       return cmh_sm4_crypt(req, SM4_OP_ENCRYPT);
+}
+
+static int cmh_sm4_decrypt(struct skcipher_request *req)
+{
+       return cmh_sm4_crypt(req, SM4_OP_DECRYPT);
+}
+
+/* Registration */
+
+static struct cmh_sm4_alg_drv sm4_drv_algs[ARRAY_SIZE(sm4_algs)];
+
+/**
+ * cmh_sm4_register() - Register SM4-CBC/CTR/ECB/XTS skcipher algorithms
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_sm4_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < ARRAY_SIZE(sm4_algs); i++) {
+               const struct cmh_sm4_alg_info *info = &sm4_algs[i];
+               struct cmh_sm4_alg_drv *drv = &sm4_drv_algs[i];
+               struct skcipher_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               memset(alg, 0, sizeof(*alg));
+
+               alg->setkey      = cmh_sm4_setkey;
+               alg->encrypt     = cmh_sm4_encrypt;
+               alg->decrypt     = cmh_sm4_decrypt;
+               alg->init        = cmh_sm4_init_tfm;
+               alg->exit        = cmh_sm4_exit_tfm;
+               alg->min_keysize = info->min_keysize;
+               alg->max_keysize = info->max_keysize;
+               alg->ivsize      = info->ivsize;
+
+               strscpy(alg->base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->base.cra_priority  = 300;
+               alg->base.cra_flags     = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                         CRYPTO_ALG_ASYNC;
+               alg->base.cra_blocksize = sm4_is_stream_mode(info->sm4_mode)
+                                         ? 1 : CMH_SM4_BLOCK_SIZE;
+               alg->base.cra_ctxsize  = sizeof(struct cmh_sm4_tfm_ctx);
+               alg->base.cra_module   = THIS_MODULE;
+
+               ret = crypto_register_skcipher(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh_sm4: failed to register %s (rc=%d)\n",
+                               info->alg_name, ret);
+                       goto err_unregister;
+               }
+
+               dev_dbg(cmh_dev(), "cmh_sm4: registered %s\n", info->alg_name);
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_skcipher(&sm4_drv_algs[i].alg);
+       return ret;
+}
+
+/**
+ * cmh_sm4_unregister() - Unregister SM4 skcipher algorithms from the crypto framework
+ */
+void cmh_sm4_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < ARRAY_SIZE(sm4_algs); i++) {
+               crypto_unregister_skcipher(&sm4_drv_algs[i].alg);
+               dev_dbg(cmh_dev(), "cmh_sm4: unregistered %s\n", sm4_algs[i].alg_name);
+       }
+}
diff --git a/drivers/crypto/cmh/include/cmh_sm4.h b/drivers/crypto/cmh/include/cmh_sm4.h
new file mode 100644
index 000000000000..9f4b0fb918db
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_sm4.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SM4 Crypto API Drivers
+ *
+ * Registers SM4 algorithms with the Linux crypto subsystem:
+ *   skcipher: ecb/cbc/ctr/cfb/xts(sm4)
+ *   aead:     gcm/ccm(sm4)
+ *   shash:    cmac/xcbc(sm4)
+ */
+
+#ifndef CMH_SM4_H
+#define CMH_SM4_H
+
+int  cmh_sm4_register(void);
+void cmh_sm4_unregister(void);
+
+int  cmh_sm4_aead_register(void);
+void cmh_sm4_aead_unregister(void);
+
+int  cmh_sm4_cmac_register(void);
+void cmh_sm4_cmac_unregister(void);
+
+#endif /* CMH_SM4_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 06/19] crypto: cmh - add CSHAKE/KMAC ahash
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register ahash algorithms for cSHAKE128, cSHAKE256, KMAC128, and
KMAC256 using the CMH hash core.  cSHAKE supports incremental
update and export/import.  KMAC has a 64KB data cap imposed by the
hardware.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile             |   4 +-
 drivers/crypto/cmh/cmh_cshake.c         | 808 ++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_kmac.c           | 630 ++++++++++++++++++
 drivers/crypto/cmh/cmh_main.c           |  18 +
 drivers/crypto/cmh/include/cmh_cshake.h |  16 +
 drivers/crypto/cmh/include/cmh_kmac.h   |  16 +
 6 files changed, 1491 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_cshake.c
 create mode 100644 drivers/crypto/cmh/cmh_kmac.c
 create mode 100644 drivers/crypto/cmh/include/cmh_cshake.h
 create mode 100644 drivers/crypto/cmh/include/cmh_kmac.h

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 1f760c0214ef..2bb240b97f31 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -16,7 +16,9 @@ cmh-y := \
        cmh_key.o \
        cmh_sys.o \
        cmh_hash.o \
-       cmh_hmac.o
+       cmh_hmac.o \
+       cmh_cshake.o \
+       cmh_kmac.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_cshake.c b/drivers/crypto/cmh/cmh_cshake.c
new file mode 100644
index 000000000000..02f9b853dd33
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_cshake.c
@@ -0,0 +1,808 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API CSHAKE Driver
+ *
+ * Registers cSHAKE-128 and cSHAKE-256 as ahash algorithms using the
+ * CMH Hash Core (HC) via HC_CMD_CSHAKE.
+ *
+ * CSHAKE (NIST SP 800-185) extends SHAKE with two domain separation
+ * parameters: function name N and customization string S.  When both
+ * are empty, cSHAKE reduces to plain SHAKE -- the driver falls back to
+ * HC_CMD_INIT in that case (per SP 800-185 S6.2).
+ *
+ * N and S are set via .setkey() using a self-describing binary header
+ * (matching the upstream authenc precedent):
+ *
+ *   struct cshake_cfg { __be32 n_len; __be32 s_len; };
+ *   setkey blob: cshake_cfg || N[n_len] || S[s_len]
+ *
+ * If .setkey() is never called, the driver defaults to plain SHAKE
+ * (N="" S="").  .setkey() is per-tfm, not per-request.
+ *
+ * N is embedded inline in the HC_CMD_CSHAKE struct (max 36 bytes).
+ * S is passed as VCQ inline data following the command slot (multi-span).
+ *
+ * Uses the same self-contained transaction model as cmh_hash.c:
+ *   .init()   -> software-only
+ *   .update() -> software-only (accumulate chunks)
+ *   .final()  -> CSHAKE [+ inline S] [+ RESTORE] [+ GATHER] + FINAL + FLUSH
+ *   .export() -> CSHAKE [+ inline S] [+ RESTORE] [+ GATHER] + SAVE + FLUSH
+ *   .import() -> restore HC context checkpoint (software-only)
+ *
+ * The HC core supports HC_CMD_SAVE / HC_CMD_RESTORE for cSHAKE mode.
+ * The cSHAKE domain-separation prefix (function name N, customization
+ * string S) is absorbed into the Keccak sponge state by HC_CMD_CSHAKE
+ * on the first submission, and preserved through save/restore.
+ * Export/import enables crypto API transform cloning.
+ *
+ * .setkey() here configures public domain-separation parameters (N, S),
+ * not a secret key.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/hash.h>
+#include <linux/scatterlist.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <asm/byteorder.h>
+
+#include "cmh_cshake.h"
+#include "cmh_vcq.h"
+#include "cmh_hc_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+
+/* Algorithm Table */
+
+struct cmh_cshake_alg_info {
+       u32         hc_algo;
+       u32         digest_size;
+       const char *alg_name;
+       const char *drv_name;
+};
+
+static const struct cmh_cshake_alg_info cmh_cshake_algs_info[] = {
+       {
+               .hc_algo     = HC_ALGO_SHAKE128,
+               .digest_size = CMH_SHAKE128_DIGEST_SIZE,
+               .alg_name    = "cshake128",
+               .drv_name    = "cri-cmh-cshake128",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHAKE256,
+               .digest_size = CMH_SHAKE256_DIGEST_SIZE,
+               .alg_name    = "cshake256",
+               .drv_name    = "cri-cmh-cshake256",
+       },
+};
+
+#define CMH_CSHAKE_ALG_COUNT  ARRAY_SIZE(cmh_cshake_algs_info)
+
+/* Per-Request State */
+
+struct cmh_cshake_chunk {
+       struct list_head  list;
+       struct list_head  tfm_node; /* per-tfm orphan tracking */
+       u32               len;
+       u8                data[];
+};
+
+/*
+ * Max payload slots for CSHAKE:
+ *   CSHAKE (1) + inline S (ceil(S_len/64)) + GATHER (1) + FINAL (1) + FLUSH (1)
+ * S can be up to SHAKE-128 block (168 bytes) = 3 inline slots.
+ * Conservative: 1 + 3 + 1 + 1 + 1 = 7, plus headers.
+ *
+ * Or INIT + GATHER + FINAL + FLUSH = 4 (plain SHAKE fallback).
+ */
+#define CMH_CSHAKE_MAX_PAYLOAD   8
+#define CMH_CSHAKE_MAX_PACKED    (CMH_CSHAKE_MAX_PAYLOAD * 2)
+
+/*
+ * Checkpoint embedded inline: the kernel ahash API has no per-request
+ * destructor, so a heap-allocated checkpoint leaks if a request is
+ * abandoned without .final().
+ */
+struct cmh_cshake_reqctx {
+       const struct cmh_cshake_alg_info *info;
+       int                               error;
+       struct list_head                  chunks;
+       u32                               num_chunks;
+       u32                               total_len;
+       u32                               has_checkpoint;
+       u8                                checkpoint[HC_CONTEXT_SIZE];
+       /* DMA state for async final */
+       dma_addr_t                        digest_dma;
+       dma_addr_t                        ckpt_dma;
+       u8                               *digest_buf;
+       struct cmh_sg_map                *sgm;
+       struct vcq_cmd packed[CMH_CSHAKE_MAX_PACKED];
+};
+
+/* Per-Transform State (carries N and S across requests) */
+
+struct cmh_cshake_tfm_ctx {
+       u8  *func_name;     /* N (function name), NULL if empty */
+       u32  func_name_len;
+       u8  *custom;        /* S (customization string), NULL if empty */
+       u32  custom_len;
+       spinlock_t         chunk_lock;  /* protects all_chunks */
+       struct list_head   all_chunks;  /* orphan-safe chunk tracking */
+};
+
+/* VCQ Builders */
+
+/* VCQ Builders (cSHAKE-specific; shared builders in cmh_hc_abi.h / cmh_vcq.h) */
+
+static void vcq_add_hc_save(struct vcq_cmd *slot, u32 core_id,
+                           u64 output_phys, u32 outlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_SAVE);
+       slot->hwc.hc.cmd_save.output = output_phys;
+       slot->hwc.hc.cmd_save.outlen = outlen;
+}
+
+static void vcq_add_hc_restore(struct vcq_cmd *slot, u32 core_id,
+                              u64 input_phys, u32 inlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_RESTORE);
+       slot->hwc.hc.cmd_restore.input = input_phys;
+       slot->hwc.hc.cmd_restore.inlen = inlen;
+}
+
+static void vcq_add_hc_cshake(struct vcq_cmd *slot, u32 core_id, u32 algo,
+                             const u8 *name, u32 namelen,
+                             u32 customlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_CSHAKE);
+       slot->hwc.hc.cmd_cshake.custom = 0;  /* inline -- CMH eSW reads from next slot(s) */
+       slot->hwc.hc.cmd_cshake.customlen = customlen;
+       slot->hwc.hc.cmd_cshake.algo = algo;
+       slot->hwc.hc.cmd_cshake.namelen = namelen;
+       if (namelen > 0 && name)
+               memcpy(slot->hwc.hc.cmd_cshake.name, name,
+                      min_t(u32, namelen, HC_CSHAKE_MAX_NAMELEN));
+}
+
+/* Request Context Cleanup */
+
+static void cmh_cshake_free_chunks(struct cmh_cshake_reqctx *rctx,
+                                  struct cmh_cshake_tfm_ctx *tctx)
+{
+       struct cmh_cshake_chunk *chunk, *tmp;
+
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(chunk, tmp, &rctx->chunks, list) {
+               list_del(&chunk->list);
+               list_del(&chunk->tfm_node);
+               kfree(chunk);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+}
+
+static void cmh_cshake_free_reqctx(struct cmh_cshake_reqctx *rctx,
+                                  struct cmh_cshake_tfm_ctx *tctx)
+{
+       cmh_cshake_free_chunks(rctx, tctx);
+       rctx->has_checkpoint = 0;
+}
+
+static struct cmh_sg_map *
+cmh_cshake_build_sg(struct cmh_cshake_reqctx *rctx, gfp_t gfp)
+{
+       struct cmh_dma_buf *bufs;
+       struct cmh_cshake_chunk *chunk;
+       struct cmh_sg_map *sgm;
+       u32 i;
+
+       bufs = kcalloc(rctx->num_chunks, sizeof(*bufs), gfp);
+       if (!bufs)
+               return NULL;
+
+       i = 0;
+       list_for_each_entry(chunk, &rctx->chunks, list) {
+               bufs[i].data = chunk->data;
+               bufs[i].len = chunk->len;
+               i++;
+       }
+
+       sgm = cmh_dma_build_sg(bufs, rctx->num_chunks, gfp);
+       kfree(bufs);
+       return sgm;
+}
+
+/* VCQ Packing + Submit */
+
+/* ahash Operations */
+
+struct cmh_cshake_alg_drv {
+       struct ahash_alg                   alg;
+       const struct cmh_cshake_alg_info  *info;
+};
+
+static const struct cmh_cshake_alg_info *
+cmh_cshake_get_info(struct crypto_ahash *tfm)
+{
+       struct ahash_alg *alg = crypto_ahash_alg(tfm);
+
+       return container_of(alg, struct cmh_cshake_alg_drv, alg)->info;
+}
+
+/*
+ * .setkey() -- parse N and S from the self-describing cshake_cfg header.
+ *
+ * Blob format: cshake_cfg { __be32 n_len; __be32 s_len; } || N || S
+ * If never called, the driver defaults to plain SHAKE (N="" S="").
+ */
+struct cshake_cfg {
+       __be32 n_len;
+       __be32 s_len;
+};
+
+static int cmh_cshake_setkey(struct crypto_ahash *tfm, const u8 *key,
+                            unsigned int keylen)
+{
+       struct cmh_cshake_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cshake_cfg cfg;
+       u32 n_len, s_len;
+       const u8 *ptr;
+
+       if (keylen < sizeof(cfg))
+               return -EINVAL;
+
+       memcpy(&cfg, key, sizeof(cfg));
+       n_len = be32_to_cpu(cfg.n_len);
+       s_len = be32_to_cpu(cfg.s_len);
+
+       if (keylen != sizeof(cfg) + n_len + s_len)
+               return -EINVAL;
+
+       if (n_len > HC_CSHAKE_MAX_NAMELEN)
+               return -EINVAL;
+
+       if (s_len > HC_CSHAKE_MAX_CUSTOMLEN)
+               return -EINVAL;
+
+       /* Free previous N and S */
+       kfree(tctx->func_name);
+       kfree(tctx->custom);
+       tctx->func_name = NULL;
+       tctx->func_name_len = 0;
+       tctx->custom = NULL;
+       tctx->custom_len = 0;
+
+       ptr = key + sizeof(cfg);
+
+       if (n_len > 0) {
+               tctx->func_name = kmemdup(ptr, n_len, GFP_KERNEL);
+               if (!tctx->func_name)
+                       return -ENOMEM;
+               tctx->func_name_len = n_len;
+               ptr += n_len;
+       }
+
+       if (s_len > 0) {
+               tctx->custom = kmemdup(ptr, s_len, GFP_KERNEL);
+               if (!tctx->custom) {
+                       kfree(tctx->func_name);
+                       tctx->func_name = NULL;
+                       tctx->func_name_len = 0;
+                       return -ENOMEM;
+               }
+               tctx->custom_len = s_len;
+       }
+
+       return 0;
+}
+
+static int cmh_cshake_init(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_cshake_reqctx *rctx = ahash_request_ctx(req);
+
+       rctx->info = cmh_cshake_get_info(tfm);
+       rctx->error = 0;
+       INIT_LIST_HEAD(&rctx->chunks);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+       rctx->has_checkpoint = 0;
+
+       return 0;
+}
+
+static int cmh_cshake_update(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_cshake_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_cshake_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_cshake_chunk *chunk;
+       int nents;
+
+       if (rctx->error)
+               return rctx->error;
+
+       if (!req->nbytes)
+               return 0;
+
+       chunk = kmalloc(sizeof(*chunk) + req->nbytes,
+                       req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                       GFP_KERNEL : GFP_ATOMIC);
+       if (!chunk) {
+               rctx->error = -ENOMEM;
+               goto err_free_chunks;
+       }
+
+       chunk->len = req->nbytes;
+       if (req->base.flags & CRYPTO_AHASH_REQ_VIRT) {
+               memcpy(chunk->data, req->svirt, req->nbytes);
+       } else {
+               nents = sg_nents_for_len(req->src, req->nbytes);
+               if (nents < 0 ||
+                   sg_copy_to_buffer(req->src, nents,
+                                     chunk->data, req->nbytes) != req->nbytes) {
+                       kfree(chunk);
+                       rctx->error = -EINVAL;
+                       goto err_free_chunks;
+               }
+       }
+
+       list_add_tail(&chunk->list, &rctx->chunks);
+       spin_lock_bh(&tctx->chunk_lock);
+       list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->num_chunks++;
+       rctx->total_len += req->nbytes;
+
+       return 0;
+
+err_free_chunks:
+       /*
+        * Terminal error -- free all previously accumulated chunks.
+        * The crypto API hash path does not call .final() on error,
+        * so chunks would be orphaned otherwise.
+        */
+       cmh_cshake_free_chunks(rctx, tctx);
+       return rctx->error;
+}
+
+static void cmh_cshake_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_cshake_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_cshake_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, HC_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->digest_dma, rctx->info->digest_size,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->digest_buf,
+                      rctx->info->digest_size);
+
+       kfree(rctx->digest_buf);
+       rctx->digest_buf = NULL;
+       cmh_dma_free_sg(rctx->sgm);
+       rctx->sgm = NULL;
+       cmh_cshake_free_reqctx(rctx, tctx);
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_cshake_final(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_cshake_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_cshake_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_cshake_alg_info *info = rctx->info;
+       struct core_dispatch d;
+       struct vcq_cmd cmds[CMH_CSHAKE_MAX_PAYLOAD];
+       struct cmh_sg_map *sgm = NULL;
+       dma_addr_t digest_dma = DMA_MAPPING_ERROR;
+       dma_addr_t ckpt_dma = DMA_MAPPING_ERROR;
+       u8 *digest_buf;
+       u32 idx;
+       int ret;
+       gfp_t gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                    GFP_KERNEL : GFP_ATOMIC;
+
+       if (rctx->error) {
+               ret = rctx->error;
+               goto out_free;
+       }
+
+       if (rctx->num_chunks > 0) {
+               sgm = cmh_cshake_build_sg(rctx, gfp);
+               if (!sgm) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       }
+
+       digest_buf = kzalloc(info->digest_size, gfp);
+       if (!digest_buf) {
+               ret = -ENOMEM;
+               goto out_free_sg;
+       }
+       digest_dma = cmh_dma_map_single(digest_buf, info->digest_size,
+                                       DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(digest_dma)) {
+               ret = -ENOMEM;
+               goto out_free_digest;
+       }
+
+       /* Map checkpoint buffer if present (CMH eSW reads it) */
+       if (rctx->has_checkpoint) {
+               ckpt_dma = cmh_dma_map_single(rctx->checkpoint,
+                                             HC_CONTEXT_SIZE, DMA_TO_DEVICE);
+               if (cmh_dma_map_error(ckpt_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap_digest;
+               }
+       }
+
+       d = cmh_core_select_instance(CMH_CORE_HC);
+       idx = 0;
+
+       if (rctx->has_checkpoint) {
+               /*
+                * Resuming from a saved checkpoint (after export/import):
+                * INIT + RESTORE [+ GATHER] + FINAL + FLUSH
+                * The cSHAKE prefix (N,S) is already absorbed in the
+                * saved Keccak state -- no need to replay HC_CMD_CSHAKE.
+                */
+               vcq_add_hc_init(&cmds[idx++], d.core_id, info->hc_algo);
+               vcq_add_hc_restore(&cmds[idx++], d.core_id, (u64)ckpt_dma,
+                                  HC_CONTEXT_SIZE);
+       } else {
+               bool use_cshake = (tctx->func_name_len > 0 ||
+                                  tctx->custom_len > 0);
+
+               if (use_cshake) {
+                       u32 span;
+
+                       vcq_add_hc_cshake(&cmds[idx], d.core_id,
+                                         info->hc_algo,
+                                         tctx->func_name,
+                                         tctx->func_name_len,
+                                         tctx->custom_len);
+                       span = vcq_add_inline_data(&cmds[idx],
+                                                  tctx->custom,
+                                                  tctx->custom_len);
+                       idx += span;
+               } else {
+                       vcq_add_hc_init(&cmds[idx++], d.core_id,
+                                       info->hc_algo);
+               }
+       }
+
+       if (sgm)
+               vcq_add_hc_gather(&cmds[idx++], d.core_id, (u64)sgm->items_dma,
+                                 HC_CMD_UPDATE);
+
+       vcq_add_hc_final(&cmds[idx++], d.core_id, (u64)digest_dma, info->digest_size);
+       vcq_add_flush(&cmds[idx++], d.core_id);
+
+       rctx->digest_buf = digest_buf;
+       rctx->digest_dma = digest_dma;
+       rctx->ckpt_dma = ckpt_dma;
+       rctx->sgm = sgm;
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_CSHAKE_MAX_PACKED,
+                                           d.mbx_idx,
+                                           cmh_cshake_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(ckpt_dma, HC_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+out_unmap_digest:
+       cmh_dma_unmap_single(digest_dma, info->digest_size,
+                            DMA_FROM_DEVICE);
+out_free_digest:
+       kfree(digest_buf);
+
+out_free_sg:
+       cmh_dma_free_sg(sgm);
+
+out_free:
+       cmh_cshake_free_reqctx(rctx, tctx);
+       return ret;
+}
+
+static int cmh_cshake_finup(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_cshake_update(req);
+       if (ret)
+               return ret;
+
+       return cmh_cshake_final(req);
+}
+
+static int cmh_cshake_digest(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_cshake_init(req);
+       if (ret)
+               return ret;
+
+       return cmh_cshake_finup(req);
+}
+
+static int cmh_cshake_export(struct ahash_request *req, void *out)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_cshake_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_cshake_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_cshake_alg_info *info = rctx->info;
+       struct core_dispatch d;
+       struct vcq_cmd cmds[CMH_CSHAKE_MAX_PAYLOAD];
+       struct cmh_sg_map *sgm = NULL;
+       dma_addr_t save_dma = DMA_MAPPING_ERROR;
+       dma_addr_t ckpt_dma = DMA_MAPPING_ERROR;
+       u8 *save_buf;
+       u32 idx;
+       int ret;
+
+       if (rctx->num_chunks > 0) {
+               sgm = cmh_cshake_build_sg(rctx, GFP_KERNEL);
+               if (!sgm)
+                       return -ENOMEM;
+       }
+
+       save_buf = kzalloc(HC_CONTEXT_SIZE, GFP_KERNEL);
+       if (!save_buf) {
+               cmh_dma_free_sg(sgm);
+               return -ENOMEM;
+       }
+       save_dma = cmh_dma_map_single(save_buf, HC_CONTEXT_SIZE,
+                                     DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(save_dma)) {
+               kfree(save_buf);
+               cmh_dma_free_sg(sgm);
+               return -ENOMEM;
+       }
+
+       /* Map checkpoint buffer if present (CMH eSW reads it) */
+       if (rctx->has_checkpoint) {
+               ckpt_dma = cmh_dma_map_single(rctx->checkpoint,
+                                             HC_CONTEXT_SIZE, DMA_TO_DEVICE);
+               if (cmh_dma_map_error(ckpt_dma)) {
+                       cmh_dma_unmap_single(save_dma, HC_CONTEXT_SIZE,
+                                            DMA_FROM_DEVICE);
+                       kfree(save_buf);
+                       cmh_dma_free_sg(sgm);
+                       return -ENOMEM;
+               }
+       }
+
+       d = cmh_core_select_instance(CMH_CORE_HC);
+       idx = 0;
+
+       if (rctx->has_checkpoint) {
+               /*
+                * Resuming from a saved checkpoint:
+                * INIT + RESTORE [+ GATHER] + SAVE + FLUSH
+                */
+               vcq_add_hc_init(&cmds[idx++], d.core_id, info->hc_algo);
+               vcq_add_hc_restore(&cmds[idx++], d.core_id, (u64)ckpt_dma,
+                                  HC_CONTEXT_SIZE);
+       } else {
+               bool use_cshake = (tctx->func_name_len > 0 ||
+                                  tctx->custom_len > 0);
+
+               if (use_cshake) {
+                       u32 span;
+
+                       vcq_add_hc_cshake(&cmds[idx], d.core_id,
+                                         info->hc_algo,
+                                         tctx->func_name,
+                                         tctx->func_name_len,
+                                         tctx->custom_len);
+                       span = vcq_add_inline_data(&cmds[idx],
+                                                  tctx->custom,
+                                                  tctx->custom_len);
+                       idx += span;
+               } else {
+                       vcq_add_hc_init(&cmds[idx++], d.core_id,
+                                       info->hc_algo);
+               }
+       }
+
+       if (sgm)
+               vcq_add_hc_gather(&cmds[idx++], d.core_id, (u64)sgm->items_dma,
+                                 HC_CMD_UPDATE);
+
+       vcq_add_hc_save(&cmds[idx++], d.core_id, (u64)save_dma,
+                       HC_CONTEXT_SIZE);
+       vcq_add_flush(&cmds[idx++], d.core_id);
+
+       ret = cmh_vcq_pack_and_submit(cmds, idx, rctx->packed, CMH_CSHAKE_MAX_PACKED,
+                                     d.mbx_idx);
+
+       /* Unmap before CPU read */
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(ckpt_dma, HC_CONTEXT_SIZE, DMA_TO_DEVICE);
+       cmh_dma_unmap_single(save_dma, HC_CONTEXT_SIZE, DMA_FROM_DEVICE);
+
+       if (!ret) {
+               memcpy(out, save_buf, HC_CONTEXT_SIZE);
+               /* Checkpoint now represents all accumulated state */
+               memcpy(rctx->checkpoint, save_buf, HC_CONTEXT_SIZE);
+               rctx->has_checkpoint = 1;
+               /* Accumulated chunks are now captured in checkpoint */
+               cmh_cshake_free_chunks(rctx, tctx);
+       }
+
+       kfree(save_buf);
+       cmh_dma_free_sg(sgm);
+       return ret;
+}
+
+static int cmh_cshake_import(struct ahash_request *req, const void *in)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_cshake_reqctx *rctx = ahash_request_ctx(req);
+
+       rctx->info = cmh_cshake_get_info(tfm);
+       rctx->error = 0;
+       INIT_LIST_HEAD(&rctx->chunks);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+
+       memcpy(rctx->checkpoint, in, HC_CONTEXT_SIZE);
+       rctx->has_checkpoint = 1;
+
+       return 0;
+}
+
+/* Transform init/exit */
+
+static int cmh_cshake_cra_init(struct crypto_tfm *tfm)
+{
+       struct cmh_cshake_tfm_ctx *tctx = crypto_tfm_ctx(tfm);
+
+       tctx->func_name = NULL;
+       tctx->func_name_len = 0;
+       tctx->custom = NULL;
+       tctx->custom_len = 0;
+       spin_lock_init(&tctx->chunk_lock);
+       INIT_LIST_HEAD(&tctx->all_chunks);
+       crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
+                                sizeof(struct cmh_cshake_reqctx));
+       return 0;
+}
+
+static void cmh_cshake_cra_exit(struct crypto_tfm *tfm)
+{
+       struct cmh_cshake_tfm_ctx *tctx = crypto_tfm_ctx(tfm);
+       struct cmh_cshake_chunk *chunk, *tmp;
+
+       /* Free any orphaned chunks (e.g. testmgr export/reimport poison) */
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(chunk, tmp, &tctx->all_chunks, tfm_node) {
+               list_del(&chunk->tfm_node);
+               kfree(chunk);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+
+       kfree(tctx->func_name);
+       kfree(tctx->custom);
+       tctx->func_name = NULL;
+       tctx->custom = NULL;
+}
+
+/* Registration */
+
+static struct cmh_cshake_alg_drv cmh_cshake_drvs[CMH_CSHAKE_ALG_COUNT];
+
+/**
+ * cmh_cshake_register() - Register cSHAKE-128/256 hash algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_cshake_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < CMH_CSHAKE_ALG_COUNT; i++) {
+               const struct cmh_cshake_alg_info *info =
+                       &cmh_cshake_algs_info[i];
+               struct cmh_cshake_alg_drv *drv = &cmh_cshake_drvs[i];
+               struct ahash_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               alg->init   = cmh_cshake_init;
+               alg->update = cmh_cshake_update;
+               alg->final  = cmh_cshake_final;
+               alg->finup  = cmh_cshake_finup;
+               alg->digest = cmh_cshake_digest;
+               alg->export = cmh_cshake_export;
+               alg->import = cmh_cshake_import;
+               alg->setkey = cmh_cshake_setkey;
+
+               alg->halg.digestsize = info->digest_size;
+               alg->halg.statesize  = HC_CONTEXT_SIZE;
+
+               strscpy(alg->halg.base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->halg.base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->halg.base.cra_priority    = 300;
+               alg->halg.base.cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                                CRYPTO_ALG_NO_FALLBACK |
+                                                CRYPTO_ALG_ASYNC |
+                                                CRYPTO_ALG_OPTIONAL_KEY |
+                                                CRYPTO_ALG_REQ_VIRT;
+               alg->halg.base.cra_blocksize   = 1;  /* XOF */
+               alg->halg.base.cra_ctxsize     = sizeof(struct cmh_cshake_tfm_ctx);
+               alg->halg.base.cra_init        = cmh_cshake_cra_init;
+               alg->halg.base.cra_exit        = cmh_cshake_cra_exit;
+               alg->halg.base.cra_module      = THIS_MODULE;
+
+               ret = crypto_register_ahash(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "cshake: failed to register %s (rc=%d)\n",
+                               info->drv_name, ret);
+                       while (i--)
+                               crypto_unregister_ahash(&cmh_cshake_drvs[i].alg);
+                       return ret;
+               }
+
+               dev_dbg(cmh_dev(), "cshake: registered %s (priority 300)\n",
+                       info->drv_name);
+       }
+
+       dev_info(cmh_dev(), "cshake: %zu algorithm(s) registered\n",
+                CMH_CSHAKE_ALG_COUNT);
+       return 0;
+}
+
+/**
+ * cmh_cshake_unregister() - Unregister cSHAKE hash algorithms from the crypto framework
+ */
+void cmh_cshake_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < CMH_CSHAKE_ALG_COUNT; i++) {
+               crypto_unregister_ahash(&cmh_cshake_drvs[i].alg);
+               dev_dbg(cmh_dev(), "cshake: unregistered %s\n",
+                       cmh_cshake_algs_info[i].drv_name);
+       }
+
+       dev_info(cmh_dev(), "cshake: cleaned up\n");
+}
diff --git a/drivers/crypto/cmh/cmh_kmac.c b/drivers/crypto/cmh/cmh_kmac.c
new file mode 100644
index 000000000000..7177a2558e97
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_kmac.c
@@ -0,0 +1,630 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API KMAC Driver
+ *
+ * Registers KMAC-128 and KMAC-256 as keyed ahash algorithms using the
+ * CMH Hash Core (HC) via HC_CMD_KMAC.
+ *
+ * KMAC (NIST SP 800-185) is a keyed variant of cSHAKE.  The function
+ * name N is always "KMAC" (hardcoded by the CMH eSW).  The user sets:
+ *   - A key via .setkey() (raw bytes + optional S)
+ *   - An optional customization string S via the setkey blob
+ *
+ * setkey blob format:
+ *   struct kmac_key_param { __be32 keylen; __be32 s_len; };
+ *   blob: kmac_key_param || key[keylen] || S[s_len]
+ *
+ * Uses the same self-contained transaction model as cmh_hmac.c:
+ *   .setkey() -> store raw key (+ S)
+ *   .init()   -> software-only
+ *   .update() -> software-only (accumulate chunks)
+ *   .final()  -> [SYS_CMD_WRITE] + HC_CMD_KMAC [+ inline S] +
+ *               [GATHER] + FINAL + FLUSH
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/hash.h>
+#include <linux/scatterlist.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <asm/byteorder.h>
+
+#include "cmh_kmac.h"
+#include "cmh_vcq.h"
+#include "cmh_hc_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/*
+ * Maximum data that can be accumulated across .update() calls.
+ * The CMH eSW rejects HC_CMD_SAVE when ctx->outlen != 0, which is
+ * always the case for KMAC (eip59_hc_kmac() sets ctx->outlen for
+ * right_encode(outlen) at finalization).  All data must be buffered
+ * in kernel memory and submitted atomically in .final().
+ *
+ * The CMH eSW does not serialize outlen into the external save
+ * context, so HC_CMD_SAVE fails for KMAC mode.
+ */
+#define KMAC_MAX_DATA          (64 * 1024)
+
+/* Algorithm Table */
+
+struct cmh_kmac_alg_info {
+       u32         hc_algo;
+       u32         digest_size;
+       const char *alg_name;
+       const char *drv_name;
+};
+
+static const struct cmh_kmac_alg_info cmh_kmac_algs_info[] = {
+       {
+               .hc_algo     = HC_ALGO_SHAKE128,
+               .digest_size = CMH_SHAKE128_DIGEST_SIZE,
+               .alg_name    = "kmac128",
+               .drv_name    = "cri-cmh-kmac128",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHAKE256,
+               .digest_size = CMH_SHAKE256_DIGEST_SIZE,
+               .alg_name    = "kmac256",
+               .drv_name    = "cri-cmh-kmac256",
+       },
+};
+
+#define CMH_KMAC_ALG_COUNT  ARRAY_SIZE(cmh_kmac_algs_info)
+
+/* Per-Request State */
+
+struct cmh_kmac_chunk {
+       struct list_head  list;
+       struct list_head  tfm_node; /* per-tfm orphan tracking */
+       u32               len;
+       u8                data[];
+};
+
+/*
+ * Max payload slots for KMAC:
+ *   SYS_CMD_WRITE (1) + KMAC (1) + inline S (3 max) + GATHER (1) +
+ *   FINAL (1) + FLUSH (1) = 8
+ */
+#define CMH_KMAC_MAX_PAYLOAD   9
+#define CMH_KMAC_MAX_PACKED    (CMH_KMAC_MAX_PAYLOAD * 2)
+
+struct cmh_kmac_reqctx {
+       const struct cmh_kmac_alg_info *info;
+       int                             error;
+       struct list_head                chunks;
+       u32                             num_chunks;
+       u32                             total_len;
+       /* DMA state for async final */
+       dma_addr_t                      digest_dma;
+       dma_addr_t                      key_dma;
+       u8                             *digest_buf;
+       struct cmh_sg_map              *sgm;
+       u32                             keylen;
+       struct vcq_cmd packed[CMH_KMAC_MAX_PACKED];
+};
+
+/* Per-Transform State (carries key + S across requests) */
+
+struct cmh_kmac_tfm_ctx {
+       struct cmh_key_ctx key;
+       u8  *custom;        /* S (customization string), NULL if empty */
+       u32  custom_len;
+       spinlock_t         chunk_lock;  /* protects all_chunks */
+       struct list_head   all_chunks;  /* orphan-safe chunk tracking */
+};
+
+/* VCQ Builders (KMAC-specific; shared builders in cmh_hc_abi.h / cmh_vcq.h) */
+
+static void vcq_add_hc_kmac(struct vcq_cmd *slot, u32 core_id, u64 key_ref, u32 keylen,
+                           u32 customlen, u32 algo, u32 outlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_KMAC);
+       slot->hwc.hc.cmd_kmac.key = key_ref;
+       slot->hwc.hc.cmd_kmac.custom = 0;  /* inline */
+       slot->hwc.hc.cmd_kmac.keylen = keylen;
+       slot->hwc.hc.cmd_kmac.customlen = customlen;
+       slot->hwc.hc.cmd_kmac.algo = algo;
+       slot->hwc.hc.cmd_kmac.outlen = outlen;
+}
+
+/* Request Context Cleanup */
+
+static void cmh_kmac_free_chunks(struct cmh_kmac_reqctx *rctx,
+                                struct cmh_kmac_tfm_ctx *tctx)
+{
+       struct cmh_kmac_chunk *chunk, *tmp;
+
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(chunk, tmp, &rctx->chunks, list) {
+               list_del(&chunk->list);
+               list_del(&chunk->tfm_node);
+               kfree(chunk);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+}
+
+static struct cmh_sg_map *
+cmh_kmac_build_sg(struct cmh_kmac_reqctx *rctx, gfp_t gfp)
+{
+       struct cmh_dma_buf *bufs;
+       struct cmh_kmac_chunk *chunk;
+       struct cmh_sg_map *sgm;
+       u32 i;
+
+       bufs = kcalloc(rctx->num_chunks, sizeof(*bufs), gfp);
+       if (!bufs)
+               return NULL;
+
+       i = 0;
+       list_for_each_entry(chunk, &rctx->chunks, list) {
+               bufs[i].data = chunk->data;
+               bufs[i].len = chunk->len;
+               i++;
+       }
+
+       sgm = cmh_dma_build_sg(bufs, rctx->num_chunks, gfp);
+       kfree(bufs);
+       return sgm;
+}
+
+/* VCQ Packing + Submit */
+
+/* ahash Operations */
+
+struct cmh_kmac_alg_drv {
+       struct ahash_alg                 alg;
+       const struct cmh_kmac_alg_info  *info;
+};
+
+static const struct cmh_kmac_alg_info *
+cmh_kmac_get_info(struct crypto_ahash *tfm)
+{
+       struct ahash_alg *alg = crypto_ahash_alg(tfm);
+
+       return container_of(alg, struct cmh_kmac_alg_drv, alg)->info;
+}
+
+/*
+ * setkey blob for KMAC (raw key path):
+ *   struct kmac_key_param { __be32 keylen; __be32 s_len; };
+ *   blob: kmac_key_param || key[keylen] || S[s_len]
+ */
+struct kmac_key_param {
+       __be32 keylen;
+       __be32 s_len;
+};
+
+static int cmh_kmac_setkey(struct crypto_ahash *tfm, const u8 *key,
+                          unsigned int keylen)
+{
+       struct cmh_kmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       /* raw key bytes with optional S */
+       {
+               struct kmac_key_param hdr;
+               u32 raw_keylen, s_len;
+               const u8 *ptr;
+
+               if (keylen < sizeof(hdr))
+                       return -EINVAL;
+
+               memcpy(&hdr, key, sizeof(hdr));
+               raw_keylen = be32_to_cpu(hdr.keylen);
+               s_len = be32_to_cpu(hdr.s_len);
+
+               if (keylen != sizeof(hdr) + raw_keylen + s_len)
+                       return -EINVAL;
+
+               if (raw_keylen == 0)
+                       return -EINVAL;
+
+               if (s_len > HC_CSHAKE_MAX_CUSTOMLEN)
+                       return -EINVAL;
+
+               ptr = key + sizeof(hdr);
+
+               /* Store raw key */
+               {
+                       int ret = cmh_key_setkey_raw(&tctx->key, ptr,
+                                                    raw_keylen, CORE_ID_HC);
+                       if (ret)
+                               return ret;
+               }
+               ptr += raw_keylen;
+
+               /* Store S */
+               kfree(tctx->custom);
+               tctx->custom = NULL;
+               tctx->custom_len = 0;
+
+               if (s_len > 0) {
+                       tctx->custom = kmemdup(ptr, s_len, GFP_KERNEL);
+                       if (!tctx->custom) {
+                               cmh_key_destroy(&tctx->key);
+                               return -ENOMEM;
+                       }
+                       tctx->custom_len = s_len;
+               }
+
+               return 0;
+       }
+}
+
+static int cmh_kmac_init(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_kmac_reqctx *rctx = ahash_request_ctx(req);
+
+       rctx->info = cmh_kmac_get_info(tfm);
+       rctx->error = 0;
+       INIT_LIST_HEAD(&rctx->chunks);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+
+       return 0;
+}
+
+static int cmh_kmac_update(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_kmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_kmac_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_kmac_chunk *chunk;
+       int nents;
+
+       if (rctx->error)
+               return rctx->error;
+
+       if (!req->nbytes)
+               return 0;
+
+       if (req->nbytes > KMAC_MAX_DATA - rctx->total_len) {
+               rctx->error = -EINVAL;
+               goto err_free_chunks;
+       }
+
+       chunk = kmalloc(sizeof(*chunk) + req->nbytes,
+                       req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                       GFP_KERNEL : GFP_ATOMIC);
+       if (!chunk) {
+               rctx->error = -ENOMEM;
+               goto err_free_chunks;
+       }
+
+       chunk->len = req->nbytes;
+       if (req->base.flags & CRYPTO_AHASH_REQ_VIRT) {
+               memcpy(chunk->data, req->svirt, req->nbytes);
+       } else {
+               nents = sg_nents_for_len(req->src, req->nbytes);
+               if (nents < 0 ||
+                   sg_copy_to_buffer(req->src, nents,
+                                     chunk->data, req->nbytes) != req->nbytes) {
+                       kfree(chunk);
+                       rctx->error = -EINVAL;
+                       goto err_free_chunks;
+               }
+       }
+
+       list_add_tail(&chunk->list, &rctx->chunks);
+       spin_lock_bh(&tctx->chunk_lock);
+       list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->num_chunks++;
+       rctx->total_len += req->nbytes;
+
+       return 0;
+
+err_free_chunks:
+       /*
+        * Terminal error -- free all previously accumulated chunks.
+        * The crypto API hash path does not call .final() on error,
+        * so chunks would be orphaned otherwise.
+        */
+       cmh_kmac_free_chunks(rctx, tctx);
+       return rctx->error;
+}
+
+static void cmh_kmac_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_kmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_kmac_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       cmh_dma_unmap_single(rctx->digest_dma, rctx->info->digest_size,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->digest_buf,
+                      rctx->info->digest_size);
+
+       kfree(rctx->digest_buf);
+       rctx->digest_buf = NULL;
+       cmh_dma_free_sg(rctx->sgm);
+       rctx->sgm = NULL;
+       cmh_kmac_free_chunks(rctx, tctx);
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_kmac_final(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_kmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_kmac_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_kmac_alg_info *info = rctx->info;
+       struct vcq_cmd cmds[CMH_KMAC_MAX_PAYLOAD];
+       struct cmh_sg_map *sgm = NULL;
+       dma_addr_t digest_dma = DMA_MAPPING_ERROR, key_dma = DMA_MAPPING_ERROR;
+       u8 *digest_buf;
+       u64 key_ref;
+       u32 key_len;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                  GFP_KERNEL : GFP_ATOMIC;
+
+       if (rctx->error) {
+               ret = rctx->error;
+               goto out_free;
+       }
+
+       if (tctx->key.mode == CMH_KEY_NONE) {
+               ret = -ENOKEY;
+               goto out_free;
+       }
+
+       if (rctx->num_chunks > 0) {
+               sgm = cmh_kmac_build_sg(rctx, gfp);
+               if (!sgm) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       }
+
+       digest_buf = kzalloc(info->digest_size, gfp);
+       if (!digest_buf) {
+               ret = -ENOMEM;
+               goto out_free_sg;
+       }
+       digest_dma = cmh_dma_map_single(digest_buf, info->digest_size,
+                                       DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(digest_dma)) {
+               ret = -ENOMEM;
+               goto out_free_digest;
+       }
+
+       /* Resolve key reference */
+       idx = 0;
+
+       key_dma = tctx->key.raw.dma;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP, (u64)key_dma,
+                         SYS_REF_NONE, tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       key_len = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_HC);
+
+       target_mbx = d.mbx_idx;
+
+       core_id = d.core_id;
+
+       {
+               u32 span;
+
+               vcq_add_hc_kmac(&cmds[idx], core_id, key_ref, key_len,
+                               tctx->custom_len, info->hc_algo,
+                               info->digest_size);
+
+               /* Add inline S data after the KMAC slot */
+               span = vcq_add_inline_data(&cmds[idx], tctx->custom,
+                                          tctx->custom_len);
+               idx += span;
+       }
+
+       if (sgm)
+               vcq_add_hc_gather(&cmds[idx++], core_id, (u64)sgm->items_dma,
+                                 HC_CMD_UPDATE);
+
+       vcq_add_hc_final(&cmds[idx++], core_id, (u64)digest_dma, info->digest_size);
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       rctx->digest_buf = digest_buf;
+       rctx->digest_dma = digest_dma;
+       rctx->sgm = sgm;
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_KMAC_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_kmac_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       cmh_dma_unmap_single(digest_dma, info->digest_size,
+                            DMA_FROM_DEVICE);
+out_free_digest:
+       kfree(digest_buf);
+
+out_free_sg:
+       cmh_dma_free_sg(sgm);
+
+out_free:
+       cmh_kmac_free_chunks(rctx, tctx);
+       return ret;
+}
+
+static int cmh_kmac_finup(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_kmac_update(req);
+       if (ret)
+               return ret;
+
+       return cmh_kmac_final(req);
+}
+
+static int cmh_kmac_digest(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_kmac_init(req);
+       if (ret)
+               return ret;
+
+       return cmh_kmac_finup(req);
+}
+
+static int cmh_kmac_export(struct ahash_request *req, void *out)
+{
+       return -EOPNOTSUPP;
+}
+
+static int cmh_kmac_import(struct ahash_request *req, const void *in)
+{
+       return -EOPNOTSUPP;
+}
+
+/* Transform init/exit */
+
+static int cmh_kmac_cra_init(struct crypto_tfm *tfm)
+{
+       struct cmh_kmac_tfm_ctx *tctx = crypto_tfm_ctx(tfm);
+
+       tctx->key.mode = CMH_KEY_NONE;
+       tctx->custom = NULL;
+       tctx->custom_len = 0;
+       spin_lock_init(&tctx->chunk_lock);
+       INIT_LIST_HEAD(&tctx->all_chunks);
+       crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
+                                sizeof(struct cmh_kmac_reqctx));
+       return 0;
+}
+
+static void cmh_kmac_cra_exit(struct crypto_tfm *tfm)
+{
+       struct cmh_kmac_tfm_ctx *tctx = crypto_tfm_ctx(tfm);
+       struct cmh_kmac_chunk *chunk, *tmp;
+
+       /* Free any orphaned chunks (e.g. testmgr export/reimport poison) */
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(chunk, tmp, &tctx->all_chunks, tfm_node) {
+               list_del(&chunk->tfm_node);
+               kfree(chunk);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+
+       cmh_key_destroy(&tctx->key);
+       kfree(tctx->custom);
+       tctx->custom = NULL;
+}
+
+/* Registration */
+
+static struct cmh_kmac_alg_drv cmh_kmac_drvs[CMH_KMAC_ALG_COUNT];
+
+/**
+ * cmh_kmac_register() - Register KMAC-128/256 hash algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_kmac_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < CMH_KMAC_ALG_COUNT; i++) {
+               const struct cmh_kmac_alg_info *info =
+                       &cmh_kmac_algs_info[i];
+               struct cmh_kmac_alg_drv *drv = &cmh_kmac_drvs[i];
+               struct ahash_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               alg->init   = cmh_kmac_init;
+               alg->update = cmh_kmac_update;
+               alg->final  = cmh_kmac_final;
+               alg->finup  = cmh_kmac_finup;
+               alg->digest = cmh_kmac_digest;
+               alg->export = cmh_kmac_export;
+               alg->import = cmh_kmac_import;
+               alg->setkey = cmh_kmac_setkey;
+
+               alg->halg.digestsize = info->digest_size;
+               alg->halg.statesize  = sizeof(struct cmh_kmac_reqctx);
+
+               strscpy(alg->halg.base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->halg.base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->halg.base.cra_priority    = 300;
+               alg->halg.base.cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                                CRYPTO_ALG_NO_FALLBACK |
+                                                CRYPTO_ALG_ASYNC |
+                                                CRYPTO_ALG_REQ_VIRT;
+               alg->halg.base.cra_blocksize   = 1;  /* XOF/keyed XOF */
+               alg->halg.base.cra_ctxsize     = sizeof(struct cmh_kmac_tfm_ctx);
+               alg->halg.base.cra_init        = cmh_kmac_cra_init;
+               alg->halg.base.cra_exit        = cmh_kmac_cra_exit;
+               alg->halg.base.cra_module      = THIS_MODULE;
+
+               ret = crypto_register_ahash(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "kmac: failed to register %s (rc=%d)\n",
+                               info->drv_name, ret);
+                       while (i--)
+                               crypto_unregister_ahash(&cmh_kmac_drvs[i].alg);
+                       return ret;
+               }
+
+               dev_dbg(cmh_dev(), "kmac: registered %s (priority 300)\n",
+                       info->drv_name);
+       }
+
+       dev_info(cmh_dev(), "kmac: %zu algorithm(s) registered\n",
+                CMH_KMAC_ALG_COUNT);
+       return 0;
+}
+
+/**
+ * cmh_kmac_unregister() - Unregister KMAC hash algorithms from the crypto framework
+ */
+void cmh_kmac_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < CMH_KMAC_ALG_COUNT; i++) {
+               crypto_unregister_ahash(&cmh_kmac_drvs[i].alg);
+               dev_dbg(cmh_dev(), "kmac: unregistered %s\n",
+                       cmh_kmac_algs_info[i].drv_name);
+       }
+
+       dev_info(cmh_dev(), "kmac: cleaned up\n");
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index c18219197bd8..f04cc6855963 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -31,6 +31,8 @@
 #include "cmh_rh.h"
 #include "cmh_hash.h"
 #include "cmh_hmac.h"
+#include "cmh_cshake.h"
+#include "cmh_kmac.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -203,6 +205,16 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_hmac_register;

+       /* Register CSHAKE hash algorithms */
+       ret = cmh_cshake_register();
+       if (ret)
+               goto err_cshake_register;
+
+       /* Register KMAC hash algorithms */
+       ret = cmh_kmac_register();
+       if (ret)
+               goto err_kmac_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -215,6 +227,10 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_kmac_unregister();
+err_kmac_register:
+       cmh_cshake_unregister();
+err_cshake_register:
        cmh_hmac_unregister();
 err_hmac_register:
        cmh_hash_unregister();
@@ -245,6 +261,8 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_kmac_unregister();
+       cmh_cshake_unregister();
        cmh_hmac_unregister();
        cmh_hash_unregister();
        cmh_rh_cleanup(cfg);
diff --git a/drivers/crypto/cmh/include/cmh_cshake.h b/drivers/crypto/cmh/include/cmh_cshake.h
new file mode 100644
index 000000000000..9bafe0baf52f
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_cshake.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API CSHAKE Driver
+ *
+ * Registers cSHAKE-128 and cSHAKE-256 ahash algorithms using
+ * HC_CMD_CSHAKE with inline customization string S.
+ */
+
+#ifndef CMH_CSHAKE_H
+#define CMH_CSHAKE_H
+
+int  cmh_cshake_register(void);
+void cmh_cshake_unregister(void);
+
+#endif /* CMH_CSHAKE_H */
diff --git a/drivers/crypto/cmh/include/cmh_kmac.h b/drivers/crypto/cmh/include/cmh_kmac.h
new file mode 100644
index 000000000000..b3c92d71a0b6
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_kmac.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API KMAC Driver
+ *
+ * Registers KMAC-128 and KMAC-256 ahash algorithms using
+ * HC_CMD_KMAC with inline customization string S.
+ */
+
+#ifndef CMH_KMAC_H
+#define CMH_KMAC_H
+
+int  cmh_kmac_register(void);
+void cmh_kmac_unregister(void);
+
+#endif /* CMH_KMAC_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 05/19] crypto: cmh - add HMAC ahash
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register ahash algorithms for HMAC-SHA-224, HMAC-SHA-256,
HMAC-SHA-384, HMAC-SHA-512, HMAC-SHA3-224, HMAC-SHA3-256,
HMAC-SHA3-384, and HMAC-SHA3-512 using the CMH hash core.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile           |   3 +-
 drivers/crypto/cmh/cmh_hmac.c         | 684 ++++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_main.c         |   9 +
 drivers/crypto/cmh/include/cmh_hmac.h |  16 +
 4 files changed, 711 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_hmac.c
 create mode 100644 drivers/crypto/cmh/include/cmh_hmac.h

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index c0531f416229..1f760c0214ef 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -15,7 +15,8 @@ cmh-y := \
        cmh_sysfs.o \
        cmh_key.o \
        cmh_sys.o \
-       cmh_hash.o
+       cmh_hash.o \
+       cmh_hmac.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_hmac.c b/drivers/crypto/cmh/cmh_hmac.c
new file mode 100644
index 000000000000..1f536088eabf
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_hmac.c
@@ -0,0 +1,684 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API HMAC Driver
+ *
+ * Registers HMAC ahash algorithms with the Linux crypto subsystem.
+ * Supports HMAC-SHA-2 (224/256/384/512) and HMAC-SHA-3 (224/256/384/512)
+ * using the CMH Hash Core (HC) via HC_CMD_HMAC.
+ *
+ * Uses the same self-contained transaction model as cmh_hash.c:
+ *   .setkey() -> store raw key bytes
+ *   .init()   -> software-only: initialize per-request context
+ *   .update() -> software-only: copy SG data into per-call chunk
+ *   .final()  -> [SYS_CMD_WRITE] + HC_CMD_HMAC + [GATHER] + FINAL + FLUSH
+ *
+ * Raw-key atomicity: SYS_CMD_WRITE to SYS_REF_TEMP is packed into
+ * the same VCQ as HC_CMD_HMAC (see cmh_key.h for details).
+ *
+ * ahash .export()/.import() (state cloning): supported at the
+ * software accumulation level only.  The HW hash core does NOT
+ * support save/restore of intermediate HMAC state (SHA3 sponge
+ * invertibility, SHA2 blocked for consistency).  Since this driver
+ * accumulates all input data in kernel memory before submitting
+ * atomically in .final(), export/import simply serializes the
+ * input queue -- no keying material or HW state is exposed.
+ *
+ * All HMAC data is accumulated in kernel memory and capped at
+ * HMAC_MAX_DATA (64 KB).
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/hash.h>
+#include <crypto/hash.h>
+#include <linux/scatterlist.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_hmac.h"
+#include "cmh_vcq.h"
+#include "cmh_hc_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/*
+ * Maximum data that can be accumulated across .update() calls.
+ * HMAC save/restore is intentionally unsupported (see file header),
+ * so all data must be buffered in kernel memory and submitted
+ * atomically in .final().  This cap prevents unbounded allocation.
+ */
+#define HMAC_MAX_DATA          (64 * 1024)
+
+/* Algorithm Table */
+
+struct cmh_hmac_alg_info {
+       u32         hc_algo;        /* HC_ALGO_* */
+       u32         digest_size;    /* bytes */
+       u32         block_size;     /* cra_blocksize */
+       const char *alg_name;       /* Linux crypto name: "hmac(sha256)" */
+       const char *drv_name;       /* driver name: "cri-cmh-hmac-sha256" */
+};
+
+static const struct cmh_hmac_alg_info cmh_hmac_algs_info[] = {
+       /* HMAC-SHA-2 family */
+       {
+               .hc_algo     = HC_ALGO_SHA2_224,
+               .digest_size = CMH_SHA224_DIGEST_SIZE,
+               .block_size  = 64,
+               .alg_name    = "hmac(sha224)",
+               .drv_name    = "cri-cmh-hmac-sha224",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA2_256,
+               .digest_size = CMH_SHA256_DIGEST_SIZE,
+               .block_size  = 64,
+               .alg_name    = "hmac(sha256)",
+               .drv_name    = "cri-cmh-hmac-sha256",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA2_384,
+               .digest_size = CMH_SHA384_DIGEST_SIZE,
+               .block_size  = 128,
+               .alg_name    = "hmac(sha384)",
+               .drv_name    = "cri-cmh-hmac-sha384",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA2_512,
+               .digest_size = CMH_SHA512_DIGEST_SIZE,
+               .block_size  = 128,
+               .alg_name    = "hmac(sha512)",
+               .drv_name    = "cri-cmh-hmac-sha512",
+       },
+       /* HMAC-SHA-3 family */
+       {
+               .hc_algo     = HC_ALGO_SHA3_224,
+               .digest_size = CMH_SHA3_224_DIGEST_SIZE,
+               .block_size  = 144,
+               .alg_name    = "hmac(sha3-224)",
+               .drv_name    = "cri-cmh-hmac-sha3-224",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA3_256,
+               .digest_size = CMH_SHA3_256_DIGEST_SIZE,
+               .block_size  = 136,
+               .alg_name    = "hmac(sha3-256)",
+               .drv_name    = "cri-cmh-hmac-sha3-256",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA3_384,
+               .digest_size = CMH_SHA3_384_DIGEST_SIZE,
+               .block_size  = 104,
+               .alg_name    = "hmac(sha3-384)",
+               .drv_name    = "cri-cmh-hmac-sha3-384",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA3_512,
+               .digest_size = CMH_SHA3_512_DIGEST_SIZE,
+               .block_size  = 72,
+               .alg_name    = "hmac(sha3-512)",
+               .drv_name    = "cri-cmh-hmac-sha3-512",
+       },
+};
+
+#define CMH_HMAC_ALG_COUNT  ARRAY_SIZE(cmh_hmac_algs_info)
+
+/* Per-Request State */
+
+struct cmh_hmac_chunk {
+       struct list_head  list;
+       struct list_head  tfm_node; /* per-tfm orphan tracking */
+       u32               len;
+       u8                data[];
+};
+
+/*
+ * Maximum payload commands any HMAC transaction can produce:
+ *   [SYS_CMD_WRITE] + HC_CMD_HMAC + [GATHER] + FINAL + FLUSH = 5
+ * Worst-case packed output (stride=7, 1 payload per VCQ):
+ *   5 VCQs x 2 entries = 10
+ */
+#define CMH_HMAC_MAX_PAYLOAD    5
+#define CMH_HMAC_MAX_PACKED     (CMH_HMAC_MAX_PAYLOAD * 2)
+
+struct cmh_hmac_reqctx {
+       const struct cmh_hmac_alg_info *info;
+       int                             error;
+       struct list_head                chunks;
+       u32                             num_chunks;
+       u32                             total_len;
+       /* DMA state for async final */
+       dma_addr_t                      digest_dma;
+       dma_addr_t                      key_dma;
+       u8                             *digest_buf;
+       struct cmh_sg_map              *sgm;
+       u32                             keylen;
+       struct vcq_cmd packed[CMH_HMAC_MAX_PACKED];
+};
+
+/* Flat state for export/import -- holds accumulated input data only */
+struct cmh_hmac_export_state {
+       u32 total_len;
+       u8  data[];
+};
+
+/*
+ * Flat state buffer for export/import.  The CMH hash core does not
+ * support save/restore of intermediate HMAC state, so this driver
+ * accumulates input in SW and serialises the buffer on export.
+ *
+ * PAGE_SIZE (4096) caps the exportable accumulated-data window.
+ * Full-range export (up to HMAC_MAX_DATA = 64 KB) is not feasible
+ * because the crypto subsystem pre-allocates statesize bytes per
+ * request.  Export returns -EINVAL if the caller has accumulated
+ * more than CMH_HMAC_EXPORT_MAX.
+ */
+#define CMH_HMAC_STATE_SIZE 4096
+#define CMH_HMAC_EXPORT_MAX (CMH_HMAC_STATE_SIZE - sizeof(struct cmh_hmac_export_state))
+
+/* Per-Transform State (carries key across requests) */
+
+struct cmh_hmac_tfm_ctx {
+       struct cmh_key_ctx key;
+       spinlock_t         chunk_lock;  /* protects all_chunks */
+       struct list_head   all_chunks;  /* orphan-safe chunk tracking */
+};
+
+/* VCQ Builders (HMAC-specific; shared builders in cmh_hc_abi.h / cmh_vcq.h) */
+
+/* Add an HC_CMD_HMAC entry */
+static void vcq_add_hc_hmac(struct vcq_cmd *slot, u32 core_id, u64 key_ref,
+                           u32 keylen, u32 algo)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_HMAC);
+       slot->hwc.hc.cmd_hmac.key = key_ref;
+       slot->hwc.hc.cmd_hmac.keylen = keylen;
+       slot->hwc.hc.cmd_hmac.algo = algo;
+}
+
+/* Request Context Cleanup */
+
+static void cmh_hmac_free_chunks(struct cmh_hmac_reqctx *rctx,
+                                struct cmh_hmac_tfm_ctx *tctx)
+{
+       struct cmh_hmac_chunk *chunk, *tmp;
+
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(chunk, tmp, &rctx->chunks, list) {
+               list_del(&chunk->list);
+               list_del(&chunk->tfm_node);
+               kfree_sensitive(chunk);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+}
+
+/*
+ * Build a DMA-mapped CMH eSW scatter-gather chain from accumulated chunks.
+ */
+static struct cmh_sg_map *
+cmh_hmac_build_sg(struct cmh_hmac_reqctx *rctx, gfp_t gfp)
+{
+       struct cmh_dma_buf *bufs;
+       struct cmh_hmac_chunk *chunk;
+       struct cmh_sg_map *sgm;
+       u32 i;
+
+       bufs = kcalloc(rctx->num_chunks, sizeof(*bufs), gfp);
+       if (!bufs)
+               return NULL;
+
+       i = 0;
+       list_for_each_entry(chunk, &rctx->chunks, list) {
+               bufs[i].data = chunk->data;
+               bufs[i].len = chunk->len;
+               i++;
+       }
+
+       sgm = cmh_dma_build_sg(bufs, rctx->num_chunks, gfp);
+       kfree(bufs);
+       return sgm;
+}
+
+/* VCQ Packing + Submit */
+
+/* ahash Operations */
+
+struct cmh_hmac_alg_drv {
+       struct ahash_alg                  alg;
+       const struct cmh_hmac_alg_info   *info;
+};
+
+static const struct cmh_hmac_alg_info *
+cmh_hmac_get_info(struct crypto_ahash *tfm)
+{
+       struct ahash_alg *alg = crypto_ahash_alg(tfm);
+
+       return container_of(alg, struct cmh_hmac_alg_drv, alg)->info;
+}
+
+static int cmh_hmac_setkey(struct crypto_ahash *tfm, const u8 *key,
+                          unsigned int keylen)
+{
+       struct cmh_hmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+
+       return cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_HC);
+}
+
+static int cmh_hmac_init(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_hmac_reqctx *rctx = ahash_request_ctx(req);
+
+       rctx->info = cmh_hmac_get_info(tfm);
+       rctx->error = 0;
+       INIT_LIST_HEAD(&rctx->chunks);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+
+       return 0;
+}
+
+static int cmh_hmac_update(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_hmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_hmac_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_hmac_chunk *chunk;
+       int nents;
+
+       if (rctx->error)
+               return rctx->error;
+
+       if (!req->nbytes)
+               return 0;
+
+       if (req->nbytes > HMAC_MAX_DATA - rctx->total_len) {
+               rctx->error = -EINVAL;
+               goto err_free_chunks;
+       }
+
+       chunk = kmalloc(sizeof(*chunk) + req->nbytes,
+                       req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                       GFP_KERNEL : GFP_ATOMIC);
+       if (!chunk) {
+               rctx->error = -ENOMEM;
+               goto err_free_chunks;
+       }
+
+       chunk->len = req->nbytes;
+       if (req->base.flags & CRYPTO_AHASH_REQ_VIRT) {
+               memcpy(chunk->data, req->svirt, req->nbytes);
+       } else {
+               nents = sg_nents_for_len(req->src, req->nbytes);
+               if (nents < 0 ||
+                   sg_copy_to_buffer(req->src, nents,
+                                     chunk->data, req->nbytes) != req->nbytes) {
+                       kfree_sensitive(chunk);
+                       rctx->error = -EINVAL;
+                       goto err_free_chunks;
+               }
+       }
+
+       list_add_tail(&chunk->list, &rctx->chunks);
+       spin_lock_bh(&tctx->chunk_lock);
+       list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->num_chunks++;
+       rctx->total_len += req->nbytes;
+
+       return 0;
+
+err_free_chunks:
+       /*
+        * Terminal error -- free all previously accumulated chunks.
+        * The crypto API hash path does not call .final()
+        * on error, and hash_sock_destruct has no per-request
+        * destructor, so chunks would be orphaned otherwise.
+        */
+       cmh_hmac_free_chunks(rctx, tctx);
+       return rctx->error;
+}
+
+static void cmh_hmac_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_hmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_hmac_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       cmh_dma_unmap_single(rctx->digest_dma, rctx->info->digest_size,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->digest_buf,
+                      rctx->info->digest_size);
+
+       kfree(rctx->digest_buf);
+       rctx->digest_buf = NULL;
+       cmh_dma_free_sg(rctx->sgm);
+       rctx->sgm = NULL;
+       cmh_hmac_free_chunks(rctx, tctx);
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_hmac_final(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_hmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_hmac_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_hmac_alg_info *info = rctx->info;
+       struct vcq_cmd cmds[CMH_HMAC_MAX_PAYLOAD];
+       struct cmh_sg_map *sgm = NULL;
+       dma_addr_t digest_dma = DMA_MAPPING_ERROR, key_dma = DMA_MAPPING_ERROR;
+       u8 *digest_buf;
+       u64 key_ref;
+       u32 keylen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                  GFP_KERNEL : GFP_ATOMIC;
+
+       if (rctx->error) {
+               ret = rctx->error;
+               goto out_free;
+       }
+
+       if (tctx->key.mode == CMH_KEY_NONE) {
+               ret = -ENOKEY;
+               goto out_free;
+       }
+
+       if (rctx->num_chunks > 0) {
+               sgm = cmh_hmac_build_sg(rctx, gfp);
+               if (!sgm) {
+                       ret = -ENOMEM;
+                       goto out_free;
+               }
+       }
+
+       digest_buf = kzalloc(info->digest_size, gfp);
+       if (!digest_buf) {
+               ret = -ENOMEM;
+               goto out_free_sg;
+       }
+       digest_dma = cmh_dma_map_single(digest_buf, info->digest_size,
+                                       DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(digest_dma)) {
+               ret = -ENOMEM;
+               goto out_free_digest;
+       }
+
+       /* Resolve key reference */
+       idx = 0;
+
+       /*
+        * Raw key: pack SYS_CMD_WRITE(SYS_REF_TEMP) into the
+        * same VCQ so the key write + HMAC are atomic.
+        */
+       key_dma = tctx->key.raw.dma;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP, (u64)key_dma,
+                         SYS_REF_NONE, tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_HC);
+
+       target_mbx = d.mbx_idx;
+
+       core_id = d.core_id;
+
+       vcq_add_hc_hmac(&cmds[idx++], core_id, key_ref, keylen, info->hc_algo);
+
+       if (sgm)
+               vcq_add_hc_gather(&cmds[idx++], core_id, (u64)sgm->items_dma,
+                                 HC_CMD_UPDATE);
+
+       vcq_add_hc_final(&cmds[idx++], core_id, (u64)digest_dma, info->digest_size);
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       rctx->digest_buf = digest_buf;
+       rctx->digest_dma = digest_dma;
+       rctx->sgm = sgm;
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_HMAC_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_hmac_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       cmh_dma_unmap_single(digest_dma, info->digest_size,
+                            DMA_FROM_DEVICE);
+out_free_digest:
+       kfree(digest_buf);
+
+out_free_sg:
+       cmh_dma_free_sg(sgm);
+
+out_free:
+       cmh_hmac_free_chunks(rctx, tctx);
+       return ret;
+}
+
+static int cmh_hmac_finup(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_hmac_update(req);
+       if (ret)
+               return ret;
+
+       return cmh_hmac_final(req);
+}
+
+static int cmh_hmac_digest(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_hmac_init(req);
+       if (ret)
+               return ret;
+
+       return cmh_hmac_finup(req);
+}
+
+/*
+ * ahash .export()/.import(): serialize/deserialize the software
+ * accumulation buffer.  No HW state is involved.
+ */
+
+static int cmh_hmac_export(struct ahash_request *req, void *out)
+{
+       struct cmh_hmac_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_hmac_export_state *state = out;
+       struct cmh_hmac_chunk *chunk;
+       u32 offset = 0;
+
+       if (rctx->total_len > CMH_HMAC_EXPORT_MAX)
+               return -ENOSPC;
+
+       state->total_len = rctx->total_len;
+       list_for_each_entry(chunk, &rctx->chunks, list) {
+               memcpy(state->data + offset, chunk->data, chunk->len);
+               offset += chunk->len;
+       }
+       return 0;
+}
+
+static int cmh_hmac_import(struct ahash_request *req, const void *in)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_hmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_hmac_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_hmac_export_state *state = in;
+       struct cmh_hmac_chunk *chunk;
+
+       /*
+        * Do NOT call free_chunks() here: the crypto API does not
+        * guarantee the request context is in a valid state before
+        * import(), so the list pointers may be stale or invalid.
+        * Re-initialize from scratch instead.  Any pre-existing chunks
+        * are tracked on tctx->all_chunks and freed in cra_exit.
+        */
+       rctx->info = cmh_hmac_get_info(tfm);
+       rctx->error = 0;
+       INIT_LIST_HEAD(&rctx->chunks);
+       rctx->num_chunks = 0;
+       rctx->total_len = 0;
+
+       if (state->total_len > CMH_HMAC_EXPORT_MAX)
+               return -EINVAL;
+
+       if (state->total_len) {
+               chunk = kmalloc(sizeof(*chunk) + state->total_len, GFP_KERNEL);
+               if (!chunk)
+                       return -ENOMEM;
+               chunk->len = state->total_len;
+               memcpy(chunk->data, state->data, state->total_len);
+               list_add_tail(&chunk->list, &rctx->chunks);
+               spin_lock_bh(&tctx->chunk_lock);
+               list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+               spin_unlock_bh(&tctx->chunk_lock);
+               rctx->num_chunks = 1;
+               rctx->total_len = state->total_len;
+       }
+       return 0;
+}
+
+/* Transform init/exit (cra_init/cra_exit) */
+
+static int cmh_hmac_cra_init(struct crypto_tfm *tfm)
+{
+       struct cmh_hmac_tfm_ctx *tctx = crypto_tfm_ctx(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       tctx->key.mode = CMH_KEY_NONE;
+       spin_lock_init(&tctx->chunk_lock);
+       INIT_LIST_HEAD(&tctx->all_chunks);
+       crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
+                                sizeof(struct cmh_hmac_reqctx));
+       return 0;
+}
+
+static void cmh_hmac_cra_exit(struct crypto_tfm *tfm)
+{
+       struct cmh_hmac_tfm_ctx *tctx = crypto_tfm_ctx(tfm);
+       struct cmh_hmac_chunk *chunk, *tmp;
+
+       /* Free any orphaned chunks (e.g. testmgr export/reimport poison) */
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(chunk, tmp, &tctx->all_chunks, tfm_node) {
+               list_del(&chunk->tfm_node);
+               kfree_sensitive(chunk);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+
+       cmh_key_destroy(&tctx->key);
+}
+
+/* Registration */
+
+static struct cmh_hmac_alg_drv cmh_hmac_drvs[CMH_HMAC_ALG_COUNT];
+
+/**
+ * cmh_hmac_register() - Register HMAC-SHA hash algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_hmac_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < CMH_HMAC_ALG_COUNT; i++) {
+               const struct cmh_hmac_alg_info *info = &cmh_hmac_algs_info[i];
+               struct cmh_hmac_alg_drv *drv = &cmh_hmac_drvs[i];
+               struct ahash_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               alg->init   = cmh_hmac_init;
+               alg->update = cmh_hmac_update;
+               alg->final  = cmh_hmac_final;
+               alg->finup  = cmh_hmac_finup;
+               alg->digest = cmh_hmac_digest;
+               alg->export = cmh_hmac_export;
+               alg->import = cmh_hmac_import;
+               alg->setkey = cmh_hmac_setkey;
+
+               alg->halg.digestsize = info->digest_size;
+               alg->halg.statesize  = CMH_HMAC_STATE_SIZE;
+
+               strscpy(alg->halg.base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->halg.base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->halg.base.cra_priority    = 300;
+               alg->halg.base.cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                                CRYPTO_ALG_NO_FALLBACK |
+                                                CRYPTO_ALG_ASYNC |
+                                                CRYPTO_ALG_REQ_VIRT;
+               alg->halg.base.cra_blocksize   = info->block_size;
+               alg->halg.base.cra_ctxsize     = sizeof(struct cmh_hmac_tfm_ctx);
+               alg->halg.base.cra_init        = cmh_hmac_cra_init;
+               alg->halg.base.cra_exit        = cmh_hmac_cra_exit;
+               alg->halg.base.cra_module      = THIS_MODULE;
+
+               ret = crypto_register_ahash(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "hmac: failed to register %s (rc=%d)\n",
+                               info->drv_name, ret);
+                       while (i--)
+                               crypto_unregister_ahash(&cmh_hmac_drvs[i].alg);
+                       return ret;
+               }
+
+               dev_dbg(cmh_dev(), "hmac: registered %s (priority 300)\n",
+                       info->drv_name);
+       }
+
+       dev_info(cmh_dev(), "hmac: %zu algorithm(s) registered\n",
+                CMH_HMAC_ALG_COUNT);
+       return 0;
+}
+
+/**
+ * cmh_hmac_unregister() - Unregister HMAC-SHA hash algorithms from the crypto framework
+ */
+void cmh_hmac_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < CMH_HMAC_ALG_COUNT; i++) {
+               crypto_unregister_ahash(&cmh_hmac_drvs[i].alg);
+               dev_dbg(cmh_dev(), "hmac: unregistered %s\n",
+                       cmh_hmac_algs_info[i].drv_name);
+       }
+
+       dev_info(cmh_dev(), "hmac: cleaned up\n");
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index e8e30b893932..c18219197bd8 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -30,6 +30,7 @@
 #include "cmh_txn.h"
 #include "cmh_rh.h"
 #include "cmh_hash.h"
+#include "cmh_hmac.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -197,6 +198,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_hash_register;

+       /* Register HMAC hash algorithms */
+       ret = cmh_hmac_register();
+       if (ret)
+               goto err_hmac_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -209,6 +215,8 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_hmac_unregister();
+err_hmac_register:
        cmh_hash_unregister();
 err_hash_register:
        cmh_rh_cleanup(cfg);
@@ -237,6 +245,7 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_hmac_unregister();
        cmh_hash_unregister();
        cmh_rh_cleanup(cfg);
        cmh_tm_cleanup();
diff --git a/drivers/crypto/cmh/include/cmh_hmac.h b/drivers/crypto/cmh/include/cmh_hmac.h
new file mode 100644
index 000000000000..fb1a11fb76eb
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_hmac.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API HMAC Driver
+ *
+ * Registers HMAC ahash algorithms (HMAC-SHA-2, HMAC-SHA-3) with the
+ * Linux crypto subsystem using HC_CMD_HMAC.
+ */
+
+#ifndef CMH_HMAC_H
+#define CMH_HMAC_H
+
+int  cmh_hmac_register(void);
+void cmh_hmac_unregister(void);
+
+#endif /* CMH_HMAC_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 07/19] crypto: cmh - add SM3 ahash
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register the SM3 ahash algorithm using the CMH SM3 core (core ID
0x05).  Supports incremental update/finup/final and export/import.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile          |   3 +-
 drivers/crypto/cmh/cmh_main.c        |   9 +
 drivers/crypto/cmh/cmh_sm3.c         | 651 +++++++++++++++++++++++++++
 drivers/crypto/cmh/include/cmh_sm3.h |  27 ++
 4 files changed, 689 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_sm3.c
 create mode 100644 drivers/crypto/cmh/include/cmh_sm3.h

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 2bb240b97f31..b3018fbcf211 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -18,7 +18,8 @@ cmh-y := \
        cmh_hash.o \
        cmh_hmac.o \
        cmh_cshake.o \
-       cmh_kmac.o
+       cmh_kmac.o \
+       cmh_sm3.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index f04cc6855963..56541e0d4219 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -33,6 +33,7 @@
 #include "cmh_hmac.h"
 #include "cmh_cshake.h"
 #include "cmh_kmac.h"
+#include "cmh_sm3.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -215,6 +216,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_kmac_register;

+       /* Register SM3 hash algorithm */
+       ret = cmh_sm3_register();
+       if (ret)
+               goto err_sm3_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -227,6 +233,8 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_sm3_unregister();
+err_sm3_register:
        cmh_kmac_unregister();
 err_kmac_register:
        cmh_cshake_unregister();
@@ -261,6 +269,7 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_sm3_unregister();
        cmh_kmac_unregister();
        cmh_cshake_unregister();
        cmh_hmac_unregister();
diff --git a/drivers/crypto/cmh/cmh_sm3.c b/drivers/crypto/cmh/cmh_sm3.c
new file mode 100644
index 000000000000..156f93da70af
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_sm3.c
@@ -0,0 +1,651 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SM3 Hash Driver (CORE_ID_SM3)
+ *
+ * Registers an asynchronous hash (ahash) algorithm for SM3
+ * (GB/T 32905-2016) using the CMH SM3 core.  This is a standalone
+ * driver separate from cmh_hash.c (which handles HC-based SHA-2/3/SHAKE)
+ * because SM3 runs on a different hardware core with its own command
+ * IDs and context layout.
+ *
+ * Incremental HW update model (same pattern as cmh_hash.c):
+ *
+ *   .init()   -> software-only: zero per-request context
+ *   .update() -> buffer data in holdback; when >= block_size bytes:
+ *                SM3_CMD_INIT [+ RESTORE] + UPDATE + SAVE + FLUSH
+ *                -> return -EINPROGRESS  (else return 0)
+ *   .final()  -> SM3_CMD_INIT [+ RESTORE] [+ UPDATE] + FINAL + FLUSH
+ *   .finup()  -> linearise holdback + new data, then final path
+ *   .digest() -> INIT + UPDATE + FINAL + FLUSH (single-shot, zero-copy)
+ *   .export() -> software-only: copy checkpoint + holdback to out
+ *   .import() -> software-only: restore checkpoint + holdback from in
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/hash.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_sm3.h"
+#include "cmh_vcq.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+
+/* Per-Request State */
+
+/*
+ * Exported SM3 state -- serialised by .export(), deserialised by
+ * .import().  This is what statesize advertises to the crypto subsystem.
+ */
+struct cmh_sm3_export_state {
+       u8  checkpoint[SM3_CONTEXT_SIZE]; /* SM3 context from last SAVE */
+       u8  buf[CMH_SM3_BLOCK_SIZE];     /* holdback buffer */
+       u32 buf_len;                      /* valid bytes in buf[] */
+       u32 hw_started;                   /* non-zero if checkpoint valid */
+};
+
+#define CMH_SM3_MAX_PAYLOAD    5   /* INIT + RESTORE + UPDATE + FINAL/SAVE + FLUSH */
+#define CMH_SM3_MAX_PACKED     (CMH_SM3_MAX_PAYLOAD * 2)
+
+/*
+ * Checkpoint embedded inline: the kernel ahash API has no per-request
+ * destructor, so a heap-allocated checkpoint leaks if a request is
+ * abandoned without .final().
+ */
+struct cmh_sm3_reqctx {
+       int    error;
+       u32    hw_started;
+       u32    buf_len;
+       u32    has_checkpoint;
+       u8     checkpoint[SM3_CONTEXT_SIZE]; /* SM3 context from last SAVE */
+       /* DMA state for current async operation */
+       dma_addr_t ckpt_dma;
+       dma_addr_t save_dma;
+       dma_addr_t data_dma;
+       dma_addr_t digest_dma;
+       u8    *save_buf;
+       u8    *data_buf;
+       u32    data_len;
+       u8    *digest_buf;
+       u8     buf[CMH_SM3_BLOCK_SIZE]; /* holdback for partial block */
+       struct vcq_cmd packed[CMH_SM3_MAX_PACKED];
+};
+
+/* VCQ Builders -- SM3 core (CORE_ID_SM3); generic flush from cmh_vcq.h */
+
+static void vcq_add_sm3_init(struct vcq_cmd *slot, u32 core_id)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM3_CMD_INIT);
+       /* SM3 has a single algorithm -- no algo selector field */
+}
+
+static void vcq_add_sm3_update(struct vcq_cmd *slot, u32 core_id, u64 input_phys, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM3_CMD_UPDATE);
+       slot->hwc.sm3.cmd_update.input = input_phys;
+       slot->hwc.sm3.cmd_update.inlen = len;
+}
+
+static void vcq_add_sm3_final(struct vcq_cmd *slot, u32 core_id, u64 digest_phys, u32 outlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM3_CMD_FINAL);
+       slot->hwc.sm3.cmd_final.digest = digest_phys;
+       slot->hwc.sm3.cmd_final.outlen = outlen;
+}
+
+static void vcq_add_sm3_save(struct vcq_cmd *slot, u32 core_id, u64 output_phys, u32 outlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM3_CMD_SAVE);
+       slot->hwc.sm3.cmd_save.output = output_phys;
+       slot->hwc.sm3.cmd_save.outlen = outlen;
+}
+
+static void vcq_add_sm3_restore(struct vcq_cmd *slot, u32 core_id, u64 input_phys, u32 inlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, SM3_CMD_RESTORE);
+       slot->hwc.sm3.cmd_restore.input = input_phys;
+       slot->hwc.sm3.cmd_restore.inlen = inlen;
+}
+
+/* Request Context Cleanup */
+
+static void cmh_sm3_free_reqctx(struct cmh_sm3_reqctx *rctx)
+{
+       rctx->has_checkpoint = 0;
+}
+
+/* VCQ Packing + Submit */
+
+/* ahash Operations */
+
+static int cmh_sm3_init(struct ahash_request *req)
+{
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+
+       memset(rctx, 0, sizeof(*rctx));
+       return 0;
+}
+
+/*
+ * Update completion -- takes ownership of save_buf as new checkpoint.
+ */
+static void cmh_sm3_update_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, SM3_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->save_dma, SM3_CONTEXT_SIZE,
+                            DMA_FROM_DEVICE);
+       cmh_dma_unmap_single(rctx->data_dma, rctx->data_len,
+                            DMA_TO_DEVICE);
+
+       if (!error) {
+               memcpy(rctx->checkpoint, rctx->save_buf, SM3_CONTEXT_SIZE);
+               rctx->has_checkpoint = 1;
+               kfree(rctx->save_buf);
+               rctx->save_buf = NULL;
+               rctx->hw_started = 1;
+       } else {
+               kfree(rctx->save_buf);
+               rctx->save_buf = NULL;
+               rctx->error = error;
+       }
+
+       kfree(rctx->data_buf);
+       rctx->data_buf = NULL;
+       rctx->data_len = 0;
+
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_sm3_update(struct ahash_request *req)
+{
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+       struct vcq_cmd cmds[CMH_SM3_MAX_PAYLOAD];
+       struct core_dispatch d;
+       u32 total_avail, full_len, tail_len, from_src;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (rctx->error)
+               return rctx->error;
+
+       if (!req->nbytes)
+               return 0;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       total_avail = rctx->buf_len + req->nbytes;
+
+       if (total_avail < CMH_SM3_BLOCK_SIZE) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(rctx->buf + rctx->buf_len,
+                              req->svirt, req->nbytes);
+               else
+                       scatterwalk_map_and_copy(rctx->buf + rctx->buf_len,
+                                                req->src, 0,
+                                                req->nbytes, 0);
+               rctx->buf_len = total_avail;
+               return 0;
+       }
+
+       full_len = total_avail - total_avail % CMH_SM3_BLOCK_SIZE;
+       tail_len = total_avail - full_len;
+       from_src = full_len - rctx->buf_len;
+
+       rctx->data_buf = kmalloc(full_len, gfp);
+       if (!rctx->data_buf)
+               return -ENOMEM;
+
+       if (rctx->buf_len > 0)
+               memcpy(rctx->data_buf, rctx->buf, rctx->buf_len);
+
+       if (from_src > 0) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(rctx->data_buf + rctx->buf_len,
+                              req->svirt, from_src);
+               else
+                       scatterwalk_map_and_copy(rctx->data_buf + rctx->buf_len,
+                                                req->src, 0,
+                                                from_src, 0);
+       }
+
+       if (tail_len > 0) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(rctx->buf, req->svirt + from_src,
+                              tail_len);
+               else
+                       scatterwalk_map_and_copy(rctx->buf, req->src,
+                                                from_src, tail_len,
+                                                0);
+       }
+       rctx->buf_len = tail_len;
+       rctx->data_len = full_len;
+
+       rctx->save_buf = kzalloc(SM3_CONTEXT_SIZE, gfp);
+       if (!rctx->save_buf) {
+               ret = -ENOMEM;
+               goto err_free;
+       }
+
+       rctx->data_dma = cmh_dma_map_single(rctx->data_buf, full_len,
+                                           DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->data_dma)) {
+               ret = -ENOMEM;
+               goto err_free;
+       }
+
+       rctx->save_dma = cmh_dma_map_single(rctx->save_buf, SM3_CONTEXT_SIZE,
+                                           DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->save_dma)) {
+               ret = -ENOMEM;
+               goto err_unmap_data;
+       }
+
+       rctx->ckpt_dma = DMA_MAPPING_ERROR;
+       if (rctx->has_checkpoint) {
+               rctx->ckpt_dma = cmh_dma_map_single(rctx->checkpoint,
+                                                   SM3_CONTEXT_SIZE,
+                                                    DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->ckpt_dma)) {
+                       ret = -ENOMEM;
+                       goto err_unmap_save;
+               }
+       }
+
+       d = cmh_core_select_instance(CMH_CORE_SM3);
+       idx = 0;
+
+       vcq_add_sm3_init(&cmds[idx++], d.core_id);
+
+       if (rctx->has_checkpoint)
+               vcq_add_sm3_restore(&cmds[idx++], d.core_id,
+                                   (u64)rctx->ckpt_dma, SM3_CONTEXT_SIZE);
+
+       vcq_add_sm3_update(&cmds[idx++], d.core_id,
+                          (u64)rctx->data_dma, full_len);
+
+       vcq_add_sm3_save(&cmds[idx++], d.core_id,
+                        (u64)rctx->save_dma, SM3_CONTEXT_SIZE);
+
+       vcq_add_flush(&cmds[idx++], d.core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_SM3_MAX_PACKED,
+                                           d.mbx_idx,
+                                           cmh_sm3_update_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto err_unmap_ckpt;
+
+       return -EINPROGRESS;
+
+err_unmap_ckpt:
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, SM3_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+err_unmap_save:
+       cmh_dma_unmap_single(rctx->save_dma, SM3_CONTEXT_SIZE,
+                            DMA_FROM_DEVICE);
+err_unmap_data:
+       cmh_dma_unmap_single(rctx->data_dma, full_len, DMA_TO_DEVICE);
+err_free:
+       kfree(rctx->save_buf);
+       rctx->save_buf = NULL;
+       kfree(rctx->data_buf);
+       rctx->data_buf = NULL;
+       rctx->data_len = 0;
+       return ret;
+}
+
+static void cmh_sm3_final_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, SM3_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+       if (rctx->data_buf)
+               cmh_dma_unmap_single(rctx->data_dma, rctx->data_len,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->digest_dma, CMH_SM3_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->digest_buf, CMH_SM3_DIGEST_SIZE);
+
+       kfree(rctx->digest_buf);
+       rctx->digest_buf = NULL;
+       kfree(rctx->data_buf);
+       rctx->data_buf = NULL;
+       cmh_sm3_free_reqctx(rctx);
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_sm3_submit_final(struct ahash_request *req,
+                               u8 *data_buf, u32 data_len)
+{
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+       struct vcq_cmd cmds[CMH_SM3_MAX_PAYLOAD];
+       struct core_dispatch d;
+       u32 idx;
+       int ret;
+       gfp_t gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                  GFP_KERNEL : GFP_ATOMIC;
+
+       rctx->data_buf = data_buf;
+       rctx->data_len = data_len;
+
+       rctx->digest_buf = kzalloc(CMH_SM3_DIGEST_SIZE, gfp);
+       if (!rctx->digest_buf) {
+               ret = -ENOMEM;
+               goto err_free_data;
+       }
+
+       rctx->digest_dma = cmh_dma_map_single(rctx->digest_buf,
+                                             CMH_SM3_DIGEST_SIZE,
+                                              DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->digest_dma)) {
+               ret = -ENOMEM;
+               goto err_free_digest;
+       }
+
+       rctx->data_dma = DMA_MAPPING_ERROR;
+       if (data_buf && data_len > 0) {
+               rctx->data_dma = cmh_dma_map_single(data_buf, data_len,
+                                                   DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->data_dma)) {
+                       ret = -ENOMEM;
+                       goto err_unmap_digest;
+               }
+       }
+
+       rctx->ckpt_dma = DMA_MAPPING_ERROR;
+       if (rctx->has_checkpoint) {
+               rctx->ckpt_dma = cmh_dma_map_single(rctx->checkpoint,
+                                                   SM3_CONTEXT_SIZE,
+                                                    DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->ckpt_dma)) {
+                       ret = -ENOMEM;
+                       goto err_unmap_data;
+               }
+       }
+
+       d = cmh_core_select_instance(CMH_CORE_SM3);
+       idx = 0;
+
+       vcq_add_sm3_init(&cmds[idx++], d.core_id);
+
+       if (rctx->has_checkpoint)
+               vcq_add_sm3_restore(&cmds[idx++], d.core_id,
+                                   (u64)rctx->ckpt_dma, SM3_CONTEXT_SIZE);
+
+       if (data_buf && data_len > 0)
+               vcq_add_sm3_update(&cmds[idx++], d.core_id,
+                                  (u64)rctx->data_dma, data_len);
+
+       vcq_add_sm3_final(&cmds[idx++], d.core_id,
+                         (u64)rctx->digest_dma, CMH_SM3_DIGEST_SIZE);
+
+       vcq_add_flush(&cmds[idx++], d.core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_SM3_MAX_PACKED,
+                                           d.mbx_idx,
+                                           cmh_sm3_final_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto err_unmap_ckpt;
+
+       return -EINPROGRESS;
+
+err_unmap_ckpt:
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, SM3_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+err_unmap_data:
+       if (data_buf && data_len > 0)
+               cmh_dma_unmap_single(rctx->data_dma, data_len,
+                                    DMA_TO_DEVICE);
+err_unmap_digest:
+       cmh_dma_unmap_single(rctx->digest_dma, CMH_SM3_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+err_free_digest:
+       kfree(rctx->digest_buf);
+       rctx->digest_buf = NULL;
+err_free_data:
+       kfree(data_buf);
+       rctx->data_buf = NULL;
+       cmh_sm3_free_reqctx(rctx);
+       return ret;
+}
+
+static int cmh_sm3_final(struct ahash_request *req)
+{
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+       u8 *data_buf = NULL;
+       u32 data_len = 0;
+       gfp_t gfp;
+
+       if (rctx->error)
+               return rctx->error;
+
+       if (rctx->buf_len > 0) {
+               gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                     GFP_KERNEL : GFP_ATOMIC;
+               data_buf = kmalloc(rctx->buf_len, gfp);
+               if (!data_buf)
+                       return -ENOMEM;
+               memcpy(data_buf, rctx->buf, rctx->buf_len);
+               data_len = rctx->buf_len;
+               rctx->buf_len = 0;
+       }
+
+       return cmh_sm3_submit_final(req, data_buf, data_len);
+}
+
+static int cmh_sm3_finup(struct ahash_request *req);
+
+/*
+ * One-shot digest -- delegates to init + finup so that all data is
+ * linearised and mapped through cmh_dma_map_single(), which is the
+ * only DMA mapping path aware of all supported DMA backends.
+ */
+static int cmh_sm3_digest(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_sm3_init(req);
+       if (ret)
+               return ret;
+       return cmh_sm3_finup(req);
+}
+
+static int cmh_sm3_finup(struct ahash_request *req)
+{
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+       u32 data_len;
+       u8 *data_buf;
+       gfp_t gfp;
+
+       if (rctx->error)
+               return rctx->error;
+
+       data_len = rctx->buf_len + req->nbytes;
+
+       if (data_len == 0)
+               return cmh_sm3_submit_final(req, NULL, 0);
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       data_buf = kmalloc(data_len, gfp);
+       if (!data_buf)
+               return -ENOMEM;
+
+       if (rctx->buf_len > 0)
+               memcpy(data_buf, rctx->buf, rctx->buf_len);
+
+       if (req->nbytes > 0) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(data_buf + rctx->buf_len,
+                              req->svirt, req->nbytes);
+               else
+                       scatterwalk_map_and_copy(data_buf + rctx->buf_len,
+                                                req->src, 0,
+                                                req->nbytes, 0);
+       }
+
+       rctx->buf_len = 0;
+       return cmh_sm3_submit_final(req, data_buf, data_len);
+}
+
+static int cmh_sm3_export(struct ahash_request *req, void *out)
+{
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_sm3_export_state *state = out;
+
+       if (rctx->hw_started && rctx->has_checkpoint)
+               memcpy(state->checkpoint, rctx->checkpoint, SM3_CONTEXT_SIZE);
+       else
+               memset(state->checkpoint, 0, SM3_CONTEXT_SIZE);
+
+       if (rctx->buf_len > 0)
+               memcpy(state->buf, rctx->buf, rctx->buf_len);
+
+       state->buf_len = rctx->buf_len;
+       state->hw_started = rctx->hw_started;
+
+       return 0;
+}
+
+static int cmh_sm3_import(struct ahash_request *req, const void *in)
+{
+       struct cmh_sm3_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_sm3_export_state *state = in;
+
+       memset(rctx, 0, sizeof(*rctx));
+
+       if (state->buf_len > CMH_SM3_BLOCK_SIZE)
+               return -EINVAL;
+
+       rctx->hw_started = state->hw_started;
+       rctx->buf_len = state->buf_len;
+       memcpy(rctx->buf, state->buf, state->buf_len);
+
+       if (state->hw_started) {
+               memcpy(rctx->checkpoint, state->checkpoint, SM3_CONTEXT_SIZE);
+               rctx->has_checkpoint = 1;
+       }
+
+       return 0;
+}
+
+/* Transform init (cra_init) */
+
+static int cmh_sm3_cra_init(struct crypto_tfm *tfm)
+{
+       crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
+                                sizeof(struct cmh_sm3_reqctx));
+       return 0;
+}
+
+/* Registration */
+
+static struct ahash_alg cmh_sm3_ahash_alg = {
+       .init    = cmh_sm3_init,
+       .update  = cmh_sm3_update,
+       .final   = cmh_sm3_final,
+       .finup   = cmh_sm3_finup,
+       .digest  = cmh_sm3_digest,
+       .export  = cmh_sm3_export,
+       .import  = cmh_sm3_import,
+
+       .halg = {
+               .digestsize = CMH_SM3_DIGEST_SIZE,
+               .statesize  = sizeof(struct cmh_sm3_export_state),
+               .base = {
+                       .cra_name        = "sm3",
+                       .cra_driver_name = "cri-cmh-sm3",
+                       .cra_priority    = 300,
+                       .cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                          CRYPTO_ALG_NO_FALLBACK |
+                                          CRYPTO_ALG_ASYNC |
+                                          CRYPTO_ALG_REQ_VIRT,
+                       .cra_blocksize   = CMH_SM3_BLOCK_SIZE,
+                       .cra_ctxsize     = 0,
+                       .cra_init        = cmh_sm3_cra_init,
+                       .cra_module      = THIS_MODULE,
+               },
+       },
+};
+
+/**
+ * cmh_sm3_register() - Register SM3 hash algorithm with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_sm3_register(void)
+{
+       int ret;
+
+       ret = crypto_register_ahash(&cmh_sm3_ahash_alg);
+       if (ret) {
+               dev_err(cmh_dev(), "sm3: failed to register cmh-sm3 (rc=%d)\n",
+                       ret);
+               return ret;
+       }
+
+       dev_info(cmh_dev(), "sm3: registered cri-cmh-sm3 (priority 300)\n");
+       dev_info(cmh_dev(), "sm3: 1 algorithm(s) registered\n");
+       return 0;
+}
+
+/**
+ * cmh_sm3_unregister() - Unregister SM3 hash algorithm from the crypto framework
+ */
+void cmh_sm3_unregister(void)
+{
+       crypto_unregister_ahash(&cmh_sm3_ahash_alg);
+       dev_info(cmh_dev(), "sm3: unregistered cri-cmh-sm3\n");
+       dev_info(cmh_dev(), "sm3: cleaned up\n");
+}
diff --git a/drivers/crypto/cmh/include/cmh_sm3.h b/drivers/crypto/cmh/include/cmh_sm3.h
new file mode 100644
index 000000000000..2f73537f9c87
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_sm3.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- SM3 Hash Driver
+ *
+ * Registers an ahash algorithm for SM3 (GB/T 32905-2016) with the
+ * Linux crypto subsystem using the CMH SM3 core (CORE_ID_SM3).
+ * Uses the same incremental HW update model as cmh_hash.c:
+ *
+ *   .init()   -> software-only: zero per-request context
+ *   .update() -> holdback partial blocks; submit full blocks via
+ *                SM3_CMD_INIT [+ RESTORE] + UPDATE + SAVE + FLUSH
+ *   .final()  -> SM3_CMD_INIT [+ RESTORE] [+ UPDATE] + FINAL + FLUSH
+ *   .digest() -> INIT + UPDATE + FINAL + FLUSH (single-shot)
+ *   .export() -> software-only: copy checkpoint + holdback
+ *   .import() -> software-only: restore checkpoint + holdback
+ */
+
+#ifndef CMH_SM3_H
+#define CMH_SM3_H
+
+#include "cmh_config.h"
+
+int  cmh_sm3_register(void);
+void cmh_sm3_unregister(void);
+
+#endif /* CMH_SM3_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 13/19] crypto: cmh - add ECDSA/SM2 sig
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register ECDSA and SM2 sig algorithms using the CMH PKE core.
Supports P-256, P-384, P-521, and SM2 curves for sign and verify
operations.  SM2 is registered as verify-only via the crypto API;
full SM2 operations (encrypt, decrypt, key exchange) are available
through the /dev/cmh_mgmt ioctl interface.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile        |   3 +-
 drivers/crypto/cmh/cmh_main.c      |   8 +
 drivers/crypto/cmh/cmh_pke_ecdsa.c | 575 +++++++++++++++++++++++++++++
 3 files changed, 585 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_pke_ecdsa.c

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 7afd9852c337..fdbf66b13628 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -31,7 +31,8 @@ cmh-y := \
        cmh_ccp_poly.o \
        cmh_rng.o \
        cmh_pke_common.o \
-       cmh_pke_rsa.o
+       cmh_pke_rsa.o \
+       cmh_pke_ecdsa.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 8535453342d7..939ff5007755 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -281,6 +281,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_pke_rsa_register;

+       /* Register PKE ECDSA/SM2 sig */
+       ret = cmh_pke_ecdsa_register();
+       if (ret)
+               goto err_pke_ecdsa_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -293,6 +298,8 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_pke_ecdsa_unregister();
+err_pke_ecdsa_register:
        cmh_pke_rsa_unregister();
 err_pke_rsa_register:
        cmh_ccp_poly_unregister();
@@ -351,6 +358,7 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_pke_ecdsa_unregister();
        cmh_pke_rsa_unregister();
        cmh_ccp_poly_unregister();
        cmh_ccp_aead_unregister();
diff --git a/drivers/crypto/cmh/cmh_pke_ecdsa.c b/drivers/crypto/cmh/cmh_pke_ecdsa.c
new file mode 100644
index 000000000000..6b65f7fb72cc
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pke_ecdsa.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- ECDSA / SM2 Signature Driver (sig_alg, synchronous)
+ *
+ * Registers "ecdsa-nist-p256", "ecdsa-nist-p384", and "ecdsa-nist-p521"
+ * sig algorithms with sign, verify, set_pub_key, and set_priv_key callbacks.
+ * Registers "sm2" as verify-only (set_pub_key + verify); SM2 sign is
+ * provided via the cmh_mgmt ioctl path in cmh_pke_sm2.c.
+ *
+ * In-kernel consumers typically use verify-only (module signatures, IMA),
+ * but we provide sign as well for completeness -- matching the CMH eSW
+ * capability.
+ *
+ * Key format: Public key = raw 04 || X || Y (uncompressed).
+ * Signature format: struct ecdsa_raw_sig (two u64[ECC_MAX_DIGITS] arrays
+ * in VLI format -- native byte order, LE digit order) for both sign
+ * output and verify input.  This matches the kernel crypto sig API.
+ *
+ * Private key via cmh_key_ctx: raw keys written via SYS_REF_TEMP.
+ * Datastore-referenced keys are only reachable through the ioctl
+ * path (cmh_mgmt.c).
+ *
+ * SM2 note: The SM2 sig entry is verify-only (no sign/set_priv_key).
+ * SM2 signature verification requires the digest to be SM3(ZA || M)
+ * where ZA = SM3(ENTLA || IDA || a || b || xG || yG || xA || yA).
+ * The ZA identity pre-hash is the caller's responsibility; the driver
+ * passes the digest directly to the CMH eSW SM2 verify engine.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <crypto/sha2.h>
+#include <crypto/sig.h>
+#include <crypto/internal/sig.h>
+#include <crypto/internal/ecc.h>
+
+#include "cmh_pke.h"
+#include "cmh_sys.h"
+#include "cmh_sys_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/*
+ * Number of ECC digits needed for a given coordinate byte length.
+ * P-256: 4, P-384: 6, P-521/SM2(clen=68): 9.
+ */
+static inline unsigned int clen_to_ndigits(u32 clen)
+{
+       return DIV_ROUND_UP(clen, sizeof(u64));
+}
+
+struct cmh_ecdsa_tfm_ctx {
+       struct cmh_key_ctx key;         /* private key (raw only) */
+       u8 *pub_key;                    /* uncompressed (x, y) without 04 prefix */
+       u32 pub_key_len;
+       u32 curve;                      /* PKE_CURVE_* */
+       u32 clen;                       /* coordinate length in bytes */
+};
+
+static inline struct cmh_ecdsa_tfm_ctx *cmh_ecdsa_ctx(struct crypto_sig *tfm)
+{
+       return crypto_sig_ctx(tfm);
+}
+
+/*
+ * Convert one VLI component (u64 array, LE digit order, native byte order)
+ * to big-endian byte array of @out_len bytes.  The VLI value is right-aligned
+ * in the output (leading zero bytes if ndigits*8 > out_len are discarded;
+ * leading zero padding added if ndigits*8 < out_len).
+ */
+static void ecdsa_vli_to_be(const u64 *vli, unsigned int ndigits,
+                           u8 *out, unsigned int out_len)
+{
+       unsigned int full_len = ndigits * sizeof(u64);
+       unsigned int i, skip;
+
+       memset(out, 0, out_len);
+
+       if (full_len <= out_len) {
+               /* VLI fits entirely -- write at right end of out */
+               u8 *dst = out + (out_len - full_len);
+
+               for (i = 0; i < ndigits; i++)
+                       put_unaligned_be64(vli[ndigits - 1 - i],
+                                          &dst[i * sizeof(u64)]);
+       } else {
+               /* VLI wider than out -- skip leading (zero) bytes */
+               u8 tmp[ECC_MAX_BYTES];
+
+               for (i = 0; i < ndigits; i++)
+                       put_unaligned_be64(vli[ndigits - 1 - i],
+                                          &tmp[i * sizeof(u64)]);
+               skip = full_len - out_len;
+               WARN_ON_ONCE(memchr_inv(tmp, 0, skip));
+               memcpy(out, tmp + skip, out_len);
+       }
+}
+
+/*
+ * Convert big-endian byte array to VLI (u64 array, LE digit order).
+ * Output is zero-filled to @max_digits entries.
+ */
+static void ecdsa_be_to_vli(const u8 *in, unsigned int in_len,
+                           u64 *vli, unsigned int max_digits)
+{
+       unsigned int full_len = max_digits * sizeof(u64);
+       u8 tmp[ECC_MAX_BYTES];
+       unsigned int i;
+
+       if (WARN_ON_ONCE(max_digits > ECC_MAX_DIGITS))
+               max_digits = ECC_MAX_DIGITS;
+
+       memset(tmp, 0, full_len);
+       if (in_len <= full_len)
+               memcpy(tmp + (full_len - in_len), in, in_len);
+       else
+               memcpy(tmp, in + (in_len - full_len), full_len);
+
+       for (i = 0; i < max_digits; i++) {
+               unsigned int off = (max_digits - 1 - i) * sizeof(u64);
+
+               vli[i] = get_unaligned_be64(&tmp[off]);
+       }
+}
+
+/*
+ * Extract raw (r || s) big-endian byte arrays from struct ecdsa_raw_sig.
+ * Each component is written as @clen bytes into @raw_rs.
+ */
+static int ecdsa_sig_to_raw(const void *src, unsigned int slen,
+                           u8 *raw_rs, u32 clen)
+{
+       const struct ecdsa_raw_sig *sig = src;
+       unsigned int ndigits = clen_to_ndigits(clen);
+
+       if (slen != sizeof(struct ecdsa_raw_sig))
+               return -EINVAL;
+
+       ecdsa_vli_to_be(sig->r, ndigits, raw_rs, clen);
+       ecdsa_vli_to_be(sig->s, ndigits, raw_rs + clen, clen);
+       return 0;
+}
+
+/*
+ * Encode raw (r || s) big-endian byte arrays into struct ecdsa_raw_sig.
+ * Returns sizeof(struct ecdsa_raw_sig) on success.
+ */
+static int ecdsa_raw_to_sig(const u8 *raw_rs, u32 clen,
+                           void *dst, unsigned int dlen)
+{
+       struct ecdsa_raw_sig *sig = dst;
+
+       if (dlen < sizeof(struct ecdsa_raw_sig))
+               return -ENOSPC;
+
+       memset(sig, 0, sizeof(*sig));
+       ecdsa_be_to_vli(raw_rs, clen, sig->r, ECC_MAX_DIGITS);
+       ecdsa_be_to_vli(raw_rs + clen, clen, sig->s, ECC_MAX_DIGITS);
+       return sizeof(struct ecdsa_raw_sig);
+}
+
+/*
+ * ECDSA verify (synchronous sig_alg)
+ *
+ * @src:    struct ecdsa_raw_sig (VLI format)
+ * @slen:   signature length (must be sizeof(struct ecdsa_raw_sig))
+ * @digest: hash digest
+ * @dlen:   digest length
+ *
+ * Returns 0 on successful verification, negative errno on failure.
+ */
+static int cmh_ecdsa_verify(struct crypto_sig *tfm,
+                           const void *src, unsigned int slen,
+                           const void *digest, unsigned int dlen)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+       u32 clen = ctx->clen;
+       u32 sig_raw_len = 2 * clen;
+       u32 copy_len = min_t(u32, dlen, clen);
+       struct core_dispatch d = cmh_core_select_instance(CMH_CORE_PKE);
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MIN];
+       u8 *sig_raw = NULL, *dig_buf = NULL, *pk_buf = NULL, *rp_buf = NULL;
+       dma_addr_t pk_dma, dig_dma, sig_dma, rp_dma;
+       int ret;
+
+       if (!ctx->pub_key)
+               return -EINVAL;
+
+       sig_raw = kzalloc(sig_raw_len, GFP_KERNEL);
+       dig_buf = kzalloc(clen, GFP_KERNEL);
+       pk_buf = kmemdup(ctx->pub_key, ctx->pub_key_len, GFP_KERNEL);
+       rp_buf = kzalloc(clen, GFP_KERNEL);
+       if (!sig_raw || !dig_buf || !pk_buf || !rp_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       /* Extract raw (r, s) big-endian from VLI signature */
+       ret = ecdsa_sig_to_raw(src, slen, sig_raw, clen);
+       if (ret)
+               goto out_free;
+
+       /*
+        * Truncate or zero-pad digest to clen bytes, right-aligned.
+        * Matches ECDSA bits2int: use leftmost min(dlen, clen) bytes,
+        * zero-pad on the left when dlen < clen.
+        */
+       memcpy(dig_buf + (clen - copy_len), digest, copy_len);
+
+       pk_dma = cmh_dma_map_single(pk_buf, ctx->pub_key_len, DMA_TO_DEVICE);
+       dig_dma = cmh_dma_map_single(dig_buf, clen, DMA_TO_DEVICE);
+       sig_dma = cmh_dma_map_single(sig_raw, sig_raw_len, DMA_TO_DEVICE);
+       rp_dma = cmh_dma_map_single(rp_buf, clen, DMA_FROM_DEVICE);
+
+       if (cmh_dma_map_error(pk_dma) || cmh_dma_map_error(dig_dma) ||
+           cmh_dma_map_error(sig_dma) || cmh_dma_map_error(rp_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MIN);
+       vcq_add_pke_ecdsa_verify(&vcq[1], d.core_id, ctx->curve, clen,
+                                pk_dma, dig_dma, sig_dma, rp_dma,
+                                pke_swap_flags(ctx->curve));
+       vcq_add_pke_flush(&vcq[2], d.core_id);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, PKE_VCQ_CMDS_MIN, 1, d.mbx_idx);
+
+out_unmap:
+       if (!cmh_dma_map_error(rp_dma))
+               cmh_dma_unmap_single(rp_dma, clen, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_raw_len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(dig_dma))
+               cmh_dma_unmap_single(dig_dma, clen, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(pk_dma))
+               cmh_dma_unmap_single(pk_dma, ctx->pub_key_len, DMA_TO_DEVICE);
+
+out_free:
+       kfree(rp_buf);
+       kfree(pk_buf);
+       kfree(sig_raw);
+       kfree(dig_buf);
+       return ret;
+}
+
+/*
+ * ECDSA sign (synchronous sig_alg)
+ *
+ * @src:  hash digest
+ * @slen: digest length
+ * @dst:  output buffer for struct ecdsa_raw_sig (VLI format)
+ * @dlen: output buffer length
+ *
+ * Returns sizeof(struct ecdsa_raw_sig) on success, negative errno on failure.
+ */
+static int cmh_ecdsa_sign(struct crypto_sig *tfm,
+                         const void *src, unsigned int slen,
+                         void *dst, unsigned int dlen)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+       u32 clen = ctx->clen;
+       u32 sig_raw_len = 2 * clen;
+       u32 copy_len = min_t(u32, slen, clen);
+       struct core_dispatch dd;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MAX];
+       u8 *dig_buf = NULL, *sig_buf = NULL, *sk_buf = NULL;
+       dma_addr_t dig_dma, sig_dma, sk_dma;
+       int ret, idx;
+
+       if (ctx->key.mode != CMH_KEY_RAW)
+               return -EINVAL;
+       if (dlen < sizeof(struct ecdsa_raw_sig))
+               return -EINVAL;
+
+       dig_buf = kzalloc(clen, GFP_KERNEL);
+       sig_buf = kzalloc(sig_raw_len, GFP_KERNEL);
+       sk_buf = kmemdup(ctx->key.raw.data, ctx->key.raw.len, GFP_KERNEL);
+       if (!dig_buf || !sig_buf || !sk_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       /*
+        * Truncate or zero-pad digest to clen bytes, right-aligned.
+        * Matches ECDSA bits2int: use leftmost min(slen, clen) bytes,
+        * zero-pad on the left when slen < clen.
+        */
+       memcpy(dig_buf + (clen - copy_len), src, copy_len);
+
+       dig_dma = cmh_dma_map_single(dig_buf, clen, DMA_TO_DEVICE);
+       sig_dma = cmh_dma_map_single(sig_buf, sig_raw_len, DMA_FROM_DEVICE);
+       sk_dma = cmh_dma_map_single(sk_buf, ctx->key.raw.len, DMA_TO_DEVICE);
+
+       if (cmh_dma_map_error(dig_dma) || cmh_dma_map_error(sig_dma) ||
+           cmh_dma_map_error(sk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       dd = cmh_core_select_instance(CMH_CORE_PKE);
+
+       idx = 1;
+       vcq_add_sys_write(&vcq[idx], SYS_REF_TEMP, sk_dma,
+                         SYS_REF_NONE, ctx->key.raw.len,
+                         ctx->key.raw.sys_type);
+       vcq[idx].id |= pke_swap_flags(ctx->curve);
+       idx++;
+       vcq_add_pke_ecdsa_sign(&vcq[idx++], dd.core_id, ctx->curve, clen,
+                              dig_dma, sig_dma, SYS_REF_TEMP,
+                              clen, pke_swap_flags(ctx->curve));
+       vcq_add_pke_flush(&vcq[idx++], dd.core_id);
+       vcq_set_header(&vcq[0], idx);
+
+       ret = cmh_tm_submit_sync_mbx(vcq, idx, 1, dd.mbx_idx);
+       if (!ret) {
+               /* Sync bounce buffer so CPU sees the DMA-written signature */
+               cmh_dma_sync_for_cpu(sig_dma, sig_raw_len, DMA_FROM_DEVICE);
+
+               /* Encode raw (r||s) into VLI ecdsa_raw_sig for kernel API */
+               ret = ecdsa_raw_to_sig(sig_buf, clen, dst, dlen);
+       }
+
+out_unmap:
+       if (!cmh_dma_map_error(sk_dma))
+               cmh_dma_unmap_single(sk_dma, ctx->key.raw.len, DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(sig_dma))
+               cmh_dma_unmap_single(sig_dma, sig_raw_len, DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(dig_dma))
+               cmh_dma_unmap_single(dig_dma, clen, DMA_TO_DEVICE);
+
+out_free:
+       kfree_sensitive(sk_buf);
+       kfree(sig_buf);
+       kfree(dig_buf);
+       return ret;
+}
+
+static int cmh_ecdsa_set_pub_key(struct crypto_sig *tfm,
+                                const void *key, unsigned int keylen)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+       const u8 *d = key;
+       u32 clen = ctx->clen;
+       u32 raw_clen;
+
+       /* Accept 04 || X || Y (uncompressed point) */
+       if (keylen < 1 || d[0] != 0x04)
+               return -EINVAL;
+       d++;
+       keylen--;
+
+       if (keylen & 1)
+               return -EINVAL;
+       raw_clen = keylen / 2;
+
+       /*
+        * Kernel passes ceil(bits/8) per coordinate (e.g. 66 for P-521),
+        * but our HW ABI uses clen (ALIGN(66,4)=68 for P-521).
+        * Accept raw_clen <= clen and zero-pad on the left.
+        */
+       if (raw_clen > clen || raw_clen == 0)
+               return -EINVAL;
+
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+       ctx->pub_key_len = 0;
+
+       ctx->pub_key = kzalloc(2 * clen, GFP_KERNEL);
+       if (!ctx->pub_key)
+               return -ENOMEM;
+
+       /* Right-align each coordinate to clen bytes */
+       memcpy(ctx->pub_key + (clen - raw_clen), d, raw_clen);
+       memcpy(ctx->pub_key + clen + (clen - raw_clen), d + raw_clen,
+              raw_clen);
+       ctx->pub_key_len = 2 * clen;
+       return 0;
+}
+
+static int cmh_ecdsa_set_priv_key(struct crypto_sig *tfm,
+                                 const void *key, unsigned int keylen)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+
+       if (keylen != ctx->clen)
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&ctx->key, key, keylen, CORE_ID_PKE);
+}
+
+static unsigned int cmh_ecdsa_key_size(struct crypto_sig *tfm)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+
+       /* crypto_sig_keysize() returns bits, not bytes */
+       return pke_curve_bits(ctx->curve);
+}
+
+static unsigned int cmh_ecdsa_max_size(struct crypto_sig *tfm)
+{
+       return sizeof(struct ecdsa_raw_sig);
+}
+
+static unsigned int cmh_ecdsa_digest_size(struct crypto_sig *tfm)
+{
+       /*
+        * Accept digests up to SHA-512 (64 bytes).  Digests longer
+        * than the curve order are truncated per ECDSA bits2int.
+        * Matches kernel ecdsa_digest_size().
+        */
+       return SHA512_DIGEST_SIZE;
+}
+
+static int cmh_ecdsa_p256_init(struct crypto_sig *tfm)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->curve = PKE_CURVE_P256;
+       ctx->clen = pke_curve_clen(PKE_CURVE_P256);
+       return 0;
+}
+
+static int cmh_ecdsa_p384_init(struct crypto_sig *tfm)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->curve = PKE_CURVE_P384;
+       ctx->clen = pke_curve_clen(PKE_CURVE_P384);
+       return 0;
+}
+
+static int cmh_ecdsa_p521_init(struct crypto_sig *tfm)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->curve = PKE_CURVE_P521;
+       ctx->clen = pke_curve_clen(PKE_CURVE_P521);
+       return 0;
+}
+
+static int cmh_sm2_init(struct crypto_sig *tfm)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->curve = PKE_CURVE_SM2;
+       ctx->clen = pke_curve_clen(PKE_CURVE_SM2);
+       return 0;
+}
+
+static void cmh_ecdsa_exit(struct crypto_sig *tfm)
+{
+       struct cmh_ecdsa_tfm_ctx *ctx = cmh_ecdsa_ctx(tfm);
+
+       cmh_key_destroy(&ctx->key);
+       kfree(ctx->pub_key);
+       ctx->pub_key = NULL;
+}
+
+static struct sig_alg cmh_ecdsa_algs[] = {
+       {
+               .sign           = cmh_ecdsa_sign,
+               .verify         = cmh_ecdsa_verify,
+               .set_pub_key    = cmh_ecdsa_set_pub_key,
+               .set_priv_key   = cmh_ecdsa_set_priv_key,
+               .key_size       = cmh_ecdsa_key_size,
+               .max_size       = cmh_ecdsa_max_size,
+               .digest_size    = cmh_ecdsa_digest_size,
+               .init           = cmh_ecdsa_p256_init,
+               .exit           = cmh_ecdsa_exit,
+               .base = {
+                       .cra_name         = "ecdsa-nist-p256",
+                       .cra_driver_name  = "cri-cmh-ecdsa-nist-p256",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_ecdsa_tfm_ctx),
+               },
+       },
+       {
+               .sign           = cmh_ecdsa_sign,
+               .verify         = cmh_ecdsa_verify,
+               .set_pub_key    = cmh_ecdsa_set_pub_key,
+               .set_priv_key   = cmh_ecdsa_set_priv_key,
+               .key_size       = cmh_ecdsa_key_size,
+               .max_size       = cmh_ecdsa_max_size,
+               .digest_size    = cmh_ecdsa_digest_size,
+               .init           = cmh_ecdsa_p384_init,
+               .exit           = cmh_ecdsa_exit,
+               .base = {
+                       .cra_name         = "ecdsa-nist-p384",
+                       .cra_driver_name  = "cri-cmh-ecdsa-nist-p384",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_ecdsa_tfm_ctx),
+               },
+       },
+       {
+               .sign           = cmh_ecdsa_sign,
+               .verify         = cmh_ecdsa_verify,
+               .set_pub_key    = cmh_ecdsa_set_pub_key,
+               .set_priv_key   = cmh_ecdsa_set_priv_key,
+               .key_size       = cmh_ecdsa_key_size,
+               .max_size       = cmh_ecdsa_max_size,
+               .digest_size    = cmh_ecdsa_digest_size,
+               .init           = cmh_ecdsa_p521_init,
+               .exit           = cmh_ecdsa_exit,
+               .base = {
+                       .cra_name         = "ecdsa-nist-p521",
+                       .cra_driver_name  = "cri-cmh-ecdsa-nist-p521",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_ecdsa_tfm_ctx),
+               },
+       },
+       {
+               .verify         = cmh_ecdsa_verify,
+               .set_pub_key    = cmh_ecdsa_set_pub_key,
+               .key_size       = cmh_ecdsa_key_size,
+               .max_size       = cmh_ecdsa_max_size,
+               .digest_size    = cmh_ecdsa_digest_size,
+               .init           = cmh_sm2_init,
+               .exit           = cmh_ecdsa_exit,
+               .base = {
+                       .cra_name         = "sm2",
+                       .cra_driver_name  = "cri-cmh-sm2",
+                       .cra_priority     = 300,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_ecdsa_tfm_ctx),
+               },
+       },
+};
+
+/**
+ * cmh_pke_ecdsa_register() - Register ECDSA/SM2 sig algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_pke_ecdsa_register(void)
+{
+       int ret, i;
+
+       for (i = 0; i < ARRAY_SIZE(cmh_ecdsa_algs); i++) {
+               ret = crypto_register_sig(&cmh_ecdsa_algs[i]);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh: failed to register %s (%d)\n",
+                               cmh_ecdsa_algs[i].base.cra_name, ret);
+                       goto err_unregister;
+               }
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_sig(&cmh_ecdsa_algs[i]);
+       return ret;
+}
+
+/**
+ * cmh_pke_ecdsa_unregister() - Unregister ECDSA/SM2 sig algorithms from the crypto framework
+ */
+void cmh_pke_ecdsa_unregister(void)
+{
+       int i = ARRAY_SIZE(cmh_ecdsa_algs);
+
+       while (i--)
+               crypto_unregister_sig(&cmh_ecdsa_algs[i]);
+}
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 11/19] crypto: cmh - add DRBG hwrng
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register the CMH DRBG core (core ID 0x0f) as an hwrng provider.
The hardware implements a NIST SP 800-90A compliant DRBG with
automatic self-seeding.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile   |   3 +-
 drivers/crypto/cmh/cmh_main.c |   9 +
 drivers/crypto/cmh/cmh_rng.c  | 316 ++++++++++++++++++++++++++++++++++
 3 files changed, 327 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_rng.c

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 4ebd0e1d10bc..1c4cb817424c 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -28,7 +28,8 @@ cmh-y := \
        cmh_sm4_cmac.o \
        cmh_ccp.o \
        cmh_ccp_aead.o \
-       cmh_ccp_poly.o
+       cmh_ccp_poly.o \
+       cmh_rng.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 79df27d43e7e..f31c50168e4a 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -34,6 +34,7 @@
 #include "cmh_cshake.h"
 #include "cmh_kmac.h"
 #include "cmh_sm3.h"
+#include "cmh_rng.h"
 #include "cmh_aes.h"
 #include "cmh_sm4.h"
 #include "cmh_ccp.h"
@@ -224,6 +225,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_sm3_register;

+       /* Register hwrng backed by DRBG core */
+       ret = cmh_rng_register(pdev);
+       if (ret)
+               goto err_rng_register;
+
        /* Register AES skcipher algorithms */
        ret = cmh_aes_register();
        if (ret)
@@ -299,6 +305,8 @@ static int cmh_probe(struct platform_device *pdev)
 err_aes_aead_register:
        cmh_aes_unregister();
 err_aes_register:
+       cmh_rng_unregister();
+err_rng_register:
        cmh_sm3_unregister();
 err_sm3_register:
        cmh_kmac_unregister();
@@ -344,6 +352,7 @@ static void cmh_remove(struct platform_device *pdev)
        cmh_aes_cmac_unregister();
        cmh_aes_aead_unregister();
        cmh_aes_unregister();
+       cmh_rng_unregister();
        cmh_sm3_unregister();
        cmh_kmac_unregister();
        cmh_cshake_unregister();
diff --git a/drivers/crypto/cmh/cmh_rng.c b/drivers/crypto/cmh/cmh_rng.c
new file mode 100644
index 000000000000..c9693f6cc360
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_rng.c
@@ -0,0 +1,316 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Hardware RNG (DRBG) Driver
+ *
+ * Implements a Linux hwrng backed by the CMH DRBG core.  Each .read()
+ * builds a 3-entry VCQ (header + GENERATE + FLUSH) and submits it
+ * synchronously through the Transaction Manager.
+ *
+ * DRBG configuration (CONFIG) is a management-host operation in the
+ * CMH security model.  The driver's behaviour is controlled by the
+ * drbg_config setting (debug-only module parameter):
+ *
+ *   "auto" (default) -- attempt CONFIG at probe with the hardcoded
+ *     ratio/strength defaults.  Succeeds in stateless mode (any host may
+ *     CONFIG) or when this host is the management host in stateful
+ *     mode.  On -EPERM the driver logs a notice and continues --
+ *     GENERATE will work once the management host configures the DRBG.
+ *
+ *   "skip" -- do not issue CONFIG; assume an external management host
+ *     will configure the DRBG.  hwrng is still registered; .read()
+ *     returns -EAGAIN until GENERATE succeeds.
+ *
+ * The management host (or any privileged user-space process) can also
+ * reconfigure the DRBG at runtime via CMH_IOCTL_DRBG_CONFIG.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/hw_random.h>
+#include <linux/slab.h>
+#include <linux/platform_device.h>
+
+#include "cmh_rng.h"
+#include "cmh_vcq.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_sys.h"
+#include "cmh_config.h"
+
+/* VCQ layout for .read(): header + GENERATE + FLUSH = 3 entries. */
+#define DRBG_READ_VCQ_CMDS     3
+
+/* VCQ layout for CONFIG: header + RESET + CONFIG + FLUSH = 4 entries. */
+#define DRBG_CONFIG_VCQ_CMDS   4
+
+/*
+ * Linux hwrng quality is expressed in bits of entropy per 1024 bits of
+ * input.  The kernel clamps to this maximum; mirror it here so our
+ * MODULE_PARM_DESC and clamp logic stay in sync.
+ */
+#define CMH_HWRNG_QUALITY_MAX  1024
+
+/* Module parameters */
+
+static int hwrng_quality;
+module_param(hwrng_quality, int, 0444);
+MODULE_PARM_DESC(hwrng_quality,
+                "hwrng quality (0=no CRNG seeding, 1-1024=enable; default: 0)");
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+static char *drbg_config = "auto";
+module_param(drbg_config, charp, 0444);
+MODULE_PARM_DESC(drbg_config,
+                "[debug] DRBG config at probe: \"auto\"=attempt CONFIG, \"skip\"=assume external (default: auto)");
+#else
+static const char *drbg_config = "auto";
+#endif
+
+/*
+ * DRBG parameters -- hardcoded to production defaults.
+ * Entropy ratio 0 = 1:1 (full entropy), security strength 0x10 = 256-bit.
+ */
+#define CMH_DRBG_ENTROPY_RATIO         0
+#define CMH_DRBG_SECURITY_STRENGTH     0x10
+
+static unsigned int drbg_timeout_ms = 500;
+
+/* VCQ Builders */
+
+static void vcq_add_drbg_generate(struct vcq_cmd *slot, u64 dst_phys, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(CORE_ID_DRBG, 0, 1, DRBG_CMD_GENERATE);
+       slot->hwc.drbg.cmd_generate.dst = dst_phys;
+       slot->hwc.drbg.cmd_generate.len = len;
+}
+
+/*
+ * Maximum bytes per DRBG GENERATE request.  The kernel calls .read()
+ * repeatedly to fill larger requests, so capping here is safe.
+ * 32 bytes matches the 256-bit security strength natural output size.
+ */
+#define CMH_DRBG_MAX_GENERATE  32U
+
+/* hwrng .read() callback */
+
+static int cmh_rng_read(struct hwrng *rng, void *data, size_t max, bool wait)
+{
+       struct cmh_dma_orphan *orphan;
+       struct vcq_cmd vcq[DRBG_READ_VCQ_CMDS];
+       dma_addr_t dma_addr;
+       void *dmabuf;
+       size_t nbytes;
+       int ret;
+
+       if (max == 0)
+               return 0;
+
+       /*
+        * Our path uses GFP_KERNEL allocations and synchronous VCQ
+        * submission -- both may sleep.  When the caller indicates
+        * non-blocking context (!wait), return 0 ("no data yet") so
+        * the hwrng core retries later.
+        */
+       if (!wait)
+               return 0;
+
+       nbytes = min_t(size_t, max, CMH_DRBG_MAX_GENERATE);
+
+       orphan = kmalloc_obj(*orphan, GFP_KERNEL);
+       if (!orphan)
+               return -ENOMEM;
+
+       dmabuf = kmalloc(nbytes, GFP_KERNEL);
+       if (!dmabuf) {
+               kfree(orphan);
+               return -ENOMEM;
+       }
+
+       dma_addr = cmh_dma_map_single(dmabuf, nbytes, DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(dma_addr)) {
+               kfree(dmabuf);
+               kfree(orphan);
+               return -ENOMEM;
+       }
+
+       orphan->buf  = dmabuf;
+       orphan->addr = dma_addr;
+       orphan->len  = nbytes;
+       orphan->dir  = DMA_FROM_DEVICE;
+
+       vcq_set_header(&vcq[0], DRBG_READ_VCQ_CMDS);
+       vcq_add_drbg_generate(&vcq[1], dma_addr, nbytes);
+       vcq_add_flush(&vcq[2], CORE_ID_DRBG);
+
+       /*
+        * Use the noabort variant: if the MBX is occupied by a slow
+        * operation (e.g. SLH-DSA sign at 120 s), we must not issue
+        * MBX_COMMAND_ABORT -- that would kill the unrelated in-flight
+        * VCQ.  On timeout with an in-flight VCQ (-EINPROGRESS), the
+        * orphan callback defers DMA cleanup until the RH fires.
+        */
+       ret = cmh_tm_submit_sync_noabort(vcq, DRBG_READ_VCQ_CMDS, 1,
+                                        msecs_to_jiffies(drbg_timeout_ms),
+                                        cmh_dma_orphan_free, orphan);
+       if (ret == -EINPROGRESS) {
+               /* Orphan callback owns dmabuf -- will free on VCQ completion */
+               return -EAGAIN;
+       }
+
+       /* Normal path or cancelled-from-queue: caller owns DMA */
+       cmh_dma_unmap_single(dma_addr, nbytes, DMA_FROM_DEVICE);
+       kfree(orphan);
+
+       if (ret) {
+               /*
+                * Only translate known transient conditions to -EAGAIN
+                * so the hwrng subsystem retries later.  Propagate
+                * unexpected failures unchanged to avoid masking real
+                * faults and causing indefinite retry loops.
+                */
+               switch (ret) {
+               case -EAGAIN:
+               case -EBUSY:
+               case -ETIMEDOUT:
+               case -EIO:
+               /*
+                * -ENODEV: the TM is not running -- occurs when the
+                * hwrng kthread (PF_NOFREEZE, not frozen during
+                * suspend) calls .read() while the device is suspended.
+                * Treat as transient: the TM restarts on resume.
+                */
+               case -ENODEV:
+                       dev_dbg_ratelimited(cmh_dev(),
+                                           "rng: transient DRBG failure (rc=%d)\n",
+                                           ret);
+                       kfree(dmabuf);
+                       return -EAGAIN;
+               default:
+                       dev_err_ratelimited(cmh_dev(),
+                                           "rng: DRBG generate failed (rc=%d)\n",
+                                           ret);
+                       kfree(dmabuf);
+                       return ret;
+               }
+       }
+
+       memcpy(data, dmabuf, nbytes);
+       kfree(dmabuf);
+
+       return nbytes;
+}
+
+/* Registration */
+
+static bool cmh_rng_registered;
+
+static struct hwrng cmh_hwrng = {
+       .name = "cri-cmh-drbg",
+       .read = cmh_rng_read,
+};
+
+/**
+ * cmh_rng_register() - Register the CMH hardware RNG device
+ * @pdev: Platform device for the CMH accelerator
+ *
+ * Reads hwrng quality from device tree and module parameters, validates
+ * DRBG configuration, optionally sends a DRBG CONFIG VCQ to firmware,
+ * and registers the hwrng device with the kernel hwrng framework.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_rng_register(struct platform_device *pdev)
+{
+       int ret;
+
+       cmh_hwrng.quality = hwrng_quality;
+
+       if (cmh_hwrng.quality > CMH_HWRNG_QUALITY_MAX)
+               cmh_hwrng.quality = CMH_HWRNG_QUALITY_MAX;
+
+       /*
+        * DRBG CONFIG is a management-host operation.  In "auto" mode,
+        * attempt it -- this succeeds in stateless mode (any host) or
+        * when we are the management host in stateful mode.  On -EPERM
+        * (not management host) we continue without error -- GENERATE
+        * will work once the management host configures the DRBG.
+        *
+        * In "skip" mode, do not issue CONFIG -- assume the management
+        * host has already configured (or will configure) the DRBG.
+        */
+       if (strcmp(drbg_config, "skip") != 0) {
+               struct vcq_cmd cfg_vcq[DRBG_CONFIG_VCQ_CMDS];
+
+               if (strcmp(drbg_config, "auto") != 0)
+                       dev_warn(&pdev->dev,
+                                "rng: unrecognized drbg_config=\"%s\", treating as \"auto\"\n",
+                                drbg_config);
+
+               vcq_set_header(&cfg_vcq[0], DRBG_CONFIG_VCQ_CMDS);
+               vcq_add_drbg_reset(&cfg_vcq[1]);
+               vcq_add_drbg_config(&cfg_vcq[2], CMH_DRBG_ENTROPY_RATIO,
+                                   CMH_DRBG_SECURITY_STRENGTH);
+               vcq_add_flush(&cfg_vcq[3], CORE_ID_DRBG);
+               ret = cmh_tm_submit_sync(cfg_vcq, DRBG_CONFIG_VCQ_CMDS, 1);
+               if (ret == -EPERM)
+                       dev_notice(&pdev->dev,
+                                  "rng: DRBG config not permitted (not management host); assuming external configuration\n");
+               else if (ret)
+                       dev_warn(&pdev->dev,
+                                "rng: DRBG config failed (rc=%d)\n", ret);
+               else
+                       dev_info(&pdev->dev,
+                                "rng: DRBG configured (ratio=%u strength=0x%02x)\n",
+                                CMH_DRBG_ENTROPY_RATIO,
+                                CMH_DRBG_SECURITY_STRENGTH);
+       } else {
+               dev_info(&pdev->dev,
+                        "rng: DRBG config skipped (drbg_config=skip); assuming external configuration\n");
+       }
+
+       ret = hwrng_register(&cmh_hwrng);
+       if (ret) {
+               dev_err(&pdev->dev, "rng: hwrng_register failed (rc=%d)\n",
+                       ret);
+               return ret;
+       }
+
+       dev_info(&pdev->dev,
+                "rng: registered cri-cmh-drbg (quality=%d timeout=%ums)\n",
+                cmh_hwrng.quality, drbg_timeout_ms);
+
+       cmh_rng_registered = true;
+       return 0;
+}
+
+/**
+ * cmh_rng_unregister() - Unregister the CMH hardware RNG device
+ *
+ * Unregisters the hwrng device from the kernel hwrng framework if it
+ * was previously registered.
+ */
+void cmh_rng_unregister(void)
+{
+       if (!cmh_rng_registered)
+               return;
+       hwrng_unregister(&cmh_hwrng);
+       cmh_rng_registered = false;
+       dev_info(cmh_dev(), "rng: unregistered cri-cmh-drbg\n");
+}
+
+/* -- debugfs timeout accessor ------------------------------------------ */
+
+#ifdef CONFIG_CRYPTO_DEV_CMH_DEBUG
+/**
+ * cmh_rng_timeout_drbg_ptr() - Return pointer to drbg_timeout_ms for debugfs
+ *
+ * Exposes the DRBG operation timeout for runtime tuning via debugfs
+ * config/ directory.
+ *
+ * Return: pointer to the static drbg_timeout_ms variable.
+ */
+unsigned int *cmh_rng_timeout_drbg_ptr(void) { return &drbg_timeout_ms; }
+#endif
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 04/19] crypto: cmh - add SHA-2/SHA-3/SHAKE ahash
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register ahash algorithms for SHA-224, SHA-256, SHA-384, SHA-512,
SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128, and SHAKE256
using the CMH hash core (core ID 0x02).

Supports incremental update/finup/final, init/export/import for
request cloning, and the CRYPTO_AHASH_REQ_VIRT flag for zero-copy
from virtual buffers.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile           |   3 +-
 drivers/crypto/cmh/cmh_hash.c         | 860 ++++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_main.c         |   9 +
 drivers/crypto/cmh/include/cmh_hash.h |  26 +
 4 files changed, 897 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_hash.c
 create mode 100644 drivers/crypto/cmh/include/cmh_hash.h

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index 1492e575598c..c0531f416229 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -14,7 +14,8 @@ cmh-y := \
        cmh_dma.o \
        cmh_sysfs.o \
        cmh_key.o \
-       cmh_sys.o
+       cmh_sys.o \
+       cmh_hash.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_hash.c b/drivers/crypto/cmh/cmh_hash.c
new file mode 100644
index 000000000000..2256bf4314c3
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_hash.c
@@ -0,0 +1,860 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API Hash Driver
+ *
+ * Registers asynchronous hash (ahash) algorithms with the Linux crypto
+ * subsystem.  Implements SHA-2 (224/256/384/512), SHA-3
+ * (224/256/384/512), and SHAKE (128/256) families using the CMH Hash
+ * Core (HC).
+ *
+ * Incremental HW update model -- each .update() with enough data for
+ * at least one full block submits a self-contained VCQ transaction:
+ *
+ *   .init()   -> software-only: zero per-request context
+ *   .update() -> buffer data in holdback; when >= block_size bytes:
+ *                INIT [+ RESTORE] + UPDATE(full blocks) + SAVE + FLUSH
+ *                -> return -EINPROGRESS  (else return 0, data in holdback)
+ *   .final()  -> INIT [+ RESTORE] [+ UPDATE(residual)] + FINAL + FLUSH
+ *   .finup()  -> linearise holdback + new data, then final path
+ *   .digest() -> INIT + UPDATE + FINAL + FLUSH (single-shot, zero-copy)
+ *   .export() -> software-only: copy checkpoint + holdback to out
+ *   .import() -> software-only: restore checkpoint + holdback from in
+ *
+ * The FLUSH after each .update() releases the HC core, so no lockout.
+ * Two hash sessions interleave fine on the same MBX -- each saves its
+ * own state via SAVE and restores via RESTORE on the next call.
+ *
+ * Export/import is purely software (no HW interaction), enabling
+ * crypto API transform clone for all plain-hash algorithms.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/hash.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_hash.h"
+#include "cmh_vcq.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+
+/* Algorithm Table */
+
+struct cmh_hash_alg_info {
+       u32         hc_algo;        /* HC_ALGO_* (SHA2, SHA3, SHAKE) */
+       u32         digest_size;    /* bytes */
+       u32         block_size;     /* cra_blocksize for Linux crypto API */
+       const char *alg_name;       /* Linux crypto name: "sha256" */
+       const char *drv_name;       /* driver name: "cri-cmh-sha256" */
+};
+
+static const struct cmh_hash_alg_info cmh_hash_algs_info[] = {
+       /* SHA-2 family */
+       {
+               .hc_algo     = HC_ALGO_SHA2_224,
+               .digest_size = CMH_SHA224_DIGEST_SIZE,
+               .block_size  = 64,
+               .alg_name    = "sha224",
+               .drv_name    = "cri-cmh-sha224",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA2_256,
+               .digest_size = CMH_SHA256_DIGEST_SIZE,
+               .block_size  = 64,
+               .alg_name    = "sha256",
+               .drv_name    = "cri-cmh-sha256",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA2_384,
+               .digest_size = CMH_SHA384_DIGEST_SIZE,
+               .block_size  = 128,
+               .alg_name    = "sha384",
+               .drv_name    = "cri-cmh-sha384",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA2_512,
+               .digest_size = CMH_SHA512_DIGEST_SIZE,
+               .block_size  = 128,
+               .alg_name    = "sha512",
+               .drv_name    = "cri-cmh-sha512",
+       },
+       /* SHA-3 family */
+       {
+               .hc_algo     = HC_ALGO_SHA3_224,
+               .digest_size = CMH_SHA3_224_DIGEST_SIZE,
+               .block_size  = 144,  /* rate = 1600/8 - 2*224/8 = 144 */
+               .alg_name    = "sha3-224",
+               .drv_name    = "cri-cmh-sha3-224",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA3_256,
+               .digest_size = CMH_SHA3_256_DIGEST_SIZE,
+               .block_size  = 136,  /* rate = 1600/8 - 2*256/8 = 136 */
+               .alg_name    = "sha3-256",
+               .drv_name    = "cri-cmh-sha3-256",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA3_384,
+               .digest_size = CMH_SHA3_384_DIGEST_SIZE,
+               .block_size  = 104,  /* rate = 1600/8 - 2*384/8 = 104 */
+               .alg_name    = "sha3-384",
+               .drv_name    = "cri-cmh-sha3-384",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHA3_512,
+               .digest_size = CMH_SHA3_512_DIGEST_SIZE,
+               .block_size  = 72,   /* rate = 1600/8 - 2*512/8 = 72 */
+               .alg_name    = "sha3-512",
+               .drv_name    = "cri-cmh-sha3-512",
+       },
+       /*
+        * SHAKE (XOF) family -- fixed-output ahash registration.
+        *
+        * cra_blocksize = 1: SHAKE is a sponge/XOF, not Merkle-Damgaard.
+        * The Keccak rate (168 for SHAKE-128, 136 for SHAKE-256) exceeds
+        * MAX_ALGAPI_BLOCKSIZE (160) on Linux <=6.7.  Using 1 signals
+        * "byte-oriented" which is correct for XOF consumers.  The kernel
+        * raised the limit to 208 in 6.8 (commit 2f3a22704889).
+        */
+       {
+               .hc_algo     = HC_ALGO_SHAKE128,
+               .digest_size = CMH_SHAKE128_DIGEST_SIZE,
+               .block_size  = 1,    /* XOF: no meaningful block for crypto API */
+               .alg_name    = "shake128",
+               .drv_name    = "cri-cmh-shake128",
+       },
+       {
+               .hc_algo     = HC_ALGO_SHAKE256,
+               .digest_size = CMH_SHAKE256_DIGEST_SIZE,
+               .block_size  = 1,    /* XOF: no meaningful block for crypto API */
+               .alg_name    = "shake256",
+               .drv_name    = "cri-cmh-shake256",
+       },
+};
+
+#define CMH_HASH_ALG_COUNT  ARRAY_SIZE(cmh_hash_algs_info)
+
+/* Per-Request State */
+
+/* Maximum cra_blocksize across all registered algorithms (SHA3-224) */
+#define CMH_HASH_MAX_BLOCK     144
+
+/*
+ * Exported hash state -- serialised by .export(), deserialised by
+ * .import().  This is what statesize advertises to the crypto subsystem.
+ */
+struct cmh_hash_export_state {
+       u8  checkpoint[HC_CONTEXT_SIZE]; /* HC context from last SAVE */
+       u8  buf[CMH_HASH_MAX_BLOCK];    /* holdback buffer */
+       u32 buf_len;                     /* valid bytes in buf[] */
+       u32 hw_started;                  /* non-zero if checkpoint valid */
+};
+
+/*
+ * Maximum payload commands any hash transaction can produce:
+ *   INIT + RESTORE + UPDATE + SAVE/FINAL + FLUSH = 5
+ * Worst-case packed output (stride=7, 1 payload per VCQ):
+ *   5 VCQs x 2 entries = 10
+ */
+#define CMH_HASH_MAX_PAYLOAD    5
+#define CMH_HASH_MAX_PACKED     (CMH_HASH_MAX_PAYLOAD * 2)
+
+/*
+ * Stored in ahash_request_ctx().  Tracks the algorithm, a holdback
+ * buffer for partial blocks, an HC context checkpoint from the last
+ * SAVE, and DMA state for the current in-flight async operation.
+ *
+ * The checkpoint is embedded inline rather than heap-allocated because
+ * the kernel ahash API has no per-request destructor.  If a request is
+ * abandoned without .final() (e.g. transform freed early), a heap
+ * checkpoint would leak unconditionally.
+ */
+struct cmh_hash_reqctx {
+       const struct cmh_hash_alg_info *info;
+       int    error;
+       u32    hw_started;      /* non-zero after first HW submission */
+       u32    buf_len;         /* bytes in holdback buf[] */
+       u32    has_checkpoint;  /* non-zero if checkpoint[] valid */
+       /* DMA state for current async operation */
+       dma_addr_t ckpt_dma;   /* RESTORE input */
+       dma_addr_t save_dma;   /* SAVE output (update only) */
+       dma_addr_t data_dma;   /* UPDATE input */
+       dma_addr_t digest_dma; /* FINAL output (final/digest only) */
+       u8    *save_buf;       /* SAVE output buffer */
+       u8    *data_buf;       /* linearised data for DMA */
+       u32    data_len;       /* bytes in data_buf */
+       u8    *digest_buf;     /* digest output buffer */
+       u8     buf[CMH_HASH_MAX_BLOCK]; /* holdback for partial block */
+       u8     checkpoint[HC_CONTEXT_SIZE]; /* HC context from last SAVE */
+       struct vcq_cmd packed[CMH_HASH_MAX_PACKED];
+};
+
+/* VCQ Builders (HC-specific; shared builders in cmh_hc_abi.h / cmh_vcq.h) */
+
+/* Add an HC_CMD_UPDATE entry */
+static void vcq_add_hc_update(struct vcq_cmd *slot, u32 core_id, u64 input_phys, u32 len)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_UPDATE);
+       slot->hwc.hc.cmd_update.input = input_phys;
+       slot->hwc.hc.cmd_update.inlen = len;
+}
+
+/* Add an HC_CMD_SAVE entry */
+static void vcq_add_hc_save(struct vcq_cmd *slot, u32 core_id, u64 output_phys, u32 outlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_SAVE);
+       slot->hwc.hc.cmd_save.output = output_phys;
+       slot->hwc.hc.cmd_save.outlen = outlen;
+}
+
+/* Add an HC_CMD_RESTORE entry */
+static void vcq_add_hc_restore(struct vcq_cmd *slot, u32 core_id, u64 input_phys, u32 inlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, HC_CMD_RESTORE);
+       slot->hwc.hc.cmd_restore.input = input_phys;
+       slot->hwc.hc.cmd_restore.inlen = inlen;
+}
+
+/* Request Context Cleanup */
+
+static void cmh_hash_free_reqctx(struct cmh_hash_reqctx *rctx)
+{
+       rctx->has_checkpoint = 0;
+}
+
+/* VCQ Packing + Submit */
+
+/* ahash Operations */
+
+/*
+ * Wrapper struct: embeds ahash_alg + a pointer to our alg_info table
+ * entry so we can recover it in the tfm callbacks.
+ */
+struct cmh_hash_alg_drv {
+       struct ahash_alg                 alg;
+       const struct cmh_hash_alg_info  *info;
+};
+
+/*
+ * Find the cmh_hash_alg_info from the crypto_ahash (embedded in our
+ * registered template).  We stash the info pointer in the algorithm's
+ * driver-private area at registration time (see cmh_hash_register).
+ */
+static const struct cmh_hash_alg_info *
+cmh_hash_get_info(struct crypto_ahash *tfm)
+{
+       struct ahash_alg *alg = crypto_ahash_alg(tfm);
+
+       return container_of(alg, struct cmh_hash_alg_drv, alg)->info;
+}
+
+static int cmh_hash_init(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->info = cmh_hash_get_info(tfm);
+       return 0;
+}
+
+/*
+ * Update completion -- called from threaded IRQ after SAVE completes.
+ * Takes ownership of save_buf as the new checkpoint.
+ */
+static void cmh_hash_update_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       /* Unmap DMA buffers */
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, HC_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->save_dma, HC_CONTEXT_SIZE,
+                            DMA_FROM_DEVICE);
+       cmh_dma_unmap_single(rctx->data_dma, rctx->data_len,
+                            DMA_TO_DEVICE);
+
+       if (!error) {
+               memcpy(rctx->checkpoint, rctx->save_buf, HC_CONTEXT_SIZE);
+               rctx->has_checkpoint = 1;
+               kfree(rctx->save_buf);
+               rctx->save_buf = NULL;
+               rctx->hw_started = 1;
+       } else {
+               kfree(rctx->save_buf);
+               rctx->save_buf = NULL;
+               rctx->error = error;
+       }
+
+       kfree(rctx->data_buf);
+       rctx->data_buf = NULL;
+       rctx->data_len = 0;
+
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * .update -- buffer incoming data, submit full blocks to HW.
+ *
+ * Maintains a partial-block holdback buffer in rctx->buf[].  When
+ * enough data is available for at least one full block, the full
+ * blocks are linearised and submitted as:
+ *   INIT [+ RESTORE] + UPDATE(full_blocks) + SAVE + FLUSH
+ *
+ * The tail (< block_size) stays in the holdback for the next call.
+ * Returns -EINPROGRESS on HW submission, 0 if only buffering.
+ */
+static int cmh_hash_update(struct ahash_request *req)
+{
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_hash_alg_info *info = rctx->info;
+       struct vcq_cmd cmds[CMH_HASH_MAX_PAYLOAD];
+       struct core_dispatch d;
+       u32 block_size = info->block_size;
+       u32 total_avail, full_len, tail_len, from_src;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (rctx->error)
+               return rctx->error;
+
+       if (!req->nbytes)
+               return 0;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       total_avail = rctx->buf_len + req->nbytes;
+
+       /* Not enough for a full block -- just buffer */
+       if (total_avail < block_size) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(rctx->buf + rctx->buf_len,
+                              req->svirt, req->nbytes);
+               else
+                       scatterwalk_map_and_copy(rctx->buf + rctx->buf_len,
+                                                req->src, 0,
+                                                req->nbytes, 0);
+               rctx->buf_len = total_avail;
+               return 0;
+       }
+
+       /* Have at least one full block -- submit to HW */
+       full_len = total_avail - total_avail % block_size;
+       tail_len = total_avail - full_len;
+       from_src = full_len - rctx->buf_len;
+
+       /* Linearise: holdback prefix + full blocks from scatterlist */
+       rctx->data_buf = kmalloc(full_len, gfp);
+       if (!rctx->data_buf)
+               return -ENOMEM;
+
+       if (rctx->buf_len > 0)
+               memcpy(rctx->data_buf, rctx->buf, rctx->buf_len);
+
+       if (from_src > 0) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(rctx->data_buf + rctx->buf_len,
+                              req->svirt, from_src);
+               else
+                       scatterwalk_map_and_copy(rctx->data_buf + rctx->buf_len,
+                                                req->src, 0,
+                                                from_src, 0);
+       }
+
+       /* Move tail to holdback */
+       if (tail_len > 0) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(rctx->buf, req->svirt + from_src,
+                              tail_len);
+               else
+                       scatterwalk_map_and_copy(rctx->buf, req->src,
+                                                from_src, tail_len,
+                                                0);
+       }
+       rctx->buf_len = tail_len;
+       rctx->data_len = full_len;
+
+       /* Allocate SAVE output buffer */
+       rctx->save_buf = kzalloc(HC_CONTEXT_SIZE, gfp);
+       if (!rctx->save_buf) {
+               ret = -ENOMEM;
+               goto err_free;
+       }
+
+       /* DMA map data, save output, and checkpoint */
+       rctx->data_dma = cmh_dma_map_single(rctx->data_buf, full_len,
+                                           DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->data_dma)) {
+               ret = -ENOMEM;
+               goto err_free;
+       }
+
+       rctx->save_dma = cmh_dma_map_single(rctx->save_buf, HC_CONTEXT_SIZE,
+                                           DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->save_dma)) {
+               ret = -ENOMEM;
+               goto err_unmap_data;
+       }
+
+       rctx->ckpt_dma = DMA_MAPPING_ERROR;
+       if (rctx->has_checkpoint) {
+               rctx->ckpt_dma = cmh_dma_map_single(rctx->checkpoint,
+                                                   HC_CONTEXT_SIZE,
+                                                    DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->ckpt_dma)) {
+                       ret = -ENOMEM;
+                       goto err_unmap_save;
+               }
+       }
+
+       /* Build VCQ: INIT [+ RESTORE] + UPDATE + SAVE + FLUSH */
+       d = cmh_core_select_instance(CMH_CORE_HC);
+       idx = 0;
+
+       vcq_add_hc_init(&cmds[idx++], d.core_id, info->hc_algo);
+
+       if (rctx->has_checkpoint)
+               vcq_add_hc_restore(&cmds[idx++], d.core_id,
+                                  (u64)rctx->ckpt_dma, HC_CONTEXT_SIZE);
+
+       vcq_add_hc_update(&cmds[idx++], d.core_id,
+                         (u64)rctx->data_dma, full_len);
+
+       vcq_add_hc_save(&cmds[idx++], d.core_id,
+                       (u64)rctx->save_dma, HC_CONTEXT_SIZE);
+
+       vcq_add_flush(&cmds[idx++], d.core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_HASH_MAX_PACKED,
+                                           d.mbx_idx,
+                                           cmh_hash_update_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto err_unmap_ckpt;
+
+       return -EINPROGRESS;
+
+err_unmap_ckpt:
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, HC_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+err_unmap_save:
+       cmh_dma_unmap_single(rctx->save_dma, HC_CONTEXT_SIZE,
+                            DMA_FROM_DEVICE);
+err_unmap_data:
+       cmh_dma_unmap_single(rctx->data_dma, full_len, DMA_TO_DEVICE);
+err_free:
+       kfree(rctx->save_buf);
+       rctx->save_buf = NULL;
+       kfree(rctx->data_buf);
+       rctx->data_buf = NULL;
+       rctx->data_len = 0;
+       return ret;
+}
+
+/*
+ * Final completion -- unmap all DMA, copy digest, signal done.
+ */
+static void cmh_hash_final_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, HC_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+       if (rctx->data_buf)
+               cmh_dma_unmap_single(rctx->data_dma, rctx->data_len,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->digest_dma, rctx->info->digest_size,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->digest_buf,
+                      rctx->info->digest_size);
+
+       kfree(rctx->digest_buf);
+       rctx->digest_buf = NULL;
+       kfree(rctx->data_buf);
+       rctx->data_buf = NULL;
+       cmh_hash_free_reqctx(rctx);
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * Submit the final VCQ transaction:
+ *   INIT [+ RESTORE] [+ UPDATE(residual)] + FINAL + FLUSH
+ *
+ * @data_buf: linearised residual data, or NULL for empty-hash.
+ *            Ownership transferred -- callback frees it.
+ * @data_len: bytes in data_buf.
+ */
+static int cmh_hash_submit_final(struct ahash_request *req,
+                                u8 *data_buf, u32 data_len)
+{
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_hash_alg_info *info = rctx->info;
+       struct vcq_cmd cmds[CMH_HASH_MAX_PAYLOAD];
+       struct core_dispatch d;
+       u32 idx;
+       int ret;
+       gfp_t gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                  GFP_KERNEL : GFP_ATOMIC;
+
+       rctx->data_buf = data_buf;
+       rctx->data_len = data_len;
+
+       /* Allocate digest output buffer */
+       rctx->digest_buf = kzalloc(info->digest_size, gfp);
+       if (!rctx->digest_buf) {
+               ret = -ENOMEM;
+               goto err_free_data;
+       }
+
+       rctx->digest_dma = cmh_dma_map_single(rctx->digest_buf,
+                                             info->digest_size,
+                                              DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->digest_dma)) {
+               ret = -ENOMEM;
+               goto err_free_digest;
+       }
+
+       /* Map residual data for UPDATE */
+       rctx->data_dma = DMA_MAPPING_ERROR;
+       if (data_buf && data_len > 0) {
+               rctx->data_dma = cmh_dma_map_single(data_buf, data_len,
+                                                   DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->data_dma)) {
+                       ret = -ENOMEM;
+                       goto err_unmap_digest;
+               }
+       }
+
+       /* Map checkpoint for RESTORE */
+       rctx->ckpt_dma = DMA_MAPPING_ERROR;
+       if (rctx->has_checkpoint) {
+               rctx->ckpt_dma = cmh_dma_map_single(rctx->checkpoint,
+                                                   HC_CONTEXT_SIZE,
+                                                    DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->ckpt_dma)) {
+                       ret = -ENOMEM;
+                       goto err_unmap_data;
+               }
+       }
+
+       /* Build VCQ: INIT [+ RESTORE] [+ UPDATE] + FINAL + FLUSH */
+       d = cmh_core_select_instance(CMH_CORE_HC);
+       idx = 0;
+
+       vcq_add_hc_init(&cmds[idx++], d.core_id, info->hc_algo);
+
+       if (rctx->has_checkpoint)
+               vcq_add_hc_restore(&cmds[idx++], d.core_id,
+                                  (u64)rctx->ckpt_dma, HC_CONTEXT_SIZE);
+
+       if (data_buf && data_len > 0)
+               vcq_add_hc_update(&cmds[idx++], d.core_id,
+                                 (u64)rctx->data_dma, data_len);
+
+       vcq_add_hc_final(&cmds[idx++], d.core_id,
+                        (u64)rctx->digest_dma, info->digest_size);
+
+       vcq_add_flush(&cmds[idx++], d.core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_HASH_MAX_PACKED,
+                                           d.mbx_idx,
+                                           cmh_hash_final_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto err_unmap_ckpt;
+
+       return -EINPROGRESS;
+
+err_unmap_ckpt:
+       if (rctx->has_checkpoint)
+               cmh_dma_unmap_single(rctx->ckpt_dma, HC_CONTEXT_SIZE,
+                                    DMA_TO_DEVICE);
+err_unmap_data:
+       if (data_buf && data_len > 0)
+               cmh_dma_unmap_single(rctx->data_dma, data_len,
+                                    DMA_TO_DEVICE);
+err_unmap_digest:
+       cmh_dma_unmap_single(rctx->digest_dma, info->digest_size,
+                            DMA_FROM_DEVICE);
+err_free_digest:
+       kfree(rctx->digest_buf);
+       rctx->digest_buf = NULL;
+err_free_data:
+       kfree(data_buf);
+       rctx->data_buf = NULL;
+       cmh_hash_free_reqctx(rctx);
+       return ret;
+}
+
+static int cmh_hash_final(struct ahash_request *req)
+{
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+       u8 *data_buf = NULL;
+       u32 data_len = 0;
+       gfp_t gfp;
+
+       if (rctx->error)
+               return rctx->error;
+
+       if (rctx->buf_len > 0) {
+               gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+                     GFP_KERNEL : GFP_ATOMIC;
+               data_buf = kmalloc(rctx->buf_len, gfp);
+               if (!data_buf)
+                       return -ENOMEM;
+               memcpy(data_buf, rctx->buf, rctx->buf_len);
+               data_len = rctx->buf_len;
+               rctx->buf_len = 0;
+       }
+
+       return cmh_hash_submit_final(req, data_buf, data_len);
+}
+
+static int cmh_hash_finup(struct ahash_request *req);
+
+/*
+ * One-shot digest -- delegates to init + finup so that all data is
+ * linearised and mapped through cmh_dma_map_single(), which is the
+ * only DMA mapping path aware of all supported DMA backends.
+ */
+static int cmh_hash_digest(struct ahash_request *req)
+{
+       int ret;
+
+       ret = cmh_hash_init(req);
+       if (ret)
+               return ret;
+       return cmh_hash_finup(req);
+}
+
+/*
+ * .finup -- update + final combined into a single transaction.
+ *
+ * Linearises the holdback buffer + new data and submits everything
+ * through the final path.  Avoids the kernel's ahash_def_finup()
+ * which would allocate a subrequest and clone via export/import.
+ */
+static int cmh_hash_finup(struct ahash_request *req)
+{
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+       u32 data_len;
+       u8 *data_buf;
+       gfp_t gfp;
+
+       if (rctx->error)
+               return rctx->error;
+
+       data_len = rctx->buf_len + req->nbytes;
+
+       if (data_len == 0)
+               return cmh_hash_submit_final(req, NULL, 0);
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       data_buf = kmalloc(data_len, gfp);
+       if (!data_buf)
+               return -ENOMEM;
+
+       if (rctx->buf_len > 0)
+               memcpy(data_buf, rctx->buf, rctx->buf_len);
+
+       if (req->nbytes > 0) {
+               if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+                       memcpy(data_buf + rctx->buf_len,
+                              req->svirt, req->nbytes);
+               else
+                       scatterwalk_map_and_copy(data_buf + rctx->buf_len,
+                                                req->src, 0,
+                                                req->nbytes, 0);
+       }
+
+       rctx->buf_len = 0;
+       return cmh_hash_submit_final(req, data_buf, data_len);
+}
+
+/*
+ * Export -- purely software.
+ *
+ * Serialise the HC checkpoint (if any) and holdback buffer into the
+ * export state structure.  No HW interaction needed because the
+ * incremental model keeps checkpoint up-to-date after each .update().
+ */
+static int cmh_hash_export(struct ahash_request *req, void *out)
+{
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_hash_export_state *state = out;
+
+       if (rctx->hw_started && rctx->has_checkpoint)
+               memcpy(state->checkpoint, rctx->checkpoint, HC_CONTEXT_SIZE);
+       else
+               memset(state->checkpoint, 0, HC_CONTEXT_SIZE);
+
+       if (rctx->buf_len > 0)
+               memcpy(state->buf, rctx->buf, rctx->buf_len);
+
+       state->buf_len = rctx->buf_len;
+       state->hw_started = rctx->hw_started;
+
+       return 0;
+}
+
+/*
+ * Import -- purely software.
+ *
+ * Restore checkpoint and holdback from a previously exported state.
+ * The next .update() or .final() will RESTORE the checkpoint into HW.
+ */
+static int cmh_hash_import(struct ahash_request *req, const void *in)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_hash_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_hash_export_state *state = in;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->info = cmh_hash_get_info(tfm);
+
+       if (state->buf_len > CMH_HASH_MAX_BLOCK)
+               return -EINVAL;
+
+       rctx->hw_started = state->hw_started;
+       rctx->buf_len = state->buf_len;
+       memcpy(rctx->buf, state->buf, state->buf_len);
+
+       if (state->hw_started) {
+               memcpy(rctx->checkpoint, state->checkpoint, HC_CONTEXT_SIZE);
+               rctx->has_checkpoint = 1;
+       }
+
+       return 0;
+}
+
+/* Transform init (cra_init) -- set per-request context size */
+
+static int cmh_hash_cra_init(struct crypto_tfm *tfm)
+{
+       crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
+                                sizeof(struct cmh_hash_reqctx));
+       return 0;
+}
+
+/* Registration */
+
+static struct cmh_hash_alg_drv cmh_hash_drvs[CMH_HASH_ALG_COUNT];
+
+/**
+ * cmh_hash_register() - Register SHA-256/384/512/3-256/3-384/3-512 hash algorithms
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_hash_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < CMH_HASH_ALG_COUNT; i++) {
+               const struct cmh_hash_alg_info *info = &cmh_hash_algs_info[i];
+               struct cmh_hash_alg_drv *drv = &cmh_hash_drvs[i];
+               struct ahash_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               alg->init   = cmh_hash_init;
+               alg->update = cmh_hash_update;
+               alg->final  = cmh_hash_final;
+               alg->finup  = cmh_hash_finup;
+               alg->digest = cmh_hash_digest;
+               alg->export = cmh_hash_export;
+               alg->import = cmh_hash_import;
+
+               alg->halg.digestsize = info->digest_size;
+               alg->halg.statesize  = sizeof(struct cmh_hash_export_state);
+
+               strscpy(alg->halg.base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->halg.base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->halg.base.cra_priority    = 300;
+               alg->halg.base.cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                                CRYPTO_ALG_NO_FALLBACK |
+                                                CRYPTO_ALG_ASYNC |
+                                                CRYPTO_ALG_REQ_VIRT;
+               alg->halg.base.cra_blocksize   = info->block_size;
+               alg->halg.base.cra_ctxsize     = 0;
+               alg->halg.base.cra_init        = cmh_hash_cra_init;
+               alg->halg.base.cra_module      = THIS_MODULE;
+
+               ret = crypto_register_ahash(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "hash: failed to register %s (rc=%d)\n",
+                               info->drv_name, ret);
+                       /* Unregister any already-registered algorithms */
+                       while (i--)
+                               crypto_unregister_ahash(&cmh_hash_drvs[i].alg);
+                       return ret;
+               }
+
+               dev_dbg(cmh_dev(), "hash: registered %s (priority 300)\n",
+                       info->drv_name);
+       }
+
+       dev_info(cmh_dev(), "hash: %zu algorithm(s) registered\n",
+                CMH_HASH_ALG_COUNT);
+       return 0;
+}
+
+/**
+ * cmh_hash_unregister() - Unregister SHA hash algorithms from the crypto framework
+ */
+void cmh_hash_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < CMH_HASH_ALG_COUNT; i++) {
+               crypto_unregister_ahash(&cmh_hash_drvs[i].alg);
+               dev_dbg(cmh_dev(), "hash: unregistered %s\n",
+                       cmh_hash_algs_info[i].drv_name);
+       }
+
+       dev_info(cmh_dev(), "hash: cleaned up\n");
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 307bd7dd304b..e8e30b893932 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -29,6 +29,7 @@
 #include "cmh_mqi.h"
 #include "cmh_txn.h"
 #include "cmh_rh.h"
+#include "cmh_hash.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -191,6 +192,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_rh_init;

+       /* Register hash algorithms with the kernel crypto API */
+       ret = cmh_hash_register();
+       if (ret)
+               goto err_hash_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -203,6 +209,8 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_hash_unregister();
+err_hash_register:
        cmh_rh_cleanup(cfg);
 err_rh_init:
        cmh_tm_cleanup();
@@ -229,6 +237,7 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_hash_unregister();
        cmh_rh_cleanup(cfg);
        cmh_tm_cleanup();
        cmh_mqi_cleanup(cfg);
diff --git a/drivers/crypto/cmh/include/cmh_hash.h b/drivers/crypto/cmh/include/cmh_hash.h
new file mode 100644
index 000000000000..bf17d3af7787
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_hash.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API Hash Driver
+ *
+ * Registers ahash algorithms (SHA-2, SHA-3, and SHAKE families) with the
+ * Linux crypto subsystem.  Uses an incremental HW update model:
+ *
+ *   .init()   -> software-only: zero per-request context
+ *   .update() -> holdback partial blocks; submit full blocks via
+ *                INIT [+ RESTORE] + UPDATE + SAVE + FLUSH
+ *   .final()  -> INIT [+ RESTORE] [+ UPDATE(residual)] + FINAL + FLUSH
+ *   .digest() -> INIT + UPDATE + FINAL + FLUSH (single-shot)
+ *   .export() -> software-only: copy checkpoint + holdback
+ *   .import() -> software-only: restore checkpoint + holdback
+ */
+
+#ifndef CMH_HASH_H
+#define CMH_HASH_H
+
+#include "cmh_config.h"
+
+int  cmh_hash_register(void);
+void cmh_hash_unregister(void);
+
+#endif /* CMH_HASH_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 18/19] selftests: crypto: cmh - add kselftest for management ioctl
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Add a minimal kselftest exercising the /dev/cmh_mgmt ioctl interface:

  - open/close the device node
  - invalid ioctl returns -ENOTTY
  - bad version field returns -EINVAL
  - KEY_NEW + KEY_DELETE lifecycle
  - KIC HKDF1 key derivation
  - ML-KEM-768 keygen via hardware RNG

Tests use the kselftest_harness.h fixture framework and output TAP.
Tests that require hardware features not present on the device under
test are gracefully skipped (SKIP).

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 .../selftests/drivers/crypto/cmh/Makefile     |   6 +
 .../drivers/crypto/cmh/cmh_mgmt_test.c        | 183 ++++++++++++++++++
 .../selftests/drivers/crypto/cmh/config       |   1 +
 3 files changed, 190 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/crypto/cmh/Makefile
 create mode 100644 tools/testing/selftests/drivers/crypto/cmh/cmh_mgmt_test.c
 create mode 100644 tools/testing/selftests/drivers/crypto/cmh/config

diff --git a/tools/testing/selftests/drivers/crypto/cmh/Makefile b/tools/testing/selftests/drivers/crypto/cmh/Makefile
new file mode 100644
index 000000000000..86cb63839b27
--- /dev/null
+++ b/tools/testing/selftests/drivers/crypto/cmh/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_GEN_PROGS := cmh_mgmt_test
+
+CFLAGS += -Wall -Wno-misleading-indentation -O2 $(KHDR_INCLUDES)
+
+include ../../../lib.mk
diff --git a/tools/testing/selftests/drivers/crypto/cmh/cmh_mgmt_test.c b/tools/testing/selftests/drivers/crypto/cmh/cmh_mgmt_test.c
new file mode 100644
index 000000000000..4514b5a1349a
--- /dev/null
+++ b/tools/testing/selftests/drivers/crypto/cmh/cmh_mgmt_test.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kselftest for /dev/cmh_mgmt ioctl interface.
+ *
+ * Tests basic ioctl operations on the CRI CryptoManager Hub management
+ * device.  Requires the cmh module loaded on real or emulated hardware.
+ *
+ * Run:  ./cmh_mgmt_test
+ * Output: TAP format (compatible with kselftest harness)
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+
+#include "kselftest_harness.h"
+#include <linux/cmh_mgmt_ioctl.h>
+
+#define CMH_DEV "/dev/cmh_mgmt"
+
+FIXTURE(cmh_mgmt)
+{
+       int fd;
+};
+
+FIXTURE_SETUP(cmh_mgmt)
+{
+       self->fd = open(CMH_DEV, O_RDWR);
+       if (self->fd < 0 && errno == ENOENT)
+               SKIP(return, "Device " CMH_DEV " not present (module not loaded?)");
+       if (self->fd < 0 && errno == EACCES)
+               SKIP(return, "Permission denied -- run as root or with CAP_SYS_ADMIN");
+       ASSERT_GE(self->fd, 0);
+}
+
+FIXTURE_TEARDOWN(cmh_mgmt)
+{
+       if (self->fd >= 0)
+               close(self->fd);
+}
+
+/*
+ * Test 1: open and close succeed.
+ * If we get here, FIXTURE_SETUP already validated the open.
+ */
+TEST_F(cmh_mgmt, open_close)
+{
+       ASSERT_GE(self->fd, 0);
+}
+
+/*
+ * Test 2: invalid ioctl number returns -ENOTTY.
+ */
+TEST_F(cmh_mgmt, invalid_ioctl)
+{
+       int ret;
+       unsigned long bogus_cmd = _IOC(_IOC_READ, 'J', 0xFF, 4);
+
+       ret = ioctl(self->fd, bogus_cmd, NULL);
+       ASSERT_EQ(ret, -1);
+       ASSERT_EQ(errno, ENOTTY);
+}
+
+/*
+ * Test 3: KEY_NEW with bad version field returns -EINVAL.
+ */
+TEST_F(cmh_mgmt, bad_version)
+{
+       struct cmh_ioctl_key_new req;
+       int ret;
+
+       memset(&req, 0, sizeof(req));
+       req.version = 0; /* invalid */
+       req.ds_type = CMH_DS_AES_KEY;
+       req.len = 32;
+       req.flags = CMH_FLAG_PT;
+       req.cid = 0xDEAD;
+
+       ret = ioctl(self->fd, CMH_IOCTL_KEY_NEW, &req);
+       ASSERT_EQ(ret, -1);
+       ASSERT_EQ(errno, EINVAL);
+}
+
+/*
+ * Test 4: KEY_NEW creates a key, KEY_DELETE destroys it.
+ */
+TEST_F(cmh_mgmt, key_new_delete)
+{
+       struct cmh_ioctl_key_new new_req;
+       struct cmh_ioctl_key_grant del_req;
+       int ret;
+
+       memset(&new_req, 0, sizeof(new_req));
+       new_req.version = CMH_MGMT_V1;
+       new_req.ds_type = CMH_DS_AES_KEY;
+       new_req.len = 32;
+       new_req.flags = CMH_FLAG_PT;
+       new_req.cid = 0x5E1F7E57ULL; /* "SELFTEST" */
+
+       ret = ioctl(self->fd, CMH_IOCTL_KEY_NEW, &new_req);
+       ASSERT_EQ(ret, 0);
+       ASSERT_NE(new_req.ref, (uint64_t)0);
+
+       /* Delete the key */
+       memset(&del_req, 0, sizeof(del_req));
+       del_req.version = CMH_MGMT_V1;
+       del_req.ref = new_req.ref;
+
+       ret = ioctl(self->fd, CMH_IOCTL_KEY_DELETE, &del_req);
+       ASSERT_EQ(ret, 0);
+}
+
+/*
+ * Test 5: KIC HKDF1 key derivation from hardware base key.
+ * Requires at least one KIC base key provisioned (KIC_KEY1).
+ */
+TEST_F(cmh_mgmt, kic_hkdf1)
+{
+       struct cmh_ioctl_kic_hkdf1 req;
+       static const char label[] = "kselftest-label";
+       int ret;
+
+       memset(&req, 0, sizeof(req));
+       req.version = CMH_MGMT_V1;
+       req.key_len = 32;
+       req.base_key = CMH_KIC_KEY1;
+       req.cid = 0x4B534C46ULL; /* "KSLF" */
+       req.label = (uint64_t)(uintptr_t)label;
+       req.label_len = sizeof(label) - 1;
+       req.flags = CMH_KIC_FLAG_TEMP;
+
+       ret = ioctl(self->fd, CMH_IOCTL_KIC_HKDF1, &req);
+       if (ret < 0 && errno == EIO)
+               SKIP(return, "KIC base key 1 not provisioned on this device");
+       ASSERT_EQ(ret, 0);
+       ASSERT_NE(req.ref, (uint64_t)0);
+}
+
+/*
+ * Test 6: ML-KEM-768 keygen using hardware RNG.
+ * Verifies the PQC keygen path end-to-end.
+ */
+TEST_F(cmh_mgmt, ml_kem_keygen)
+{
+       struct cmh_ioctl_ml_kem_keygen req;
+       /* ML-KEM-768: ek = 384*3+32 = 1184, dk = 768*3+96 = 2400 */
+       uint8_t ek[1184];
+       uint8_t dk[2400];
+       int ret;
+
+       memset(&req, 0, sizeof(req));
+       req.version = CMH_MGMT_V1;
+       req.k = 3; /* ML-KEM-768 */
+       req.flags = CMH_QSE_FLAG_HW_RNG;
+       req.seed = 0; /* HW RNG */
+       req.z = 0;    /* HW RNG */
+       req.ek = (uint64_t)(uintptr_t)ek;
+       req.dk = (uint64_t)(uintptr_t)dk;
+       req.dk_cid = 0;
+       req.dk_ref = 0;
+
+       memset(ek, 0, sizeof(ek));
+       memset(dk, 0, sizeof(dk));
+
+       ret = ioctl(self->fd, CMH_IOCTL_ML_KEM_KEYGEN, &req);
+       if (ret < 0 && errno == ENODEV)
+               SKIP(return, "QSE core not available on this hardware");
+       ASSERT_EQ(ret, 0);
+
+       /* Verify output is non-zero (extremely unlikely for random keys) */
+       {
+               int i, nonzero = 0;
+
+               for (i = 0; i < 64; i++)
+                       nonzero += (ek[i] != 0);
+               ASSERT_GT(nonzero, 0);
+       }
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/drivers/crypto/cmh/config b/tools/testing/selftests/drivers/crypto/cmh/config
new file mode 100644
index 000000000000..063c1dd0e23b
--- /dev/null
+++ b/tools/testing/selftests/drivers/crypto/cmh/config
@@ -0,0 +1 @@
+CONFIG_CRYPTO_DEV_CMH=m
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 14/19] crypto: cmh - add ECDH/X25519 kpp
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register ECDH and X25519 kpp algorithms using the CMH PKE core.
Supports P-256, P-384, and Curve25519 for key agreement.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile       |   3 +-
 drivers/crypto/cmh/cmh_main.c     |   8 +
 drivers/crypto/cmh/cmh_pke_ecdh.c | 698 ++++++++++++++++++++++++++++++
 3 files changed, 708 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_pke_ecdh.c

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index fdbf66b13628..a4cea0a56fc1 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -32,7 +32,8 @@ cmh-y := \
        cmh_rng.o \
        cmh_pke_common.o \
        cmh_pke_rsa.o \
-       cmh_pke_ecdsa.o
+       cmh_pke_ecdsa.o \
+       cmh_pke_ecdh.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 939ff5007755..ea0f32b941f5 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -286,6 +286,11 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_pke_ecdsa_register;

+       /* Register PKE ECDH/X25519 kpp */
+       ret = cmh_pke_ecdh_register();
+       if (ret)
+               goto err_pke_ecdh_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -298,6 +303,8 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_pke_ecdh_unregister();
+err_pke_ecdh_register:
        cmh_pke_ecdsa_unregister();
 err_pke_ecdsa_register:
        cmh_pke_rsa_unregister();
@@ -358,6 +365,7 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_pke_ecdh_unregister();
        cmh_pke_ecdsa_unregister();
        cmh_pke_rsa_unregister();
        cmh_ccp_poly_unregister();
diff --git a/drivers/crypto/cmh/cmh_pke_ecdh.c b/drivers/crypto/cmh/cmh_pke_ecdh.c
new file mode 100644
index 000000000000..d8b821cc4217
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_pke_ecdh.c
@@ -0,0 +1,698 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- ECDH / X25519 kpp Driver
+ *
+ * Registers "ecdh-nist-p256", "ecdh-nist-p384", and "curve25519"
+ * kpp algorithms with priority 300.
+ *
+ * - set_secret: decodes private key from kpp_secret + ecdh struct
+ *   (NIST curves) or raw 32-byte scalar (Curve25519).
+ *   Stores in cmh_key_ctx: raw keys written via SYS_REF_TEMP.
+ *   Datastore-referenced keys are only reachable through the ioctl
+ *   path (cmh_mgmt.c).
+ *
+ * - generate_public_key: PKE_CMD_ECDH_KEYGEN -> outputs X coordinate
+ *   (NIST Weierstrass) or full public key (Edwards/Montgomery).
+ *   For NIST curves, we generate X||Y by calling ECDSA_PUBGEN instead,
+ *   matching the kernel ecdh.c pattern that outputs uncompressed X||Y.
+ *
+ * - compute_shared_secret: PKE_CMD_ECDH -> shared secret X coordinate.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <crypto/kpp.h>
+#include <crypto/ecdh.h>
+#include <crypto/internal/kpp.h>
+#include <crypto/internal/ecc.h>
+
+#include "cmh_pke.h"
+#include "cmh_sys.h"
+#include "cmh_sys_abi.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/*
+ * ECDH key format: kpp_secret header + key_size(u16) + key data.
+ * We decode this inline to avoid depending on CONFIG_CRYPTO_ECDH.
+ */
+#define ECDH_KPP_SECRET_MIN_SIZE (sizeof(struct kpp_secret) + sizeof(unsigned short))
+
+struct cmh_ecdh_tfm_ctx {
+       struct cmh_key_ctx key;
+       u32 curve;              /* PKE_CURVE_* */
+       u32 clen;               /* coordinate length in bytes */
+};
+
+static inline struct cmh_ecdh_tfm_ctx *cmh_ecdh_ctx(struct crypto_kpp *tfm)
+{
+       return kpp_tfm_ctx(tfm);
+}
+
+/*
+ * Per-request context for ECDH/X25519 operations.
+ *
+ * generate_public_key: single-phase async VCQ.
+ * compute_shared_secret: 2-phase async VCQ with callback chaining.
+ *   Phase 1: sys_write(sk) + sys_new(ref) + ecdh(peer) + pflush
+ *            -> phase1 callback reads ref, submits Phase 2.
+ *   Phase 2: sys_data(ref, ss_dma) + sys_flush
+ *            -> phase2 callback extracts shared secret, completes req.
+ *
+ * Both phases target the same mbx_idx so the DS reference remains
+ * valid, since DS objects are MBX-scoped.
+ */
+struct cmh_ecdh_reqctx {
+       /* Buffers */
+       u8 *pk_buf;             /* keygen: output public key */
+       u8 *sk_buf;             /* private key copy */
+       u8 *peer_buf;           /* compute: peer public key */
+       u8 *ss_buf;             /* compute: shared secret output */
+       u64 *ref_buf;           /* compute: DS ref from Phase 1 */
+       /* DMA handles */
+       dma_addr_t pk_dma;
+       dma_addr_t sk_dma;
+       dma_addr_t peer_dma;
+       dma_addr_t ss_dma;
+       dma_addr_t ref_dma;
+       /* Sizes and params for Phase 2 re-submit */
+       u32 out_len;            /* keygen: public key size */
+       u32 clen;
+       u32 peer_len;
+       u32 sk_len;
+       u32 dma_swap;
+       int mbx_idx;            /* pinned MBX for Phase 2 */
+};
+
+/*
+ * set_secret: NIST curves decode kpp_secret + u16 key_size + raw scalar.
+ * Curve25519 uses raw 32-byte scalar directly.
+ */
+static int cmh_ecdh_set_secret_nist(struct crypto_kpp *tfm,
+                                   const void *buf, unsigned int len)
+{
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+       const u8 *ptr = buf;
+       struct kpp_secret secret;
+       unsigned short key_size;
+       int ret;
+
+       if (!buf || len < ECDH_KPP_SECRET_MIN_SIZE)
+               return -EINVAL;
+
+       memcpy(&secret, ptr, sizeof(secret));
+       ptr += sizeof(secret);
+
+       if (secret.type != CRYPTO_KPP_SECRET_TYPE_ECDH)
+               return -EINVAL;
+       if (len < secret.len)
+               return -EINVAL;
+
+       memcpy(&key_size, ptr, sizeof(key_size));
+       ptr += sizeof(key_size);
+
+       if (key_size == 0) {
+               /*
+                * key_size == 0: generate a validated random private key.
+                * Uses the kernel ECC library (FIPS 186-5 A.2.2) to ensure
+                * the scalar is in the valid range [2, n-3] for the curve.
+                */
+               u64 priv[ECC_MAX_DIGITS];
+               unsigned int ndigits = ctx->clen / sizeof(u64);
+               unsigned int curve_id;
+               u8 *rnd;
+
+               if (secret.len != ECDH_KPP_SECRET_MIN_SIZE)
+                       return -EINVAL;
+               if (ndigits > ECC_MAX_DIGITS)
+                       return -EINVAL;
+               /* Reject non-limb-aligned clen to prevent ndigits truncation */
+               if (ctx->clen % sizeof(u64))
+                       return -EINVAL;
+
+               if (ctx->curve == PKE_CURVE_P256)
+                       curve_id = ECC_CURVE_NIST_P256;
+               else if (ctx->curve == PKE_CURVE_P384)
+                       curve_id = ECC_CURVE_NIST_P384;
+               else
+                       return -EINVAL;
+
+               ret = ecc_gen_privkey(curve_id, ndigits, priv);
+               if (ret) {
+                       memzero_explicit(priv, sizeof(priv));
+                       return ret;
+               }
+
+               rnd = kmalloc(ctx->clen, GFP_KERNEL);
+               if (!rnd) {
+                       memzero_explicit(priv, sizeof(priv));
+                       return -ENOMEM;
+               }
+
+               /* Convert VLI (native LE-digit-order) to big-endian bytes */
+               ecc_swap_digits(priv, (u64 *)rnd, ndigits);
+               memzero_explicit(priv, sizeof(priv));
+
+               ret = cmh_key_setkey_raw(&ctx->key, rnd, ctx->clen,
+                                        CORE_ID_PKE);
+               kfree_sensitive(rnd);
+               return ret;
+       }
+
+       if (key_size != ctx->clen)
+               return -EINVAL;
+
+       if (secret.len != ECDH_KPP_SECRET_MIN_SIZE + key_size)
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&ctx->key, ptr, key_size, CORE_ID_PKE);
+}
+
+static int cmh_ecdh_set_secret_x25519(struct crypto_kpp *tfm,
+                                     const void *buf, unsigned int len)
+{
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+
+       if (len != pke_curve_clen(PKE_CURVE_25519))
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&ctx->key, buf, len, CORE_ID_PKE);
+}
+
+static void cmh_ecdh_keygen_complete(void *data, int error)
+{
+       struct kpp_request *req = data;
+       struct cmh_ecdh_reqctx *rctx = kpp_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (!cmh_dma_map_error(rctx->sk_dma))
+               cmh_dma_unmap_single(rctx->sk_dma, rctx->sk_len,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->pk_dma))
+               cmh_dma_unmap_single(rctx->pk_dma, rctx->out_len,
+                                    DMA_FROM_DEVICE);
+
+       if (!error) {
+               int nents;
+
+               nents = sg_nents_for_len(req->dst, rctx->out_len);
+               if (nents < 0 ||
+                   sg_copy_from_buffer(req->dst, nents,
+                                       rctx->pk_buf,
+                                       rctx->out_len) != rctx->out_len)
+                       error = -EINVAL;
+               else
+                       req->dst_len = rctx->out_len;
+       }
+
+       kfree_sensitive(rctx->sk_buf);
+       rctx->sk_buf = NULL;
+       kfree(rctx->pk_buf);
+       rctx->pk_buf = NULL;
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * generate_public_key: For NIST ECDH, use ECDH_KEYGEN which outputs
+ * the public key X-coordinate.  But the kernel kpp interface expects
+ * uncompressed X||Y, so we use ECDSA_PUBGEN which gives us (X,Y).
+ * For Curve25519, ECDH_KEYGEN gives us the Montgomery u-coordinate
+ * which is the full public key.
+ */
+static int cmh_ecdh_generate_public_key(struct kpp_request *req)
+{
+       struct crypto_kpp *tfm = crypto_kpp_reqtfm(req);
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+       struct cmh_ecdh_reqctx *rctx = kpp_request_ctx(req);
+       u32 clen = ctx->clen;
+       bool is_25519 = (ctx->curve == PKE_CURVE_25519);
+       u32 out_len = is_25519 ? clen : 2 * clen;
+       struct vcq_cmd vcq[PKE_VCQ_CMDS_MAX];
+       struct core_dispatch dd;
+       u32 swap, dma_swap;
+       int ret, idx;
+       gfp_t gfp;
+
+       if (ctx->key.mode != CMH_KEY_RAW)
+               return -EINVAL;
+       if (req->dst_len < out_len)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->out_len = out_len;
+       rctx->sk_len = ctx->key.raw.len;
+       rctx->pk_dma = DMA_MAPPING_ERROR;
+       rctx->sk_dma = DMA_MAPPING_ERROR;
+
+       rctx->pk_buf = kzalloc(out_len, gfp);
+       if (!rctx->pk_buf)
+               return -ENOMEM;
+
+       rctx->pk_dma = cmh_dma_map_single(rctx->pk_buf, out_len,
+                                         DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->pk_dma)) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       swap = PKE_SWAP_FLAGS;
+       dma_swap = pke_swap_flags(ctx->curve);
+
+       dd = cmh_core_select_instance(CMH_CORE_PKE);
+
+       rctx->sk_buf = kmemdup(ctx->key.raw.data, ctx->key.raw.len, gfp);
+       if (!rctx->sk_buf) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+       rctx->sk_dma = cmh_dma_map_single(rctx->sk_buf, ctx->key.raw.len,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->sk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], PKE_VCQ_CMDS_MAX);
+       idx = 1;
+       vcq_add_sys_write(&vcq[idx], SYS_REF_TEMP, rctx->sk_dma,
+                         SYS_REF_NONE, ctx->key.raw.len,
+                         ctx->key.raw.sys_type);
+       vcq[idx].id |= dma_swap;
+       idx++;
+       if (is_25519)
+               vcq_add_pke_ecdh_keygen(&vcq[idx++], dd.core_id, ctx->curve,
+                                       clen, rctx->pk_dma, SYS_REF_TEMP,
+                                       swap);
+       else
+               vcq_add_pke_ecdsa_pubgen(&vcq[idx++], dd.core_id,
+                                        ctx->curve, clen, rctx->pk_dma,
+                                        SYS_REF_TEMP, swap);
+       vcq_add_pke_flush(&vcq[idx++], dd.core_id);
+
+       ret = cmh_tm_submit_async(vcq, PKE_VCQ_CMDS_MAX, 1, dd.mbx_idx,
+                                 cmh_ecdh_keygen_complete, req,
+                                 !!(req->base.flags &
+                                    CRYPTO_TFM_REQ_MAY_BACKLOG), 0);
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (!ret)
+               return -EINPROGRESS;
+
+out_unmap:
+       if (!cmh_dma_map_error(rctx->sk_dma))
+               cmh_dma_unmap_single(rctx->sk_dma, ctx->key.raw.len,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->pk_dma))
+               cmh_dma_unmap_single(rctx->pk_dma, out_len,
+                                    DMA_FROM_DEVICE);
+
+out_free:
+       kfree_sensitive(rctx->sk_buf);
+       kfree(rctx->pk_buf);
+       return ret;
+}
+
+static void cmh_ecdh_ss_phase2_complete(void *data, int error)
+{
+       struct kpp_request *req = data;
+       struct cmh_ecdh_reqctx *rctx = kpp_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       if (!cmh_dma_map_error(rctx->ss_dma))
+               cmh_dma_unmap_single(rctx->ss_dma, rctx->clen,
+                                    DMA_FROM_DEVICE);
+
+       if (!error) {
+               int nents;
+
+               nents = sg_nents_for_len(req->dst, rctx->clen);
+               if (nents < 0 ||
+                   sg_copy_from_buffer(req->dst, nents,
+                                       rctx->ss_buf,
+                                       rctx->clen) != rctx->clen)
+                       error = -EINVAL;
+               else
+                       req->dst_len = rctx->clen;
+       }
+
+       kfree(rctx->ref_buf);
+       rctx->ref_buf = NULL;
+       kfree_sensitive(rctx->ss_buf);
+       rctx->ss_buf = NULL;
+       cmh_complete(&req->base, error);
+}
+
+static void cmh_ecdh_ss_phase1_complete(void *data, int error)
+{
+       struct kpp_request *req = data;
+       struct cmh_ecdh_reqctx *rctx = kpp_request_ctx(req);
+       struct vcq_cmd vcq[3];
+       int ret;
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       /* Phase 1-only resources: sk, peer -- always clean up */
+       if (!cmh_dma_map_error(rctx->sk_dma))
+               cmh_dma_unmap_single(rctx->sk_dma, rctx->sk_len,
+                                    DMA_TO_DEVICE);
+       kfree_sensitive(rctx->sk_buf);
+       rctx->sk_buf = NULL;
+
+       if (!cmh_dma_map_error(rctx->peer_dma))
+               cmh_dma_unmap_single(rctx->peer_dma, rctx->peer_len,
+                                    DMA_TO_DEVICE);
+       kfree(rctx->peer_buf);
+       rctx->peer_buf = NULL;
+
+       if (error)
+               goto out_cleanup;
+
+       /* Read the DS reference written by Phase 1 */
+       cmh_dma_sync_for_cpu(rctx->ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       cmh_dma_unmap_single(rctx->ref_dma, sizeof(u64), DMA_FROM_DEVICE);
+       rctx->ref_dma = DMA_MAPPING_ERROR;
+
+       /* Phase 2: extract shared secret from DS */
+       vcq_set_header(&vcq[0], 3);
+       vcq_add_sys_data(&vcq[1], *rctx->ref_buf, rctx->ss_dma,
+                        rctx->clen);
+       vcq[1].id |= rctx->dma_swap;
+       vcq_add_sys_flush(&vcq[2]);
+
+       ret = cmh_tm_submit_async(vcq, 3, 1, rctx->mbx_idx,
+                                 cmh_ecdh_ss_phase2_complete, req,
+                                 true, 0);
+       if (ret == -EBUSY || !ret)
+               return;
+
+       error = ret;
+
+out_cleanup:
+       if (!cmh_dma_map_error(rctx->ref_dma))
+               cmh_dma_unmap_single(rctx->ref_dma, sizeof(u64),
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(rctx->ss_dma))
+               cmh_dma_unmap_single(rctx->ss_dma, rctx->clen,
+                                    DMA_FROM_DEVICE);
+       kfree(rctx->ref_buf);
+       rctx->ref_buf = NULL;
+       kfree_sensitive(rctx->ss_buf);
+       rctx->ss_buf = NULL;
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * compute_shared_secret: PKE_CMD_ECDH.
+ *
+ * req->src = peer public key (X||Y for NIST, raw 32B for Curve25519).
+ * Output = shared secret X coordinate (clen bytes).
+ *
+ * The CMH ECDH command stores the shared secret in a DS object,
+ * not directly to DMA.  We create a DS slot with SYS_CMD_NEW,
+ * reference it via SYS_REF_LAST, then extract the result with a
+ * second VCQ submission using SYS_CMD_DATA with the actual ref.
+ */
+static int cmh_ecdh_compute_shared_secret(struct kpp_request *req)
+{
+       struct crypto_kpp *tfm = crypto_kpp_reqtfm(req);
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+       struct cmh_ecdh_reqctx *rctx = kpp_request_ctx(req);
+       u32 clen = ctx->clen;
+       bool is_25519 = (ctx->curve == PKE_CURVE_25519);
+       u32 peer_len = is_25519 ? clen : 2 * clen;
+       u32 ss_type = SYS_TYPE_SET(SYS_TYPE_FLAG_PT, CORE_ID_PKE);
+       struct vcq_cmd vcq[5];
+       struct core_dispatch dd;
+       u32 swap, dma_swap;
+       int ret, idx, nents;
+       gfp_t gfp;
+
+       if (ctx->key.mode != CMH_KEY_RAW)
+               return -EINVAL;
+       if (req->src_len < peer_len || req->dst_len < clen)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->clen = clen;
+       rctx->peer_len = peer_len;
+       rctx->sk_len = ctx->key.raw.len;
+       rctx->pk_dma = DMA_MAPPING_ERROR;
+       rctx->sk_dma = DMA_MAPPING_ERROR;
+       rctx->peer_dma = DMA_MAPPING_ERROR;
+       rctx->ss_dma = DMA_MAPPING_ERROR;
+       rctx->ref_dma = DMA_MAPPING_ERROR;
+
+       rctx->peer_buf = kmalloc(peer_len, gfp);
+       rctx->ss_buf = kzalloc(clen, gfp);
+       rctx->ref_buf = kzalloc_obj(u64, gfp);
+       if (!rctx->peer_buf || !rctx->ss_buf || !rctx->ref_buf) {
+               ret = -ENOMEM;
+               goto out_free;
+       }
+
+       nents = sg_nents_for_len(req->src, peer_len);
+       if (nents < 0 ||
+           sg_pcopy_to_buffer(req->src, nents, rctx->peer_buf,
+                              peer_len, 0) != peer_len) {
+               ret = -EINVAL;
+               goto out_free;
+       }
+
+       rctx->peer_dma = cmh_dma_map_single(rctx->peer_buf, peer_len,
+                                           DMA_TO_DEVICE);
+       rctx->ss_dma = cmh_dma_map_single(rctx->ss_buf, clen,
+                                         DMA_FROM_DEVICE);
+       rctx->ref_dma = cmh_dma_map_single(rctx->ref_buf, sizeof(u64),
+                                          DMA_FROM_DEVICE);
+
+       if (cmh_dma_map_error(rctx->peer_dma) ||
+           cmh_dma_map_error(rctx->ss_dma) ||
+           cmh_dma_map_error(rctx->ref_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       swap = PKE_SWAP_FLAGS;
+       dma_swap = pke_swap_flags(ctx->curve);
+       rctx->dma_swap = dma_swap;
+
+       dd = cmh_core_select_instance(CMH_CORE_PKE);
+       rctx->mbx_idx = dd.mbx_idx;
+
+       rctx->sk_buf = kmemdup(ctx->key.raw.data, ctx->key.raw.len, gfp);
+       if (!rctx->sk_buf) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+       rctx->sk_dma = cmh_dma_map_single(rctx->sk_buf, ctx->key.raw.len,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->sk_dma)) {
+               ret = -ENOMEM;
+               goto out_unmap;
+       }
+
+       vcq_set_header(&vcq[0], 5);
+       idx = 1;
+       vcq_add_sys_write(&vcq[idx], SYS_REF_TEMP, rctx->sk_dma,
+                         SYS_REF_NONE, ctx->key.raw.len,
+                         ctx->key.raw.sys_type);
+       vcq[idx].id |= dma_swap;
+       idx++;
+       vcq_add_sys_new(&vcq[idx++], 0, rctx->ref_dma, clen);
+       vcq_add_pke_ecdh(&vcq[idx++], dd.core_id, ctx->curve, clen,
+                        clen, ss_type, rctx->peer_dma,
+                        SYS_REF_TEMP, SYS_REF_LAST, swap);
+       vcq_add_pke_flush(&vcq[idx++], dd.core_id);
+
+       ret = cmh_tm_submit_async(vcq, 5, 1, dd.mbx_idx,
+                                 cmh_ecdh_ss_phase1_complete, req,
+                                 !!(req->base.flags &
+                                    CRYPTO_TFM_REQ_MAY_BACKLOG), 0);
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (!ret)
+               return -EINPROGRESS;
+
+out_unmap:
+       if (!cmh_dma_map_error(rctx->sk_dma))
+               cmh_dma_unmap_single(rctx->sk_dma, rctx->sk_len,
+                                    DMA_TO_DEVICE);
+       if (!cmh_dma_map_error(rctx->ss_dma))
+               cmh_dma_unmap_single(rctx->ss_dma, clen,
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(rctx->ref_dma))
+               cmh_dma_unmap_single(rctx->ref_dma, sizeof(u64),
+                                    DMA_FROM_DEVICE);
+       if (!cmh_dma_map_error(rctx->peer_dma))
+               cmh_dma_unmap_single(rctx->peer_dma, peer_len,
+                                    DMA_TO_DEVICE);
+
+out_free:
+       kfree_sensitive(rctx->sk_buf);
+       kfree(rctx->ref_buf);
+       kfree_sensitive(rctx->ss_buf);
+       kfree(rctx->peer_buf);
+       return ret;
+}
+
+static unsigned int cmh_ecdh_max_size(struct crypto_kpp *tfm)
+{
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+
+       /* Max output = X||Y for generate_public_key (NIST) */
+       return 2 * ctx->clen;
+}
+
+static unsigned int cmh_x25519_max_size(struct crypto_kpp *tfm)
+{
+       return pke_curve_clen(PKE_CURVE_25519); /* single coordinate */
+}
+
+static int cmh_ecdh_p256_init(struct crypto_kpp *tfm)
+{
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->curve = PKE_CURVE_P256;
+       ctx->clen = pke_curve_clen(PKE_CURVE_P256);
+       tfm->reqsize = sizeof(struct cmh_ecdh_reqctx);
+       return 0;
+}
+
+static int cmh_ecdh_p384_init(struct crypto_kpp *tfm)
+{
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->curve = PKE_CURVE_P384;
+       ctx->clen = pke_curve_clen(PKE_CURVE_P384);
+       tfm->reqsize = sizeof(struct cmh_ecdh_reqctx);
+       return 0;
+}
+
+static int cmh_x25519_init(struct crypto_kpp *tfm)
+{
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+
+       memset(ctx, 0, sizeof(*ctx));
+       ctx->curve = PKE_CURVE_25519;
+       ctx->clen = pke_curve_clen(PKE_CURVE_25519);
+       tfm->reqsize = sizeof(struct cmh_ecdh_reqctx);
+       return 0;
+}
+
+static void cmh_ecdh_exit(struct crypto_kpp *tfm)
+{
+       struct cmh_ecdh_tfm_ctx *ctx = cmh_ecdh_ctx(tfm);
+
+       cmh_key_destroy(&ctx->key);
+}
+
+static struct kpp_alg cmh_ecdh_algs[] = {
+       {
+               .set_secret             = cmh_ecdh_set_secret_nist,
+               .generate_public_key    = cmh_ecdh_generate_public_key,
+               .compute_shared_secret  = cmh_ecdh_compute_shared_secret,
+               .max_size               = cmh_ecdh_max_size,
+               .init                   = cmh_ecdh_p256_init,
+               .exit                   = cmh_ecdh_exit,
+               .base = {
+                       .cra_name         = "ecdh-nist-p256",
+                       .cra_driver_name  = "cri-cmh-ecdh-nist-p256",
+                       .cra_priority     = 300,
+                       .cra_flags        = CRYPTO_ALG_ASYNC,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_ecdh_tfm_ctx),
+               },
+       },
+       {
+               .set_secret             = cmh_ecdh_set_secret_nist,
+               .generate_public_key    = cmh_ecdh_generate_public_key,
+               .compute_shared_secret  = cmh_ecdh_compute_shared_secret,
+               .max_size               = cmh_ecdh_max_size,
+               .init                   = cmh_ecdh_p384_init,
+               .exit                   = cmh_ecdh_exit,
+               .base = {
+                       .cra_name         = "ecdh-nist-p384",
+                       .cra_driver_name  = "cri-cmh-ecdh-nist-p384",
+                       .cra_priority     = 300,
+                       .cra_flags        = CRYPTO_ALG_ASYNC,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_ecdh_tfm_ctx),
+               },
+       },
+       {
+               .set_secret             = cmh_ecdh_set_secret_x25519,
+               .generate_public_key    = cmh_ecdh_generate_public_key,
+               .compute_shared_secret  = cmh_ecdh_compute_shared_secret,
+               .max_size               = cmh_x25519_max_size,
+               .init                   = cmh_x25519_init,
+               .exit                   = cmh_ecdh_exit,
+               .base = {
+                       .cra_name         = "curve25519",
+                       .cra_driver_name  = "cri-cmh-curve25519",
+                       .cra_priority     = 300,
+                       .cra_flags        = CRYPTO_ALG_ASYNC,
+                       .cra_module       = THIS_MODULE,
+                       .cra_ctxsize      = sizeof(struct cmh_ecdh_tfm_ctx),
+               },
+       },
+};
+
+/**
+ * cmh_pke_ecdh_register() - Register ECDH kpp algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_pke_ecdh_register(void)
+{
+       int ret, i;
+
+       for (i = 0; i < ARRAY_SIZE(cmh_ecdh_algs); i++) {
+               ret = crypto_register_kpp(&cmh_ecdh_algs[i]);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh: failed to register %s (%d)\n",
+                               cmh_ecdh_algs[i].base.cra_name, ret);
+                       goto err_unregister;
+               }
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_kpp(&cmh_ecdh_algs[i]);
+       return ret;
+}
+
+/**
+ * cmh_pke_ecdh_unregister() - Unregister ECDH kpp algorithms from the crypto framework
+ */
+void cmh_pke_ecdh_unregister(void)
+{
+       int i = ARRAY_SIZE(cmh_ecdh_algs);
+
+       while (i--)
+               crypto_unregister_kpp(&cmh_ecdh_algs[i]);
+}
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 19/19] MAINTAINERS: add Rambus CryptoManager Hub (CMH)
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Add MAINTAINERS entry for the CRI CryptoManager Hub (CMH) hardware
crypto accelerator driver under drivers/crypto/cmh/.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 MAINTAINERS | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 90034eb7874e..ecb389795e3d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6797,6 +6797,25 @@ F:       kernel/cred.c
 F:     rust/kernel/cred.rs
 F:     Documentation/security/credentials.rst

+CRI CRYPTOMANAGER HUB (CMH) HARDWARE CRYPTO ACCELERATOR
+M:     Alex Ousherovitch <aousherovitch@rambus.com>
+M:     Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
+R:     Joel Wittenauer <Joel.Wittenauer@cryptography.com>
+R:     Thi Nguyen <thin@rambus.com>
+L:     linux-crypto@vger.kernel.org
+L:     sipsupport@rambus.com (moderated for non-subscribers)
+S:     Maintained
+T:     git https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
+F:     Documentation/ABI/testing/cmh-mgmt
+F:     Documentation/ABI/testing/debugfs-driver-cmh
+F:     Documentation/ABI/testing/sysfs-driver-cmh
+F:     Documentation/crypto/device_drivers/cmh.rst
+F:     Documentation/devicetree/bindings/crypto/cri,cmh.yaml
+F:     Documentation/userspace-api/ioctl/cmh_mgmt.rst
+F:     drivers/crypto/cmh/
+F:     include/uapi/linux/cmh_mgmt_ioctl.h
+F:     tools/testing/selftests/drivers/crypto/cmh/
+
 INTEL CRPS COMMON REDUNDANT PSU DRIVER
 M:     Ninad Palsule <ninad@linux.ibm.com>
 L:     linux-hwmon@vger.kernel.org
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox