Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [RFC] Null Namespaces
From: Al Viro @ 2026-06-25  1:10 UTC (permalink / raw)
  To: John Ericson
  Cc: Andy Lutomirski, Li Chen, Cong Wang, Christian Brauner,
	linux-arch, LKML, linux-fsdevel, linux-api, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Kees Cook,
	Sergei Zimmerman, Farid Zakaria
In-Reply-To: <103524f8-1658-41df-88e9-cf49c628a721@app.fastmail.com>

On Wed, Jun 24, 2026 at 07:53:53PM -0400, John Ericson wrote:
> I wanted to discuss a bit about each type of namespace to indicate that
> this is a concept I think works across the board --- it wouldn't be such
> a good solution for the process spawning API if it was only applicable
> to some but not all namespace types. But the truth is that I have
> thought about the FS cases the most, as I think you have picked up on.
> 
> If there is interest in landing
> 
>   1. null CWD
>   2. null root fs
>   3. null mount namespace
> 
> in isolation, and then returning to the other namespaces to iron out
> their details, that would be fantastic. It would be much nicer for me to
> get some momentum that way, without having to design everything all at
> once first before getting to implement anything.

Please, start with explaining what, in your opinion, a mount namespace _is_,
and where does "mount X is attached at path P relative to mount Y" belong.

What's the fundamental difference between CWD and any open descriptor for
a directory?  Why does it make sense to ban the former, but allow the
equivalents done via the latter?

^ permalink raw reply

* Re: [RFC] Null Namespaces
From: John Ericson @ 2026-06-25  3:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Li Chen, Cong Wang, Christian Brauner,
	linux-arch, LKML, linux-fsdevel, linux-api, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Kees Cook,
	Sergei Zimmerman, Farid Zakaria
In-Reply-To: <20260625011023.GM2636677@ZenIV>

Ah, I started replying to your first email, but this is better, this
gets to the heart of the matter. Please don't mind me responding to your
two questions in reverse.

On Wed, Jun 24, 2026, at 9:10 PM, Al Viro wrote:
> What's the fundamental difference between CWD and any open descriptor
> for a directory?  Why does it make sense to ban the former, but allow
> the equivalents done via the latter?

Yes! These two notions are very close --- but that's the *problem*, not
a reason to not care about the existence of the CWD and root FS. I want
to get rid of CWD in my processes not because it is fundamentally
different (it isn't), but because it is superfluous.

If one is capability-minded like me, it's a bad mistake that we ever had
this "working directory" notion to begin with, and yet another example
of the folks at Bell Labs sticking something in the kernel that was
really only needed by the shell, and that could have just been done in
userland.

The current working directory, roughly, is *just* some global state
holding a directory file descriptor. But I don't want that global state.
If I am writing my userland program (that is not a shell), I would not
create the global variable. I do not appreciate the fact that the kernel
foists that state upon me whether I like it or not.

Now obviously we cannot have a giant breaking change removing the notion
of a current working directory altogether. But we can allow individual
processes which don't want it to opt out, and that is what nulling out
these fields (and updating the path resolution code to cope with that)
allows.

There is no loss of expressive power doing this, because one can (and
should!) just use the `*at` and file descriptors. But there is, however,
the imposition of discipline. The programmer (or coding agent) is
encouraged to do everything with file descriptors rather than path
concatenations etc., because they need to use `*at` anyways, and then
voilà, without browbeating anyone in security seminars or code review, a
bunch of TOCTOU issues disappear simply because doing the right thing is
now the path of least resistance.

> Please, start with explaining what, in your opinion, a mount namespace
> _is_, and where does "mount X is attached at path P relative to mount
> Y" belong.

Let's take a pathological example:

- Process A has `/foo` bind-mounted at `/bar/foo`

- Process B has `/bar` without that bind mount, and `/foo` mounted at
  `/baz/foo`, as is possible because it is in a different mount
  namespace.

If A opens `/bar/foo`, and sends it over (via socket) to B, and then B
does `openat(recv_fd, "..")`, B will get `/bar`, not `/baz`. This is
because `..` is resolved according to the mount referenced in the open
file. (This is, by the way, very good! Directory file descriptors would
be perilous to use if this were not the case!)

The moral of the story is that "mount X is attached at path P relative
to mount Y" is information accessed in the mounts themselves (maybe via
their containing mount namespace, per the `mnt_ns` field, or maybe not,
I am not sure, but it is immaterial). In contrast, the mount namespace
of the *opening* task (`current->nsproxy->mnt_ns`, and current is B)
doesn't matter at all for this purpose.

I am not on a crusade against `struct mnt_namespace` in general; I am
just trying to null out `(struct nsproxy)::mnt_ns` in particular. (This
is just as I am not on a crusade against `struct path`, just `root` and
`pwd` of `struct fs_struct`.)

These days, `current->nsproxy->mnt_ns` is, to me, first and foremost,
there for the legacy mount API. Again, just like our CWD example above,
this is mostly just global state.

The new mount API drastically [^1] reduces the need for it, since it
allows referring to mounts explicitly via file descriptors. That's OK!
The argument is the same as the above --- I am *not* trying to limit
what can be done if one has all the right files open with the right
perms. I am just trying to limit what works out of the box --- to reduce
the default set of privileges, *especially* where the resources involved
are implicit and/or stateful.

[^1]: It doesn't *quite* eliminate the need for `nsproxy->mnt_ns`
    entirely, since (as I understand it, from reading the `move_mount`
    man page) it is still used for some authorization checks, since
    `O_PATH` file descriptors do not grant privileges other than mere
    discoverability. But that's a problem that could be solved later
    with an `O_MOUNT` option analogous to `O_RDONLY` or `O_WRONLY`. In
    the meantime, I am perfectly happy if my processes with null mount
    namespaces get `move_mount` permission errors.

^ permalink raw reply

* [ANNOUNCE/CFP] Linux Plumbers 2026 Containers and Checkpoint/Restore Microconference
From: Kamalesh Babulal @ 2026-06-25  3:55 UTC (permalink / raw)
  To: cgroups, containers, bpf, linux-fsdevel, linux-api,
	linux-integrity, criu, lxc-devel, fuse-devel
  Cc: Stéphane Graber, Mike Rapoport, Christian Brauner,
	Michal Koutný, Adrian Reber, Kamalesh Babulal

Hello,

We are pleased to announce the Call for Proposals for the Containers and
Checkpoint/Restore Microconference[0] at Linux Plumbers Conference 2026,
taking place in Prague, Czechia, from October 5 to 7, 2026.

This microconference will focus on current work and open problems in
containers, checkpoint/restore, kernel interfaces, and related userspace
tooling. We hope to bring together people working on container
runtimes, CRIU, init systems, distributions, orchestration systems, and
the kernel interfaces that make these pieces work together.

Topics of interest include, but are not limited to:

  - New VFS and syscall interfaces relevant to containers, including
    work around idmapped mounts

  - Closing remaining gaps between cgroup v1 and cgroup v2, and making
    migration easier

  - The growing role of eBPF in container runtimes, observability,
    policy enforcement, and checkpoint/restore

  - Mechanisms for mediating and intercepting increasingly complex
    system calls

  - Lowering the barriers to practical use of user namespaces

  - Attestation, measurement, and other approaches to establishing
    container integrity

  - Better resource-control interfaces and limits for containerized
    workloads

  - Keeping CRIU working smoothly on modern Linux distributions

  - Checkpoint/restore support for GPUs and similar accelerators

  - Restoring FUSE daemons and related userspace services

  - Handling restartable sequences correctly during checkpoint and
    restore

  - Support for newly added kernel features and interfaces

  - Shadow stack support on x86 and arm64

  - Support for madvise(MADV_GUARD_INSTALL) and mseal()

  - pidfd-based checkpoint/restore, including process-exit information

We are also interested in additional topics that may emerge as work
evolves over the coming months. Ongoing development work, operational
experience, unresolved kernel API questions, and cross-project
coordination topics are all welcome.

We encourage you to bring open questions, unresolved issues, or problems
that would benefit from input from others. In your proposal, please
include a short description of the topic, what you would like to
discuss, and what kind of feedback or collaboration would help move the
work forward.

Allocated time per session is expected to be between 15 and 30 minutes.

Please submit proposals through the LPC 2026 abstracts page by August 7:

        https://lpc.events/event/20/abstracts/

Linux Plumbers Conference 2026 will be a hybrid event. While in-person
presentation is preferred to help keep the sessions smooth and
interactive, remote presentation will also be available.

We are looking forward to your proposals and to seeing you in Prague.

[0] https://lpc.events/event/20/contributions/2332/

Thanks,
Containers & Checkpoint/Restart Microconference Team

^ permalink raw reply

* [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches

This patchset is for VFS. Of course, it depends on my previous vmsplice
patchset ( https://lore.kernel.org/all/20260531010107.1953702-1-safinaskar@gmail.com/ ).

I fix some problems in my previous patchset.

1. Fix problem with CLASS(fd, f)(fd). See first patch in this patchset
for details. This is probably not so important, but I fix it anyway.

2. Change "unsigned long" back to "int". See second patch for details.
Again, this is probably not important, but I want to fix this anyway.

3. Fix that LTP vmsplice01 bug.

4. libfuse relies on sharing vmsplice behavior. So we detect particular
combination of flags to pipe2(2) and vmsplice(2) and return -EINVAL.
This forces libfuse to fail back to non-vmsplice code path.
I. e. we fix libfuse-related regression [1].
I did debian code search for regex "vmsplice.*SPLICE_F_NONBLOCK" and
I found no other packages with this particular combination of flags
except for fuse itself. (Okay, other packages are fio and stress-ng,
but these are merely testers.) So, I think this is okay to return
EINVAL here, breakage will be minimal.

5. Set FMODE_NOWAIT for named FIFOs. CRIU relies on ability to do
vmsplice(SPLICE_F_NONBLOCK) on named FIFOs. So, I fix this CRIU-related
regression [2]. But there is another CRIU-related regression, which I do not
fix [3]: CRIU behavior in splice mode becomes so slow that splice mode
becomes useless. I personally still believe that removing vmsplice is
right thing to do. Other option is doing nothing. Yet another option
is to implement some deprecation period [3]. Let other developers
decide.

See patches for details.

Please, run that LTP vmsplice01 test again.

Notes:

- I want to repeat: I change behavior around SPLICE_F_NONBLOCK.
Previously, vmsplice ignored whether pipe itself was opened as
non-blocking file. Now it is not ignored. And in my opinion
new behavior is better.
- vmsplice(2) now is in fs/read_write.c . It is very similar to
preadv2 and pwritev2 now, so I think it belongs to fs/read_write.c now.

Please, review this patchset carefully. I'm still new contributor.
In particular, please, review that do-while loop, I'm not sure I did
everything right.

Tested in Qemu.

[1] https://lore.kernel.org/all/CAJnrk1Y9egYizkx1H9K0cqxSYuB+7vLvQbV7Tf4C5eHFqnnC-A@mail.gmail.com/
[2] https://lore.kernel.org/all/CANaxB-zK5q=Xw6UZTmeFtXsDZjUsPkFk=p485m-wtNTBnf4hgg@mail.gmail.com/
[3] https://lore.kernel.org/all/CANaxB-xUrLQYGiRJZc4Boi+KX=0TJSWymErNovANVko20fMDVA@mail.gmail.com/

v1: https://lore.kernel.org/lkml/20260606061031.3744880-1-safinaskar@gmail.com/

Changes since v1: fix fuse-related and CRIU-related regressions (see above).

Askar Safin (7):
  vmsplice: open-code do_writev and do_readv
  vmsplice: change argument type back to "int"
  splice: turn wait_for_space flags argument into bool
  pipe: move wait_for_space to fs/pipe.c and rename it
  vmsplice: make sure we don't wait after writing some data
  vmsplice: return -EINVAL for particular combination of flags
  pipe: set FMODE_NOWAIT for named FIFOs

 fs/pipe.c                 | 23 +++++++++++++
 fs/read_write.c           | 71 +++++++++++++++++++++++++++++++++++----
 fs/splice.c               | 19 +----------
 include/linux/pipe_fs_i.h |  2 ++
 include/linux/syscalls.h  |  2 +-
 5 files changed, 91 insertions(+), 26 deletions(-)

base-commit: 8d86fcfc2857d64af85f5c87c193c25655c970af
-- 
2.47.3

^ permalink raw reply

* [PATCH v2 1/7] vmsplice: open-code do_writev and do_readv
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

My previous vmsplice patch did the following mistake: I did
"CLASS(fd, f)(fd)", then did some checks on resulting "struct file",
then passed numeric (!) file descriptor to a function.

This is somewhat okay in this particular case, but I still think
this is code smell, so I fix this by open-coding do_writev and do_readv.

Also I insert a comment to warn other developers to keep
do_writev and do_readv in sync with vmsplice(2).

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 fs/read_write.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 1e5444f4d..e224e7cb8 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1070,6 +1070,7 @@ static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 			unsigned long vlen, rwf_t flags)
 {
+	/* All future changes to this function should be kept in sync with vmsplice(2). */
 	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
@@ -1093,6 +1094,7 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
 			 unsigned long vlen, rwf_t flags)
 {
+	/* All future changes to this function should be kept in sync with vmsplice(2). */
 	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
@@ -1226,14 +1228,24 @@ SYSCALL_DEFINE4(vmsplice, unsigned long, fd, const struct iovec __user *, vec,
 	if (fd_empty(f))
 		return -EBADF;
 
-	/* We do do_writev/do_readv, so it is okay to pass "false" here */
+	/* We do vfs_writev/vfs_readv, so it is okay to pass "false" here */
 	if (!get_pipe_info(fd_file(f), /* for_splice = */ false))
 		return -EBADF;
 
-	if (fd_file(f)->f_mode & FMODE_WRITE)
-		return do_writev(fd, vec, vlen, (flags & SPLICE_F_NONBLOCK) ? RWF_NOWAIT : 0);
-	else
-		return do_readv(fd, vec, vlen, (flags & SPLICE_F_NONBLOCK) ? RWF_NOWAIT : 0);
+	if (fd_file(f)->f_mode & FMODE_WRITE) {
+		ssize_t ret = vfs_writev(fd_file(f), vec, vlen, NULL, (flags & SPLICE_F_NONBLOCK) ? RWF_NOWAIT : 0);
+		if (ret > 0)
+			add_wchar(current, ret);
+		inc_syscw(current);
+		return ret;
+	} else {
+		ssize_t ret = vfs_readv(fd_file(f), vec, vlen, NULL, (flags & SPLICE_F_NONBLOCK) ? RWF_NOWAIT : 0);
+
+		if (ret > 0)
+			add_rchar(current, ret);
+		inc_syscr(current);
+		return ret;
+	}
 }
 
 /*
-- 
2.47.3


^ permalink raw reply related

* [PATCH v2 2/7] vmsplice: change argument type back to "int"
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

My previous vmsplice patchset changed vmsplice argument from
"int" to "unsigned long". This may cause problems, so let's
change it back.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 fs/read_write.c          | 2 +-
 include/linux/syscalls.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index e224e7cb8..77487b307 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1218,7 +1218,7 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
 /*
  * Legacy preadv2/pwritev2 wrapper.
  */
-SYSCALL_DEFINE4(vmsplice, unsigned long, fd, const struct iovec __user *, vec,
+SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, vec,
 		unsigned long, vlen, unsigned int, flags)
 {
 	if (unlikely(flags & ~SPLICE_F_ALL))
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a86a88207..46a3ec954 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -514,7 +514,7 @@ asmlinkage long sys_ppoll_time32(struct pollfd __user *, unsigned int,
 			  struct old_timespec32 __user *, const sigset_t __user *,
 			  size_t);
 asmlinkage long sys_signalfd4(int ufd, sigset_t __user *user_mask, size_t sizemask, int flags);
-asmlinkage long sys_vmsplice(unsigned long fd, const struct iovec __user *vec,
+asmlinkage long sys_vmsplice(int fd, const struct iovec __user *vec,
 			     unsigned long vlen, unsigned int flags);
 asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
 			   int fd_out, loff_t __user *off_out,
-- 
2.47.3


^ permalink raw reply related

* [PATCH v2 3/7] splice: turn wait_for_space flags argument into bool
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

I want to do this, because I will move this function to fs/pipe.c.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 fs/splice.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 6ddf7dd72..707db2c2c 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1239,7 +1239,7 @@ ssize_t splice_file_range(struct file *in, loff_t *ppos, struct file *out,
 }
 EXPORT_SYMBOL(splice_file_range);
 
-static int wait_for_space(struct pipe_inode_info *pipe, unsigned flags)
+static int wait_for_space(struct pipe_inode_info *pipe, bool non_block)
 {
 	for (;;) {
 		if (unlikely(!pipe->readers)) {
@@ -1248,7 +1248,7 @@ static int wait_for_space(struct pipe_inode_info *pipe, unsigned flags)
 		}
 		if (!pipe_is_full(pipe))
 			return 0;
-		if (flags & SPLICE_F_NONBLOCK)
+		if (non_block)
 			return -EAGAIN;
 		if (signal_pending(current))
 			return -ERESTARTSYS;
@@ -1268,7 +1268,7 @@ ssize_t splice_file_to_pipe(struct file *in,
 	ssize_t ret;
 
 	pipe_lock(opipe);
-	ret = wait_for_space(opipe, flags);
+	ret = wait_for_space(opipe, flags & SPLICE_F_NONBLOCK);
 	if (!ret)
 		ret = do_splice_read(in, offset, opipe, len, flags);
 	pipe_unlock(opipe);
-- 
2.47.3


^ permalink raw reply related

* [PATCH v2 4/7] pipe: move wait_for_space to fs/pipe.c and rename it
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

This is needed, because I plan to use it in fs/read_write.c.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 fs/pipe.c                 | 17 +++++++++++++++++
 fs/splice.c               | 19 +------------------
 include/linux/pipe_fs_i.h |  2 ++
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 9841648c9..c0ccf21b9 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1451,6 +1451,23 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
 	return ret;
 }
 
+int pipe_wait_for_space(struct pipe_inode_info *pipe, bool non_block)
+{
+	for (;;) {
+		if (unlikely(!pipe->readers)) {
+			send_sig(SIGPIPE, current, 0);
+			return -EPIPE;
+		}
+		if (!pipe_is_full(pipe))
+			return 0;
+		if (non_block)
+			return -EAGAIN;
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+		pipe_wait_writable(pipe);
+	}
+}
+
 static const struct super_operations pipefs_ops = {
 	.destroy_inode = free_inode_nonrcu,
 	.statfs = simple_statfs,
diff --git a/fs/splice.c b/fs/splice.c
index 707db2c2c..d12243d19 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1239,23 +1239,6 @@ ssize_t splice_file_range(struct file *in, loff_t *ppos, struct file *out,
 }
 EXPORT_SYMBOL(splice_file_range);
 
-static int wait_for_space(struct pipe_inode_info *pipe, bool non_block)
-{
-	for (;;) {
-		if (unlikely(!pipe->readers)) {
-			send_sig(SIGPIPE, current, 0);
-			return -EPIPE;
-		}
-		if (!pipe_is_full(pipe))
-			return 0;
-		if (non_block)
-			return -EAGAIN;
-		if (signal_pending(current))
-			return -ERESTARTSYS;
-		pipe_wait_writable(pipe);
-	}
-}
-
 static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,
 			       size_t len, unsigned int flags);
@@ -1268,7 +1251,7 @@ ssize_t splice_file_to_pipe(struct file *in,
 	ssize_t ret;
 
 	pipe_lock(opipe);
-	ret = wait_for_space(opipe, flags & SPLICE_F_NONBLOCK);
+	ret = pipe_wait_for_space(opipe, flags & SPLICE_F_NONBLOCK);
 	if (!ret)
 		ret = do_splice_read(in, offset, opipe, len, flags);
 	pipe_unlock(opipe);
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index a1eeed800..be653625d 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -335,4 +335,6 @@ struct pipe_inode_info *get_pipe_info(struct file *file, bool for_splice);
 int create_pipe_files(struct file **, int);
 unsigned int round_pipe_size(unsigned int size);
 
+int pipe_wait_for_space(struct pipe_inode_info *pipe, bool non_block);
+
 #endif
-- 
2.47.3


^ permalink raw reply related

* [PATCH v2 5/7] vmsplice: make sure we don't wait after writing some data
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

Make sure we don't wait for space in pipe after writing some data.
This is needed for compatibility with previous version of vmsplice.
Found by LTP vmsplice01.
See comments in the code and links below for details.

Link: https://lore.kernel.org/all/20260603-raumfahrt-unmerklich-ertrugen-c4ecae70d5f9@brauner/
Link: https://lore.kernel.org/all/CAHk-=wgV-j-G3d+899Zm1pQ=NaJrddPz=GKcL5Yw5DTUM=GaUw@mail.gmail.com/
Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 fs/read_write.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 77487b307..dbd0debc2 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1221,6 +1221,8 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
 SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, vec,
 		unsigned long, vlen, unsigned int, flags)
 {
+	struct pipe_inode_info *pipe;
+
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
@@ -1229,11 +1231,44 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, vec,
 		return -EBADF;
 
 	/* We do vfs_writev/vfs_readv, so it is okay to pass "false" here */
-	if (!get_pipe_info(fd_file(f), /* for_splice = */ false))
+	pipe = get_pipe_info(fd_file(f), /* for_splice = */ false);
+
+	if (!pipe)
 		return -EBADF;
 
 	if (fd_file(f)->f_mode & FMODE_WRITE) {
-		ssize_t ret = vfs_writev(fd_file(f), vec, vlen, NULL, (flags & SPLICE_F_NONBLOCK) ? RWF_NOWAIT : 0);
+		/*
+		 * When writing to the pipe, previous implementation of vmsplice
+		 * first waited for space in the pipe to appear
+		 * (depending on whether SPLICE_F_NONBLOCK was passed),
+		 * then did unconditional non-blocking write to the pipe.
+		 *
+		 * This differs from what pwritev2 does.
+		 *
+		 * For compatibility we do the same thing previous
+		 * implementation did.
+		 *
+		 * We lock the pipe, do pipe_wait_for_space, then unlock
+		 * the pipe, and then do vfs_writev. vfs_writev internally
+		 * locks the pipe again. This may cause TOCTOU: when we
+		 * do vfs_writev, the pipe may become full again. So we
+		 * do a loop.
+		 */
+
+		bool non_block = (flags & SPLICE_F_NONBLOCK) || (fd_file(f)->f_flags & O_NONBLOCK);
+		ssize_t ret;
+
+		do {
+			pipe_lock(pipe);
+			ret = pipe_wait_for_space(pipe, non_block);
+			pipe_unlock(pipe);
+
+			if (ret < 0)
+				break;
+
+			ret = vfs_writev(fd_file(f), vec, vlen, NULL, RWF_NOWAIT);
+		} while (!non_block && ret == -EAGAIN);
+
 		if (ret > 0)
 			add_wchar(current, ret);
 		inc_syscw(current);
-- 
2.47.3


^ permalink raw reply related

* [PATCH v2 6/7] vmsplice: return -EINVAL for particular combination of flags
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

See comment for details.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 fs/read_write.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index dbd0debc2..b1f71b142 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1258,6 +1258,16 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, vec,
 		bool non_block = (flags & SPLICE_F_NONBLOCK) || (fd_file(f)->f_flags & O_NONBLOCK);
 		ssize_t ret;
 
+		/*
+		 * libfuse relies on sharing vmsplice behavior.
+		 * So we detect particular combination of flags to
+		 * pipe2(2) and vmsplice(2) and return -EINVAL.
+		 * This forces libfuse to fail back to non-vmsplice
+		 * code path.
+		 */
+		if ((flags == SPLICE_F_NONBLOCK) && (fd_file(f)->f_flags & O_NONBLOCK))
+			return -EINVAL;
+
 		do {
 			pipe_lock(pipe);
 			ret = pipe_wait_for_space(pipe, non_block);
-- 
2.47.3


^ permalink raw reply related

* [PATCH v2 7/7] pipe: set FMODE_NOWAIT for named FIFOs
From: Askar Safin @ 2026-06-25  8:34 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, David Hildenbrand, Pedro Falcato,
	Miklos Szeredi, Andy Lutomirski, Collin Funk, David Laight,
	Stefan Metzmacher, The 8472, Willy Tarreau, Joanne Koong,
	Val Packett, Andrei Vagin, patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

CRIU relies on ability to do vmsplice(SPLICE_F_NONBLOCK) on named FIFOs.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 fs/pipe.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index c0ccf21b9..a8e9b4459 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1156,6 +1156,12 @@ static int fifo_open(struct inode *inode, struct file *filp)
 	/* We can only do regular read/write on fifos */
 	stream_open(inode, filp);
 
+	/*
+	 * CRIU relies on ability to do vmsplice(SPLICE_F_NONBLOCK)
+	 * on named FIFOs.
+	 */
+	filp->f_mode |= FMODE_NOWAIT;
+
 	switch (filp->f_mode & (FMODE_READ | FMODE_WRITE)) {
 	case FMODE_READ:
 	/*
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: David Hildenbrand (Arm) @ 2026-06-25  8:46 UTC (permalink / raw)
  To: Askar Safin, linux-fsdevel, Christian Brauner, Alexander Viro,
	Jan Kara
  Cc: linux-kernel, linux-mm, linux-api, netdev, fuse-devel,
	Linus Torvalds, Matthew Wilcox, Jens Axboe, Christoph Hellwig,
	David Howells, Andrew Morton, Pedro Falcato, Miklos Szeredi,
	Andy Lutomirski, Collin Funk, David Laight, Stefan Metzmacher,
	The 8472, Willy Tarreau, Joanne Koong, Val Packett, Andrei Vagin,
	patches
In-Reply-To: <20260625083409.3769242-1-safinaskar@gmail.com>

On 6/25/26 10:34, Askar Safin wrote:
> This patchset is for VFS. Of course, it depends on my previous vmsplice
> patchset ( https://lore.kernel.org/all/20260531010107.1953702-1-safinaskar@gmail.com/ ).
> 
> I fix some problems in my previous patchset.

I think we concluded that we cannot rip out vmsplice that way at this point, and
I suspect that Christian will drop that topic branch from -next after -rc1.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Askar Safin @ 2026-06-25  8:53 UTC (permalink / raw)
  To: avagin
  Cc: akpm, alexander, axboe, bernd, brauner, criu, david, dhowells,
	fuse-devel, hch, jack, joannelkoong, linux-api, linux-fsdevel,
	linux-kernel, linux-mm, miklos, netdev, patches, pfalcato,
	rostedt, safinaskar, torvalds, val, viro, willy
In-Reply-To: <CANaxB-xUrLQYGiRJZc4Boi+KX=0TJSWymErNovANVko20fMDVA@mail.gmail.com>

Andrei Vagin <avagin@gmail.com>:
> On Wed, Jun 24, 2026 at 12:12 AM Askar Safin <safinaskar@gmail.com> wrote:
> > Does CRIU actually rely on ability to do SPLICE_F_NONBLOCK vmsplice into
> > named fifos? Or this is merely a test?
> 
> Yes, it does.

I. e. CRIU relies on that named fifo behavior? Okay, I just sent
v2 version of my fixes. The patchset contains fix for named fifos.

Please, test that this fixes that named fifo problem.

> I already explained that this isn't just a perfomance degradation, it
> actually breaks the pre-dump mechanism in CRIU. vmsplice is invoked from
> our parasite code within the context of a user process, where execution
> speed is critical. A heavy performance penalty completely invalidates
> the pre-dump logic, making the feature useless.

This is very unfortunate. But I still want to remove vmsplice.

> At a minimum, we may need to consider a deprecation plan where vmsplice
> with SPLICE_F_GIFT triggers a warning for a few releases before these
> changes are applied. Alternatively, we could introduce the proposed
> behavior alongside a sysctl to fall back to the old behavior and explicitly
> state that this fallback path will be completely deprecated in a future kernel
> version.

My patches change not only SPLICE_F_GIFT behavior, but also vmsplice
behavior in general.

Let other developers decide what to do (i. e. do nothing, remove
vmsplice now or implement some deprecation scheme).

-- 
Askar Safin

^ permalink raw reply

* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Askar Safin @ 2026-06-25  9:03 UTC (permalink / raw)
  To: val
  Cc: akpm, axboe, brauner, david, dhowells, fuse-devel, hch, jack,
	joannelkoong, linux-api, linux-fsdevel, linux-kernel, linux-mm,
	miklos, netdev, patches, pfalcato, rostedt, safinaskar, torvalds,
	viro, willy
In-Reply-To: <83f05c55-efba-4bf5-abfe-d2ab0819e904@packett.cool>

Val Packett <val@packett.cool>:
> speaking of fuse_dev_splice……_write actually, this series has broken 
> xdg-document-portal!
> 
> https://github.com/flatpak/xdg-desktop-portal/issues/2026
> 
> Specifically what happens is that the EINVAL is returned due to oh.len 
> != nbytes:
> 
> fuse_dev_do_write: oh.len 16400 != nbytes 15526
> 
> (where 16400 == 16384 (read len) + 16, 15526 == 15510 (file len) + 16)
> 
> After reverting the series, there is no error because oh.len 
> becomes 15526 too.

Please, test v2 version of my fixes:
https://lore.kernel.org/lkml/20260625083409.3769242-1-safinaskar@gmail.com/ .

This should fix this bug.

-- 
Askar Safin

^ permalink raw reply

* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: Askar Safin @ 2026-06-25 10:11 UTC (permalink / raw)
  To: david
  Cc: akpm, avagin, axboe, brauner, collin.funk1, david.laight.linux,
	dhowells, fuse-devel, hch, jack, joannelkoong, kernel, linux-api,
	linux-fsdevel, linux-kernel, linux-mm, luto, metze, miklos,
	netdev, patches, pfalcato, safinaskar, torvalds, val, viro, w,
	willy
In-Reply-To: <89ea76b3-e956-4232-8180-ee3929adf905@kernel.org>

"David Hildenbrand (Arm)" <david@kernel.org>:
> I think we concluded that we cannot rip out vmsplice that way at this point, and
> I suspect that Christian will drop that topic branch from -next after -rc1.

I think my patches still have a chance.

On fuse regression: I return EINVAL for particular combination of
flags used by fuse. This causes fuse to fail-back to non-vmsplice
code path. I did Debian code search, and I found none significant
packages, which use same combination of options.

So I think I was able to deal with fuse regression.

On CRIU named fifo "Not supported" regression: it is handled.

On CRIU major performance regression: it is NOT handled. But I still
think my approach is right. (See cover letter for details.)

(I wrote about all these in cover letter for this v2 patchset.)

So all regressions found so far (except for CRIU major performance
regression) are handled.

Other option is to introduce some deprecation period (as
suggested by Andrei Vagin). I can do this, if needed.

-- 
Askar Safin

^ permalink raw reply

* Re: [PATCH v2 0/7] vmsplice: fix some problems in my previous vmsplice patchset
From: David Hildenbrand (Arm) @ 2026-06-25 10:35 UTC (permalink / raw)
  To: Askar Safin
  Cc: akpm, avagin, axboe, brauner, collin.funk1, david.laight.linux,
	dhowells, fuse-devel, hch, jack, joannelkoong, kernel, linux-api,
	linux-fsdevel, linux-kernel, linux-mm, luto, metze, miklos,
	netdev, patches, pfalcato, torvalds, val, viro, w, willy
In-Reply-To: <20260625101132.3859505-1-safinaskar@gmail.com>

On 6/25/26 12:11, Askar Safin wrote:
> "David Hildenbrand (Arm)" <david@kernel.org>:
>> I think we concluded that we cannot rip out vmsplice that way at this point, and
>> I suspect that Christian will drop that topic branch from -next after -rc1.
> 
> I think my patches still have a chance.

I talked to Christian and it doesn't sound like it.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
From: Zhang Yi @ 2026-06-25 11:16 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung), Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, linux-fsdevel, linux-api
In-Reply-To: <557b2e5c-7c65-48de-87a9-6fba21eca99f@huaweicloud.com>

On 6/18/2026 11:22 AM, Zhang Yi wrote:
> On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:
>> On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
>>> [API questions for Zhang and -fsdevel/ -api below)
>>>
>>>> +	unsigned int		blksize = i_blocksize(inode);
>>>> +	loff_t			offset_aligned = round_down(offset, blksize);
>>>
>>> I think this actually needs to found up instead of rounding down.
>>>
>>>> +	/*
>>>> +	 * Zero the tail of the old EOF block and any space up to the new
>>>> +	 * offset.
>>>> +	 * In the usual truncate path, xfs_falloc_setsize takes care of
>>>> +	 * zeroing those blocks.
>>>> +	 */
>>>> +	if (offset_aligned > old_size) {
>>>> +		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
>>>> +		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
>>>> +				NULL, &did_zero);
>>>> +		if (error)
>>>> +			return error;
>>>> +	}
>>>
>>> ... then this will properly zero from the old i_size to the first block
>>> boundary after the old size.
>>
>> Hmm, right now we do this:
>>
>> |----------|----------|----------|
>>     ^      ^     ^    ^
>>     |      |     |    |
>>  old_size  |   offset |
>>            |          |
>> 	off_rd       off_ru
>>
>> At the moment, we zero out old_size to off_rd and pass offset to
>> xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.
>>
>> What you are proposing is to zero out old_size to off_ru, and pass
>> off_ru to xfs_alloc_file_space. I don't exactly understand the
>> difference.
> 
> IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
> 'offset' and 'end' are not block-size aligned, then:
> 
> 1) if the two blocks straddling the boundaries have not yet been allocated,
>    or allocated as unwritten, we should round outward the allocation range
>    and zero out all allocated blocks, including those two boundary blocks.
> 2) if the blocks at the boundaries are already in the written state — which
>    can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
>    should be careful here: we should only zero the ranges [offset, offset_ru)
>    and [end_rd, end) for the boundary blocks, leaving the already-written
>    portions of the boundary blocks intact.
> 
I've thought about this more and feel that we need to add one more
scenario here:

3) If the blocks at the boundaries are in dirty unwritten or in delay
   allocated state, it should be handled the same way as scenario 2.
   Additionally, before returning, we need to flush these two boundary
   blocks that have been partially zeroed, to ensure that after
   FALLOC_FL_WRITE_ZEROES, all ranges are in the written state.

What do you think?

Best Regards,
Yi.
















^ permalink raw reply

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
From: Christoph Hellwig @ 2026-06-25 12:12 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Pankaj Raghav (Samsung), Christoph Hellwig, Pankaj Raghav,
	linux-xfs, bfoster, lukas, Darrick J . Wong, dgc, gost.dev,
	andres, kundan.kumar, cem, linux-fsdevel, linux-api
In-Reply-To: <0b7f1a4f-da1c-4297-8099-98d738070ab7@huaweicloud.com>

On Thu, Jun 25, 2026 at 07:16:47PM +0800, Zhang Yi wrote:
> I've thought about this more and feel that we need to add one more
> scenario here:
> 
> 3) If the blocks at the boundaries are in dirty unwritten or in delay
>    allocated state, it should be handled the same way as scenario 2.
>    Additionally, before returning, we need to flush these two boundary
>    blocks that have been partially zeroed, to ensure that after
>    FALLOC_FL_WRITE_ZEROES, all ranges are in the written state.
> 
> What do you think?

Yes.  Can you send an xfstest addition the exercises these corner
cases?


^ permalink raw reply

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
From: Pankaj Raghav @ 2026-06-25 12:16 UTC (permalink / raw)
  To: Christoph Hellwig, Zhang Yi
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, linux-fsdevel, linux-api
In-Reply-To: <aj0bNstaTvy3HOkh@infradead.org>



On 6/25/2026 2:12 PM, Christoph Hellwig wrote:
> On Thu, Jun 25, 2026 at 07:16:47PM +0800, Zhang Yi wrote:
>> I've thought about this more and feel that we need to add one more
>> scenario here:
>>
>> 3) If the blocks at the boundaries are in dirty unwritten or in delay
>>     allocated state, it should be handled the same way as scenario 2.
>>     Additionally, before returning, we need to flush these two boundary
>>     blocks that have been partially zeroed, to ensure that after
>>     FALLOC_FL_WRITE_ZEROES, all ranges are in the written state.
>>
>> What do you think?
> 
> Yes.  Can you send an xfstest addition the exercises these corner
> cases?
> 

I have already created one that does a subset of these cases. I will add more
and send it to the list soon!

--
Pankaj

^ permalink raw reply

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
From: Christoph Hellwig @ 2026-06-25 12:19 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: Christoph Hellwig, Zhang Yi, Pankaj Raghav, linux-xfs, bfoster,
	lukas, Darrick J . Wong, dgc, gost.dev, andres, kundan.kumar, cem,
	linux-fsdevel, linux-api
In-Reply-To: <b86337f2-5aa5-4905-9610-35fafed2018b@linux.dev>

On Thu, Jun 25, 2026 at 02:16:11PM +0200, Pankaj Raghav wrote:
> I have already created one that does a subset of these cases. I will add more
> and send it to the list soon!

Cool.  I'll wait with reviewing the patches until we have the tests
and agree on all these scenarios.  (and because I'm totally overloaded
and am looking for excuses to defer work :))

^ permalink raw reply

* Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
From: Zhang Yi @ 2026-06-25 12:29 UTC (permalink / raw)
  To: Pankaj Raghav, Christoph Hellwig
  Cc: Pankaj Raghav, linux-xfs, bfoster, lukas, Darrick J . Wong, dgc,
	gost.dev, andres, kundan.kumar, cem, linux-fsdevel, linux-api
In-Reply-To: <b86337f2-5aa5-4905-9610-35fafed2018b@linux.dev>

On 6/25/2026 8:16 PM, Pankaj Raghav wrote:
> 
> 
> On 6/25/2026 2:12 PM, Christoph Hellwig wrote:
>> On Thu, Jun 25, 2026 at 07:16:47PM +0800, Zhang Yi wrote:
>>> I've thought about this more and feel that we need to add one more
>>> scenario here:
>>>
>>> 3) If the blocks at the boundaries are in dirty unwritten or in delay
>>>     allocated state, it should be handled the same way as scenario 2.
>>>     Additionally, before returning, we need to flush these two boundary
>>>     blocks that have been partially zeroed, to ensure that after
>>>     FALLOC_FL_WRITE_ZEROES, all ranges are in the written state.
>>>
>>> What do you think?
>>
>> Yes.  Can you send an xfstest addition the exercises these corner
>> cases?
>>
> 
> I have already created one that does a subset of these cases. I will add more
> and send it to the list soon!
> 
> -- 
> Pankaj

Ah, thanks for working on this! I'm also fixing the corresponding
issues in ext4. After you post your test case, I'll incorporate it and
help with the review as well.

Thanks,
Yi.



^ permalink raw reply

* Re: [RFC] Null Namespaces
From: Andy Lutomirski @ 2026-06-25 15:51 UTC (permalink / raw)
  To: John Ericson
  Cc: Al Viro, Andy Lutomirski, Li Chen, Cong Wang, Christian Brauner,
	linux-arch, LKML, linux-fsdevel, linux-api, Arnd Bergmann,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Kees Cook,
	Sergei Zimmerman, Farid Zakaria
In-Reply-To: <a75a9b82-a15b-4893-8f92-62b62664ea83@app.fastmail.com>

On Wed, Jun 24, 2026 at 8:41 PM John Ericson <mail@johnericson.me> wrote:
>
> Ah, I started replying to your first email, but this is better, this
> gets to the heart of the matter. Please don't mind me responding to your
> two questions in reverse.
>
> On Wed, Jun 24, 2026, at 9:10 PM, Al Viro wrote:
> > What's the fundamental difference between CWD and any open descriptor
> > for a directory?  Why does it make sense to ban the former, but allow
> > the equivalents done via the latter?
>
> Yes! These two notions are very close --- but that's the *problem*, not
> a reason to not care about the existence of the CWD and root FS. I want
> to get rid of CWD in my processes not because it is fundamentally
> different (it isn't), but because it is superfluous.
>
> If one is capability-minded like me, it's a bad mistake that we ever had
> this "working directory" notion to begin with, and yet another example
> of the folks at Bell Labs sticking something in the kernel that was
> really only needed by the shell, and that could have just been done in
> userland.
>
> The current working directory, roughly, is *just* some global state
> holding a directory file descriptor. But I don't want that global state.
> If I am writing my userland program (that is not a shell), I would not
> create the global variable. I do not appreciate the fact that the kernel
> foists that state upon me whether I like it or not.
>
> Now obviously we cannot have a giant breaking change removing the notion
> of a current working directory altogether. But we can allow individual
> processes which don't want it to opt out, and that is what nulling out
> these fields (and updating the path resolution code to cope with that)
> allows.
>
> There is no loss of expressive power doing this, because one can (and
> should!) just use the `*at` and file descriptors. But there is, however,
> the imposition of discipline. The programmer (or coding agent) is
> encouraged to do everything with file descriptors rather than path
> concatenations etc., because they need to use `*at` anyways, and then
> voilà, without browbeating anyone in security seminars or code review, a
> bunch of TOCTOU issues disappear simply because doing the right thing is
> now the path of least resistance.
>
> > Please, start with explaining what, in your opinion, a mount namespace
> > _is_, and where does "mount X is attached at path P relative to mount
> > Y" belong.
>
> Let's take a pathological example:
>
> - Process A has `/foo` bind-mounted at `/bar/foo`
>
> - Process B has `/bar` without that bind mount, and `/foo` mounted at
>   `/baz/foo`, as is possible because it is in a different mount
>   namespace.
>
> If A opens `/bar/foo`, and sends it over (via socket) to B, and then B
> does `openat(recv_fd, "..")`, B will get `/bar`, not `/baz`. This is
> because `..` is resolved according to the mount referenced in the open
> file. (This is, by the way, very good! Directory file descriptors would
> be perilous to use if this were not the case!)
>
> The moral of the story is that "mount X is attached at path P relative
> to mount Y" is information accessed in the mounts themselves (maybe via
> their containing mount namespace, per the `mnt_ns` field, or maybe not,
> I am not sure, but it is immaterial). In contrast, the mount namespace
> of the *opening* task (`current->nsproxy->mnt_ns`, and current is B)
> doesn't matter at all for this purpose.

It's sort of a combination -- read the data structures :)  Other than
the propagation part, they're really not that bad.

In any event, I think this discussion is sort of immaterial to the
proposed API change.  No one is about to remove the concept of a mount
namespace.  But maybe it makes sense to have a way to have a task that
doesn't actually belong to a mount namespace.  A mount namespace is
certainly going to exist.

There will definitely be subtleties.  For example, what happens if a
task with "no mount namespace" tries to do OPEN_TREE_CLONE?  In some
logical sense it ought to work but it ought to be impossible to
actually mount the resulting tree anywhere, but this risks running
afoul of all kinds of checks.  Maybe you get a whole new mount
namespace (that does not become your current mnt_ns) if you
OPEN_TREE_CLONE?

This stuff is complex and it probably makes more sense to keep changes simple.

^ permalink raw reply

* [PATCH 00/19] crypto: cmh - add CRI CryptoManager Hub driver
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen

From: Alex Ousherovitch <aousherovitch@rambus.com>

crypto: cmh - add CRI CryptoManager Hub hardware crypto accelerator

This series adds a driver for the CRI CryptoManager Hub (CMH), a
hardware cryptographic accelerator IP from Cryptography Research at
Rambus Inc. (https://www.rambus.com/cryptographyresearch/).
CMH provides a broad set of symmetric, asymmetric, and post-quantum
cryptographic algorithms accelerated in hardware, accessed via a
mailbox-based Virtual Command Queue (VCQ) interface.

The hardware is a platform device matched via device tree
(compatible = "cri,cmh").  It exposes a single MMIO register region
(SIC) with per-mailbox doorbell, status, and command registers.
Each mailbox has DMA-coherent queue memory for VCQ command
submission and completion.

Driver architecture:

  In-kernel users                       /dev/cmh_mgmt (ioctl)
  (dm-crypt, IPsec, kTLS, fscrypt)      (key management)
       |                                        |
       v                                        v
  +----------------------------------------------------+
  |        Kernel Crypto API + hwrng (72 total)        |
  |   ahash | skcipher | aead | akcipher | sig | kpp   |
  +----------------------------------------------------+
       |                                           |
       v                                           v
  +------------------+    +------------------------+
  | Transaction Mgr  |--->| Key / Mgmt subsystem   |
  | (kthread, CMQ)   |    | (datastore, ioctl ops) |
  +------------------+    +------------------------+
       |
       v
  +------------------+     +-------------------+
  | MQI (VCQ pack,   |---->| Response Handler  |
  |  DMA map, submit)|     | (threaded IRQ,    |
  +------------------+     |  watchdog, unmap) |
       |                   +-------------------+
       v                          ^
  +-----------+              +-----------+
  | Hardware  |--- IRQ ----->| Hardware  |
  | (mailbox) |              | (mailbox) |
  +-----------+              +-----------+

The transaction manager runs as a dedicated kthread that pulls
requests from a central command queue, packs VCQ entries, maps DMA
buffers, and submits to the least-loaded mailbox.  Completion is
handled by per-mailbox threaded IRQs.  The driver returns
-EINPROGRESS for async crypto requests and supports the
CRYPTO_TFM_REQ_MAY_BACKLOG flag for queue-full backpressure.

Registered algorithms (72 total):

  Type       Count  Algorithms
  ---------  -----  --------------------------------------------------
  ahash         15  SHA-{224,256,384,512}, SHA3-{224,256,384,512},
                     SHAKE-{128,256}, cSHAKE-{128,256},
                     KMAC-{128,256}, SM3
  ahash(HMAC)    8  HMAC-SHA-{224,256,384,512},
                     HMAC-SHA3-{224,256,384,512}
  ahash(MAC)     4  CMAC(AES), CMAC(SM4), XCBC(SM4), Poly1305
  skcipher      11  AES-{ECB,CBC,CTR,CFB,XTS},
                     SM4-{ECB,CBC,CTR,CFB,XTS}, ChaCha20
  aead           6  AES-{GCM,CCM}, SM4-{GCM,CCM},
                     rfc7539(chacha20,poly1305),
                     rfc7539esp(chacha20,poly1305)
  akcipher       1  RSA (2048--4096 bit; 512/1024 legacy/test)
  sig           23  ECDSA P-{256,384,521}, SM2 (verify-only),
                     ML-DSA-{44,65,87},
                     SLH-DSA (12 parameter sets),
                     LMS, LMS-HSS, XMSS, XMSS-MT
  kpp            3  ECDH P-{256,384}, X25519
  hwrng          1  DRBG-backed /dev/hwrng

Ioctl-only algorithms (not registered with the crypto API at all):
  - EdDSA (Ed25519, Ed448): sign and verify
  - ML-KEM (ML-KEM-512/768/1024): no standard kernel KEM API exists

The driver also exposes /dev/cmh_mgmt, a misc device providing 44
ioctl commands.  Relative to the in-kernel crypto API these fall into
two groups; the distinction matters because some commands name the
same primitives the driver also registers, and that overlap is
deliberate and bounded:

(1) Operations with no crypto API representation - the large
    majority.  The crypto API has no transform type or verb for
    these, so a character device is the only available UAPI:
      - hardware key lifecycle: create, import, export, derive,
        destroy, enumerate (keystore CRUD) - no keystore verb
      - KIC key derivation (HKDF, AES-CMAC-KDF, DKEK)
      - asymmetric key generation (RSA, EC, EdDSA, ML-DSA, SLH-DSA)
        and public-key derivation - the crypto API has no keygen verb
      - ML-KEM encapsulate/decapsulate - no kernel KEM API exists
      - SM2 encrypt/decrypt and key exchange (multi-step GM/T 0003)
      - EdDSA sign/verify - not registered with the crypto API
      - EAC Chip Authentication and DRBG (re)configuration

(2) Hardware-held-key operations on algorithms that ARE also
    registered (RSA decrypt, ECDSA/ML-DSA/SLH-DSA sign, ECDH).  These
    name the same primitives as the registered akcipher/sig/kpp
    transforms, but the crypto API's set_priv_key()/set_secret()
    accept only raw key bytes supplied by the caller; they cannot
    reference a private key that is generated inside, and never
    leaves, the hardware datastore - the central security property of
    this device.  The ioctl path keeps the private key
    hardware-resident, while the registered transforms serve raw-key
    in-kernel users.  The two paths are complementary, not redundant.

The device requires CAP_SYS_ADMIN.

/dev/cmh_mgmt is built conditionally on CONFIG_CRYPTO_DEV_CMH_MGMT
(default n); when disabled the ioctl interface is absent while all
kernel crypto API algorithms remain registered.

The ML-DSA sig algorithms are registered at priority 5001.  The
kernel's crypto/mldsa.c registers at priority 5000 with verify-only
(sign returns -EOPNOTSUPP).  Our driver provides full HW-accelerated
sign + verify, so the higher priority ensures the hardware
implementation is preferred when the driver is loaded.

Power management uses DEFINE_SIMPLE_DEV_PM_OPS.  On suspend the
transaction manager drains in-flight requests (configurable 10s
timeout, returns -ECANCELED on timeout), stops the kthread, and
masks IRQs.  On resume it re-verifies SIC/boot status and restarts
the kthread.

Dependencies:
  - Kernel 7.1+ (based on Herbert Xu's cryptodev-2.6 tree, 7.1.0-rc2)
  - sig_alg backend (upstream since 6.13)
  - CRYPTO_AHASH_REQ_VIRT (native support, no fallback needed)
  - CMH eSW loaded independently by hardware before driver probe

The driver registers all algorithms through the standard in-kernel
crypto API; in-kernel users (dm-crypt, fscrypt, IPsec, etc.) consume
them directly.  Key provisioning and hardware-held-key operations are
exposed to user space via /dev/cmh_mgmt ioctls.

Public hardware documentation:
  Product brief: https://go.rambus.com/ch-7xx-and-cc-7xx-product-brief
  No public datasheets are currently available.  The driver was
  developed against the CRI CryptoManager Hub Hardware Reference
  Manual (Rambus Inc. confidential).  Detailed hardware reference is
  available under NDA from Rambus Inc.; contact the maintainers listed
  in MAINTAINERS for access during review.

Tested on RISC-V and ARM64 QEMU emulation with the CMH hardware
model (QEMU TCG, 512 MiB RAM).  Also exercised on Xilinx VMK180
FPGA board with real CMH IP.

  - testmgr: 41 CMH algorithm registrations matched by upstream
    test vectors, all pass; 30 names report "No test for" (PQC
    families, KMAC, cSHAKE - no upstream vectors yet).
  - kselftest tools/testing/selftests/drivers/crypto/cmh:
    6 pass, 0 fail.

checkpatch.pl --strict: 0 errors, 0 warnings, 0 checks on all
files (the only output is the expected per-file "does MAINTAINERS
need updating?" reminder, satisfied by the MAINTAINERS patch).
sparse (C=2): 0 warnings.
W=1 -Werror: clean.
make dt_binding_check: clean (dtschema validates the
cri,cmh.yaml binding).

Tested with the following debug options enabled simultaneously
(submit-checklist "Test your code" item 1):
  CONFIG_PROVE_LOCKING, CONFIG_PROVE_RCU, CONFIG_DEBUG_LOCK_ALLOC,
  CONFIG_DEBUG_OBJECTS_RCU_HEAD, CONFIG_SLUB_DEBUG,
  CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_SPINLOCK,
  CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_ATOMIC_SLEEP.
  Result: no lockdep warnings, no ODEBUG splats, no slab corruption.

Additionally tested (separate passes - mutually exclusive configs):
  - CONFIG_KASAN + CONFIG_UBSAN + CONFIG_DEBUG_KMEMLEAK + CONFIG_KFENCE:
    no sanitizer findings; KMEMLEAK scan reports 0 unreferenced objects.
  - CONFIG_KCSAN (arm64; riscv64 lacks HAVE_ARCH_KCSAN):
    0 data-race reports attributed to the driver.

Stack usage: worst-case under 1 KB on both riscv64 and arm64
(scripts/checkstack.pl).  Hardware command buffers live in
per-request context (heap-allocated by the crypto framework).

Alex Ousherovitch (19):
  dt-bindings: crypto: add Rambus CryptoManager Hub
  crypto: cmh - add core platform driver
  crypto: cmh - add key provisioning and management
  crypto: cmh - add SHA-2/SHA-3/SHAKE ahash
  crypto: cmh - add HMAC ahash
  crypto: cmh - add CSHAKE/KMAC ahash
  crypto: cmh - add SM3 ahash
  crypto: cmh - add AES skcipher/aead/cmac
  crypto: cmh - add SM4 skcipher/aead/cmac/xcbc
  crypto: cmh - add ChaCha20-Poly1305
  crypto: cmh - add DRBG hwrng
  crypto: cmh - add RSA akcipher
  crypto: cmh - add ECDSA/SM2 sig
  crypto: cmh - add ECDH/X25519 kpp
  crypto: cmh - add ML-KEM/ML-DSA (QSE)
  crypto: cmh - add SLH-DSA/LMS/XMSS (HCQ)
  Documentation: ioctl: add CMH ioctl documentation and register 'J'
  selftests: crypto: cmh - add kselftest for management ioctl
  MAINTAINERS: add Rambus CryptoManager Hub (CMH)

base-commit: 6ea0ce3a19f9c37a014099e2b0a46b27fa164564
--
2.43.7

** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply

* [PATCH 01/19] dt-bindings: crypto: add Rambus CryptoManager Hub
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Add device tree binding schema for the CRI CryptoManager Hub (CMH)
hardware crypto accelerator.  The binding covers the parent SoC-level
node with register region, interrupt, DMA properties, and per-core
child nodes identified by compatible string and unit address.

Register the 'cri' vendor prefix for Cryptography Research, Inc.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 .../devicetree/bindings/crypto/cri,cmh.yaml   | 222 ++++++++++++++++++
 .../devicetree/bindings/vendor-prefixes.yaml  |   2 +
 2 files changed, 224 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/cri,cmh.yaml

diff --git a/Documentation/devicetree/bindings/crypto/cri,cmh.yaml b/Documentation/devicetree/bindings/crypto/cri,cmh.yaml
new file mode 100644
index 000000000000..db41132e0591
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/cri,cmh.yaml
@@ -0,0 +1,222 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/crypto/cri,cmh.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: CRI CryptoManager Hub (CMH) Hardware Crypto Accelerator
+
+maintainers:
+  - Alex Ousherovitch <aousherovitch@rambus.com>
+  - Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
+  - Joel Wittenauer <Joel.Wittenauer@cryptography.com>
+
+description: |
+  The CRI CryptoManager Hub (CMH) is a hardware cryptographic accelerator accessed
+  via a mailbox-based VCQ (Virtual Command Queue) interface.  The host
+  writes VCQ command sequences into per-mailbox DMA queue buffers and
+  rings a doorbell; the CMH eSW processes them and signals completion
+  via interrupt.
+
+  Supported algorithm families: SHA-2, SHA-3, SM3, AES, SM4,
+  ChaCha20-Poly1305, RSA, ECDSA, EdDSA, ECDH, SM2, ML-KEM, ML-DSA,
+  SLH-DSA, LMS, XMSS, DRBG.
+
+properties:
+  compatible:
+    const: cri,cmh
+
+  reg:
+    maxItems: 1
+    description:
+      SIC (System Interface Controller) MMIO region.  Mailbox instance
+      registers are at offsets N * 0x1000 within this region.
+
+  interrupts:
+    minItems: 1
+    maxItems: 64
+    description:
+      Per-mailbox completion/error interrupts from the CryptoManager Hub,
+      matching the real CMH ch_sys_interrupt_mbx[N-1:0] topology.
+      Entry i corresponds to MBX instance i.  The driver maps each
+      configured mailbox (cri,mbx-instances) to its DT interrupt
+      index and registers a separate threaded IRQ handler per MBX.
+
+  interrupt-names:
+    minItems: 1
+    maxItems: 64
+    items:
+      pattern: '^mbx[0-9]+$'
+    description:
+      Names for each mailbox interrupt, matching the interrupts array.
+      Format is "mbxN" where N is the mailbox instance index.
+
+  cri,mbx-instances:
+    $ref: /schemas/types.yaml#/definitions/uint32-array
+    minItems: 1
+    maxItems: 64
+    description:
+      Array of 0-based mailbox instance indices to configure.
+      Each index N maps to register offset N * 0x1000 within the
+      SIC region.  If absent, defaults to instances 0 and 1.
+
+  cri,mbx-slots-log2:
+    $ref: /schemas/types.yaml#/definitions/uint32-array
+    minItems: 1
+    maxItems: 64
+    description:
+      Per-mailbox slot count as log2.  Valid range 1..15.
+      Array length must match cri,mbx-instances.
+      Default is 5 (32 slots).
+
+  cri,mbx-strides-log2:
+    $ref: /schemas/types.yaml#/definitions/uint32-array
+    minItems: 1
+    maxItems: 64
+    description:
+      Per-mailbox stride (bytes per slot) as log2.  Valid range 7..10.
+      Array length must match cri,mbx-instances.
+      Default is 7 (128 bytes per slot).
+
+  "#address-cells":
+    const: 1
+
+  "#size-cells":
+    const: 0
+
+patternProperties:
+  "^(hc|aes|sm4|sm3|hcq|qse|pke|drbg|ccp)@[0-9a-f]+$":
+    type: object
+    description:
+      Per-core-type child nodes.  Each child represents one crypto core
+      instance available in the hardware.  The driver enumerates these at
+      probe to discover which algorithm families are present.
+
+    properties:
+      reg:
+        maxItems: 1
+        description:
+          Hardware core ID for this core type (e.g. 0x02 for HC, 0x03 for AES).
+          Must match the CORE_ID_* values defined by the CMH hardware.
+
+      cri,mbx:
+        $ref: /schemas/types.yaml#/definitions/uint32
+        description:
+          Pin this core instance to a specific mailbox instance index.
+          Multiple child nodes of the same core type may each specify a
+          different cri,mbx value to spread instances across mailboxes.
+          When absent, the driver auto-assigns a mailbox via round-robin
+          across the instances listed in cri,mbx-instances.
+
+    required:
+      - reg
+
+    additionalProperties: false
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - "#address-cells"
+  - "#size-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+    soc {
+        #address-cells = <2>;
+        #size-cells = <2>;
+
+        crypto@a4800000 {
+            compatible = "cri,cmh";
+            reg = <0x0 0xa4800000 0x0 0x41000>;
+            interrupts = <1 2>;
+            interrupt-names = "mbx0", "mbx1";
+            cri,mbx-instances = <0 1>;
+            cri,mbx-slots-log2 = <5 5>;
+            cri,mbx-strides-log2 = <7 7>;
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            hc@2 {
+                reg = <0x02>;
+            };
+
+            aes@3 {
+                reg = <0x03>;
+            };
+
+            sm4@4 {
+                reg = <0x04>;
+            };
+
+            sm3@5 {
+                reg = <0x05>;
+            };
+
+            hcq@8 {
+                reg = <0x08>;
+            };
+
+            qse@9 {
+                reg = <0x09>;
+            };
+
+            pke@a {
+                reg = <0x0a>;
+                cri,mbx = <1>;
+            };
+
+            drbg@f {
+                reg = <0x0f>;
+            };
+
+            ccp@18 {
+                reg = <0x18>;
+            };
+        };
+    };
+
+  - |
+    /* Multi-instance: two AES cores on separate MBXes (future eSW support) */
+    soc {
+        #address-cells = <2>;
+        #size-cells = <2>;
+
+        crypto@a4800000 {
+            compatible = "cri,cmh";
+            reg = <0x0 0xa4800000 0x0 0x41000>;
+            interrupts = <1 2>;
+            interrupt-names = "mbx0", "mbx1";
+            cri,mbx-instances = <0 1>;
+            cri,mbx-slots-log2 = <5 5>;
+            cri,mbx-strides-log2 = <7 7>;
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            hc@2 {
+                reg = <0x02>;
+            };
+
+            aes@3 {
+                reg = <0x03>;
+                cri,mbx = <0>;
+            };
+
+            /* Second AES instance at core ID 0x06, pinned to MBX 1 */
+            aes@6 {
+                reg = <0x06>;
+                cri,mbx = <1>;
+            };
+
+            pke@a {
+                reg = <0x0a>;
+                cri,mbx = <1>;
+            };
+
+            drbg@f {
+                reg = <0x0f>;
+            };
+        };
+    };
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index 28784d66ae7b..3402adba3e49 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -375,6 +375,8 @@ patternProperties:
     description: Crane Connectivity Solutions
   "^creative,.*":
     description: Creative Technology Ltd
+  "^cri,.*":
+    description: Cryptography Research, Inc.
   "^crystalfontz,.*":
     description: Crystalfontz America, Inc.
   "^csky,.*":
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

* [PATCH 08/19] crypto: cmh - add AES skcipher/aead/cmac
From: Saravanakrishnan Krishnamoorthy @ 2026-06-25 17:33 UTC (permalink / raw)
  To: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
	Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
	Paul Walmsley, Rob Herring, Saravanakrishnan Krishnamoorthy,
	Shuah Khan
  Cc: Alexandre Ghiti, devicetree, Joel Wittenauer, linux-api,
	linux-crypto, linux-doc, linux-kernel, linux-kselftest,
	linux-riscv, Shuah Khan, sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-1-skrishnamoorthy@rambus.com>

From: Alex Ousherovitch <aousherovitch@rambus.com>

Register AES algorithms using the CMH AES core (core ID 0x03):
- skcipher: AES-ECB, AES-CBC, AES-CTR, AES-XTS, AES-CFB
- aead: AES-GCM, AES-CCM
- ahash: AES-CMAC

Supports 128, 192, and 256-bit keys.  AEAD algorithms handle
associated data, payload, and authentication tag with correct
encrypt/decrypt separation.

Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
Reviewed-by: Thi Nguyen <thin@rambus.com>
---
 drivers/crypto/cmh/Makefile          |   5 +-
 drivers/crypto/cmh/cmh_aes.c         | 736 ++++++++++++++++++++
 drivers/crypto/cmh/cmh_aes_aead.c    | 987 +++++++++++++++++++++++++++
 drivers/crypto/cmh/cmh_aes_cmac.c    | 537 +++++++++++++++
 drivers/crypto/cmh/cmh_main.c        |  25 +
 drivers/crypto/cmh/include/cmh_aes.h |  24 +
 6 files changed, 2313 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cmh/cmh_aes.c
 create mode 100644 drivers/crypto/cmh/cmh_aes_aead.c
 create mode 100644 drivers/crypto/cmh/cmh_aes_cmac.c
 create mode 100644 drivers/crypto/cmh/include/cmh_aes.h

diff --git a/drivers/crypto/cmh/Makefile b/drivers/crypto/cmh/Makefile
index b3018fbcf211..ced8d1748e6c 100644
--- a/drivers/crypto/cmh/Makefile
+++ b/drivers/crypto/cmh/Makefile
@@ -19,7 +19,10 @@ cmh-y := \
        cmh_hmac.o \
        cmh_cshake.o \
        cmh_kmac.o \
-       cmh_sm3.o
+       cmh_sm3.o \
+       cmh_aes.o \
+       cmh_aes_aead.o \
+       cmh_aes_cmac.o

 # Management ioctl device (/dev/cmh_mgmt): key lifecycle, PKE, PQC ioctls.
 cmh-$(CONFIG_CRYPTO_DEV_CMH_MGMT) += \
diff --git a/drivers/crypto/cmh/cmh_aes.c b/drivers/crypto/cmh/cmh_aes.c
new file mode 100644
index 000000000000..b36295763e33
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_aes.c
@@ -0,0 +1,736 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API AES (skcipher) Driver
+ *
+ * Registers skcipher algorithms with the Linux crypto subsystem:
+ *   ecb(aes), cbc(aes), ctr(aes), cfb(aes), xts(aes)
+ *
+ * Uses the CMH AES Core via VCQ commands:
+ *   [SYS_CMD_WRITE] + AES_CMD_INIT + [AES_CMD_UPDATE] + AES_CMD_FINAL
+ *   + VCQ_CMD_FLUSH
+ *
+ * The AES core requires bidirectional DMA -- both input and output
+ * buffers are mapped and passed in a single AES_CMD_FINAL command.
+ *
+ * Raw-key atomicity: SYS_CMD_WRITE to SYS_REF_TEMP is packed into
+ * the same VCQ as AES commands (see cmh_key.h for details).
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/unaligned.h>
+
+#include "cmh_aes.h"
+#include "cmh_vcq.h"
+#include "cmh_aes_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/* Algorithm Table */
+
+struct cmh_aes_alg_info {
+       u32         aes_mode;   /* AES_MODE_* */
+       u32         ivsize;             /* bytes (0 for ECB) */
+       u32         min_keysize;        /* minimum key bytes */
+       u32         max_keysize;        /* maximum key bytes */
+       const char *alg_name;   /* Linux crypto name: "ecb(aes)" */
+       const char *drv_name;   /* driver name: "cri-cmh-ecb-aes" */
+};
+
+static const struct cmh_aes_alg_info aes_algs[] = {
+       { AES_MODE_ECB, 0,                AES_KEYSIZE_128, AES_KEYSIZE_256,
+         "ecb(aes)", "cri-cmh-ecb-aes" },
+       { AES_MODE_CBC, CMH_AES_IV_SIZE,  AES_KEYSIZE_128, AES_KEYSIZE_256,
+         "cbc(aes)", "cri-cmh-cbc-aes" },
+       { AES_MODE_CTR, CMH_AES_IV_SIZE,  AES_KEYSIZE_128, AES_KEYSIZE_256,
+         "ctr(aes)", "cri-cmh-ctr-aes" },
+       { AES_MODE_CFB, CMH_AES_IV_SIZE,  AES_KEYSIZE_128, AES_KEYSIZE_256,
+         "cfb(aes)", "cri-cmh-cfb-aes" },
+       { AES_MODE_XTS, CMH_AES_IV_SIZE,  2 * AES_KEYSIZE_128, 2 * AES_KEYSIZE_256,
+         "xts(aes)", "cri-cmh-xts-aes" },
+};
+
+/* Per-transform context (allocated by crypto framework) */
+
+struct cmh_aes_tfm_ctx {
+       struct cmh_key_ctx key;
+};
+
+/* Per-request context (lives in skcipher_request::__ctx) */
+
+/*
+ * Maximum payload commands:
+ *   [SYS_CMD_WRITE] + AES_CMD_INIT + [AES_CMD_UPDATE] + AES_CMD_FINAL
+ *   + VCQ_CMD_FLUSH = 5
+ * UPDATE is used for XTS data > 2 blocks (see cmh_aes_crypt).
+ */
+#define CMH_AES_MAX_PAYLOAD    5
+#define CMH_AES_MAX_PACKED     (CMH_AES_MAX_PAYLOAD * 2)
+
+struct cmh_aes_reqctx {
+       dma_addr_t in_dma;
+       dma_addr_t out_dma;
+       dma_addr_t iv_dma;
+       dma_addr_t iv2_dma;
+       dma_addr_t key_dma;
+       u8 *in_buf;
+       u8 *out_buf;
+       u8 *iv_buf;
+       u8 *iv2_buf;
+       u32 cryptlen;
+       u32 ivsize;
+       u32 keylen;
+       u32 aes_mode;
+       u32 aes_op;
+       /* CTR counter-wrap split state */
+       u32 ctr_chunk1_len;
+       u32 core_id;
+       s32 target_mbx;
+       u64 key_ref;
+       struct vcq_cmd packed[CMH_AES_MAX_PACKED];
+};
+
+/* VCQ Builders -- AES-specific */
+
+static void vcq_add_aes_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref, u64 iv_dma,
+                            u32 keylen, u32 ivlen, u32 mode, u32 op,
+                            u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_INIT);
+       slot->hwc.aes.cmd_init.key = key_ref;
+       slot->hwc.aes.cmd_init.iv = iv_dma;
+       slot->hwc.aes.cmd_init.keylen = keylen;
+       slot->hwc.aes.cmd_init.ivlen = ivlen;
+       slot->hwc.aes.cmd_init.mode = mode;
+       slot->hwc.aes.cmd_init.op = op;
+       slot->hwc.aes.cmd_init.aadlen = 0;
+       slot->hwc.aes.cmd_init.iolen = iolen;
+       slot->hwc.aes.cmd_init.taglen = 0;
+}
+
+static void vcq_add_aes_update(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                              u64 output_dma, u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_UPDATE);
+       slot->hwc.aes.cmd_update.input = input_dma;
+       slot->hwc.aes.cmd_update.output = output_dma;
+       slot->hwc.aes.cmd_update.iolen = iolen;
+}
+
+static void vcq_add_aes_final(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                             u64 output_dma, u32 iolen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_FINAL);
+       slot->hwc.aes.cmd_final.input = input_dma;
+       slot->hwc.aes.cmd_final.output = output_dma;
+       slot->hwc.aes.cmd_final.iolen = iolen;
+       slot->hwc.aes.cmd_final.tag = 0;
+       slot->hwc.aes.cmd_final.taglen = 0;
+}
+
+/*
+ * We wrap each skcipher_alg with its info pointer in a compound struct,
+ * then use container_of() in cmh_aes_get_info() to recover it.
+ * This is the same pattern used by hash, hmac, cshake, kmac.
+ */
+struct cmh_aes_alg_drv {
+       struct skcipher_alg             alg;
+       const struct cmh_aes_alg_info  *info;
+};
+
+static bool aes_is_stream_mode(u32 mode)
+{
+       return mode == AES_MODE_CTR || mode == AES_MODE_CFB;
+}
+
+/*
+ * Update req->iv after a successful encrypt/decrypt.
+ *
+ * The Linux skcipher API contract requires that req->iv is updated to
+ * reflect the state needed to continue processing in a chained call:
+ *   CBC encrypt: IV <- last ciphertext block
+ *   CBC decrypt: IV <- last ciphertext block of the *input*
+ *   CTR:         IV <- counter incremented by ceil(cryptlen / blocksize)
+ *   CFB:         IV <- last ciphertext block
+ */
+static void cmh_aes_update_iv(struct skcipher_request *req, u32 mode,
+                             u32 op, const u8 *in_buf, const u8 *out_buf)
+{
+       u32 bs = CMH_AES_BLOCK_SIZE;
+       u32 nblocks;
+
+       switch (mode) {
+       case AES_MODE_CBC:
+               if (op == AES_OP_ENCRYPT)
+                       memcpy(req->iv, out_buf + req->cryptlen - bs, bs);
+               else
+                       memcpy(req->iv, in_buf + req->cryptlen - bs, bs);
+               break;
+       case AES_MODE_CTR:
+               /*
+                * Arithmetic big-endian 128-bit counter increment.
+                * Process from the least-significant byte (index 15)
+                * upward, carrying as needed.
+                */
+               nblocks = DIV_ROUND_UP(req->cryptlen, bs);
+               {
+                       u8 *iv = req->iv;
+                       int i;
+
+                       for (i = bs - 1; i >= 0 && nblocks; i--) {
+                               u32 sum = (u32)iv[i] + (nblocks & 0xff);
+
+                               iv[i] = (u8)sum;
+                               nblocks = (nblocks >> 8) + (sum >> 8);
+                       }
+               }
+               break;
+       case AES_MODE_CFB:
+               /*
+                * CFB-128 chains on the last ciphertext block.  On encrypt,
+                * that is out_buf; on decrypt, it is in_buf.
+                *
+                * For sub-block requests (cryptlen < 16), there is no
+                * complete ciphertext block to chain, so the IV is left
+                * unchanged -- CFB-128 has no defined chaining semantic
+                * for partial blocks (shift-register CFB-n is a different
+                * mode).  Without this guard the pointer arithmetic
+                * underflows and reads before the buffer.
+                */
+               if (req->cryptlen >= bs) {
+                       if (op == AES_OP_ENCRYPT)
+                               memcpy(req->iv, out_buf + req->cryptlen - bs,
+                                      bs);
+                       else
+                               memcpy(req->iv, in_buf + req->cryptlen - bs,
+                                      bs);
+               }
+               break;
+       default:
+               break;
+       }
+}
+
+/* skcipher Operations */
+
+static const struct cmh_aes_alg_info *
+cmh_aes_get_info(struct crypto_skcipher *tfm)
+{
+       struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+
+       return container_of(alg, struct cmh_aes_alg_drv, alg)->info;
+}
+
+static int cmh_aes_setkey(struct crypto_skcipher *tfm, const u8 *key,
+                         unsigned int keylen)
+{
+       struct cmh_aes_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       const struct cmh_aes_alg_info *info = cmh_aes_get_info(tfm);
+
+       if (info->aes_mode == AES_MODE_XTS) {
+               int err;
+
+               /* XTS: double key (32, 48, or 64 bytes) */
+               if (keylen != 2 * AES_KEYSIZE_128 &&
+                   keylen != 2 * AES_KEYSIZE_192 &&
+                   keylen != 2 * AES_KEYSIZE_256)
+                       return -EINVAL;
+               err = xts_verify_key(tfm, key, keylen);
+               if (err)
+                       return err;
+       } else {
+               /* Standard: 16, 24, or 32 bytes */
+               if (keylen != AES_KEYSIZE_128 &&
+                   keylen != AES_KEYSIZE_192 &&
+                   keylen != AES_KEYSIZE_256)
+                       return -EINVAL;
+       }
+
+       return cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_AES);
+}
+
+static int cmh_aes_init_tfm(struct crypto_skcipher *tfm)
+{
+       struct cmh_aes_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       crypto_skcipher_set_reqsize(tfm, sizeof(struct cmh_aes_reqctx));
+       return 0;
+}
+
+static void cmh_aes_exit_tfm(struct crypto_skcipher *tfm)
+{
+       struct cmh_aes_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+
+       cmh_key_destroy(&tctx->key);
+}
+
+#define CMH_AES_MAX_CRYPTLEN   SZ_32M
+
+/* DMA unmap helper */
+static void cmh_aes_unmap_dma(struct cmh_aes_reqctx *rctx)
+{
+       if (rctx->iv2_buf)
+               cmh_dma_unmap_single(rctx->iv2_dma, rctx->ivsize,
+                                    DMA_TO_DEVICE);
+       if (rctx->ivsize > 0)
+               cmh_dma_unmap_single(rctx->iv_dma, rctx->ivsize,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->out_dma, rctx->cryptlen, DMA_FROM_DEVICE);
+       cmh_dma_unmap_single(rctx->in_dma, rctx->cryptlen, DMA_TO_DEVICE);
+}
+
+static void cmh_aes_free_bufs(struct cmh_aes_reqctx *rctx)
+{
+       kfree(rctx->iv2_buf);
+       rctx->iv2_buf = NULL;
+       kfree(rctx->iv_buf);
+       rctx->iv_buf = NULL;
+       kfree_sensitive(rctx->out_buf);
+       rctx->out_buf = NULL;
+       kfree_sensitive(rctx->in_buf);
+       rctx->in_buf = NULL;
+}
+
+/*
+ * Submit the second CTR chunk after the first completes.
+ * Called from cmh_aes_complete when ctr_chunk1_len > 0.
+ */
+static void cmh_aes_complete(void *data, int error);
+
+static int cmh_aes_ctr_submit_chunk2(struct skcipher_request *req)
+{
+       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+       struct cmh_aes_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       struct cmh_aes_reqctx *rctx = skcipher_request_ctx(req);
+       struct vcq_cmd cmds[CMH_AES_MAX_PAYLOAD];
+       u32 chunk1 = rctx->ctr_chunk1_len;
+       u32 chunk2 = rctx->cryptlen - chunk1;
+       u64 key_ref;
+       u32 keylen;
+       u32 idx = 0;
+
+       /* Clear split flag so next completion is final */
+       rctx->ctr_chunk1_len = 0;
+
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+
+       vcq_add_aes_init(&cmds[idx++], rctx->core_id, key_ref,
+                        (u64)rctx->iv2_dma, keylen, rctx->ivsize,
+                        rctx->aes_mode, rctx->aes_op, 0);
+       vcq_add_aes_final(&cmds[idx++], rctx->core_id,
+                         (u64)(rctx->in_dma + chunk1),
+                         (u64)(rctx->out_dma + chunk1), chunk2);
+       vcq_add_flush(&cmds[idx++], rctx->core_id);
+
+       return cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                            CMH_AES_MAX_PACKED,
+                                            rctx->target_mbx,
+                                            cmh_aes_complete, req,
+                                            !!(req->base.flags &
+                                               CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                            cmh_tm_async_timeout_jiffies());
+}
+
+/*
+ * Async completion callback -- fires from RH threaded IRQ context.
+ *
+ * Unmaps DMA buffers, copies output to req->dst scatterlist,
+ * updates the IV state, frees temporaries, and completes the request.
+ *
+ * For CTR counter-wrap splits, the first chunk completion chains
+ * into a second VCQ submission rather than finalizing immediately.
+ */
+static void cmh_aes_complete(void *data, int error)
+{
+       struct skcipher_request *req = data;
+       struct cmh_aes_reqctx *rctx = skcipher_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       /*
+        * CTR counter-wrap: first chunk completed, submit second.
+        * DMA mappings remain valid (they cover the full buffer).
+        *
+        * Recursion depth bounded: chunk2 clears ctr_chunk1_len before
+        * submission, so the second cmh_aes_complete invocation sees 0
+        * and finalizes (max depth = 2).
+        */
+       if (rctx->ctr_chunk1_len && !error) {
+               int ret;
+
+               ret = cmh_aes_ctr_submit_chunk2(req);
+
+               if (!ret || ret == -EBUSY)
+                       return;
+               /* Submission failed; clean up below */
+               error = ret;
+       }
+
+       cmh_aes_unmap_dma(rctx);
+
+       if (!error) {
+               scatterwalk_map_and_copy(rctx->out_buf, req->dst,
+                                        0, rctx->cryptlen, 1);
+               cmh_aes_update_iv(req, rctx->aes_mode, rctx->aes_op,
+                                 rctx->in_buf, rctx->out_buf);
+       }
+
+       cmh_aes_free_bufs(rctx);
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * Core encrypt/decrypt -- builds a VCQ transaction and submits async.
+ *
+ * Returns -EINPROGRESS on successful submission (completion callback
+ * will fire later).  Returns 0 for trivial cases (zero-length).
+ * Returns negative errno on pre-submission errors.
+ */
+static int cmh_aes_crypt(struct skcipher_request *req, u32 aes_op)
+{
+       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+       struct cmh_aes_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
+       const struct cmh_aes_alg_info *info = cmh_aes_get_info(tfm);
+       struct cmh_aes_reqctx *rctx = skcipher_request_ctx(req);
+       struct vcq_cmd cmds[CMH_AES_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE)
+               return -ENOKEY;
+
+       if (!req->cryptlen)
+               return 0;
+
+       if (req->cryptlen > CMH_AES_MAX_CRYPTLEN)
+               return -EINVAL;
+
+       switch (info->aes_mode) {
+       case AES_MODE_CTR:
+       case AES_MODE_CFB:
+               break;
+       case AES_MODE_XTS:
+               if (req->cryptlen < CMH_AES_BLOCK_SIZE)
+                       return -EINVAL;
+               break;
+       default:
+               if (req->cryptlen & (CMH_AES_BLOCK_SIZE - 1))
+                       return -EINVAL;
+               break;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       /* Initialise reqctx */
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = req->cryptlen;
+       rctx->ivsize = info->ivsize;
+       rctx->aes_mode = info->aes_mode;
+       rctx->aes_op = aes_op;
+       rctx->iv2_buf = NULL;
+
+       /* Linearise input from scatterlist */
+       rctx->in_buf = kmalloc(req->cryptlen, gfp);
+       if (!rctx->in_buf)
+               return -ENOMEM;
+
+       scatterwalk_map_and_copy(rctx->in_buf, req->src, 0, req->cryptlen, 0);
+
+       rctx->in_dma = cmh_dma_map_single(rctx->in_buf, req->cryptlen,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->in_dma)) {
+               ret = -ENOMEM;
+               goto out_free_in;
+       }
+
+       /* Allocate and map output buffer */
+       rctx->out_buf = kmalloc(req->cryptlen, gfp);
+       if (!rctx->out_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_in;
+       }
+
+       rctx->out_dma = cmh_dma_map_single(rctx->out_buf, req->cryptlen,
+                                          DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->out_dma)) {
+               ret = -ENOMEM;
+               goto out_free_out;
+       }
+
+       /* Map IV if required */
+       if (info->ivsize > 0) {
+               rctx->iv_buf = kmemdup(req->iv, info->ivsize, gfp);
+               if (!rctx->iv_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_out;
+               }
+               rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf, info->ivsize,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->iv_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_iv;
+               }
+       }
+
+       /* Resolve key reference */
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_AES);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /*
+        * iolen in INIT: XTS needs total length upfront for tweak
+        * computation; all other modes use 0 (streaming).
+        */
+       vcq_add_aes_init(&cmds[idx++], core_id, key_ref, (u64)rctx->iv_dma,
+                        keylen, info->ivsize, info->aes_mode, aes_op,
+                        info->aes_mode == AES_MODE_XTS ?
+                        req->cryptlen : 0);
+
+       if (info->aes_mode == AES_MODE_XTS &&
+           req->cryptlen > 2 * CMH_AES_BLOCK_SIZE) {
+               u32 final_len, update_len;
+
+               if (req->cryptlen & (CMH_AES_BLOCK_SIZE - 1))
+                       final_len = CMH_AES_BLOCK_SIZE +
+                                   (req->cryptlen & (CMH_AES_BLOCK_SIZE - 1));
+               else
+                       final_len = 2 * CMH_AES_BLOCK_SIZE;
+
+               update_len = req->cryptlen - final_len;
+
+               vcq_add_aes_update(&cmds[idx++], core_id,
+                                  (u64)rctx->in_dma,
+                                  (u64)rctx->out_dma, update_len);
+               vcq_add_aes_final(&cmds[idx++], core_id,
+                                 (u64)(rctx->in_dma + update_len),
+                                 (u64)(rctx->out_dma + update_len),
+                                 final_len);
+       } else if (info->aes_mode == AES_MODE_CTR) {
+               /*
+                * CTR counter-wrap workaround:
+                * The AES-SCA hardware uses a 64-bit block counter.
+                * If the lower 64 bits of the IV would wrap during
+                * this operation, split into two separate VCQ
+                * transactions -- the completion callback for the
+                * first chunk submits the second.
+                */
+               u64 lower64 = get_unaligned_be64(rctx->iv_buf + 8);
+               u32 nblocks = DIV_ROUND_UP(req->cryptlen,
+                                         CMH_AES_BLOCK_SIZE);
+               u64 bwrap = lower64 ? (~lower64 + 1ULL) : U64_MAX;
+
+               if (nblocks > bwrap) {
+                       u32 chunk1 = (u32)bwrap * CMH_AES_BLOCK_SIZE;
+                       u64 upper64;
+
+                       /* Prepare second IV for chained submission */
+                       rctx->iv2_buf = kmalloc(info->ivsize, gfp);
+                       if (!rctx->iv2_buf) {
+                               ret = -ENOMEM;
+                               goto out_unmap_iv;
+                       }
+                       upper64 = get_unaligned_be64(rctx->iv_buf);
+                       put_unaligned_be64(upper64 + 1, rctx->iv2_buf);
+                       put_unaligned_be64(0, rctx->iv2_buf + 8);
+
+                       rctx->iv2_dma =
+                               cmh_dma_map_single(rctx->iv2_buf,
+                                                  info->ivsize,
+                                                  DMA_TO_DEVICE);
+                       if (cmh_dma_map_error(rctx->iv2_dma)) {
+                               ret = -ENOMEM;
+                               goto out_free_iv2;
+                       }
+
+                       /* Store state for the chained second submission */
+                       rctx->ctr_chunk1_len = chunk1;
+                       rctx->core_id = core_id;
+                       rctx->target_mbx = target_mbx;
+                       rctx->key_ref = key_ref;
+
+                       /* First transaction: only chunk1 */
+                       vcq_add_aes_final(&cmds[idx++], core_id,
+                                         (u64)rctx->in_dma,
+                                         (u64)rctx->out_dma, chunk1);
+               } else {
+                       /* No wrap: single FINAL with all data */
+                       vcq_add_aes_final(&cmds[idx++], core_id,
+                                         (u64)rctx->in_dma,
+                                         (u64)rctx->out_dma,
+                                         req->cryptlen);
+               }
+       } else {
+               vcq_add_aes_final(&cmds[idx++], core_id,
+                                 (u64)rctx->in_dma,
+                                 (u64)rctx->out_dma, req->cryptlen);
+       }
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_AES_MAX_PACKED, target_mbx,
+                                           cmh_aes_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       if (rctx->iv2_buf) {
+               cmh_dma_unmap_single(rctx->iv2_dma, info->ivsize,
+                                    DMA_TO_DEVICE);
+       }
+out_free_iv2:
+       kfree(rctx->iv2_buf);
+out_unmap_iv:
+       if (info->ivsize > 0)
+               cmh_dma_unmap_single(rctx->iv_dma, info->ivsize,
+                                    DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+out_unmap_out:
+       cmh_dma_unmap_single(rctx->out_dma, req->cryptlen, DMA_FROM_DEVICE);
+out_free_out:
+       kfree_sensitive(rctx->out_buf);
+out_unmap_in:
+       cmh_dma_unmap_single(rctx->in_dma, req->cryptlen, DMA_TO_DEVICE);
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+       return ret;
+}
+
+static int cmh_aes_encrypt(struct skcipher_request *req)
+{
+       return cmh_aes_crypt(req, AES_OP_ENCRYPT);
+}
+
+static int cmh_aes_decrypt(struct skcipher_request *req)
+{
+       return cmh_aes_crypt(req, AES_OP_DECRYPT);
+}
+
+/* Registration */
+
+static struct cmh_aes_alg_drv aes_drv_algs[ARRAY_SIZE(aes_algs)];
+
+/**
+ * cmh_aes_register() - Register AES-CBC/CTR/ECB/XTS skcipher algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_aes_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < ARRAY_SIZE(aes_algs); i++) {
+               const struct cmh_aes_alg_info *info = &aes_algs[i];
+               struct cmh_aes_alg_drv *drv = &aes_drv_algs[i];
+               struct skcipher_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               memset(alg, 0, sizeof(*alg));
+
+               alg->setkey      = cmh_aes_setkey;
+               alg->encrypt     = cmh_aes_encrypt;
+               alg->decrypt     = cmh_aes_decrypt;
+               alg->init        = cmh_aes_init_tfm;
+               alg->exit        = cmh_aes_exit_tfm;
+               alg->min_keysize = info->min_keysize;
+               alg->max_keysize = info->max_keysize;
+               alg->ivsize      = info->ivsize;
+
+               strscpy(alg->base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->base.cra_priority  = 300;
+               alg->base.cra_flags     = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                         CRYPTO_ALG_ASYNC;
+               alg->base.cra_blocksize = aes_is_stream_mode(info->aes_mode)
+                                         ? 1 : CMH_AES_BLOCK_SIZE;
+               alg->base.cra_ctxsize  = sizeof(struct cmh_aes_tfm_ctx);
+               alg->base.cra_module   = THIS_MODULE;
+
+               ret = crypto_register_skcipher(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh_aes: failed to register %s (rc=%d)\n",
+                               info->alg_name, ret);
+                       goto err_unregister;
+               }
+
+               dev_dbg(cmh_dev(), "cmh_aes: registered %s\n", info->alg_name);
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_skcipher(&aes_drv_algs[i].alg);
+       return ret;
+}
+
+/**
+ * cmh_aes_unregister() - Unregister AES skcipher algorithms from the crypto framework
+ */
+void cmh_aes_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < ARRAY_SIZE(aes_algs); i++) {
+               crypto_unregister_skcipher(&aes_drv_algs[i].alg);
+               dev_dbg(cmh_dev(), "cmh_aes: unregistered %s\n", aes_algs[i].alg_name);
+       }
+}
diff --git a/drivers/crypto/cmh/cmh_aes_aead.c b/drivers/crypto/cmh/cmh_aes_aead.c
new file mode 100644
index 000000000000..0b59c5f7d474
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_aes_aead.c
@@ -0,0 +1,987 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API AES AEAD Driver (GCM/CCM)
+ *
+ * Registers AEAD algorithms with the Linux crypto subsystem:
+ *   gcm(aes), ccm(aes)
+ *
+ * GCM: AES_CMD_INIT(mode=GCM) + [AAD_FINAL] + AES_CMD_FINAL + FLUSH
+ *   - Standard 12-byte IV (nonce), 16-byte tag
+ *   - AES_CMD_INIT carries aadlen/iolen/taglen
+ *   - AES_CMD_FINAL carries tag DMA for encrypt (produce) / decrypt (verify)
+ *
+ * CCM: AES_CMD_CCM_INIT + [AAD_FINAL] + AES_CMD_FINAL + FLUSH
+ *   - Variable nonce (7--13 bytes), variable tag (4--16 bytes)
+ *   - Uses AES_CMD_CCM_INIT (0x0A) with aes_cmd_init struct
+ *   - Nonce passed via IV field, taglen in init
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/utils.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_aes.h"
+#include "cmh_vcq.h"
+#include "cmh_aes_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+/*
+ * GCM IV contract:
+ *
+ * The AES core requires exactly 16 bytes loaded into its IV register.
+ * For standard 96-bit nonce GCM, the driver passes:
+ *
+ *   IV[0..11]  = user-supplied 12-byte nonce
+ *   IV[12..15] = 0x00000000
+ *
+ * The hardware internally sets the last 32 bits to the big-endian
+ * counter value 1 (forming J0 = nonce || 0x00000001) before
+ * processing AAD.  The driver must NOT pre-set the counter.
+ *
+ * If the IV format is incorrect, GCM authentication will fail
+ * (encrypt produces wrong ciphertext/tag, decrypt rejects).
+ */
+#define AES_GCM_IV_SIZE                12U     /* GCM nonce size (standard) */
+#define AES_GCM_HW_IV_SIZE     16U     /* HW requires 16-byte IV buffer */
+#define AES_GCM_TAG_SIZE       16U
+
+/* CCM: callers pass a 16-byte IV in RFC 3610 format:
+ * iv[0] = L-1, iv[1..14-iv[0]] = nonce, rest = counter (zeroed).
+ * Nonce length = 14 - iv[0], range 7..13.
+ */
+#define AES_CCM_IV_SIZE        16U
+
+enum cmh_aes_aead_type {
+       CMH_AES_AEAD_GCM,
+       CMH_AES_AEAD_CCM,
+};
+
+struct cmh_aes_aead_info {
+       enum cmh_aes_aead_type type;
+       u32         aes_mode;   /* AES_MODE_GCM or AES_MODE_CCM */
+       u32         ivsize;
+       u32         maxauthsize;
+       const char *alg_name;
+       const char *drv_name;
+};
+
+static const struct cmh_aes_aead_info aes_aead_algs[] = {
+       { CMH_AES_AEAD_GCM, AES_MODE_GCM, AES_GCM_IV_SIZE,
+         AES_GCM_TAG_SIZE, "gcm(aes)", "cri-cmh-gcm-aes" },
+       { CMH_AES_AEAD_CCM, AES_MODE_CCM, AES_CCM_IV_SIZE,
+         AES_GCM_TAG_SIZE, "ccm(aes)", "cri-cmh-ccm-aes" },
+};
+
+struct cmh_aes_aead_tfm_ctx {
+       struct cmh_key_ctx key;
+       u32 authsize;           /* tag length set by setauthsize */
+       struct crypto_cipher *sw_cipher;        /* CCM empty-input fallback */
+       struct crypto_aead *fallback;   /* CCM authsize=10 fallback */
+};
+
+/* Per-request context (lives in aead_request::__ctx) */
+
+/*
+ * Maximum payload commands:
+ *   [SYS_CMD_WRITE] + AES_CMD_INIT + AAD_FINAL + AES_CMD_FINAL + FLUSH = 5
+ */
+#define CMH_AES_AEAD_MAX_PAYLOAD       5
+#define CMH_AES_AEAD_MAX_PACKED                (CMH_AES_AEAD_MAX_PAYLOAD * 2)
+
+struct cmh_aes_aead_reqctx {
+       dma_addr_t in_dma;
+       dma_addr_t out_dma;
+       dma_addr_t iv_dma;
+       dma_addr_t key_dma;
+       dma_addr_t aad_dma;
+       dma_addr_t tag_dma;
+       u8 *in_buf;
+       u8 *out_buf;
+       u8 *iv_buf;
+       u8 *aad_buf;
+       u8 *tag_buf;
+       u32 cryptlen;
+       u32 assoclen;
+       u32 authsize;
+       u32 iv_map_len;
+       u32 keylen;
+       bool encrypting;
+       bool empty_gcm_fallback;
+       struct vcq_cmd packed[CMH_AES_AEAD_MAX_PACKED];
+};
+
+struct cmh_aes_aead_drv {
+       struct aead_alg                  alg;
+       const struct cmh_aes_aead_info  *info;
+};
+
+static const struct cmh_aes_aead_info *
+cmh_aes_aead_get_info(struct crypto_aead *tfm)
+{
+       struct aead_alg *alg = crypto_aead_alg(tfm);
+
+       return container_of(alg, struct cmh_aes_aead_drv, alg)->info;
+}
+
+/* VCQ Builders -- AEAD-specific */
+
+static void vcq_add_aes_aead_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref,
+                                 u64 iv_dma, u32 keylen, u32 ivlen,
+                                 u32 mode, u32 op, u32 aadlen, u32 iolen,
+                                 u32 taglen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_INIT);
+       slot->hwc.aes.cmd_init.key = key_ref;
+       slot->hwc.aes.cmd_init.iv = iv_dma;
+       slot->hwc.aes.cmd_init.keylen = keylen;
+       slot->hwc.aes.cmd_init.ivlen = ivlen;
+       slot->hwc.aes.cmd_init.mode = mode;
+       slot->hwc.aes.cmd_init.op = op;
+       slot->hwc.aes.cmd_init.aadlen = aadlen;
+       slot->hwc.aes.cmd_init.iolen = iolen;
+       slot->hwc.aes.cmd_init.taglen = taglen;
+}
+
+static void vcq_add_aes_ccm_init(struct vcq_cmd *slot, u32 core_id, u64 key_ref,
+                                u64 nonce_dma, u32 keylen, u32 noncelen,
+                                u32 op, u32 aadlen, u32 iolen, u32 taglen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_CCM_INIT);
+       slot->hwc.aes.cmd_init.key = key_ref;
+       slot->hwc.aes.cmd_init.iv = nonce_dma;
+       slot->hwc.aes.cmd_init.keylen = keylen;
+       slot->hwc.aes.cmd_init.ivlen = noncelen;
+       slot->hwc.aes.cmd_init.mode = AES_MODE_CCM;
+       slot->hwc.aes.cmd_init.op = op;
+       slot->hwc.aes.cmd_init.aadlen = aadlen;
+       slot->hwc.aes.cmd_init.iolen = iolen;
+       slot->hwc.aes.cmd_init.taglen = taglen;
+}
+
+static void vcq_add_aes_aad_final(struct vcq_cmd *slot, u32 core_id, u64 aad_dma,
+                                 u32 aadlen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_AAD_FINAL);
+       slot->hwc.aes.cmd_aad_final.data = aad_dma;
+       slot->hwc.aes.cmd_aad_final.datalen = aadlen;
+}
+
+static void vcq_add_aes_aead_final(struct vcq_cmd *slot, u32 core_id, u64 input_dma,
+                                  u64 output_dma, u64 tag_dma,
+                                  u32 iolen, u32 taglen)
+{
+       memset(slot, 0, sizeof(*slot));
+       slot->magic = VCQ_CMD_MAGIC;
+       slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_FINAL);
+       slot->hwc.aes.cmd_final.input = input_dma;
+       slot->hwc.aes.cmd_final.output = output_dma;
+       slot->hwc.aes.cmd_final.tag = tag_dma;
+       slot->hwc.aes.cmd_final.iolen = iolen;
+       slot->hwc.aes.cmd_final.taglen = taglen;
+}
+
+/* setkey */
+static int cmh_aes_aead_setkey(struct crypto_aead *tfm, const u8 *key,
+                              unsigned int keylen)
+{
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       int ret;
+
+       if (keylen != 16 && keylen != 24 && keylen != 32)
+               return -EINVAL;
+
+       /* Keep SW fallback ciphers in sync for CCM edge cases */
+       if (tctx->sw_cipher) {
+               ret = crypto_cipher_setkey(tctx->sw_cipher, key, keylen);
+               if (ret)
+                       return ret;
+       }
+       if (tctx->fallback) {
+               ret = crypto_aead_setkey(tctx->fallback, key, keylen);
+               if (ret)
+                       return ret;
+       }
+
+       ret = cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_AES);
+
+       return ret;
+}
+
+static int cmh_aes_aead_setauthsize(struct crypto_aead *tfm,
+                                   unsigned int authsize)
+{
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       const struct cmh_aes_aead_info *info = cmh_aes_aead_get_info(tfm);
+       int ret;
+
+       if (info->type == CMH_AES_AEAD_GCM) {
+               /* GCM: accept 4, 8, 12, 13, 14, 15, 16 per NIST SP 800-38D */
+               if (authsize < 4 || authsize > 16 ||
+                   (authsize > 4 && authsize < 8) ||
+                   (authsize > 8 && authsize < 12))
+                       return -EINVAL;
+       } else {
+               /* CCM: accept all RFC 3610 values {4,6,8,10,12,14,16} */
+               if (authsize < 4 || authsize > 16 || (authsize & 1))
+                       return -EINVAL;
+               /* Forward to SW fallback for authsize=10 (HW unsupported) */
+               if (tctx->fallback) {
+                       ret = crypto_aead_setauthsize(tctx->fallback,
+                                                     authsize);
+                       if (ret)
+                               return ret;
+               }
+       }
+
+       tctx->authsize = authsize;
+       return 0;
+}
+
+static int cmh_aes_aead_init_tfm(struct crypto_aead *tfm)
+{
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       const struct cmh_aes_aead_info *info = cmh_aes_aead_get_info(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       tctx->authsize = info->maxauthsize;
+
+       if (info->type == CMH_AES_AEAD_CCM) {
+               struct crypto_aead *fb;
+               struct crypto_cipher *ci;
+
+               ci = crypto_alloc_cipher("aes", 0, 0);
+               if (IS_ERR(ci))
+                       return PTR_ERR(ci);
+               tctx->sw_cipher = ci;
+
+               fb = crypto_alloc_aead("ccm(aes)", 0,
+                                      CRYPTO_ALG_NEED_FALLBACK);
+               if (IS_ERR(fb)) {
+                       crypto_free_cipher(ci);
+                       tctx->sw_cipher = NULL;
+                       return PTR_ERR(fb);
+               }
+               tctx->fallback = fb;
+
+               /*
+                * Subreq lives at (rctx + 1).  Alignment is guaranteed
+                * by the crypto framework's __ctx ALIGN mechanism.
+                */
+               crypto_aead_set_reqsize(tfm,
+                                       sizeof(struct cmh_aes_aead_reqctx) +
+                                       sizeof(struct aead_request) +
+                                       crypto_aead_reqsize(fb));
+       } else {
+               crypto_aead_set_reqsize(tfm,
+                                       sizeof(struct cmh_aes_aead_reqctx));
+       }
+
+       return 0;
+}
+
+static void cmh_aes_aead_exit_tfm(struct crypto_aead *tfm)
+{
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+
+       if (tctx->fallback)
+               crypto_free_aead(tctx->fallback);
+       if (tctx->sw_cipher)
+               crypto_free_cipher(tctx->sw_cipher);
+       cmh_key_destroy(&tctx->key);
+}
+
+/* DMA unmap helper */
+static void cmh_aes_aead_unmap_dma(struct cmh_aes_aead_reqctx *rctx)
+{
+       u32 tag_map_len;
+
+       cmh_dma_unmap_single(rctx->iv_dma, rctx->iv_map_len, DMA_TO_DEVICE);
+       /*
+        * The empty-GCM fallback maps a full AES block (16 bytes) for the
+        * ECB output regardless of authsize, so unmap with the mapped size.
+        */
+       tag_map_len = rctx->empty_gcm_fallback ?
+                     AES_GCM_HW_IV_SIZE : rctx->authsize;
+       cmh_dma_unmap_single(rctx->tag_dma, tag_map_len,
+                            (rctx->encrypting || rctx->empty_gcm_fallback) ?
+                             DMA_FROM_DEVICE : DMA_TO_DEVICE);
+       if (rctx->cryptlen > 0) {
+               cmh_dma_unmap_single(rctx->out_dma, rctx->cryptlen,
+                                    DMA_FROM_DEVICE);
+               cmh_dma_unmap_single(rctx->in_dma, rctx->cryptlen,
+                                    DMA_TO_DEVICE);
+       }
+       if (rctx->assoclen > 0)
+               cmh_dma_unmap_single(rctx->aad_dma, rctx->assoclen,
+                                    DMA_TO_DEVICE);
+}
+
+static void cmh_aes_aead_free_bufs(struct cmh_aes_aead_reqctx *rctx)
+{
+       kfree(rctx->iv_buf);
+       rctx->iv_buf = NULL;
+       kfree(rctx->tag_buf);
+       rctx->tag_buf = NULL;
+       kfree_sensitive(rctx->out_buf);
+       rctx->out_buf = NULL;
+       kfree_sensitive(rctx->in_buf);
+       rctx->in_buf = NULL;
+       kfree(rctx->aad_buf);
+       rctx->aad_buf = NULL;
+}
+
+static void cmh_aes_aead_complete(void *data, int error)
+{
+       struct aead_request *req = data;
+       struct cmh_aes_aead_reqctx *rctx = aead_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       cmh_aes_aead_unmap_dma(rctx);
+
+       /*
+        * Map HW error on decrypt to -EBADMSG.  The eSW AES core uses a
+        * single error code (-EIO) for both authentication failures and
+        * other core errors (e.g. DMA timeout), so we cannot distinguish
+        * them from the MBX_STATUS alone.  In practice the only error
+        * during a well-formed AEAD decrypt is auth-tag mismatch; a DMA
+        * timeout would indicate a fatal HW problem where -EBADMSG vs
+        * -EIO is moot.  The kernel crypto API requires -EBADMSG for
+        * AEAD authentication failures.
+        */
+       if (error == -EIO && !rctx->encrypting)
+               error = -EBADMSG;
+
+       if (!error) {
+               /* GCM empty-input decrypt: compare computed tag with expected */
+               if (rctx->empty_gcm_fallback && !rctx->encrypting) {
+                       if (crypto_memneq(rctx->tag_buf, rctx->in_buf,
+                                         rctx->authsize))
+                               error = -EBADMSG;
+               }
+               if (!error && rctx->cryptlen > 0)
+                       scatterwalk_map_and_copy(rctx->out_buf, req->dst,
+                                                req->assoclen,
+                                               rctx->cryptlen, 1);
+               if (!error && rctx->encrypting)
+                       scatterwalk_map_and_copy(rctx->tag_buf, req->dst,
+                                                req->assoclen +
+                                               rctx->cryptlen,
+                                               rctx->authsize, 1);
+       }
+
+       cmh_aes_aead_free_bufs(rctx);
+       cmh_complete(&req->base, error);
+}
+
+/*
+ * GCM empty-input fallback.
+ *
+ * When both AAD and plaintext are empty, GCM reduces to:
+ *   tag = E(K, J0) where J0 = nonce || 0x00000001
+ *
+ * The eSW GCM engine rejects this degenerate case, so we compute it
+ * via a single ECB block encryption of J0.
+ *
+ * VCQ: [SYS_CMD_WRITE] + AES_CMD_INIT(ECB) + AES_CMD_FINAL + FLUSH
+ */
+static int cmh_aes_gcm_empty(struct aead_request *req, u32 aes_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       struct cmh_aes_aead_reqctx *rctx = aead_request_ctx(req);
+       struct vcq_cmd cmds[CMH_AES_AEAD_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen, authsize;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       authsize = tctx->authsize;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = 0;
+       rctx->assoclen = 0;
+       rctx->authsize = authsize;
+       rctx->encrypting = (aes_op == AES_OP_ENCRYPT);
+       rctx->empty_gcm_fallback = true;
+
+       /* Build J0 = nonce || 0x00000001 in iv_buf */
+       rctx->iv_buf = kzalloc(AES_GCM_HW_IV_SIZE, gfp);
+       if (!rctx->iv_buf)
+               return -ENOMEM;
+       memcpy(rctx->iv_buf, req->iv, AES_GCM_IV_SIZE);
+       rctx->iv_buf[15] = 0x01; /* big-endian counter = 1 */
+       rctx->iv_map_len = AES_GCM_HW_IV_SIZE;
+
+       rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf, AES_GCM_HW_IV_SIZE,
+                                         DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->iv_dma)) {
+               ret = -ENOMEM;
+               goto out_free_iv;
+       }
+
+       /* Tag buffer -- receives E(K, J0) output */
+       rctx->tag_buf = kzalloc(AES_GCM_HW_IV_SIZE, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_iv;
+       }
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf, AES_GCM_HW_IV_SIZE,
+                                          DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       /* For decrypt: read expected tag from request for later comparison */
+       if (!rctx->encrypting) {
+               rctx->in_buf = kmalloc(authsize, gfp);
+               if (!rctx->in_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+               scatterwalk_map_and_copy(rctx->in_buf, req->src, 0,
+                                        authsize, 0);
+       }
+
+       /* Resolve key */
+       idx = 0;
+       rctx->key_dma = tctx->key.raw.dma;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_AES);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /* ECB INIT: single block encryption of J0 */
+       vcq_add_aes_aead_init(&cmds[idx++], core_id, key_ref,
+                             0, keylen, 0, AES_MODE_ECB, AES_OP_ENCRYPT,
+                             0, AES_GCM_HW_IV_SIZE, 0);
+
+       /* FINAL: J0 in, E(K,J0) out */
+       vcq_add_aes_aead_final(&cmds[idx++], core_id,
+                              (u64)rctx->iv_dma, (u64)rctx->tag_dma,
+                              0, AES_GCM_HW_IV_SIZE, 0);
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_AES_AEAD_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_aes_aead_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_free_in;
+
+       return -EINPROGRESS;
+
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, AES_GCM_HW_IV_SIZE,
+                            DMA_FROM_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_unmap_iv:
+       cmh_dma_unmap_single(rctx->iv_dma, AES_GCM_HW_IV_SIZE, DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+       return ret;
+}
+
+/*
+ * CCM empty-input fallback.
+ *
+ * When both AAD and plaintext are empty, CCM reduces to:
+ *   T  = E(K, B0)    -- CBC-MAC of the single formatting block
+ *   S0 = E(K, A0)    -- CTR block zero
+ *   tag = (T XOR S0)[0..authsize-1]
+ *
+ * The eSW rejects this degenerate case, so the driver computes it
+ * synchronously via two crypto_cipher single-block encryptions.
+ */
+static int cmh_aes_ccm_empty(struct aead_request *req, u32 aes_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       u32 authsize = tctx->authsize;
+       u8 b0[CMH_AES_BLOCK_SIZE], a0[CMH_AES_BLOCK_SIZE];
+       u8 t[CMH_AES_BLOCK_SIZE], s0[CMH_AES_BLOCK_SIZE];
+       u8 tag[CMH_AES_BLOCK_SIZE];
+       u8 L;
+       u32 i;
+
+       /* Defense-in-depth: iv[0] = L-1, valid L is 2..8 per RFC 3610 S2.1 */
+       if (WARN_ON_ONCE(req->iv[0] < 1 || req->iv[0] > 7))
+               return -EINVAL;
+
+       L = req->iv[0] + 1;
+
+       if (tctx->key.mode != CMH_KEY_RAW)
+               return -EOPNOTSUPP;
+
+       /* B0: flags || nonce || Q(=0).  Adata=0, t=authsize, q=L. */
+       memset(b0, 0, CMH_AES_BLOCK_SIZE);
+       b0[0] = (u8)(8 * ((authsize - 2) / 2) + (L - 1));
+       memcpy(&b0[1], &req->iv[1], 15 - L);
+
+       /* A0: (L-1) || nonce || counter(=0) */
+       memset(a0, 0, CMH_AES_BLOCK_SIZE);
+       a0[0] = (u8)(L - 1);
+       memcpy(&a0[1], &req->iv[1], 15 - L);
+
+       crypto_cipher_encrypt_one(tctx->sw_cipher, t, b0);
+       crypto_cipher_encrypt_one(tctx->sw_cipher, s0, a0);
+
+       for (i = 0; i < authsize; i++)
+               tag[i] = t[i] ^ s0[i];
+
+       if (aes_op == AES_OP_ENCRYPT) {
+               scatterwalk_map_and_copy(tag, req->dst,
+                                        req->assoclen, authsize, 1);
+       } else {
+               u8 expected[CMH_AES_BLOCK_SIZE];
+
+               scatterwalk_map_and_copy(expected, req->src,
+                                        req->assoclen, authsize, 0);
+               if (crypto_memneq(tag, expected, authsize))
+                       return -EBADMSG;
+       }
+
+       return 0;
+}
+
+/*
+ * CCM authsize=10 fallback.
+ *
+ * The eSW AES CCM core does not support authsize=10 (valid per RFC 3610).
+ * Forward the entire request to the generic CCM implementation.
+ */
+static void cmh_aes_ccm_fb_done(void *data, int err)
+{
+       struct aead_request *req = data;
+
+       cmh_complete(&req->base, err);
+}
+
+static int cmh_aes_ccm_fallback(struct aead_request *req, u32 aes_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       struct cmh_aes_aead_reqctx *rctx = aead_request_ctx(req);
+       struct aead_request *subreq = (void *)(rctx + 1);
+
+       aead_request_set_tfm(subreq, tctx->fallback);
+       aead_request_set_callback(subreq, req->base.flags,
+                                 cmh_aes_ccm_fb_done, req);
+       aead_request_set_crypt(subreq, req->src, req->dst,
+                              req->cryptlen, req->iv);
+       aead_request_set_ad(subreq, req->assoclen);
+
+       return (aes_op == AES_OP_ENCRYPT) ?
+               crypto_aead_encrypt(subreq) : crypto_aead_decrypt(subreq);
+}
+
+/*
+ * Core AEAD encrypt/decrypt -- async path.
+ *
+ * Encrypt: plaintext -> ciphertext + tag appended
+ * Decrypt: ciphertext + tag -> plaintext (tag verified by eSW)
+ *
+ * VCQ: [SYS_CMD_WRITE] + INIT/CCM_INIT + [AAD_FINAL] + FINAL + FLUSH
+ */
+static int cmh_aes_aead_crypt(struct aead_request *req, u32 aes_op)
+{
+       struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+       struct cmh_aes_aead_tfm_ctx *tctx = crypto_aead_ctx(tfm);
+       const struct cmh_aes_aead_info *info = cmh_aes_aead_get_info(tfm);
+       struct cmh_aes_aead_reqctx *rctx = aead_request_ctx(req);
+       struct vcq_cmd cmds[CMH_AES_AEAD_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen, authsize, cryptlen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE)
+               return -ENOKEY;
+
+       authsize = tctx->authsize;
+
+       if (aes_op == AES_OP_ENCRYPT) {
+               cryptlen = req->cryptlen;
+       } else {
+               if (req->cryptlen < authsize)
+                       return -EINVAL;
+               cryptlen = req->cryptlen - authsize;
+       }
+
+       /*
+        * Validate CCM IV format early -- the empty-input fallback and
+        * nonce extraction both depend on iv[0] being in range [1,7].
+        */
+       if (info->type == CMH_AES_AEAD_CCM) {
+               if (req->iv[0] < 1 || req->iv[0] > 7)
+                       return -EINVAL;
+       }
+
+       /*
+        * The CMH eSW rejects GCM/CCM when both aadlen and iolen are zero.
+        * For GCM, the tag is simply E(K, J0) -- handle with ECB fallback.
+        * For CCM, compute tag = E(K,B0) XOR E(K,A0) in software.
+        */
+       if (cryptlen == 0 && req->assoclen == 0) {
+               if (info->type == CMH_AES_AEAD_GCM)
+                       return cmh_aes_gcm_empty(req, aes_op);
+               return cmh_aes_ccm_empty(req, aes_op);
+       }
+
+       /*
+        * HW does not support authsize=10 for CCM.  Forward the entire
+        * request to the generic CCM implementation.
+        */
+       if (info->type == CMH_AES_AEAD_CCM && authsize == 10)
+               return cmh_aes_ccm_fallback(req, aes_op);
+
+       /*
+        * HW uses a proprietary LLI scatter-gather format that is
+        * incompatible with struct scatterlist, so the payload is
+        * linearised into contiguous buffers for DMA.  Cap total
+        * size to prevent excessive memory consumption.
+        */
+       if ((u64)cryptlen + req->assoclen > SZ_1M)
+               return -EINVAL;
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       memset(rctx, 0, sizeof(*rctx));
+       rctx->cryptlen = cryptlen;
+       rctx->assoclen = req->assoclen;
+       rctx->authsize = authsize;
+       rctx->encrypting = (aes_op == AES_OP_ENCRYPT);
+
+       /* Linearise AAD */
+       if (req->assoclen > 0) {
+               rctx->aad_buf = kmalloc(req->assoclen, gfp);
+               if (!rctx->aad_buf)
+                       return -ENOMEM;
+               scatterwalk_map_and_copy(rctx->aad_buf, req->src,
+                                        0, req->assoclen, 0);
+               rctx->aad_dma = cmh_dma_map_single(rctx->aad_buf,
+                                                  req->assoclen,
+                                                   DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->aad_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_aad;
+               }
+       }
+
+       /* Linearise input */
+       if (cryptlen > 0) {
+               rctx->in_buf = kmalloc(cryptlen, gfp);
+               if (!rctx->in_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_aad;
+               }
+               scatterwalk_map_and_copy(rctx->in_buf, req->src,
+                                        req->assoclen, cryptlen, 0);
+               rctx->in_dma = cmh_dma_map_single(rctx->in_buf, cryptlen,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->in_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_in;
+               }
+       }
+
+       /* Allocate output buffer */
+       if (cryptlen > 0) {
+               rctx->out_buf = kmalloc(cryptlen, gfp);
+               if (!rctx->out_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_in;
+               }
+               rctx->out_dma = cmh_dma_map_single(rctx->out_buf, cryptlen,
+                                                  DMA_FROM_DEVICE);
+               if (cmh_dma_map_error(rctx->out_dma)) {
+                       ret = -ENOMEM;
+                       goto out_free_out;
+               }
+       }
+
+       /* Tag buffer */
+       rctx->tag_buf = kmalloc(authsize, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_unmap_out;
+       }
+
+       if (!rctx->encrypting) {
+               scatterwalk_map_and_copy(rctx->tag_buf, req->src,
+                                        req->assoclen + cryptlen,
+                                       authsize, 0);
+       } else {
+               memset(rctx->tag_buf, 0, authsize);
+       }
+
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf, authsize,
+                                          rctx->encrypting ?
+                                           DMA_FROM_DEVICE : DMA_TO_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       /* Map IV/nonce */
+       if (info->type == CMH_AES_AEAD_GCM) {
+               rctx->iv_buf = kzalloc(AES_GCM_HW_IV_SIZE, gfp);
+               if (!rctx->iv_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+               memcpy(rctx->iv_buf, req->iv, AES_GCM_IV_SIZE);
+               rctx->iv_map_len = AES_GCM_HW_IV_SIZE;
+               rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf,
+                                                 rctx->iv_map_len,
+                                                  DMA_TO_DEVICE);
+       } else {
+               u32 noncelen;
+
+               if (req->iv[0] < 1 || req->iv[0] > 7) {
+                       ret = -EINVAL;
+                       goto out_unmap_tag;
+               }
+               noncelen = 14 - req->iv[0];
+
+               rctx->iv_buf = kmemdup(req->iv + 1, noncelen, gfp);
+               if (!rctx->iv_buf) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+               rctx->iv_map_len = noncelen;
+               rctx->iv_dma = cmh_dma_map_single(rctx->iv_buf,
+                                                 rctx->iv_map_len,
+                                                  DMA_TO_DEVICE);
+       }
+       if (cmh_dma_map_error(rctx->iv_dma)) {
+               ret = -ENOMEM;
+               goto out_free_iv;
+       }
+
+       /* Resolve key reference */
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_AES);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /* Build INIT command */
+       if (info->type == CMH_AES_AEAD_CCM) {
+               vcq_add_aes_ccm_init(&cmds[idx++], core_id, key_ref,
+                                    (u64)rctx->iv_dma, keylen,
+                                    rctx->iv_map_len, aes_op,
+                                    req->assoclen, cryptlen, authsize);
+       } else {
+               vcq_add_aes_aead_init(&cmds[idx++], core_id, key_ref,
+                                     (u64)rctx->iv_dma, keylen,
+                                     AES_GCM_HW_IV_SIZE, info->aes_mode,
+                                     aes_op, req->assoclen, cryptlen,
+                                     authsize);
+       }
+
+       if (req->assoclen > 0)
+               vcq_add_aes_aad_final(&cmds[idx++], core_id,
+                                     (u64)rctx->aad_dma, req->assoclen);
+
+       vcq_add_aes_aead_final(&cmds[idx++], core_id,
+                              cryptlen > 0 ? (u64)rctx->in_dma : 0,
+                              cryptlen > 0 ? (u64)rctx->out_dma : 0,
+                              (u64)rctx->tag_dma, cryptlen, authsize);
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_AES_AEAD_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_aes_aead_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       cmh_dma_unmap_single(rctx->iv_dma, rctx->iv_map_len, DMA_TO_DEVICE);
+out_free_iv:
+       kfree(rctx->iv_buf);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, authsize,
+                            rctx->encrypting ? DMA_FROM_DEVICE :
+                                              DMA_TO_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_unmap_out:
+       if (cryptlen > 0)
+               cmh_dma_unmap_single(rctx->out_dma, cryptlen, DMA_FROM_DEVICE);
+out_free_out:
+       kfree_sensitive(rctx->out_buf);
+out_unmap_in:
+       if (cryptlen > 0)
+               cmh_dma_unmap_single(rctx->in_dma, cryptlen, DMA_TO_DEVICE);
+out_free_in:
+       kfree_sensitive(rctx->in_buf);
+out_unmap_aad:
+       if (req->assoclen > 0)
+               cmh_dma_unmap_single(rctx->aad_dma, req->assoclen,
+                                    DMA_TO_DEVICE);
+out_free_aad:
+       kfree(rctx->aad_buf);
+       return ret;
+}
+
+static int cmh_aes_aead_encrypt(struct aead_request *req)
+{
+       return cmh_aes_aead_crypt(req, AES_OP_ENCRYPT);
+}
+
+static int cmh_aes_aead_decrypt(struct aead_request *req)
+{
+       return cmh_aes_aead_crypt(req, AES_OP_DECRYPT);
+}
+
+/* Registration */
+
+static struct cmh_aes_aead_drv aes_aead_drv_algs[ARRAY_SIZE(aes_aead_algs)];
+
+/**
+ * cmh_aes_aead_register() - Register AES-GCM/CCM AEAD algorithms with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_aes_aead_register(void)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < ARRAY_SIZE(aes_aead_algs); i++) {
+               const struct cmh_aes_aead_info *info = &aes_aead_algs[i];
+               struct cmh_aes_aead_drv *drv = &aes_aead_drv_algs[i];
+               struct aead_alg *alg = &drv->alg;
+
+               drv->info = info;
+
+               memset(alg, 0, sizeof(*alg));
+
+               alg->setkey      = cmh_aes_aead_setkey;
+               alg->setauthsize = cmh_aes_aead_setauthsize;
+               alg->encrypt     = cmh_aes_aead_encrypt;
+               alg->decrypt     = cmh_aes_aead_decrypt;
+               alg->init        = cmh_aes_aead_init_tfm;
+               alg->exit        = cmh_aes_aead_exit_tfm;
+               alg->ivsize      = info->ivsize;
+               alg->maxauthsize = info->maxauthsize;
+
+               strscpy(alg->base.cra_name, info->alg_name,
+                       CRYPTO_MAX_ALG_NAME);
+               strscpy(alg->base.cra_driver_name, info->drv_name,
+                       CRYPTO_MAX_ALG_NAME);
+               alg->base.cra_priority  = 300;
+               alg->base.cra_flags     = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                         CRYPTO_ALG_ASYNC;
+               if (info->type == CMH_AES_AEAD_CCM) {
+                       alg->base.cra_flags |= CRYPTO_ALG_NEED_FALLBACK;
+                       /*
+                        * Bump priority above 300 so we beat the generic
+                        * ccm_base template instance.  That template inherits
+                        * priority (ctr + cbcmac) / 2 = 300 when both
+                        * constituents are at 300, and list ordering would
+                        * otherwise let it shadow our driver.
+                        */
+                       alg->base.cra_priority = 301;
+               }
+               alg->base.cra_blocksize = 1;
+               alg->base.cra_ctxsize  = sizeof(struct cmh_aes_aead_tfm_ctx);
+               alg->base.cra_module   = THIS_MODULE;
+
+               ret = crypto_register_aead(alg);
+               if (ret) {
+                       dev_err(cmh_dev(), "cmh_aes_aead: failed to register %s (rc=%d)\n",
+                               info->alg_name, ret);
+                       goto err_unregister;
+               }
+
+               dev_dbg(cmh_dev(), "cmh_aes_aead: registered %s\n", info->alg_name);
+       }
+
+       return 0;
+
+err_unregister:
+       while (i--)
+               crypto_unregister_aead(&aes_aead_drv_algs[i].alg);
+       return ret;
+}
+
+/**
+ * cmh_aes_aead_unregister() - Unregister AES AEAD algorithms from the crypto framework
+ */
+void cmh_aes_aead_unregister(void)
+{
+       unsigned int i;
+
+       for (i = 0; i < ARRAY_SIZE(aes_aead_algs); i++) {
+               crypto_unregister_aead(&aes_aead_drv_algs[i].alg);
+               dev_dbg(cmh_dev(), "cmh_aes_aead: unregistered %s\n",
+                       aes_aead_algs[i].alg_name);
+       }
+}
diff --git a/drivers/crypto/cmh/cmh_aes_cmac.c b/drivers/crypto/cmh/cmh_aes_cmac.c
new file mode 100644
index 000000000000..a711c575398d
--- /dev/null
+++ b/drivers/crypto/cmh/cmh_aes_cmac.c
@@ -0,0 +1,537 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- Kernel Crypto API AES-CMAC (ahash) Driver
+ *
+ * Registers cmac(aes) as an ahash algorithm.
+ *
+ * CMAC produces a 16-byte tag (MAC) from a key and message.
+ * VCQ sequence: [SYS_CMD_WRITE] + AES_CMD_INIT(CMAC) +
+ *               AES_CMD_AAD_FINAL_AUTH + FLUSH
+ *
+ * The ahash interface accumulates data in a kernel buffer via .update(),
+ * then .final() builds and submits the VCQ asynchronously.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/crypto.h>
+#include <crypto/internal/hash.h>
+#include <crypto/scatterwalk.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include "cmh_aes.h"
+#include "cmh_vcq.h"
+#include "cmh_aes_abi.h"
+#include "cmh_sys_abi.h"
+#include "cmh_sys.h"
+#include "cmh_txn.h"
+#include "cmh_dma.h"
+#include "cmh_key.h"
+
+#define AES_CMAC_DIGEST_SIZE   16U
+#define AES_CMAC_BLOCK_SIZE    16U
+
+/*
+ * Maximum accumulated data for CMAC -- driver-imposed, not HW.
+ *
+ * The AES core does not expose external save/restore VCQ commands,
+ * so the driver must accumulate all data in kernel memory via
+ * .update() and submit it atomically in .final().  This cap limits
+ * the per-request kernel allocation.
+ */
+#define AES_CMAC_MAX_DATA      (64 * 1024)
+
+/* Per-transform context */
+struct cmh_aes_cmac_tfm_ctx {
+       struct cmh_key_ctx key;
+       spinlock_t         chunk_lock;  /* protects all_chunks */
+       struct list_head   all_chunks;  /* orphan-safe chunk tracking */
+};
+
+/* One chunk per .update() call -- data is embedded via flexible array */
+struct cmh_aes_cmac_chunk {
+       struct list_head list;
+       struct list_head tfm_node; /* per-tfm orphan tracking */
+       u32 len;
+       u8 data[];
+};
+
+/* Per-request context (lives in ahash_request::__ctx) */
+
+/*
+ * Maximum payload commands:
+ *   [SYS_CMD_WRITE] + AES_CMD_INIT + AES_CMD_AAD_FINAL_AUTH + FLUSH = 4
+ */
+#define CMH_AES_CMAC_MAX_PAYLOAD       4
+#define CMH_AES_CMAC_MAX_PACKED                (CMH_AES_CMAC_MAX_PAYLOAD * 2)
+
+struct cmh_aes_cmac_reqctx {
+       struct list_head chunks;
+       u32  total_len;
+       u8  *buf;       /* linearised in final() for DMA */
+       /* DMA state for async final */
+       dma_addr_t key_dma;
+       dma_addr_t in_dma;
+       dma_addr_t tag_dma;
+       u8 *tag_buf;
+       u32 keylen;
+       struct vcq_cmd packed[CMH_AES_CMAC_MAX_PACKED];
+};
+
+/* Flat state for export/import -- holds accumulated input data only */
+struct cmh_aes_cmac_export_state {
+       u32 total_len;
+       u8  data[];
+};
+
+/*
+ * Flat state buffer for export/import.  The CMH AES core does not
+ * support save/restore of intermediate CMAC state, so this driver
+ * accumulates input in SW and serialises the buffer on export.
+ *
+ * PAGE_SIZE (4096) caps the exportable accumulated-data window.
+ * Full-range export is not feasible because the crypto subsystem
+ * pre-allocates statesize bytes per request.  Export returns -EINVAL
+ * if the caller has accumulated more than CMH_AES_CMAC_EXPORT_MAX.
+ */
+#define CMH_AES_CMAC_STATE_SIZE 4096
+#define CMH_AES_CMAC_EXPORT_MAX \
+       (CMH_AES_CMAC_STATE_SIZE - sizeof(struct cmh_aes_cmac_export_state))
+
+/*
+ * Export/import: not supported.
+ *
+ * The AES core lacks external save/restore VCQ commands, so there is
+ * no way to checkpoint intermediate CMAC state to host memory.
+ * Pending eSW ABI extension to add save/restore for the AES core.
+ */
+
+static int cmh_aes_cmac_setkey(struct crypto_ahash *tfm, const u8 *key,
+                              unsigned int keylen)
+{
+       struct cmh_aes_cmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+
+       if (keylen != 16 && keylen != 24 && keylen != 32)
+               return -EINVAL;
+
+       return cmh_key_setkey_raw(&tctx->key, key, keylen, CORE_ID_AES);
+}
+
+static void cmh_aes_cmac_free_chunks(struct cmh_aes_cmac_reqctx *rctx,
+                                    struct cmh_aes_cmac_tfm_ctx *tctx)
+{
+       struct cmh_aes_cmac_chunk *c, *tmp;
+
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(c, tmp, &rctx->chunks, list) {
+               list_del(&c->list);
+               list_del(&c->tfm_node);
+               kfree_sensitive(c);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->total_len = 0;
+}
+
+static int cmh_aes_cmac_init(struct ahash_request *req)
+{
+       struct cmh_aes_cmac_reqctx *rctx = ahash_request_ctx(req);
+
+       memset(rctx, 0, sizeof(*rctx));
+       INIT_LIST_HEAD(&rctx->chunks);
+       return 0;
+}
+
+static int cmh_aes_cmac_update(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_aes_cmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_aes_cmac_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_aes_cmac_chunk *chunk;
+       gfp_t gfp;
+       int ret;
+
+       if (!req->nbytes)
+               return 0;
+
+       if (req->nbytes > AES_CMAC_MAX_DATA - rctx->total_len) {
+               ret = -EINVAL;
+               goto err_free_chunks;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       chunk = kmalloc(sizeof(*chunk) + req->nbytes, gfp);
+       if (!chunk) {
+               ret = -ENOMEM;
+               goto err_free_chunks;
+       }
+
+       chunk->len = req->nbytes;
+       if (req->base.flags & CRYPTO_AHASH_REQ_VIRT)
+               memcpy(chunk->data, req->svirt, req->nbytes);
+       else
+               scatterwalk_map_and_copy(chunk->data, req->src,
+                                        0, req->nbytes, 0);
+
+       list_add_tail(&chunk->list, &rctx->chunks);
+       spin_lock_bh(&tctx->chunk_lock);
+       list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+       spin_unlock_bh(&tctx->chunk_lock);
+       rctx->total_len += req->nbytes;
+       return 0;
+
+err_free_chunks:
+       /*
+        * Terminal error -- free all previously accumulated chunks.
+        * callers may not call .final() on error, so they would leak.
+        */
+       cmh_aes_cmac_free_chunks(rctx, tctx);
+       return ret;
+}
+
+static void cmh_aes_cmac_complete(void *data, int error)
+{
+       struct ahash_request *req = data;
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_aes_cmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_aes_cmac_reqctx *rctx = ahash_request_ctx(req);
+
+       if (error == -EINPROGRESS) {
+               cmh_complete(&req->base, error);
+               return;
+       }
+
+       /* Unmap DMA */
+       if (rctx->total_len > 0)
+               cmh_dma_unmap_single(rctx->in_dma, rctx->total_len,
+                                    DMA_TO_DEVICE);
+       cmh_dma_unmap_single(rctx->tag_dma, AES_CMAC_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+
+       if (!error)
+               memcpy(req->result, rctx->tag_buf, AES_CMAC_DIGEST_SIZE);
+
+       kfree(rctx->tag_buf);
+       rctx->tag_buf = NULL;
+       kfree_sensitive(rctx->buf);
+       rctx->buf = NULL;
+       cmh_aes_cmac_free_chunks(rctx, tctx);
+       cmh_complete(&req->base, error);
+}
+
+static int cmh_aes_cmac_final(struct ahash_request *req)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_aes_cmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_aes_cmac_reqctx *rctx = ahash_request_ctx(req);
+       struct vcq_cmd cmds[CMH_AES_CMAC_MAX_PAYLOAD];
+       u64 key_ref;
+       u32 keylen;
+       struct core_dispatch d;
+       s32 target_mbx;
+       u32 core_id;
+       u32 idx;
+       int ret;
+       gfp_t gfp;
+
+       if (tctx->key.mode == CMH_KEY_NONE) {
+               ret = -ENOKEY;
+               goto out_free_buf;
+       }
+
+       gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+             GFP_KERNEL : GFP_ATOMIC;
+
+       /* Linearise accumulated chunks into a contiguous buffer for DMA */
+       if (rctx->total_len > 0) {
+               struct cmh_aes_cmac_chunk *c;
+               u32 off = 0;
+
+               rctx->buf = kmalloc(rctx->total_len, gfp);
+               if (!rctx->buf) {
+                       ret = -ENOMEM;
+                       goto out_free_chunks;
+               }
+               list_for_each_entry(c, &rctx->chunks, list) {
+                       memcpy(rctx->buf + off, c->data, c->len);
+                       off += c->len;
+               }
+       }
+
+       /* Tag output buffer */
+       rctx->tag_buf = kzalloc(AES_CMAC_DIGEST_SIZE, gfp);
+       if (!rctx->tag_buf) {
+               ret = -ENOMEM;
+               goto out_free_buf;
+       }
+
+       rctx->tag_dma = cmh_dma_map_single(rctx->tag_buf,
+                                          AES_CMAC_DIGEST_SIZE,
+                                           DMA_FROM_DEVICE);
+       if (cmh_dma_map_error(rctx->tag_dma)) {
+               ret = -ENOMEM;
+               goto out_free_tag;
+       }
+
+       /* Map input data (may be zero-length for empty CMAC) */
+       if (rctx->total_len > 0) {
+               rctx->in_dma = cmh_dma_map_single(rctx->buf, rctx->total_len,
+                                                 DMA_TO_DEVICE);
+               if (cmh_dma_map_error(rctx->in_dma)) {
+                       ret = -ENOMEM;
+                       goto out_unmap_tag;
+               }
+       }
+
+       /* Resolve key */
+       idx = 0;
+
+       rctx->key_dma = tctx->key.raw.dma;
+       rctx->keylen = tctx->key.raw.len;
+       vcq_add_sys_write(&cmds[idx++], SYS_REF_TEMP,
+                         (u64)rctx->key_dma, SYS_REF_NONE,
+                         tctx->key.raw.len,
+                         tctx->key.raw.sys_type);
+       key_ref = SYS_REF_TEMP;
+       keylen = tctx->key.raw.len;
+       d = cmh_core_select_instance(CMH_CORE_AES);
+       target_mbx = d.mbx_idx;
+       core_id = d.core_id;
+
+       /*
+        * INIT: mode=CMAC, op=ENCRYPT (CMAC always "encrypts")
+        * CMAC data goes through the AAD path:
+        *   aadlen = total data length, iolen = 0
+        */
+       {
+               struct vcq_cmd *slot = &cmds[idx++];
+
+               memset(slot, 0, sizeof(*slot));
+               slot->magic = VCQ_CMD_MAGIC;
+               slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_INIT);
+               slot->hwc.aes.cmd_init.key = key_ref;
+               slot->hwc.aes.cmd_init.iv = 0;
+               slot->hwc.aes.cmd_init.keylen = keylen;
+               slot->hwc.aes.cmd_init.ivlen = 0;
+               slot->hwc.aes.cmd_init.mode = AES_MODE_CMAC;
+               slot->hwc.aes.cmd_init.op = AES_OP_ENCRYPT;
+               slot->hwc.aes.cmd_init.aadlen = rctx->total_len;
+               slot->hwc.aes.cmd_init.iolen = 0;
+               slot->hwc.aes.cmd_init.taglen = AES_CMAC_DIGEST_SIZE;
+       }
+
+       /* AAD_FINAL_AUTH: final AAD + tag extraction in one atomic step */
+       {
+               struct vcq_cmd *slot = &cmds[idx++];
+
+               memset(slot, 0, sizeof(*slot));
+               slot->magic = VCQ_CMD_MAGIC;
+               slot->id = VCQ_CMD_ID(core_id, 0, 1, AES_CMD_AAD_FINAL_AUTH);
+               slot->hwc.aes.cmd_aad_final_auth.data =
+                       rctx->total_len > 0 ? (u64)rctx->in_dma : 0;
+               slot->hwc.aes.cmd_aad_final_auth.datalen = rctx->total_len;
+               slot->hwc.aes.cmd_aad_final_auth.tag = (u64)rctx->tag_dma;
+               slot->hwc.aes.cmd_aad_final_auth.taglen = AES_CMAC_DIGEST_SIZE;
+       }
+
+       vcq_add_flush(&cmds[idx++], core_id);
+
+       ret = cmh_vcq_pack_and_submit_async(cmds, idx, rctx->packed,
+                                           CMH_AES_CMAC_MAX_PACKED,
+                                           target_mbx,
+                                           cmh_aes_cmac_complete, req,
+                                           !!(req->base.flags &
+                                              CRYPTO_TFM_REQ_MAY_BACKLOG),
+                                           cmh_tm_async_timeout_jiffies());
+       /* -EBUSY = backlogged; ownership transferred to callback. */
+       if (ret == -EBUSY)
+               return -EBUSY;
+       if (ret)
+               goto out_cleanup_all;
+
+       return -EINPROGRESS;
+
+out_cleanup_all:
+       if (rctx->total_len > 0 && !cmh_dma_map_error(rctx->in_dma))
+               cmh_dma_unmap_single(rctx->in_dma, rctx->total_len,
+                                    DMA_TO_DEVICE);
+out_unmap_tag:
+       cmh_dma_unmap_single(rctx->tag_dma, AES_CMAC_DIGEST_SIZE,
+                            DMA_FROM_DEVICE);
+out_free_tag:
+       kfree(rctx->tag_buf);
+out_free_buf:
+out_free_chunks:
+       cmh_aes_cmac_free_chunks(rctx, tctx);
+       kfree_sensitive(rctx->buf);
+       rctx->buf = NULL;
+       rctx->total_len = 0;
+       return ret;
+}
+
+/*
+ * ahash .export()/.import(): serialize/deserialize the software
+ * accumulation buffer.  No HW state is involved -- the AES core
+ * does not support save/restore, but we only export the input queue.
+ */
+
+static int cmh_aes_cmac_export(struct ahash_request *req, void *out)
+{
+       struct cmh_aes_cmac_reqctx *rctx = ahash_request_ctx(req);
+       struct cmh_aes_cmac_export_state *state = out;
+       struct cmh_aes_cmac_chunk *chunk;
+       u32 offset = 0;
+
+       if (rctx->total_len > CMH_AES_CMAC_EXPORT_MAX)
+               return -ENOSPC;
+
+       state->total_len = rctx->total_len;
+       list_for_each_entry(chunk, &rctx->chunks, list) {
+               memcpy(state->data + offset, chunk->data, chunk->len);
+               offset += chunk->len;
+       }
+       return 0;
+}
+
+static int cmh_aes_cmac_import(struct ahash_request *req, const void *in)
+{
+       struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+       struct cmh_aes_cmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_aes_cmac_reqctx *rctx = ahash_request_ctx(req);
+       const struct cmh_aes_cmac_export_state *state = in;
+       struct cmh_aes_cmac_chunk *chunk;
+
+       /*
+        * Do NOT call free_chunks() here: the crypto API does not
+        * guarantee the request context is in a valid state before
+        * import(), so the list pointers may be stale or invalid.
+        * Re-initialize from scratch instead.  Any pre-existing chunks
+        * are tracked on tctx->all_chunks and freed in exit_tfm.
+        */
+       memset(rctx, 0, sizeof(*rctx));
+       INIT_LIST_HEAD(&rctx->chunks);
+
+       if (state->total_len > CMH_AES_CMAC_EXPORT_MAX)
+               return -EINVAL;
+
+       if (state->total_len) {
+               chunk = kmalloc(sizeof(*chunk) + state->total_len, GFP_KERNEL);
+               if (!chunk)
+                       return -ENOMEM;
+               chunk->len = state->total_len;
+               memcpy(chunk->data, state->data, state->total_len);
+               list_add_tail(&chunk->list, &rctx->chunks);
+               spin_lock_bh(&tctx->chunk_lock);
+               list_add_tail(&chunk->tfm_node, &tctx->all_chunks);
+               spin_unlock_bh(&tctx->chunk_lock);
+               rctx->total_len = state->total_len;
+       }
+       return 0;
+}
+
+static int cmh_aes_cmac_finup(struct ahash_request *req)
+{
+       int err;
+
+       err = cmh_aes_cmac_update(req);
+       if (err)
+               return err;
+       return cmh_aes_cmac_final(req);
+}
+
+static int cmh_aes_cmac_digest(struct ahash_request *req)
+{
+       int err;
+
+       err = cmh_aes_cmac_init(req);
+       if (err)
+               return err;
+       return cmh_aes_cmac_finup(req);
+}
+
+static int cmh_aes_cmac_init_tfm(struct crypto_ahash *tfm)
+{
+       struct cmh_aes_cmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+
+       memset(tctx, 0, sizeof(*tctx));
+       spin_lock_init(&tctx->chunk_lock);
+       INIT_LIST_HEAD(&tctx->all_chunks);
+       crypto_ahash_set_reqsize(tfm, sizeof(struct cmh_aes_cmac_reqctx));
+       return 0;
+}
+
+static void cmh_aes_cmac_exit_tfm(struct crypto_ahash *tfm)
+{
+       struct cmh_aes_cmac_tfm_ctx *tctx = crypto_ahash_ctx(tfm);
+       struct cmh_aes_cmac_chunk *c, *tmp;
+
+       /* Free any orphaned chunks (e.g. testmgr export/reimport poison) */
+       spin_lock_bh(&tctx->chunk_lock);
+       list_for_each_entry_safe(c, tmp, &tctx->all_chunks, tfm_node) {
+               list_del(&c->tfm_node);
+               kfree_sensitive(c);
+       }
+       spin_unlock_bh(&tctx->chunk_lock);
+
+       cmh_key_destroy(&tctx->key);
+}
+
+static struct ahash_alg cmh_aes_cmac_alg = {
+       .init           = cmh_aes_cmac_init,
+       .update         = cmh_aes_cmac_update,
+       .final          = cmh_aes_cmac_final,
+       .finup          = cmh_aes_cmac_finup,
+       .digest         = cmh_aes_cmac_digest,
+       .export         = cmh_aes_cmac_export,
+       .import         = cmh_aes_cmac_import,
+       .setkey         = cmh_aes_cmac_setkey,
+       .init_tfm       = cmh_aes_cmac_init_tfm,
+       .exit_tfm       = cmh_aes_cmac_exit_tfm,
+       .halg           = {
+               .digestsize     = AES_CMAC_DIGEST_SIZE,
+               .statesize      = CMH_AES_CMAC_STATE_SIZE,
+               .base           = {
+                       .cra_name        = "cmac(aes)",
+                       .cra_driver_name = "cri-cmh-cmac-aes",
+                       .cra_priority    = 300,
+                       .cra_flags       = CRYPTO_ALG_KERN_DRIVER_ONLY |
+                                          CRYPTO_ALG_NO_FALLBACK |
+                                          CRYPTO_ALG_ASYNC |
+                                          CRYPTO_ALG_REQ_VIRT,
+                       .cra_blocksize   = AES_CMAC_BLOCK_SIZE,
+                       .cra_ctxsize     = sizeof(struct cmh_aes_cmac_tfm_ctx),
+                       .cra_module      = THIS_MODULE,
+               },
+       },
+};
+
+/**
+ * cmh_aes_cmac_register() - Register AES-CMAC hash algorithm with the crypto framework
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cmh_aes_cmac_register(void)
+{
+       int ret;
+
+       ret = crypto_register_ahash(&cmh_aes_cmac_alg);
+       if (ret)
+               dev_err(cmh_dev(), "cmh_aes_cmac: failed to register cmac(aes) (rc=%d)\n",
+                       ret);
+       else
+               dev_dbg(cmh_dev(), "cmh_aes_cmac: registered cmac(aes)\n");
+
+       return ret;
+}
+
+/**
+ * cmh_aes_cmac_unregister() - Unregister AES-CMAC hash algorithm from the crypto framework
+ */
+void cmh_aes_cmac_unregister(void)
+{
+       crypto_unregister_ahash(&cmh_aes_cmac_alg);
+       dev_dbg(cmh_dev(), "cmh_aes_cmac: unregistered cmac(aes)\n");
+}
diff --git a/drivers/crypto/cmh/cmh_main.c b/drivers/crypto/cmh/cmh_main.c
index 56541e0d4219..1edd8d14c666 100644
--- a/drivers/crypto/cmh/cmh_main.c
+++ b/drivers/crypto/cmh/cmh_main.c
@@ -34,6 +34,7 @@
 #include "cmh_cshake.h"
 #include "cmh_kmac.h"
 #include "cmh_sm3.h"
+#include "cmh_aes.h"
 #include "cmh_mgmt.h"
 #include "cmh_registers.h"
 #include "cmh_debugfs.h"
@@ -221,6 +222,21 @@ static int cmh_probe(struct platform_device *pdev)
        if (ret)
                goto err_sm3_register;

+       /* Register AES skcipher algorithms */
+       ret = cmh_aes_register();
+       if (ret)
+               goto err_aes_register;
+
+       /* Register AES AEAD algorithms (GCM, CCM) */
+       ret = cmh_aes_aead_register();
+       if (ret)
+               goto err_aes_aead_register;
+
+       /* Register AES CMAC algorithm */
+       ret = cmh_aes_cmac_register();
+       if (ret)
+               goto err_aes_cmac_register;
+
        /* Register key management device (/dev/cmh_mgmt) */
        ret = cmh_mgmt_register();
        if (ret)
@@ -233,6 +249,12 @@ static int cmh_probe(struct platform_device *pdev)
        return 0;

 err_mgmt_register:
+       cmh_aes_cmac_unregister();
+err_aes_cmac_register:
+       cmh_aes_aead_unregister();
+err_aes_aead_register:
+       cmh_aes_unregister();
+err_aes_register:
        cmh_sm3_unregister();
 err_sm3_register:
        cmh_kmac_unregister();
@@ -269,6 +291,9 @@ static void cmh_remove(struct platform_device *pdev)
        cfg = &dev->config;

        cmh_mgmt_unregister();
+       cmh_aes_cmac_unregister();
+       cmh_aes_aead_unregister();
+       cmh_aes_unregister();
        cmh_sm3_unregister();
        cmh_kmac_unregister();
        cmh_cshake_unregister();
diff --git a/drivers/crypto/cmh/include/cmh_aes.h b/drivers/crypto/cmh/include/cmh_aes.h
new file mode 100644
index 000000000000..591afaa36f85
--- /dev/null
+++ b/drivers/crypto/cmh/include/cmh_aes.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Cryptography Research, Inc. (CRI).
+ * CMH LKM -- AES Crypto API Drivers
+ *
+ * Registers AES algorithms with the Linux crypto subsystem:
+ *   skcipher: ecb/cbc/ctr/cfb/xts(aes)
+ *   aead:     gcm/ccm(aes)
+ *   shash:    cmac(aes)
+ */
+
+#ifndef CMH_AES_H
+#define CMH_AES_H
+
+int  cmh_aes_register(void);
+void cmh_aes_unregister(void);
+
+int  cmh_aes_aead_register(void);
+void cmh_aes_aead_unregister(void);
+
+int  cmh_aes_cmac_register(void);
+void cmh_aes_cmac_unregister(void);
+
+#endif /* CMH_AES_H */
--
2.43.7


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox