Linux block layer
 help / color / mirror / Atom feed
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Jani Nikula @ 2026-06-22  8:37 UTC (permalink / raw)
  To: Kaitao Cheng, Andrew Morton, David Hildenbrand, Jens Axboe,
	Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König
  Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
	Philipp Stanner, linux-block, linux-kernel, cgroups,
	linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
	dri-devel, linux-perf-users, linux-trace-kernel, kexec,
	live-patching, linux-modules, linux-crypto, linux-pm, rcu,
	sched-ext, linux-mm, virtualization, damon, llvm, chengkaitao
In-Reply-To: <20260622040533.29824-1-kaitao.cheng@linux.dev>

On Mon, 22 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> Add *_mutable() iterator variants for list, hlist and llist.  The new
> helpers are variadic and support both forms.  In the common case, the
> caller omits the temporary cursor and the macro creates a unique internal
> cursor with typeof(pos) and __UNIQUE_ID().  If a loop really needs an
> explicit temporary cursor, the caller can still pass it and the helper
> keeps the existing *_safe() behaviour.
>
> For example, a call site may use the shorter form:
>
>   list_for_each_entry_mutable(pos, head, member)
>
> or keep the explicit temporary cursor form:
>
>   list_for_each_entry_mutable(pos, tmp, head, member)

I'm unconvinced it's a good idea to allow two forms with macro trickery,
*especially* when it's not the last argument you can omit. I think it's
a footgun.

IMO stick with the first form only, and there'll always be the _safe
variant that can be used when the temp pointer is needed.


BR,
Jani.


-- 
Jani Nikula, Intel

^ permalink raw reply

* Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Eric Dumazet @ 2026-06-22  8:31 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: josef, axboe, linux-block, nbd, linux-kernel,
	syzbot+6b85d1e39a5b8ed9a954
In-Reply-To: <20260621235255.66015-1-kartikey406@gmail.com>

On Sun, Jun 21, 2026 at 4:53 PM Deepanshu Kartikey
<kartikey406@gmail.com> wrote:
>
>
> nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
> held at the point of reclassification. That assertion was copied from
> nvme-tcp, where the socket is created internally by the kernel
> (sock_create_kern()) and is never visible to user space, so the lock
> is guaranteed to be free.
>
> NBD is different: the socket is looked up from a user-supplied fd in
> nbd_get_socket(), and user space retains that fd. A concurrent syscall
> on the same socket (or softirq processing taking bh_lock_sock() on a
> connected TCP socket) can legitimately hold the lock at the instant
> NBD reclassifies it. sock_allow_reclassification() then returns false
> and the WARN_ON_ONCE() fires, which turns into a crash under
> panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
> against socket activity on the same fd, as reported by syzbot.
>
> Hitting a held lock here is expected for an externally owned socket and
> is not a kernel bug, so skip reclassification silently instead of
> warning. Reclassification is a lockdep-only annotation, so skipping it
> in the rare racing case is harmless.

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks!

^ permalink raw reply

* Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Eric Dumazet @ 2026-06-22  8:18 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Deepanshu Kartikey, linux-block, nbd, linux-kernel,
	syzbot+6b85d1e39a5b8ed9a954
In-Reply-To: <20260622014328.115-1-hdanton@sina.com>

On Sun, Jun 21, 2026 at 6:43 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Mon, 22 Jun 2026 05:22:55 +0530 Deepanshu Kartikey wrote:
> > nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
> > held at the point of reclassification. That assertion was copied from
> > nvme-tcp, where the socket is created internally by the kernel
> > (sock_create_kern()) and is never visible to user space, so the lock
> > is guaranteed to be free.
> >
> > NBD is different: the socket is looked up from a user-supplied fd in
> > nbd_get_socket(), and user space retains that fd. A concurrent syscall
> > on the same socket (or softirq processing taking bh_lock_sock() on a
> > connected TCP socket) can legitimately hold the lock at the instant
> > NBD reclassifies it. sock_allow_reclassification() then returns false
> > and the WARN_ON_ONCE() fires, which turns into a crash under
> > panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
> > against socket activity on the same fd, as reported by syzbot.
> >
> Given the syzbot report, if you are right (I suspect) then Eric delivered
> another half-baked croissant, and feel free to cut it off instead to make
> room for correct fix.

Nobody (including you) caught this.difference between nbd and other
sock_allow_reclassification()
callers.

What was the "correct fix" you envisioned exactly?

^ permalink raw reply

* [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create error path
From: Zizhi Wo @ 2026-06-22  7:07 UTC (permalink / raw)
  To: axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi
In-Reply-To: <20260622070714.1158886-1-wozizhi@huaweicloud.com>

From: Zizhi Wo <wozizhi@huawei.com>

When blkg_create() fails before the blkg is linked onto blkcg->blkg_list
and q->blkg_list (e.g. radix_tree_insert() fails or the blkg_lookup()
returns NULL), the blkg is freed asynchronously via blkg_free_workfn().

Since such a blkg was never linked, it is invisible to
blkcg_deactivate_policy(), so its blkg->pd[] entries can not be cleared in
it. blkg_free_workfn() then calls blkcg_policy->pd_free_fn() on them, which
can race with bfq module exit (bfq_exit() -> blkcg_policy_unregister())
clearing the blkcg_policy[] slot, leading to a NULL pointer dereference:

[   72.597786] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[   72.598690] CPU: 35 UID: 0 PID: 458 Comm: kworker/35:1 Not tainted 7.1.0+ #33 PREEMPT(full)
[   72.599518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
[   72.600342] Workqueue: events blkg_free_workfn
[   72.600991] RIP: 0010:blkg_free_workfn+0x115/0x3d0
......
[   72.613278] Call Trace:
[   72.613988]  <TASK>
[   72.614357]  process_one_work+0x6b4/0xff0
[   72.615251]  ? __pfx_blkg_free_workfn+0x10/0x10
[   72.616041]  ? assign_work+0x131/0x3f0
[   72.616962]  worker_thread+0x4eb/0xd50
[   72.617599]  ? __kthread_parkme+0x8d/0x170
[   72.618565]  ? __pfx_worker_thread+0x10/0x10
[   72.619566]  ? __pfx_worker_thread+0x10/0x10
[   72.620213]  kthread+0x327/0x410
......

Fix this by introducing blkg_free_pd() to synchronously free the pd and
clear blkg->pd[] in the blkg_create() error path, while the blkcg_policy
is still valid.

Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
---
 block/blk-cgroup.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6386fe413994..673886d71c26 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -109,28 +109,37 @@ static struct cgroup_subsys_state *blkcg_css(void)
 	if (css)
 		return css;
 	return task_css(current, io_cgrp_id);
 }
 
+static void blkg_free_pd(struct blkcg_gq *blkg)
+{
+	int i;
+
+	for (i = 0; i < BLKCG_MAX_POLS; i++) {
+		if (blkg->pd[i]) {
+			blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
+			blkg->pd[i] = NULL;
+		}
+	}
+}
+
 static void blkg_free_workfn(struct work_struct *work)
 {
 	struct blkcg_gq *blkg = container_of(work, struct blkcg_gq,
 					     free_work);
 	struct request_queue *q = blkg->q;
-	int i;
 
 	/*
 	 * pd_free_fn() can also be called from blkcg_deactivate_policy(),
 	 * in order to make sure pd_free_fn() is called in order, the deletion
 	 * of the list blkg->q_node is delayed to here from blkg_destroy(), and
 	 * blkcg_mutex is used to synchronize blkg_free_workfn() and
 	 * blkcg_deactivate_policy().
 	 */
 	mutex_lock(&q->blkcg_mutex);
-	for (i = 0; i < BLKCG_MAX_POLS; i++)
-		if (blkg->pd[i])
-			blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
+	blkg_free_pd(blkg);
 	if (blkg->parent)
 		blkg_put(blkg->parent);
 	spin_lock_irq(&q->queue_lock);
 	list_del_init(&blkg->q_node);
 	spin_unlock_irq(&q->queue_lock);
@@ -436,19 +445,28 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
 	spin_unlock(&blkcg->lock);
 
 	if (!ret)
 		return blkg;
 
-	/* @blkg failed fully initialized, use the usual release path */
+	/*
+	 * @blkg failed fully initialized and never linked, so its pd[] is
+	 * invisible to blkcg_deactivate_policy(). Free pd[] synchronously
+	 * while blkcg_policy[] is still valid, otherwise the async free path
+	 * may call pd_free_fn() after the policy is unregistered (e.g. rmmod bfq).
+	 * The err_free_blkg path below frees pd[] for the same reason.
+	 */
+	blkg_free_pd(blkg);
 	percpu_ref_kill(&blkg->refcnt);
 	return ERR_PTR(ret);
 
 err_put_css:
 	css_put(&blkcg->css);
 err_free_blkg:
-	if (new_blkg)
+	if (new_blkg) {
+		blkg_free_pd(new_blkg);
 		blkg_free(new_blkg);
+	}
 	return ERR_PTR(ret);
 }
 
 /**
  * blkg_lookup_create - lookup blkg, try to create one if not there
-- 
2.52.0


^ permalink raw reply related

* [PATCH 0/2] fix two issues in blkg_create() error path
From: Zizhi Wo @ 2026-06-22  7:07 UTC (permalink / raw)
  To: axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi

From: Zizhi Wo <wozizhi@huawei.com>

This series fixes two issues on the blkg_create() error path.

Patch 1 fixes a blkg leak when blkg_create() fails.

Patch 2 fixes a null-ptr-deref by freeing blkg->pd synchronously on the
error path, otherwise the async free path may dereference an already
unregistered blkcg_policy.

Zizhi Wo (2):
  blk-cgroup: fix blkg leak in blkg_create() error path
  blk-cgroup: fix Null-ptr-deref by freeing blkg pd on blkg_create error
    path

 block/blk-cgroup.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

-- 
2.52.0


^ permalink raw reply

* [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path
From: Zizhi Wo @ 2026-06-22  7:07 UTC (permalink / raw)
  To: axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi
In-Reply-To: <20260622070714.1158886-1-wozizhi@huaweicloud.com>

When radix_tree_insert() fails in blkg_create(), the error path calls
blkg_put() to release the blkg. This was correct when blkg->refcnt was an
atomic_t: blkg_put() dropped it to 0 and triggered the release path.

But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use
percpu_ref") switched refcnt to a percpu_ref. In percpu mode
percpu_ref_put() never checks for zero, so the release callback is never
invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so
blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call
blkg_destroy()->percpu_ref_kill() either, cause the leak.

Fix it by killing the percpu_ref instead, which switches it to atomic mode
and drops the initial ref.

Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
---
 block/blk-cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index bc63bd220865..6386fe413994 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
 
 	if (!ret)
 		return blkg;
 
 	/* @blkg failed fully initialized, use the usual release path */
-	blkg_put(blkg);
+	percpu_ref_kill(&blkg->refcnt);
 	return ERR_PTR(ret);
 
 err_put_css:
 	css_put(&blkcg->css);
 err_free_blkg:
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching
From: Leonid Ravich @ 2026-06-22  7:10 UTC (permalink / raw)
  To: Herbert Xu, Eric Biggers
  Cc: Alasdair Kergon, Ard Biesheuvel, Jens Axboe, dm-devel,
	linux-block
In-Reply-To: <ajDWww8OgNcXs73c@gondor.apana.org.au>

On Mon, Jun 15, 2026 at 03:53:17PM -0700, Eric Biggers wrote:
> So in other words, this series slows down dm-crypt and crypto_skcipher
> for everyone to optimize for an out-of-tree driver.  And there's also no
> benchmark showing that your driver is even worth it over just using the
> CPU.

I measured on arm64 (Graviton3, dm-crypt + xts-aes-ce, RAM-backed,
fixed CPU freq):

  - 4 KiB random write, 512-byte sectors: v4 as posted regressed ~5%.
    Root cause (ftrace): a per-bio kmalloc_array() for the scatterlists,
    where the per-sector path uses dm-crypt's inline sg_in[]/sg_out[].

  - Reusing the inline arrays when the segment count fits (heap only for
    larger bios) removes the regression, back to parity. This will be in
    the dm-crypt patch for v5.

So the software path is neutral after the fix, not slower. No software throughput win
either: the auto-splitter still calls alg->encrypt per data unit. The win
is for a consumer that takes the whole request in one pass, a HW engine,
or any async offload engine that pays a fixed per-request cost,
it currently pays once per sector instead of once per bio.

I'd rather not over-complicate the patches until there's a general
ack on the direction: per-request data_unit_size + auto-split,
enabling one-pass consumers, neutral for everyone else. Is that direction
acceptable? If so I'll respin v5.

Thanks,
Leonid

^ permalink raw reply

* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-22  6:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König, David Howells, Simona Vetter, Randy Dunlap,
	Luca Ceresoli, Philipp Stanner, linux-block, LKML,
	open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
	io-uring, audit, bpf, Network Development, dri-devel,
	linux-perf-use., linux-trace-kernel, kexec, live-patching,
	linux-modules, Linux Crypto Mailing List, Linux Power Management,
	rcu, sched-ext, linux-mm, virtualization, damon,
	clang-built-linux, chengkaitao, Muchun Song
In-Reply-To: <CAADnVQJmPWFT01b7DuLdtafv=8FyB84GYHNZ8zSTck+9Aw0JpA@mail.gmail.com>



在 2026/6/22 13:28, Alexei Starovoitov 写道:
> On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>
>> From: chengkaitao <chengkaitao@kylinos.cn>
>>
>> The list_for_each*_safe() helpers are used when the loop body may remove
>> the current entry.  Their current interface, however, forces every caller
>> to define a temporary cursor outside the macro and pass it in, even when
>> the caller never uses that cursor directly.  For most call sites this
>> extra cursor is just boilerplate required by the macro implementation.
>>
>> This is awkward because the saved next pointer is an internal detail of
>> the iteration.  Callers that only remove or move the current entry do not
>> need to spell it out.
>>
>> The _safe() suffix has also caused confusion.  Christian Koenig pointed
>> out that the name is easy to read as a thread-safe variant, especially
>> for beginners, even though it only means that the iterator keeps enough
>> state to tolerate removal of the current entry.  He suggested _mutable()
>> as a clearer description of what the loop permits.
>>
>> Add *_mutable() iterator variants for list, hlist and llist.  The new
>> helpers are variadic and support both forms.  In the common case, the
>> caller omits the temporary cursor and the macro creates a unique internal
>> cursor with typeof(pos) and __UNIQUE_ID().  If a loop really needs an
>> explicit temporary cursor, the caller can still pass it and the helper
>> keeps the existing *_safe() behaviour.
>>
>> For example, a call site may use the shorter form:
>>
>>   list_for_each_entry_mutable(pos, head, member)
>>
>> or keep the explicit temporary cursor form:
>>
>>   list_for_each_entry_mutable(pos, tmp, head, member)
>>
>> The existing *_safe() helpers remain available for compatibility.  This
>> series only converts users in mm, block, kernel, init and io_uring.  If
>> this approach looks acceptable, the remaining users can be converted in
>> follow-up series.
>>
>> Changes in v3 (Christian König, Andy Shevchenko):
>> - Convert safe list walks to mutable iterators
>>
>> Changes in v2 (Muchun Song, Andy Shevchenko):
>> - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>>   cursor change directly in the existing list_for_each_entry*() helpers.
>> - Open-code special list walks that rely on updating the loop cursor in
>>   the body, preserving their existing traversal semantics.
>>
>> Link to v2:
>> https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/
>>
>> Link to v1:
>> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
>>
>> Kaitao Cheng (7):
>>   list: Add mutable iterator variants
>>   llist: Add mutable iterator variants
>>   mm: Use mutable list iterators
>>   block: Use mutable list iterators
>>   kernel: Use mutable list iterators
>>   initramfs: Use mutable list iterator
>>   io_uring: Use mutable list iterators
>>
>>  block/bfq-iosched.c                 |  17 +-
>>  block/blk-cgroup.c                  |  12 +-
>>  block/blk-flush.c                   |   4 +-
>>  block/blk-iocost.c                  |  18 +-
>>  block/blk-mq.c                      |   8 +-
>>  block/blk-throttle.c                |   4 +-
>>  block/kyber-iosched.c               |   4 +-
>>  block/partitions/ldm.c              |   8 +-
>>  block/sed-opal.c                    |   4 +-
>>  include/linux/list.h                | 269 ++++++++++++++++++++++++----
>>  include/linux/llist.h               |  81 +++++++--
>>  init/initramfs.c                    |   5 +-
>>  io_uring/cancel.c                   |   6 +-
>>  io_uring/poll.c                     |   3 +-
>>  io_uring/rw.c                       |   4 +-
>>  io_uring/timeout.c                  |   8 +-
>>  io_uring/uring_cmd.c                |   3 +-
>>  kernel/audit_tree.c                 |   4 +-
>>  kernel/audit_watch.c                |  16 +-
>>  kernel/auditfilter.c                |   4 +-
>>  kernel/auditsc.c                    |   4 +-
>>  kernel/bpf/arena.c                  |  10 +-
>>  kernel/bpf/arraymap.c               |   8 +-
>>  kernel/bpf/bpf_local_storage.c      |   3 +-
>>  kernel/bpf/bpf_lru_list.c           |  25 ++-
>>  kernel/bpf/btf.c                    |  18 +-
>>  kernel/bpf/cgroup.c                 |   7 +-
>>  kernel/bpf/cpumap.c                 |   4 +-
>>  kernel/bpf/devmap.c                 |  10 +-
>>  kernel/bpf/helpers.c                |   8 +-
>>  kernel/bpf/local_storage.c          |   4 +-
>>  kernel/bpf/memalloc.c               |  16 +-
>>  kernel/bpf/offload.c                |   8 +-
>>  kernel/bpf/states.c                 |   4 +-
>>  kernel/bpf/stream.c                 |   4 +-
>>  kernel/bpf/verifier.c               |   6 +-
>>  kernel/cgroup/cgroup-v1.c           |   4 +-
>>  kernel/cgroup/cgroup.c              |  54 +++---
>>  kernel/cgroup/dmem.c                |  12 +-
>>  kernel/cgroup/rdma.c                |   8 +-
>>  kernel/events/core.c                |  44 +++--
>>  kernel/events/uprobes.c             |  12 +-
>>  kernel/exit.c                       |   8 +-
>>  kernel/fail_function.c              |   4 +-
>>  kernel/gcov/clang.c                 |   4 +-
>>  kernel/irq_work.c                   |   4 +-
>>  kernel/kexec_core.c                 |   4 +-
>>  kernel/kprobes.c                    |  16 +-
>>  kernel/livepatch/core.c             |   4 +-
>>  kernel/livepatch/core.h             |   4 +-
>>  kernel/liveupdate/kho_block.c       |   4 +-
>>  kernel/liveupdate/luo_flb.c         |   4 +-
>>  kernel/locking/rwsem.c              |   2 +-
>>  kernel/locking/test-ww_mutex.c      |   2 +-
>>  kernel/module/main.c                |  11 +-
>>  kernel/padata.c                     |   4 +-
>>  kernel/power/snapshot.c             |   8 +-
>>  kernel/power/wakelock.c             |   4 +-
>>  kernel/printk/printk.c              |  11 +-
>>  kernel/ptrace.c                     |   4 +-
>>  kernel/rcu/rcutorture.c             |   3 +-
>>  kernel/rcu/tasks.h                  |   9 +-
>>  kernel/rcu/tree.c                   |   6 +-
>>  kernel/resource.c                   |   4 +-
>>  kernel/sched/core.c                 |   4 +-
>>  kernel/sched/ext.c                  |  22 +--
>>  kernel/sched/fair.c                 |  28 +--
>>  kernel/sched/topology.c             |   4 +-
>>  kernel/sched/wait.c                 |   4 +-
>>  kernel/seccomp.c                    |   4 +-
>>  kernel/signal.c                     |  11 +-
>>  kernel/smp.c                        |   4 +-
>>  kernel/taskstats.c                  |   8 +-
>>  kernel/time/clockevents.c           |   6 +-
>>  kernel/time/clocksource.c           |   4 +-
>>  kernel/time/posix-cpu-timers.c      |   4 +-
>>  kernel/time/posix-timers.c          |   3 +-
>>  kernel/torture.c                    |   3 +-
>>  kernel/trace/bpf_trace.c            |   4 +-
>>  kernel/trace/ftrace.c               |  49 +++--
>>  kernel/trace/ring_buffer.c          |  25 ++-
>>  kernel/trace/trace.c                |  12 +-
>>  kernel/trace/trace_dynevent.c       |   6 +-
>>  kernel/trace/trace_dynevent.h       |   5 +-
>>  kernel/trace/trace_events.c         |  35 ++--
>>  kernel/trace/trace_events_filter.c  |   4 +-
>>  kernel/trace/trace_events_hist.c    |   8 +-
>>  kernel/trace/trace_events_trigger.c |  17 +-
>>  kernel/trace/trace_events_user.c    |  16 +-
>>  kernel/trace/trace_stat.c           |   4 +-
>>  kernel/user-return-notifier.c       |   3 +-
>>  kernel/workqueue.c                  |  16 +-
>>  mm/backing-dev.c                    |   8 +-
>>  mm/balloon.c                        |   8 +-
>>  mm/cma.c                            |   4 +-
>>  mm/compaction.c                     |   4 +-
>>  mm/damon/core.c                     |   4 +-
>>  mm/damon/sysfs-schemes.c            |   4 +-
>>  mm/dmapool.c                        |   4 +-
>>  mm/huge_memory.c                    |   8 +-
>>  mm/hugetlb.c                        |  56 +++---
>>  mm/hugetlb_vmemmap.c                |  16 +-
>>  mm/khugepaged.c                     |  14 +-
>>  mm/kmemleak.c                       |   7 +-
>>  mm/ksm.c                            |  25 +--
>>  mm/list_lru.c                       |   4 +-
>>  mm/memcontrol-v1.c                  |   8 +-
>>  mm/memory-failure.c                 |  12 +-
>>  mm/memory-tiers.c                   |   4 +-
>>  mm/migrate.c                        |  23 ++-
>>  mm/mmu_notifier.c                   |   9 +-
>>  mm/page_alloc.c                     |   8 +-
>>  mm/page_reporting.c                 |   2 +-
>>  mm/percpu.c                         |  11 +-
>>  mm/pgtable-generic.c                |   4 +-
>>  mm/rmap.c                           |  10 +-
>>  mm/shmem.c                          |   9 +-
>>  mm/slab_common.c                    |  14 +-
>>  mm/slub.c                           |  33 ++--
>>  mm/swapfile.c                       |   4 +-
>>  mm/userfaultfd.c                    |  12 +-
>>  mm/vmalloc.c                        |  24 +--
>>  mm/vmscan.c                         |   7 +-
>>  mm/zsmalloc.c                       |   4 +-
>>  124 files changed, 875 insertions(+), 681 deletions(-)
> 
> Not sure what you were thinking, but this diff stat
> is not landable.

[PATCH v3 1/7] and [PATCH v3 2/7] contain the main logic and can
be merged directly. They are also compatible with the old API.
[PATCH v3 3/7] through [PATCH v3 7/7] are just simple interface
replacements and do not change any functional logic. They can be
left unmerged for now; individual modules can pick them up later
if needed.

In v2, Andy Shevchenko mentioned: "If it's done by Linus himself
during the day when he prepares -rc1, it's fine." Even so, the
changes in this patch series are indeed quite large and touch
almost every subsystem. I have only converted part of them for
now, so I wanted to send this out first and see what people think.

-- 
Thanks
Kaitao Cheng


^ permalink raw reply

* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Alexei Starovoitov @ 2026-06-22  5:28 UTC (permalink / raw)
  To: Kaitao Cheng
  Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König, David Howells, Simona Vetter, Randy Dunlap,
	Luca Ceresoli, Philipp Stanner, linux-block, LKML,
	open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
	io-uring, audit, bpf, Network Development, dri-devel,
	linux-perf-use., linux-trace-kernel, kexec, live-patching,
	linux-modules, Linux Crypto Mailing List, Linux Power Management,
	rcu, sched-ext, linux-mm, virtualization, damon,
	clang-built-linux, chengkaitao
In-Reply-To: <20260622040533.29824-1-kaitao.cheng@linux.dev>

On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>
> From: chengkaitao <chengkaitao@kylinos.cn>
>
> The list_for_each*_safe() helpers are used when the loop body may remove
> the current entry.  Their current interface, however, forces every caller
> to define a temporary cursor outside the macro and pass it in, even when
> the caller never uses that cursor directly.  For most call sites this
> extra cursor is just boilerplate required by the macro implementation.
>
> This is awkward because the saved next pointer is an internal detail of
> the iteration.  Callers that only remove or move the current entry do not
> need to spell it out.
>
> The _safe() suffix has also caused confusion.  Christian Koenig pointed
> out that the name is easy to read as a thread-safe variant, especially
> for beginners, even though it only means that the iterator keeps enough
> state to tolerate removal of the current entry.  He suggested _mutable()
> as a clearer description of what the loop permits.
>
> Add *_mutable() iterator variants for list, hlist and llist.  The new
> helpers are variadic and support both forms.  In the common case, the
> caller omits the temporary cursor and the macro creates a unique internal
> cursor with typeof(pos) and __UNIQUE_ID().  If a loop really needs an
> explicit temporary cursor, the caller can still pass it and the helper
> keeps the existing *_safe() behaviour.
>
> For example, a call site may use the shorter form:
>
>   list_for_each_entry_mutable(pos, head, member)
>
> or keep the explicit temporary cursor form:
>
>   list_for_each_entry_mutable(pos, tmp, head, member)
>
> The existing *_safe() helpers remain available for compatibility.  This
> series only converts users in mm, block, kernel, init and io_uring.  If
> this approach looks acceptable, the remaining users can be converted in
> follow-up series.
>
> Changes in v3 (Christian König, Andy Shevchenko):
> - Convert safe list walks to mutable iterators
>
> Changes in v2 (Muchun Song, Andy Shevchenko):
> - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>   cursor change directly in the existing list_for_each_entry*() helpers.
> - Open-code special list walks that rely on updating the loop cursor in
>   the body, preserving their existing traversal semantics.
>
> Link to v2:
> https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/
>
> Link to v1:
> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
>
> Kaitao Cheng (7):
>   list: Add mutable iterator variants
>   llist: Add mutable iterator variants
>   mm: Use mutable list iterators
>   block: Use mutable list iterators
>   kernel: Use mutable list iterators
>   initramfs: Use mutable list iterator
>   io_uring: Use mutable list iterators
>
>  block/bfq-iosched.c                 |  17 +-
>  block/blk-cgroup.c                  |  12 +-
>  block/blk-flush.c                   |   4 +-
>  block/blk-iocost.c                  |  18 +-
>  block/blk-mq.c                      |   8 +-
>  block/blk-throttle.c                |   4 +-
>  block/kyber-iosched.c               |   4 +-
>  block/partitions/ldm.c              |   8 +-
>  block/sed-opal.c                    |   4 +-
>  include/linux/list.h                | 269 ++++++++++++++++++++++++----
>  include/linux/llist.h               |  81 +++++++--
>  init/initramfs.c                    |   5 +-
>  io_uring/cancel.c                   |   6 +-
>  io_uring/poll.c                     |   3 +-
>  io_uring/rw.c                       |   4 +-
>  io_uring/timeout.c                  |   8 +-
>  io_uring/uring_cmd.c                |   3 +-
>  kernel/audit_tree.c                 |   4 +-
>  kernel/audit_watch.c                |  16 +-
>  kernel/auditfilter.c                |   4 +-
>  kernel/auditsc.c                    |   4 +-
>  kernel/bpf/arena.c                  |  10 +-
>  kernel/bpf/arraymap.c               |   8 +-
>  kernel/bpf/bpf_local_storage.c      |   3 +-
>  kernel/bpf/bpf_lru_list.c           |  25 ++-
>  kernel/bpf/btf.c                    |  18 +-
>  kernel/bpf/cgroup.c                 |   7 +-
>  kernel/bpf/cpumap.c                 |   4 +-
>  kernel/bpf/devmap.c                 |  10 +-
>  kernel/bpf/helpers.c                |   8 +-
>  kernel/bpf/local_storage.c          |   4 +-
>  kernel/bpf/memalloc.c               |  16 +-
>  kernel/bpf/offload.c                |   8 +-
>  kernel/bpf/states.c                 |   4 +-
>  kernel/bpf/stream.c                 |   4 +-
>  kernel/bpf/verifier.c               |   6 +-
>  kernel/cgroup/cgroup-v1.c           |   4 +-
>  kernel/cgroup/cgroup.c              |  54 +++---
>  kernel/cgroup/dmem.c                |  12 +-
>  kernel/cgroup/rdma.c                |   8 +-
>  kernel/events/core.c                |  44 +++--
>  kernel/events/uprobes.c             |  12 +-
>  kernel/exit.c                       |   8 +-
>  kernel/fail_function.c              |   4 +-
>  kernel/gcov/clang.c                 |   4 +-
>  kernel/irq_work.c                   |   4 +-
>  kernel/kexec_core.c                 |   4 +-
>  kernel/kprobes.c                    |  16 +-
>  kernel/livepatch/core.c             |   4 +-
>  kernel/livepatch/core.h             |   4 +-
>  kernel/liveupdate/kho_block.c       |   4 +-
>  kernel/liveupdate/luo_flb.c         |   4 +-
>  kernel/locking/rwsem.c              |   2 +-
>  kernel/locking/test-ww_mutex.c      |   2 +-
>  kernel/module/main.c                |  11 +-
>  kernel/padata.c                     |   4 +-
>  kernel/power/snapshot.c             |   8 +-
>  kernel/power/wakelock.c             |   4 +-
>  kernel/printk/printk.c              |  11 +-
>  kernel/ptrace.c                     |   4 +-
>  kernel/rcu/rcutorture.c             |   3 +-
>  kernel/rcu/tasks.h                  |   9 +-
>  kernel/rcu/tree.c                   |   6 +-
>  kernel/resource.c                   |   4 +-
>  kernel/sched/core.c                 |   4 +-
>  kernel/sched/ext.c                  |  22 +--
>  kernel/sched/fair.c                 |  28 +--
>  kernel/sched/topology.c             |   4 +-
>  kernel/sched/wait.c                 |   4 +-
>  kernel/seccomp.c                    |   4 +-
>  kernel/signal.c                     |  11 +-
>  kernel/smp.c                        |   4 +-
>  kernel/taskstats.c                  |   8 +-
>  kernel/time/clockevents.c           |   6 +-
>  kernel/time/clocksource.c           |   4 +-
>  kernel/time/posix-cpu-timers.c      |   4 +-
>  kernel/time/posix-timers.c          |   3 +-
>  kernel/torture.c                    |   3 +-
>  kernel/trace/bpf_trace.c            |   4 +-
>  kernel/trace/ftrace.c               |  49 +++--
>  kernel/trace/ring_buffer.c          |  25 ++-
>  kernel/trace/trace.c                |  12 +-
>  kernel/trace/trace_dynevent.c       |   6 +-
>  kernel/trace/trace_dynevent.h       |   5 +-
>  kernel/trace/trace_events.c         |  35 ++--
>  kernel/trace/trace_events_filter.c  |   4 +-
>  kernel/trace/trace_events_hist.c    |   8 +-
>  kernel/trace/trace_events_trigger.c |  17 +-
>  kernel/trace/trace_events_user.c    |  16 +-
>  kernel/trace/trace_stat.c           |   4 +-
>  kernel/user-return-notifier.c       |   3 +-
>  kernel/workqueue.c                  |  16 +-
>  mm/backing-dev.c                    |   8 +-
>  mm/balloon.c                        |   8 +-
>  mm/cma.c                            |   4 +-
>  mm/compaction.c                     |   4 +-
>  mm/damon/core.c                     |   4 +-
>  mm/damon/sysfs-schemes.c            |   4 +-
>  mm/dmapool.c                        |   4 +-
>  mm/huge_memory.c                    |   8 +-
>  mm/hugetlb.c                        |  56 +++---
>  mm/hugetlb_vmemmap.c                |  16 +-
>  mm/khugepaged.c                     |  14 +-
>  mm/kmemleak.c                       |   7 +-
>  mm/ksm.c                            |  25 +--
>  mm/list_lru.c                       |   4 +-
>  mm/memcontrol-v1.c                  |   8 +-
>  mm/memory-failure.c                 |  12 +-
>  mm/memory-tiers.c                   |   4 +-
>  mm/migrate.c                        |  23 ++-
>  mm/mmu_notifier.c                   |   9 +-
>  mm/page_alloc.c                     |   8 +-
>  mm/page_reporting.c                 |   2 +-
>  mm/percpu.c                         |  11 +-
>  mm/pgtable-generic.c                |   4 +-
>  mm/rmap.c                           |  10 +-
>  mm/shmem.c                          |   9 +-
>  mm/slab_common.c                    |  14 +-
>  mm/slub.c                           |  33 ++--
>  mm/swapfile.c                       |   4 +-
>  mm/userfaultfd.c                    |  12 +-
>  mm/vmalloc.c                        |  24 +--
>  mm/vmscan.c                         |   7 +-
>  mm/zsmalloc.c                       |   4 +-
>  124 files changed, 875 insertions(+), 681 deletions(-)

Not sure what you were thinking, but this diff stat
is not landable.

pw-bot: cr

^ permalink raw reply

* [PATCH v3 4/7] block: Use mutable list iterators
From: Kaitao Cheng @ 2026-06-22  4:42 UTC (permalink / raw)
  To: Yu Kuai, Jens Axboe, Tejun Heo, Josef Bacik,
	Richard Russon (FlatCap), Jonathan Derrick
  Cc: linux-block, linux-kernel, cgroups, linux-ntfs-dev, Kaitao Cheng
In-Reply-To: <20260622040533.29824-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

The safe list iterators require callers to provide a temporary cursor
even when the cursor is only used by the iterator itself.  The mutable
iterator variants keep the same removal-safe traversal semantics while
allowing those internal cursors to be hidden from the call sites.

Convert block users of list, hlist and llist safe iterators to the new
mutable helpers.  Drop the now-unused temporary cursor variables where
the loop body does not inspect or reset them.

This is a mechanical cleanup with no intended change in traversal order
or list mutation behavior.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 block/bfq-iosched.c    | 17 +++++++----------
 block/blk-cgroup.c     | 12 ++++++------
 block/blk-flush.c      |  4 ++--
 block/blk-iocost.c     | 18 +++++++++---------
 block/blk-mq.c         |  8 ++++----
 block/blk-throttle.c   |  4 ++--
 block/kyber-iosched.c  |  4 ++--
 block/partitions/ldm.c |  8 ++++----
 block/sed-opal.c       |  4 ++--
 9 files changed, 38 insertions(+), 41 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 141c602d5e85..78e32ba0553d 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1209,9 +1209,8 @@ static int bfqq_process_refs(struct bfq_queue *bfqq)
 static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 {
 	struct bfq_queue *item;
-	struct hlist_node *n;
 
-	hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
+	hlist_for_each_entry_mutable(item, &bfqd->burst_list, burst_list_node)
 		hlist_del_init(&item->burst_list_node);
 
 	/*
@@ -1236,7 +1235,6 @@ static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 
 	if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
 		struct bfq_queue *pos, *bfqq_item;
-		struct hlist_node *n;
 
 		/*
 		 * Enough queues have been activated shortly after each
@@ -1260,8 +1258,8 @@ static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 		 * belonging to a large burst. So the burst list is not
 		 * needed any more. Remove it.
 		 */
-		hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
-					  burst_list_node)
+		hlist_for_each_entry_mutable(pos, &bfqd->burst_list,
+					     burst_list_node)
 			hlist_del_init(&pos->burst_list_node);
 	} else /*
 		* Burst not yet large: add bfqq to the burst list. Do
@@ -5330,7 +5328,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 void bfq_put_queue(struct bfq_queue *bfqq)
 {
 	struct bfq_queue *item;
-	struct hlist_node *n;
 	struct bfq_group *bfqg = bfqq_group(bfqq);
 
 	bfq_log_bfqq(bfqq->bfqd, bfqq, "put_queue: %p %d", bfqq, bfqq->ref);
@@ -5391,8 +5388,8 @@ void bfq_put_queue(struct bfq_queue *bfqq)
 		hlist_del_init(&bfqq->woken_list_node);
 
 	/* reset waker for all queues in woken list */
-	hlist_for_each_entry_safe(item, n, &bfqq->woken_list,
-				  woken_list_node) {
+	hlist_for_each_entry_mutable(item, &bfqq->woken_list,
+				     woken_list_node) {
 		item->waker_bfqq = NULL;
 		hlist_del_init(&item->woken_list_node);
 	}
@@ -7141,13 +7138,13 @@ static void bfq_depth_updated(struct request_queue *q)
 static void bfq_exit_queue(struct elevator_queue *e)
 {
 	struct bfq_data *bfqd = e->elevator_data;
-	struct bfq_queue *bfqq, *n;
+	struct bfq_queue *bfqq;
 	unsigned int actuator;
 
 	hrtimer_cancel(&bfqd->idle_slice_timer);
 
 	spin_lock_irq(&bfqd->lock);
-	list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
+	list_for_each_entry_mutable(bfqq, &bfqd->idle_list, bfqq_list)
 		bfq_deactivate_bfqq(bfqd, bfqq, false, false);
 	spin_unlock_irq(&bfqd->lock);
 
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3093c1c03902..ced900019f2e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -997,7 +997,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
 {
 	struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
 	struct llist_node *lnode;
-	struct blkg_iostat_set *bisc, *next_bisc;
+	struct blkg_iostat_set *bisc;
 	unsigned long flags;
 
 	rcu_read_lock();
@@ -1017,17 +1017,17 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
 	/*
 	 * Iterate only the iostat_cpu's queued in the lockless list.
 	 */
-	llist_for_each_entry_safe(bisc, next_bisc, lnode, lnode) {
+	llist_for_each_entry_mutable(bisc, lnode, lnode) {
 		struct blkcg_gq *blkg = bisc->blkg;
 		struct blkcg_gq *parent = blkg->parent;
 		struct blkg_iostat cur;
 		unsigned int seq;
 
 		/*
-		 * Order assignment of `next_bisc` from `bisc->lnode.next` in
-		 * llist_for_each_entry_safe and clearing `bisc->lqueued` for
-		 * avoiding to assign `next_bisc` with new next pointer added
-		 * in blk_cgroup_bio_start() in case of re-ordering.
+		 * Order the iterator's internal `bisc->lnode.next` load before
+		 * clearing `bisc->lqueued`, so the iterator can't pick up a new
+		 * next pointer added in blk_cgroup_bio_start() in case of
+		 * re-ordering.
 		 *
 		 * The pair barrier is implied in llist_add() in blk_cgroup_bio_start().
 		 */
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 403a46c86411..20654c2103f2 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -204,7 +204,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
 {
 	struct request_queue *q = flush_rq->q;
 	struct list_head *running;
-	struct request *rq, *n;
+	struct request *rq;
 	unsigned long flags = 0;
 	struct blk_flush_queue *fq = blk_get_flush_queue(flush_rq->mq_ctx);
 
@@ -243,7 +243,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
 	fq->flush_running_idx ^= 1;
 
 	/* and push the waiting requests to the next stage */
-	list_for_each_entry_safe(rq, n, running, queuelist) {
+	list_for_each_entry_mutable(rq, running, queuelist) {
 		unsigned int seq = blk_flush_cur_seq(rq);
 
 		BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 563cc7dcf348..2ca18e52bc13 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1718,7 +1718,7 @@ static void iocg_flush_stat_leaf(struct ioc_gq *iocg, struct ioc_now *now)
 static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now)
 {
 	LIST_HEAD(inner_walk);
-	struct ioc_gq *iocg, *tiocg;
+	struct ioc_gq *iocg;
 
 	/* flush leaves and build inner node walk list */
 	list_for_each_entry(iocg, target_iocgs, active_list) {
@@ -1727,7 +1727,7 @@ static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now)
 	}
 
 	/* keep flushing upwards by walking the inner list backwards */
-	list_for_each_entry_safe_reverse(iocg, tiocg, &inner_walk, walk_list) {
+	list_for_each_entry_mutable_reverse(iocg, &inner_walk, walk_list) {
 		iocg_flush_stat_upward(iocg);
 		list_del_init(&iocg->walk_list);
 	}
@@ -1848,7 +1848,7 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
 {
 	LIST_HEAD(over_hwa);
 	LIST_HEAD(inner_walk);
-	struct ioc_gq *iocg, *tiocg, *root_iocg;
+	struct ioc_gq *iocg, *root_iocg;
 	u32 after_sum, over_sum, over_target, gamma;
 
 	/*
@@ -1884,7 +1884,7 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
 		over_target = 0;
 	}
 
-	list_for_each_entry_safe(iocg, tiocg, &over_hwa, walk_list) {
+	list_for_each_entry_mutable(iocg, &over_hwa, walk_list) {
 		if (over_target)
 			iocg->hweight_after_donation =
 				div_u64((u64)iocg->hweight_after_donation *
@@ -2055,7 +2055,7 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
 	}
 
 	/* walk list should be dissolved after use */
-	list_for_each_entry_safe(iocg, tiocg, &inner_walk, walk_list)
+	list_for_each_entry_mutable(iocg, &inner_walk, walk_list)
 		list_del_init(&iocg->walk_list);
 }
 
@@ -2166,9 +2166,9 @@ static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
 static int ioc_check_iocgs(struct ioc *ioc, struct ioc_now *now)
 {
 	int nr_debtors = 0;
-	struct ioc_gq *iocg, *tiocg;
+	struct ioc_gq *iocg;
 
-	list_for_each_entry_safe(iocg, tiocg, &ioc->active_iocgs, active_list) {
+	list_for_each_entry_mutable(iocg, &ioc->active_iocgs, active_list) {
 		if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt &&
 		    !iocg->delay && !iocg_is_idle(iocg))
 			continue;
@@ -2234,7 +2234,7 @@ static int ioc_check_iocgs(struct ioc *ioc, struct ioc_now *now)
 static void ioc_timer_fn(struct timer_list *timer)
 {
 	struct ioc *ioc = container_of(timer, struct ioc, timer);
-	struct ioc_gq *iocg, *tiocg;
+	struct ioc_gq *iocg;
 	struct ioc_now now;
 	LIST_HEAD(surpluses);
 	int nr_debtors, nr_shortages = 0, nr_lagging = 0;
@@ -2378,7 +2378,7 @@ static void ioc_timer_fn(struct timer_list *timer)
 	commit_weights(ioc);
 
 	/* surplus list should be dissolved after use */
-	list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list)
+	list_for_each_entry_mutable(iocg, &surpluses, surplus_list)
 		list_del_init(&iocg->surplus_list);
 
 	/*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 88cb5acc4f39..2daed45ad4e7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1216,9 +1216,9 @@ EXPORT_SYMBOL_GPL(blk_mq_end_request_batch);
 static void blk_complete_reqs(struct llist_head *list)
 {
 	struct llist_node *entry = llist_reverse_order(llist_del_all(list));
-	struct request *rq, *next;
+	struct request *rq;
 
-	llist_for_each_entry_safe(rq, next, entry, ipi_list)
+	llist_for_each_entry_mutable(rq, entry, ipi_list)
 		rq->q->mq_ops->complete(rq);
 }
 
@@ -4383,14 +4383,14 @@ static int blk_mq_alloc_ctxs(struct request_queue *q)
  */
 void blk_mq_release(struct request_queue *q)
 {
-	struct blk_mq_hw_ctx *hctx, *next;
+	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		WARN_ON_ONCE(hctx && list_empty(&hctx->hctx_list));
 
 	/* all hctx are in .unused_hctx_list now */
-	list_for_each_entry_safe(hctx, next, &q->unused_hctx_list, hctx_list) {
+	list_for_each_entry_mutable(hctx, &q->unused_hctx_list, hctx_list) {
 		list_del_init(&hctx->hctx_list);
 		kobject_put(&hctx->kobj);
 	}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 47052ba21d1b..1dd4901f00f3 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1686,10 +1686,10 @@ static void tg_cancel_writeback_bios(struct throtl_grp *tg,
 	tg->flags |= THROTL_TG_CANCELING;
 
 	for (rw = READ; rw <= WRITE; rw++) {
-		struct throtl_qnode *qn, *tmp;
+		struct throtl_qnode *qn;
 		unsigned int nr_bios = 0;
 
-		list_for_each_entry_safe(qn, tmp, &sq->queued[rw], node) {
+		list_for_each_entry_mutable(qn, &sq->queued[rw], node) {
 			struct bio *bio;
 
 			while ((bio = bio_list_pop(&qn->bios_iops))) {
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 971818bcdc9d..1a509666b861 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -578,9 +578,9 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx,
 				  blk_insert_t flags)
 {
 	struct kyber_hctx_data *khd = hctx->sched_data;
-	struct request *rq, *next;
+	struct request *rq;
 
-	list_for_each_entry_safe(rq, next, rq_list, queuelist) {
+	list_for_each_entry_mutable(rq, rq_list, queuelist) {
 		unsigned int sched_domain = kyber_sched_domain(rq->cmd_flags);
 		struct kyber_ctx_queue *kcq = &khd->kcqs[rq->mq_ctx->index_hw[hctx->type]];
 		struct list_head *head = &kcq->rq_list[sched_domain];
diff --git a/block/partitions/ldm.c b/block/partitions/ldm.c
index c0bdcae58a3e..459f72f2148a 100644
--- a/block/partitions/ldm.c
+++ b/block/partitions/ldm.c
@@ -1285,11 +1285,11 @@ static bool ldm_frag_add (const u8 *data, int size, struct list_head *frags)
  */
 static void ldm_frag_free (struct list_head *list)
 {
-	struct list_head *item, *tmp;
+	struct list_head *item;
 
 	BUG_ON (!list);
 
-	list_for_each_safe (item, tmp, list)
+	list_for_each_mutable(item, list)
 		kfree (list_entry (item, struct frag, list));
 }
 
@@ -1400,11 +1400,11 @@ static bool ldm_get_vblks(struct parsed_partitions *state, unsigned long base,
  */
 static void ldm_free_vblks (struct list_head *lh)
 {
-	struct list_head *item, *tmp;
+	struct list_head *item;
 
 	BUG_ON (!lh);
 
-	list_for_each_safe (item, tmp, lh)
+	list_for_each_mutable(item, lh)
 		kfree (list_entry (item, struct vblk, list));
 }
 
diff --git a/block/sed-opal.c b/block/sed-opal.c
index 79b290d9458a..5bf9ebce8452 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -2573,10 +2573,10 @@ static int check_opal_support(struct opal_dev *dev)
 static void clean_opal_dev(struct opal_dev *dev)
 {
 
-	struct opal_suspend_data *suspend, *next;
+	struct opal_suspend_data *suspend;
 
 	mutex_lock(&dev->dev_lock);
-	list_for_each_entry_safe(suspend, next, &dev->unlk_lst, node) {
+	list_for_each_entry_mutable(suspend, &dev->unlk_lst, node) {
 		list_del(&suspend->node);
 		kfree(suspend);
 	}
-- 
2.43.0


^ permalink raw reply related

* [PATCH v3 4/7] block: Use mutable list iterators
From: Kaitao Cheng @ 2026-06-22  4:19 UTC (permalink / raw)
  To: Yu Kuai, Jens Axboe, Tejun Heo, Josef Bacik,
	Richard Russon (FlatCap), Jonathan Derrick
  Cc: linux-block, linux-kernel, cgroups, linux-ntfs-dev, Kaitao Cheng

From: Kaitao Cheng <chengkaitao@kylinos.cn>

The safe list iterators require callers to provide a temporary cursor
even when the cursor is only used by the iterator itself.  The mutable
iterator variants keep the same removal-safe traversal semantics while
allowing those internal cursors to be hidden from the call sites.

Convert block users of list, hlist and llist safe iterators to the new
mutable helpers.  Drop the now-unused temporary cursor variables where
the loop body does not inspect or reset them.

This is a mechanical cleanup with no intended change in traversal order
or list mutation behavior.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 block/bfq-iosched.c    | 17 +++++++----------
 block/blk-cgroup.c     | 12 ++++++------
 block/blk-flush.c      |  4 ++--
 block/blk-iocost.c     | 18 +++++++++---------
 block/blk-mq.c         |  8 ++++----
 block/blk-throttle.c   |  4 ++--
 block/kyber-iosched.c  |  4 ++--
 block/partitions/ldm.c |  8 ++++----
 block/sed-opal.c       |  4 ++--
 9 files changed, 38 insertions(+), 41 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 141c602d5e85..78e32ba0553d 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1209,9 +1209,8 @@ static int bfqq_process_refs(struct bfq_queue *bfqq)
 static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 {
 	struct bfq_queue *item;
-	struct hlist_node *n;
 
-	hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
+	hlist_for_each_entry_mutable(item, &bfqd->burst_list, burst_list_node)
 		hlist_del_init(&item->burst_list_node);
 
 	/*
@@ -1236,7 +1235,6 @@ static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 
 	if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
 		struct bfq_queue *pos, *bfqq_item;
-		struct hlist_node *n;
 
 		/*
 		 * Enough queues have been activated shortly after each
@@ -1260,8 +1258,8 @@ static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 		 * belonging to a large burst. So the burst list is not
 		 * needed any more. Remove it.
 		 */
-		hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
-					  burst_list_node)
+		hlist_for_each_entry_mutable(pos, &bfqd->burst_list,
+					     burst_list_node)
 			hlist_del_init(&pos->burst_list_node);
 	} else /*
 		* Burst not yet large: add bfqq to the burst list. Do
@@ -5330,7 +5328,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 void bfq_put_queue(struct bfq_queue *bfqq)
 {
 	struct bfq_queue *item;
-	struct hlist_node *n;
 	struct bfq_group *bfqg = bfqq_group(bfqq);
 
 	bfq_log_bfqq(bfqq->bfqd, bfqq, "put_queue: %p %d", bfqq, bfqq->ref);
@@ -5391,8 +5388,8 @@ void bfq_put_queue(struct bfq_queue *bfqq)
 		hlist_del_init(&bfqq->woken_list_node);
 
 	/* reset waker for all queues in woken list */
-	hlist_for_each_entry_safe(item, n, &bfqq->woken_list,
-				  woken_list_node) {
+	hlist_for_each_entry_mutable(item, &bfqq->woken_list,
+				     woken_list_node) {
 		item->waker_bfqq = NULL;
 		hlist_del_init(&item->woken_list_node);
 	}
@@ -7141,13 +7138,13 @@ static void bfq_depth_updated(struct request_queue *q)
 static void bfq_exit_queue(struct elevator_queue *e)
 {
 	struct bfq_data *bfqd = e->elevator_data;
-	struct bfq_queue *bfqq, *n;
+	struct bfq_queue *bfqq;
 	unsigned int actuator;
 
 	hrtimer_cancel(&bfqd->idle_slice_timer);
 
 	spin_lock_irq(&bfqd->lock);
-	list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
+	list_for_each_entry_mutable(bfqq, &bfqd->idle_list, bfqq_list)
 		bfq_deactivate_bfqq(bfqd, bfqq, false, false);
 	spin_unlock_irq(&bfqd->lock);
 
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3093c1c03902..ced900019f2e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -997,7 +997,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
 {
 	struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
 	struct llist_node *lnode;
-	struct blkg_iostat_set *bisc, *next_bisc;
+	struct blkg_iostat_set *bisc;
 	unsigned long flags;
 
 	rcu_read_lock();
@@ -1017,17 +1017,17 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
 	/*
 	 * Iterate only the iostat_cpu's queued in the lockless list.
 	 */
-	llist_for_each_entry_safe(bisc, next_bisc, lnode, lnode) {
+	llist_for_each_entry_mutable(bisc, lnode, lnode) {
 		struct blkcg_gq *blkg = bisc->blkg;
 		struct blkcg_gq *parent = blkg->parent;
 		struct blkg_iostat cur;
 		unsigned int seq;
 
 		/*
-		 * Order assignment of `next_bisc` from `bisc->lnode.next` in
-		 * llist_for_each_entry_safe and clearing `bisc->lqueued` for
-		 * avoiding to assign `next_bisc` with new next pointer added
-		 * in blk_cgroup_bio_start() in case of re-ordering.
+		 * Order the iterator's internal `bisc->lnode.next` load before
+		 * clearing `bisc->lqueued`, so the iterator can't pick up a new
+		 * next pointer added in blk_cgroup_bio_start() in case of
+		 * re-ordering.
 		 *
 		 * The pair barrier is implied in llist_add() in blk_cgroup_bio_start().
 		 */
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 403a46c86411..20654c2103f2 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -204,7 +204,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
 {
 	struct request_queue *q = flush_rq->q;
 	struct list_head *running;
-	struct request *rq, *n;
+	struct request *rq;
 	unsigned long flags = 0;
 	struct blk_flush_queue *fq = blk_get_flush_queue(flush_rq->mq_ctx);
 
@@ -243,7 +243,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
 	fq->flush_running_idx ^= 1;
 
 	/* and push the waiting requests to the next stage */
-	list_for_each_entry_safe(rq, n, running, queuelist) {
+	list_for_each_entry_mutable(rq, running, queuelist) {
 		unsigned int seq = blk_flush_cur_seq(rq);
 
 		BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 563cc7dcf348..2ca18e52bc13 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1718,7 +1718,7 @@ static void iocg_flush_stat_leaf(struct ioc_gq *iocg, struct ioc_now *now)
 static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now)
 {
 	LIST_HEAD(inner_walk);
-	struct ioc_gq *iocg, *tiocg;
+	struct ioc_gq *iocg;
 
 	/* flush leaves and build inner node walk list */
 	list_for_each_entry(iocg, target_iocgs, active_list) {
@@ -1727,7 +1727,7 @@ static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now)
 	}
 
 	/* keep flushing upwards by walking the inner list backwards */
-	list_for_each_entry_safe_reverse(iocg, tiocg, &inner_walk, walk_list) {
+	list_for_each_entry_mutable_reverse(iocg, &inner_walk, walk_list) {
 		iocg_flush_stat_upward(iocg);
 		list_del_init(&iocg->walk_list);
 	}
@@ -1848,7 +1848,7 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
 {
 	LIST_HEAD(over_hwa);
 	LIST_HEAD(inner_walk);
-	struct ioc_gq *iocg, *tiocg, *root_iocg;
+	struct ioc_gq *iocg, *root_iocg;
 	u32 after_sum, over_sum, over_target, gamma;
 
 	/*
@@ -1884,7 +1884,7 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
 		over_target = 0;
 	}
 
-	list_for_each_entry_safe(iocg, tiocg, &over_hwa, walk_list) {
+	list_for_each_entry_mutable(iocg, &over_hwa, walk_list) {
 		if (over_target)
 			iocg->hweight_after_donation =
 				div_u64((u64)iocg->hweight_after_donation *
@@ -2055,7 +2055,7 @@ static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
 	}
 
 	/* walk list should be dissolved after use */
-	list_for_each_entry_safe(iocg, tiocg, &inner_walk, walk_list)
+	list_for_each_entry_mutable(iocg, &inner_walk, walk_list)
 		list_del_init(&iocg->walk_list);
 }
 
@@ -2166,9 +2166,9 @@ static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
 static int ioc_check_iocgs(struct ioc *ioc, struct ioc_now *now)
 {
 	int nr_debtors = 0;
-	struct ioc_gq *iocg, *tiocg;
+	struct ioc_gq *iocg;
 
-	list_for_each_entry_safe(iocg, tiocg, &ioc->active_iocgs, active_list) {
+	list_for_each_entry_mutable(iocg, &ioc->active_iocgs, active_list) {
 		if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt &&
 		    !iocg->delay && !iocg_is_idle(iocg))
 			continue;
@@ -2234,7 +2234,7 @@ static int ioc_check_iocgs(struct ioc *ioc, struct ioc_now *now)
 static void ioc_timer_fn(struct timer_list *timer)
 {
 	struct ioc *ioc = container_of(timer, struct ioc, timer);
-	struct ioc_gq *iocg, *tiocg;
+	struct ioc_gq *iocg;
 	struct ioc_now now;
 	LIST_HEAD(surpluses);
 	int nr_debtors, nr_shortages = 0, nr_lagging = 0;
@@ -2378,7 +2378,7 @@ static void ioc_timer_fn(struct timer_list *timer)
 	commit_weights(ioc);
 
 	/* surplus list should be dissolved after use */
-	list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list)
+	list_for_each_entry_mutable(iocg, &surpluses, surplus_list)
 		list_del_init(&iocg->surplus_list);
 
 	/*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 88cb5acc4f39..2daed45ad4e7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1216,9 +1216,9 @@ EXPORT_SYMBOL_GPL(blk_mq_end_request_batch);
 static void blk_complete_reqs(struct llist_head *list)
 {
 	struct llist_node *entry = llist_reverse_order(llist_del_all(list));
-	struct request *rq, *next;
+	struct request *rq;
 
-	llist_for_each_entry_safe(rq, next, entry, ipi_list)
+	llist_for_each_entry_mutable(rq, entry, ipi_list)
 		rq->q->mq_ops->complete(rq);
 }
 
@@ -4383,14 +4383,14 @@ static int blk_mq_alloc_ctxs(struct request_queue *q)
  */
 void blk_mq_release(struct request_queue *q)
 {
-	struct blk_mq_hw_ctx *hctx, *next;
+	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		WARN_ON_ONCE(hctx && list_empty(&hctx->hctx_list));
 
 	/* all hctx are in .unused_hctx_list now */
-	list_for_each_entry_safe(hctx, next, &q->unused_hctx_list, hctx_list) {
+	list_for_each_entry_mutable(hctx, &q->unused_hctx_list, hctx_list) {
 		list_del_init(&hctx->hctx_list);
 		kobject_put(&hctx->kobj);
 	}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 47052ba21d1b..1dd4901f00f3 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1686,10 +1686,10 @@ static void tg_cancel_writeback_bios(struct throtl_grp *tg,
 	tg->flags |= THROTL_TG_CANCELING;
 
 	for (rw = READ; rw <= WRITE; rw++) {
-		struct throtl_qnode *qn, *tmp;
+		struct throtl_qnode *qn;
 		unsigned int nr_bios = 0;
 
-		list_for_each_entry_safe(qn, tmp, &sq->queued[rw], node) {
+		list_for_each_entry_mutable(qn, &sq->queued[rw], node) {
 			struct bio *bio;
 
 			while ((bio = bio_list_pop(&qn->bios_iops))) {
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 971818bcdc9d..1a509666b861 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -578,9 +578,9 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx,
 				  blk_insert_t flags)
 {
 	struct kyber_hctx_data *khd = hctx->sched_data;
-	struct request *rq, *next;
+	struct request *rq;
 
-	list_for_each_entry_safe(rq, next, rq_list, queuelist) {
+	list_for_each_entry_mutable(rq, rq_list, queuelist) {
 		unsigned int sched_domain = kyber_sched_domain(rq->cmd_flags);
 		struct kyber_ctx_queue *kcq = &khd->kcqs[rq->mq_ctx->index_hw[hctx->type]];
 		struct list_head *head = &kcq->rq_list[sched_domain];
diff --git a/block/partitions/ldm.c b/block/partitions/ldm.c
index c0bdcae58a3e..459f72f2148a 100644
--- a/block/partitions/ldm.c
+++ b/block/partitions/ldm.c
@@ -1285,11 +1285,11 @@ static bool ldm_frag_add (const u8 *data, int size, struct list_head *frags)
  */
 static void ldm_frag_free (struct list_head *list)
 {
-	struct list_head *item, *tmp;
+	struct list_head *item;
 
 	BUG_ON (!list);
 
-	list_for_each_safe (item, tmp, list)
+	list_for_each_mutable(item, list)
 		kfree (list_entry (item, struct frag, list));
 }
 
@@ -1400,11 +1400,11 @@ static bool ldm_get_vblks(struct parsed_partitions *state, unsigned long base,
  */
 static void ldm_free_vblks (struct list_head *lh)
 {
-	struct list_head *item, *tmp;
+	struct list_head *item;
 
 	BUG_ON (!lh);
 
-	list_for_each_safe (item, tmp, lh)
+	list_for_each_mutable(item, lh)
 		kfree (list_entry (item, struct vblk, list));
 }
 
diff --git a/block/sed-opal.c b/block/sed-opal.c
index 79b290d9458a..5bf9ebce8452 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -2573,10 +2573,10 @@ static int check_opal_support(struct opal_dev *dev)
 static void clean_opal_dev(struct opal_dev *dev)
 {
 
-	struct opal_suspend_data *suspend, *next;
+	struct opal_suspend_data *suspend;
 
 	mutex_lock(&dev->dev_lock);
-	list_for_each_entry_safe(suspend, next, &dev->unlk_lst, node) {
+	list_for_each_entry_mutable(suspend, &dev->unlk_lst, node) {
 		list_del(&suspend->node);
 		kfree(suspend);
 	}
-- 
2.43.0


^ permalink raw reply related

* [PATCH v3 2/7] llist: Add mutable iterator variants
From: Kaitao Cheng @ 2026-06-22  4:05 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König
  Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
	Philipp Stanner, linux-block, linux-kernel, cgroups,
	linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
	dri-devel, linux-perf-users, linux-trace-kernel, kexec,
	live-patching, linux-modules, linux-crypto, linux-pm, rcu,
	sched-ext, linux-mm, virtualization, damon, llvm, Kaitao Cheng
In-Reply-To: <20260622040533.29824-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

llist_for_each_safe() and llist_for_each_entry_safe() require callers to
provide a temporary cursor even when the cursor is only needed by the
iterator itself.  This makes call sites noisier than necessary for the
common case where the loop body may remove the current entry but does
not otherwise inspect the saved next pointer.

Add llist_for_each_mutable() and llist_for_each_entry_mutable() variants
that support both forms.  Callers may omit the temporary cursor and let
the helper create an internal unique cursor, or keep passing an explicit
cursor when the loop needs to inspect or reset it.

Keep the existing safe helpers as compatibility wrappers so current users
continue to build unchanged while new code can use the shorter mutable
form.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 include/linux/llist.h | 81 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/include/linux/llist.h b/include/linux/llist.h
index 8846b7709669..1c6f12411d5e 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -49,6 +49,7 @@
  */
 
 #include <linux/atomic.h>
+#include <linux/args.h>
 #include <linux/container_of.h>
 #include <linux/stddef.h>
 #include <linux/types.h>
@@ -143,12 +144,33 @@ static inline bool llist_on_list(const struct llist_node *node)
 #define llist_for_each(pos, node)			\
 	for ((pos) = (node); pos; (pos) = (pos)->next)
 
+/*
+ * llist_for_each_safe is an old interface, use llist_for_each_mutable instead.
+ */
+#define llist_for_each_safe(pos, n, node)			\
+	for ((pos) = (node); (pos) && ((n) = (pos)->next, true); (pos) = (n))
+
+#define __llist_for_each_mutable_internal(pos, tmp, node)		\
+	for (typeof(pos) tmp = ((pos) = (node)) ? (pos)->next : NULL;	\
+	     (pos);							\
+	     (pos) = tmp, tmp = (pos) ? (pos)->next : NULL)
+
+#define __llist_for_each_mutable1(pos, node)				\
+	__llist_for_each_mutable_internal(pos, __UNIQUE_ID(next), node)
+
+#define __llist_for_each_mutable2(pos, next, node)			\
+	llist_for_each_safe(pos, next, node)
+
 /**
- * llist_for_each_safe - iterate over some deleted entries of a lock-less list
- *			 safe against removal of list entry
+ * llist_for_each_mutable - iterate over some deleted entries of a lock-less list
+ *			    safe against removal of list entry
  * @pos:	the &struct llist_node to use as a loop cursor
- * @n:		another &struct llist_node to use as temporary storage
- * @node:	the first entry of deleted list entries
+ * @...:	either (node) or (next, node)
+ *
+ * next:	another &struct llist_node to use as optional temporary storage.
+ *		The temporary cursor is internal unless explicitly supplied by
+ *		the caller.
+ * node:	the first entry of deleted list entries
  *
  * In general, some entries of the lock-less list can be traversed
  * safely only after being deleted from list, so start with an entry
@@ -159,8 +181,9 @@ static inline bool llist_on_list(const struct llist_node *node)
  * you want to traverse from the oldest to the newest, you must
  * reverse the order by yourself before traversing.
  */
-#define llist_for_each_safe(pos, n, node)			\
-	for ((pos) = (node); (pos) && ((n) = (pos)->next, true); (pos) = (n))
+#define llist_for_each_mutable(pos, ...)				\
+	CONCATENATE(__llist_for_each_mutable, COUNT_ARGS(__VA_ARGS__))	\
+		(pos, __VA_ARGS__)
 
 /**
  * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
@@ -182,13 +205,41 @@ static inline bool llist_on_list(const struct llist_node *node)
 	     member_address_is_nonnull(pos, member);			\
 	     (pos) = llist_entry((pos)->member.next, typeof(*(pos)), member))
 
+/*
+ * llist_for_each_entry_safe is an old interface, use llist_for_each_entry_mutable instead.
+ */
+#define llist_for_each_entry_safe(pos, n, node, member)			       \
+	for (pos = llist_entry((node), typeof(*pos), member);		       \
+	     member_address_is_nonnull(pos, member) &&			       \
+	        (n = llist_entry(pos->member.next, typeof(*n), member), true); \
+	     pos = n)
+
+#define __llist_for_each_entry_mutable_internal(pos, tmp, node, member)	\
+	for (typeof(pos) tmp = ((pos) = llist_entry((node), typeof(*pos), member), \
+		member_address_is_nonnull(pos, member) ?			\
+		llist_entry((pos)->member.next, typeof(*pos), member) : NULL);	\
+	     member_address_is_nonnull(pos, member);				\
+	     (pos) = tmp, tmp = member_address_is_nonnull(pos, member) ?	\
+		llist_entry((pos)->member.next, typeof(*pos), member) : NULL)
+
+#define __llist_for_each_entry_mutable2(pos, node, member)			\
+	__llist_for_each_entry_mutable_internal(pos, __UNIQUE_ID(next), node, member)
+
+#define __llist_for_each_entry_mutable3(pos, next, node, member)		\
+	llist_for_each_entry_safe(pos, next, node, member)
+
 /**
- * llist_for_each_entry_safe - iterate over some deleted entries of lock-less list of given type
- *			       safe against removal of list entry
+ * llist_for_each_entry_mutable - iterate over some deleted entries of
+ *				  lock-less list of given type safe against
+ *				  removal of list entry
  * @pos:	the type * to use as a loop cursor.
- * @n:		another type * to use as temporary storage
- * @node:	the first entry of deleted list entries.
- * @member:	the name of the llist_node with the struct.
+ * @...:	either (node, member) or (next, node, member)
+ *
+ * next:	another type * to use as optional temporary storage. The
+ *		temporary cursor is internal unless explicitly supplied by the
+ *		caller.
+ * node:	the first entry of deleted list entries.
+ * member:	the name of the llist_node with the struct.
  *
  * In general, some entries of the lock-less list can be traversed
  * safely only after being removed from list, so start with an entry
@@ -199,11 +250,9 @@ static inline bool llist_on_list(const struct llist_node *node)
  * you want to traverse from the oldest to the newest, you must
  * reverse the order by yourself before traversing.
  */
-#define llist_for_each_entry_safe(pos, n, node, member)			       \
-	for (pos = llist_entry((node), typeof(*pos), member);		       \
-	     member_address_is_nonnull(pos, member) &&			       \
-	        (n = llist_entry(pos->member.next, typeof(*n), member), true); \
-	     pos = n)
+#define llist_for_each_entry_mutable(pos, ...)				\
+	CONCATENATE(__llist_for_each_entry_mutable,			\
+		COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
 
 /**
  * llist_empty - tests whether a lock-less list is empty
-- 
2.43.0


^ permalink raw reply related

* [PATCH v3 1/7] list: Add mutable iterator variants
From: Kaitao Cheng @ 2026-06-22  4:05 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König
  Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
	Philipp Stanner, linux-block, linux-kernel, cgroups,
	linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
	dri-devel, linux-perf-users, linux-trace-kernel, kexec,
	live-patching, linux-modules, linux-crypto, linux-pm, rcu,
	sched-ext, linux-mm, virtualization, damon, llvm, Kaitao Cheng
In-Reply-To: <20260622040533.29824-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

The list_for_each*_safe() helpers are used when the loop body may
remove the current entry.  Their API exposes the temporary cursor at
every call site, even though most users only need it for the iterator
implementation and never reference it in the loop body.

Add *_mutable() variants for list and hlist iteration.  The new helpers
support both forms: callers may keep passing an explicit temporary cursor
when they need to inspect or reset it, or omit it and let the helper use
a unique internal cursor.

This makes call sites that only mutate the list through the current entry
less noisy, while keeping the existing *_safe() helpers available for
compatibility.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 231 insertions(+), 38 deletions(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index 09d979976b3b..1081def7cea9 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -7,6 +7,7 @@
 #include <linux/stddef.h>
 #include <linux/poison.h>
 #include <linux/const.h>
+#include <linux/args.h>
 
 #include <asm/barrier.h>
 
@@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
 #define list_for_each_prev(pos, head) \
 	for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
 
-/**
- * list_for_each_safe - iterate over a list safe against removal of list entry
- * @pos:	the &struct list_head to use as a loop cursor.
- * @n:		another &struct list_head to use as temporary storage
- * @head:	the head for your list.
+/*
+ * list_for_each_safe is an old interface, use list_for_each_mutable instead.
  */
 #define list_for_each_safe(pos, n, head) \
 	for (pos = (head)->next, n = pos->next; \
 	     !list_is_head(pos, (head)); \
 	     pos = n, n = pos->next)
 
+#define __list_for_each_mutable_internal(pos, tmp, head)		\
+	for (typeof(pos) tmp = (pos = (head)->next)->next;		\
+	     !list_is_head(pos, (head));				\
+	     pos = tmp, tmp = pos->next)
+
+#define __list_for_each_mutable1(pos, head)				\
+	__list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
+
+#define __list_for_each_mutable2(pos, next, head)			\
+	list_for_each_safe(pos, next, head)
+
 /**
- * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
+ * list_for_each_mutable - iterate over a list safe against entry removal
  * @pos:	the &struct list_head to use as a loop cursor.
- * @n:		another &struct list_head to use as temporary storage
- * @head:	the head for your list.
+ * @...:	either (head) or (next, head)
+ *
+ * next:	another &struct list_head to use as optional temporary storage.
+ *		The temporary cursor is internal unless explicitly supplied by
+ *		the caller.
+ * head:	the head for your list.
+ */
+#define list_for_each_mutable(pos, ...)					\
+	CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__))	\
+		(pos, __VA_ARGS__)
+
+/*
+ * list_for_each_prev_safe is an old interface, use list_for_each_prev_mutable instead.
  */
 #define list_for_each_prev_safe(pos, n, head) \
 	for (pos = (head)->prev, n = pos->prev; \
 	     !list_is_head(pos, (head)); \
 	     pos = n, n = pos->prev)
 
+#define __list_for_each_prev_mutable_internal(pos, tmp, head)		\
+	for (typeof(pos) tmp = (pos = (head)->prev)->prev;		\
+	     !list_is_head(pos, (head));				\
+	     pos = tmp, tmp = pos->prev)
+
+#define __list_for_each_prev_mutable1(pos, head)			\
+	__list_for_each_prev_mutable_internal(pos, __UNIQUE_ID(prev), head)
+
+#define __list_for_each_prev_mutable2(pos, prev, head)			\
+	list_for_each_prev_safe(pos, prev, head)
+
+/**
+ * list_for_each_prev_mutable - iterate over a list backwards safe against entry removal
+ * @pos:	the &struct list_head to use as a loop cursor.
+ * @...:	either (head) or (prev, head)
+ *
+ * prev:	another &struct list_head to use as optional temporary storage.
+ *		The temporary cursor is internal unless explicitly supplied by
+ *		the caller.
+ * head:	the head for your list.
+ */
+#define list_for_each_prev_mutable(pos, ...)				\
+	CONCATENATE(__list_for_each_prev_mutable, COUNT_ARGS(__VA_ARGS__)) \
+		(pos, __VA_ARGS__)
+
 /**
  * list_count_nodes - count nodes in the list
  * @head:	the head for your list.
@@ -895,12 +940,8 @@ static inline size_t list_count_nodes(struct list_head *head)
 	for (; !list_entry_is_head(pos, head, member);			\
 	     pos = list_prev_entry(pos, member))
 
-/**
- * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry
- * @pos:	the type * to use as a loop cursor.
- * @n:		another type * to use as temporary storage
- * @head:	the head for your list.
- * @member:	the name of the list_head within the struct.
+/*
+ * list_for_each_entry_safe is an old interface, use list_for_each_entry_mutable instead.
  */
 #define list_for_each_entry_safe(pos, n, head, member)			\
 	for (pos = list_first_entry(head, typeof(*pos), member),	\
@@ -908,15 +949,36 @@ static inline size_t list_count_nodes(struct list_head *head)
 	     !list_entry_is_head(pos, head, member); 			\
 	     pos = n, n = list_next_entry(n, member))
 
+#define __list_for_each_entry_mutable_internal(pos, tmp, head, member)	\
+	for (typeof(pos) tmp = list_next_entry(pos =			\
+		list_first_entry(head, typeof(*pos), member), member);	\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = tmp, tmp = list_next_entry(tmp, member))
+
+#define __list_for_each_entry_mutable2(pos, head, member)		\
+	__list_for_each_entry_mutable_internal(pos, __UNIQUE_ID(next), head, member)
+
+#define __list_for_each_entry_mutable3(pos, next, head, member)		\
+	list_for_each_entry_safe(pos, next, head, member)
+
 /**
- * list_for_each_entry_safe_continue - continue list iteration safe against removal
+ * list_for_each_entry_mutable - iterate over a list safe against entry removal
  * @pos:	the type * to use as a loop cursor.
- * @n:		another type * to use as temporary storage
- * @head:	the head for your list.
- * @member:	the name of the list_head within the struct.
+ * @...:	either (head, member) or (next, head, member)
  *
- * Iterate over list of given type, continuing after current point,
- * safe against removal of list entry.
+ * next:	another type * to use as optional temporary storage. The
+ *		temporary cursor is internal unless explicitly supplied by the
+ *		caller.
+ * head:	the head for your list.
+ * member:	the name of the list_head within the struct.
+ */
+#define list_for_each_entry_mutable(pos, ...)				\
+	CONCATENATE(__list_for_each_entry_mutable, COUNT_ARGS(__VA_ARGS__)) \
+		(pos, __VA_ARGS__)
+
+/*
+ * list_for_each_entry_safe_continue is an old interface,
+ * use list_for_each_entry_mutable_continue instead.
  */
 #define list_for_each_entry_safe_continue(pos, n, head, member) 		\
 	for (pos = list_next_entry(pos, member), 				\
@@ -924,30 +986,79 @@ static inline size_t list_count_nodes(struct list_head *head)
 	     !list_entry_is_head(pos, head, member);				\
 	     pos = n, n = list_next_entry(n, member))
 
+#define __list_for_each_entry_mutable_continue_internal(pos, tmp, head, member) \
+	for (typeof(pos) tmp = list_next_entry(pos =			\
+		list_next_entry(pos, member), member);			\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = tmp, tmp = list_next_entry(tmp, member))
+
+#define __list_for_each_entry_mutable_continue2(pos, head, member)	\
+	__list_for_each_entry_mutable_continue_internal(pos,		\
+		__UNIQUE_ID(next), head, member)
+
+#define __list_for_each_entry_mutable_continue3(pos, next, head, member) \
+	list_for_each_entry_safe_continue(pos, next, head, member)
+
 /**
- * list_for_each_entry_safe_from - iterate over list from current point safe against removal
+ * list_for_each_entry_mutable_continue - continue list iteration safe against removal
  * @pos:	the type * to use as a loop cursor.
- * @n:		another type * to use as temporary storage
- * @head:	the head for your list.
- * @member:	the name of the list_head within the struct.
+ * @...:	either (head, member) or (next, head, member)
  *
- * Iterate over list of given type from current point, safe against
- * removal of list entry.
+ * next:	another type * to use as optional temporary storage. The
+ *		temporary cursor is internal unless explicitly supplied by the
+ *		caller.
+ * head:	the head for your list.
+ * member:	the name of the list_head within the struct.
+ *
+ * Iterate over list of given type, continuing after current point,
+ * safe against removal of list entry.
+ */
+#define list_for_each_entry_mutable_continue(pos, ...)			\
+	CONCATENATE(__list_for_each_entry_mutable_continue,		\
+		COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
+
+/*
+ * list_for_each_entry_safe_from is an old interface,
+ * use list_for_each_entry_mutable_from instead.
  */
 #define list_for_each_entry_safe_from(pos, n, head, member) 			\
 	for (n = list_next_entry(pos, member);					\
 	     !list_entry_is_head(pos, head, member);				\
 	     pos = n, n = list_next_entry(n, member))
 
+#define __list_for_each_entry_mutable_from_internal(pos, tmp, head, member) \
+	for (typeof(pos) tmp = list_next_entry(pos, member);		\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = tmp, tmp = list_next_entry(tmp, member))
+
+#define __list_for_each_entry_mutable_from2(pos, head, member)		\
+	__list_for_each_entry_mutable_from_internal(pos,		\
+		__UNIQUE_ID(next), head, member)
+
+#define __list_for_each_entry_mutable_from3(pos, next, head, member)	\
+	list_for_each_entry_safe_from(pos, next, head, member)
+
 /**
- * list_for_each_entry_safe_reverse - iterate backwards over list safe against removal
+ * list_for_each_entry_mutable_from - iterate over list from current point safe against removal
  * @pos:	the type * to use as a loop cursor.
- * @n:		another type * to use as temporary storage
- * @head:	the head for your list.
- * @member:	the name of the list_head within the struct.
+ * @...:	either (head, member) or (next, head, member)
  *
- * Iterate backwards over list of given type, safe against removal
- * of list entry.
+ * next:	another type * to use as optional temporary storage. The
+ *		temporary cursor is internal unless explicitly supplied by the
+ *		caller.
+ * head:	the head for your list.
+ * member:	the name of the list_head within the struct.
+ *
+ * Iterate over list of given type from current point, safe against
+ * removal of list entry.
+ */
+#define list_for_each_entry_mutable_from(pos, ...)			\
+	CONCATENATE(__list_for_each_entry_mutable_from,			\
+		COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
+
+/*
+ * list_for_each_entry_safe_reverse is an old interface,
+ * use list_for_each_entry_mutable_reverse instead.
  */
 #define list_for_each_entry_safe_reverse(pos, n, head, member)		\
 	for (pos = list_last_entry(head, typeof(*pos), member),		\
@@ -955,6 +1066,37 @@ static inline size_t list_count_nodes(struct list_head *head)
 	     !list_entry_is_head(pos, head, member); 			\
 	     pos = n, n = list_prev_entry(n, member))
 
+#define __list_for_each_entry_mutable_reverse_internal(pos, tmp, head, member) \
+	for (typeof(pos) tmp = list_prev_entry(pos =			\
+		list_last_entry(head, typeof(*pos), member), member);	\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = tmp, tmp = list_prev_entry(tmp, member))
+
+#define __list_for_each_entry_mutable_reverse2(pos, head, member)	\
+	__list_for_each_entry_mutable_reverse_internal(pos,		\
+		__UNIQUE_ID(prev), head, member)
+
+#define __list_for_each_entry_mutable_reverse3(pos, prev, head, member)	\
+	list_for_each_entry_safe_reverse(pos, prev, head, member)
+
+/**
+ * list_for_each_entry_mutable_reverse - iterate backwards over list safe against removal
+ * @pos:	the type * to use as a loop cursor.
+ * @...:	either (head, member) or (prev, head, member)
+ *
+ * prev:	another type * to use as optional temporary storage. The
+ *		temporary cursor is internal unless explicitly supplied by the
+ *		caller.
+ * head:	the head for your list.
+ * member:	the name of the list_head within the struct.
+ *
+ * Iterate backwards over list of given type, safe against removal
+ * of list entry.
+ */
+#define list_for_each_entry_mutable_reverse(pos, ...)			\
+	CONCATENATE(__list_for_each_entry_mutable_reverse,		\
+		COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
+
 /**
  * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
  * @pos:	the loop cursor used in the list_for_each_entry_safe loop
@@ -1189,6 +1331,31 @@ static inline void hlist_splice_init(struct hlist_head *from,
 	for (pos = (head)->first; pos && ({ n = pos->next; 1; }); \
 	     pos = n)
 
+#define __hlist_for_each_mutable_internal(pos, tmp, head)		\
+	for (typeof(pos) tmp = (pos = (head)->first) ? pos->next : NULL; \
+	     pos;							\
+	     pos = tmp, tmp = pos ? pos->next : NULL)
+
+#define __hlist_for_each_mutable1(pos, head)				\
+	__hlist_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
+
+#define __hlist_for_each_mutable2(pos, next, head)			\
+	hlist_for_each_safe(pos, next, head)
+
+/**
+ * hlist_for_each_mutable - iterate over a hlist safe against entry removal
+ * @pos:	the &struct hlist_node to use as a loop cursor.
+ * @...:	either (head) or (next, head)
+ *
+ * next:	another &struct hlist_node to use as optional temporary storage.
+ *		The temporary cursor is internal unless explicitly supplied by
+ *		the caller.
+ * head:	the head for your hlist.
+ */
+#define hlist_for_each_mutable(pos, ...)				\
+	CONCATENATE(__hlist_for_each_mutable, COUNT_ARGS(__VA_ARGS__))	\
+		(pos, __VA_ARGS__)
+
 #define hlist_entry_safe(ptr, type, member) \
 	({ typeof(ptr) ____ptr = (ptr); \
 	   ____ptr ? hlist_entry(____ptr, type, member) : NULL; \
@@ -1224,18 +1391,44 @@ static inline void hlist_splice_init(struct hlist_head *from,
 	for (; pos;							\
 	     pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member))
 
-/**
- * hlist_for_each_entry_safe - iterate over list of given type safe against removal of list entry
- * @pos:	the type * to use as a loop cursor.
- * @n:		a &struct hlist_node to use as temporary storage
- * @head:	the head for your list.
- * @member:	the name of the hlist_node within the struct.
+/*
+ * hlist_for_each_entry_safe is an old interface, use hlist_for_each_entry_mutable instead.
  */
 #define hlist_for_each_entry_safe(pos, n, head, member) 		\
 	for (pos = hlist_entry_safe((head)->first, typeof(*pos), member);\
 	     pos && ({ n = pos->member.next; 1; });			\
 	     pos = hlist_entry_safe(n, typeof(*pos), member))
 
+#define __hlist_for_each_entry_mutable_internal(pos, tmp, head, member)	\
+	for (struct hlist_node *tmp = (pos =				\
+		hlist_entry_safe((head)->first, typeof(*pos), member)) ? \
+		pos->member.next : NULL;				\
+	     pos;							\
+	     pos = hlist_entry_safe((tmp), typeof(*pos), member),	\
+		tmp = pos ? pos->member.next : NULL)
+
+#define __hlist_for_each_entry_mutable2(pos, head, member)		\
+	__hlist_for_each_entry_mutable_internal(pos,			\
+		__UNIQUE_ID(next), head, member)
+
+#define __hlist_for_each_entry_mutable3(pos, next, head, member)	\
+	hlist_for_each_entry_safe(pos, next, head, member)
+
+/**
+ * hlist_for_each_entry_mutable - iterate over hlist safe against entry removal
+ * @pos:	the type * to use as a loop cursor.
+ * @...:	either (head, member) or (next, head, member)
+ *
+ * next:	a &struct hlist_node to use as optional temporary storage. The
+ *		temporary cursor is internal unless explicitly supplied by the
+ *		caller.
+ * head:	the head for your hlist.
+ * member:	the name of the hlist_node within the struct.
+ */
+#define hlist_for_each_entry_mutable(pos, ...)				\
+	CONCATENATE(__hlist_for_each_entry_mutable,			\
+		COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
+
 /**
  * hlist_count_nodes - count nodes in the hlist
  * @head:	the head for your hlist.
-- 
2.43.0


^ permalink raw reply related

* [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-22  4:05 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
	Alexander Viro, Christian Brauner, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
	Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
	Christian König
  Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
	Philipp Stanner, linux-block, linux-kernel, cgroups,
	linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
	dri-devel, linux-perf-users, linux-trace-kernel, kexec,
	live-patching, linux-modules, linux-crypto, linux-pm, rcu,
	sched-ext, linux-mm, virtualization, damon, llvm, chengkaitao

From: chengkaitao <chengkaitao@kylinos.cn>

The list_for_each*_safe() helpers are used when the loop body may remove
the current entry.  Their current interface, however, forces every caller
to define a temporary cursor outside the macro and pass it in, even when
the caller never uses that cursor directly.  For most call sites this
extra cursor is just boilerplate required by the macro implementation.

This is awkward because the saved next pointer is an internal detail of
the iteration.  Callers that only remove or move the current entry do not
need to spell it out.

The _safe() suffix has also caused confusion.  Christian Koenig pointed
out that the name is easy to read as a thread-safe variant, especially
for beginners, even though it only means that the iterator keeps enough
state to tolerate removal of the current entry.  He suggested _mutable()
as a clearer description of what the loop permits.

Add *_mutable() iterator variants for list, hlist and llist.  The new
helpers are variadic and support both forms.  In the common case, the
caller omits the temporary cursor and the macro creates a unique internal
cursor with typeof(pos) and __UNIQUE_ID().  If a loop really needs an
explicit temporary cursor, the caller can still pass it and the helper
keeps the existing *_safe() behaviour.

For example, a call site may use the shorter form:

  list_for_each_entry_mutable(pos, head, member)

or keep the explicit temporary cursor form:

  list_for_each_entry_mutable(pos, tmp, head, member)

The existing *_safe() helpers remain available for compatibility.  This
series only converts users in mm, block, kernel, init and io_uring.  If
this approach looks acceptable, the remaining users can be converted in
follow-up series.

Changes in v3 (Christian König, Andy Shevchenko):
- Convert safe list walks to mutable iterators

Changes in v2 (Muchun Song, Andy Shevchenko):
- Drop the list_for_each_entry_mutable*() helpers from v1 and make the
  cursor change directly in the existing list_for_each_entry*() helpers.
- Open-code special list walks that rely on updating the loop cursor in
  the body, preserving their existing traversal semantics.

Link to v2:
https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/

Link to v1:
https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/

Kaitao Cheng (7):
  list: Add mutable iterator variants
  llist: Add mutable iterator variants
  mm: Use mutable list iterators
  block: Use mutable list iterators
  kernel: Use mutable list iterators
  initramfs: Use mutable list iterator
  io_uring: Use mutable list iterators

 block/bfq-iosched.c                 |  17 +-
 block/blk-cgroup.c                  |  12 +-
 block/blk-flush.c                   |   4 +-
 block/blk-iocost.c                  |  18 +-
 block/blk-mq.c                      |   8 +-
 block/blk-throttle.c                |   4 +-
 block/kyber-iosched.c               |   4 +-
 block/partitions/ldm.c              |   8 +-
 block/sed-opal.c                    |   4 +-
 include/linux/list.h                | 269 ++++++++++++++++++++++++----
 include/linux/llist.h               |  81 +++++++--
 init/initramfs.c                    |   5 +-
 io_uring/cancel.c                   |   6 +-
 io_uring/poll.c                     |   3 +-
 io_uring/rw.c                       |   4 +-
 io_uring/timeout.c                  |   8 +-
 io_uring/uring_cmd.c                |   3 +-
 kernel/audit_tree.c                 |   4 +-
 kernel/audit_watch.c                |  16 +-
 kernel/auditfilter.c                |   4 +-
 kernel/auditsc.c                    |   4 +-
 kernel/bpf/arena.c                  |  10 +-
 kernel/bpf/arraymap.c               |   8 +-
 kernel/bpf/bpf_local_storage.c      |   3 +-
 kernel/bpf/bpf_lru_list.c           |  25 ++-
 kernel/bpf/btf.c                    |  18 +-
 kernel/bpf/cgroup.c                 |   7 +-
 kernel/bpf/cpumap.c                 |   4 +-
 kernel/bpf/devmap.c                 |  10 +-
 kernel/bpf/helpers.c                |   8 +-
 kernel/bpf/local_storage.c          |   4 +-
 kernel/bpf/memalloc.c               |  16 +-
 kernel/bpf/offload.c                |   8 +-
 kernel/bpf/states.c                 |   4 +-
 kernel/bpf/stream.c                 |   4 +-
 kernel/bpf/verifier.c               |   6 +-
 kernel/cgroup/cgroup-v1.c           |   4 +-
 kernel/cgroup/cgroup.c              |  54 +++---
 kernel/cgroup/dmem.c                |  12 +-
 kernel/cgroup/rdma.c                |   8 +-
 kernel/events/core.c                |  44 +++--
 kernel/events/uprobes.c             |  12 +-
 kernel/exit.c                       |   8 +-
 kernel/fail_function.c              |   4 +-
 kernel/gcov/clang.c                 |   4 +-
 kernel/irq_work.c                   |   4 +-
 kernel/kexec_core.c                 |   4 +-
 kernel/kprobes.c                    |  16 +-
 kernel/livepatch/core.c             |   4 +-
 kernel/livepatch/core.h             |   4 +-
 kernel/liveupdate/kho_block.c       |   4 +-
 kernel/liveupdate/luo_flb.c         |   4 +-
 kernel/locking/rwsem.c              |   2 +-
 kernel/locking/test-ww_mutex.c      |   2 +-
 kernel/module/main.c                |  11 +-
 kernel/padata.c                     |   4 +-
 kernel/power/snapshot.c             |   8 +-
 kernel/power/wakelock.c             |   4 +-
 kernel/printk/printk.c              |  11 +-
 kernel/ptrace.c                     |   4 +-
 kernel/rcu/rcutorture.c             |   3 +-
 kernel/rcu/tasks.h                  |   9 +-
 kernel/rcu/tree.c                   |   6 +-
 kernel/resource.c                   |   4 +-
 kernel/sched/core.c                 |   4 +-
 kernel/sched/ext.c                  |  22 +--
 kernel/sched/fair.c                 |  28 +--
 kernel/sched/topology.c             |   4 +-
 kernel/sched/wait.c                 |   4 +-
 kernel/seccomp.c                    |   4 +-
 kernel/signal.c                     |  11 +-
 kernel/smp.c                        |   4 +-
 kernel/taskstats.c                  |   8 +-
 kernel/time/clockevents.c           |   6 +-
 kernel/time/clocksource.c           |   4 +-
 kernel/time/posix-cpu-timers.c      |   4 +-
 kernel/time/posix-timers.c          |   3 +-
 kernel/torture.c                    |   3 +-
 kernel/trace/bpf_trace.c            |   4 +-
 kernel/trace/ftrace.c               |  49 +++--
 kernel/trace/ring_buffer.c          |  25 ++-
 kernel/trace/trace.c                |  12 +-
 kernel/trace/trace_dynevent.c       |   6 +-
 kernel/trace/trace_dynevent.h       |   5 +-
 kernel/trace/trace_events.c         |  35 ++--
 kernel/trace/trace_events_filter.c  |   4 +-
 kernel/trace/trace_events_hist.c    |   8 +-
 kernel/trace/trace_events_trigger.c |  17 +-
 kernel/trace/trace_events_user.c    |  16 +-
 kernel/trace/trace_stat.c           |   4 +-
 kernel/user-return-notifier.c       |   3 +-
 kernel/workqueue.c                  |  16 +-
 mm/backing-dev.c                    |   8 +-
 mm/balloon.c                        |   8 +-
 mm/cma.c                            |   4 +-
 mm/compaction.c                     |   4 +-
 mm/damon/core.c                     |   4 +-
 mm/damon/sysfs-schemes.c            |   4 +-
 mm/dmapool.c                        |   4 +-
 mm/huge_memory.c                    |   8 +-
 mm/hugetlb.c                        |  56 +++---
 mm/hugetlb_vmemmap.c                |  16 +-
 mm/khugepaged.c                     |  14 +-
 mm/kmemleak.c                       |   7 +-
 mm/ksm.c                            |  25 +--
 mm/list_lru.c                       |   4 +-
 mm/memcontrol-v1.c                  |   8 +-
 mm/memory-failure.c                 |  12 +-
 mm/memory-tiers.c                   |   4 +-
 mm/migrate.c                        |  23 ++-
 mm/mmu_notifier.c                   |   9 +-
 mm/page_alloc.c                     |   8 +-
 mm/page_reporting.c                 |   2 +-
 mm/percpu.c                         |  11 +-
 mm/pgtable-generic.c                |   4 +-
 mm/rmap.c                           |  10 +-
 mm/shmem.c                          |   9 +-
 mm/slab_common.c                    |  14 +-
 mm/slub.c                           |  33 ++--
 mm/swapfile.c                       |   4 +-
 mm/userfaultfd.c                    |  12 +-
 mm/vmalloc.c                        |  24 +--
 mm/vmscan.c                         |   7 +-
 mm/zsmalloc.c                       |   4 +-
 124 files changed, 875 insertions(+), 681 deletions(-)

-- 
2.43.0


^ permalink raw reply

* Re: [PATCH v2 3/7] rust: doctest: add LocalModule fallback for #[vtable] ThisModule
From: Alvin Sun @ 2026-06-22  2:52 UTC (permalink / raw)
  To: Andreas Hindborg, Gary Guo, Miguel Ojeda, Boqun Feng,
	Björn Roy Baron, Benno Lossin, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block
In-Reply-To: <87fr2kf3m2.fsf@t14s.mail-host-address-is-not-set>


On 6/18/26 20:13, Andreas Hindborg wrote:
> Alvin Sun <alvin.sun@linux.dev> writes:
>
>> Add a `LocalModule` struct with a null-pointer `ModuleMetadata` impl
>> in the doctest harness, so that `crate::LocalModule` (auto-inserted
>> by `#[vtable]`) resolves correctly when there is no `module!` macro.
>>
>> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
>
> Does this need to be ordered before the vtable auto insert in the patch series?

Yes, you're right — this patch would be better placed before the vtable
auto-insert patch to avoid a temporary state where the doctest harness
doesn't provide the LocalModule fallback that #[vtable] expects.

Thanks for the suggestions from you and Gary. v3 has been sent to the
list:

https://lore.kernel.org/rust-for-linux/20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev

Best regards,
Alvin Sun

>
> Best regards,
> Andreas Hindborg
>
>

^ permalink raw reply

* [PATCH v3 4/6] rust: drm: set fops.owner from driver module pointer
From: Alvin Sun @ 2026-06-22  2:44 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, Alvin Sun
In-Reply-To: <20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev>

Change `create_fops()` to accept an owner module pointer instead of
hardcoding `null_mut()`, ensuring the kernel correctly tracks the
module owning the DRM device's file operations.

Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
 rust/kernel/drm/device.rs  | 3 ++-
 rust/kernel/drm/gem/mod.rs | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/rust/kernel/drm/device.rs b/rust/kernel/drm/device.rs
index 403fc35353c74..221667d843c41 100644
--- a/rust/kernel/drm/device.rs
+++ b/rust/kernel/drm/device.rs
@@ -111,7 +111,8 @@ impl<T: drm::Driver> Device<T> {
         fops: &Self::GEM_FOPS,
     };
 
-    const GEM_FOPS: bindings::file_operations = drm::gem::create_fops();
+    const GEM_FOPS: bindings::file_operations =
+        drm::gem::create_fops(crate::this_module::<T::OwnerModule>().as_ptr());
 
     /// Create a new `drm::Device` for a `drm::Driver`.
     pub fn new(dev: &device::Device, data: impl PinInit<T::Data, Error>) -> Result<ARef<Self>> {
diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
index 01b5bd47a3332..9a203efc59116 100644
--- a/rust/kernel/drm/gem/mod.rs
+++ b/rust/kernel/drm/gem/mod.rs
@@ -357,10 +357,10 @@ impl<T: DriverObject> AllocImpl for Object<T> {
     };
 }
 
-pub(super) const fn create_fops() -> bindings::file_operations {
+pub(super) const fn create_fops(owner: *mut bindings::module) -> bindings::file_operations {
     let mut fops: bindings::file_operations = pin_init::zeroed();
 
-    fops.owner = core::ptr::null_mut();
+    fops.owner = owner;
     fops.open = Some(bindings::drm_open);
     fops.release = Some(bindings::drm_release);
     fops.unlocked_ioctl = Some(bindings::drm_ioctl);

-- 
2.43.0



^ permalink raw reply related

* [PATCH v3 5/6] rust: miscdevice: set fops.owner from driver module pointer
From: Alvin Sun @ 2026-06-22  2:44 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, Alvin Sun
In-Reply-To: <20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev>

Set the miscdevice fops owner field from the driver module pointer
via the `this_module::<T::OwnerModule>()` helper, instead of
defaulting to null.

Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
 rust/kernel/miscdevice.rs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs
index 83ce50def5ac9..04fe4697ff564 100644
--- a/rust/kernel/miscdevice.rs
+++ b/rust/kernel/miscdevice.rs
@@ -26,10 +26,11 @@
     mm::virt::VmaNew,
     prelude::*,
     seq_file::SeqFile,
+    this_module,
     types::{
         ForeignOwnable,
         Opaque, //
-    },
+    }, //
 };
 use core::marker::PhantomData;
 
@@ -430,6 +431,7 @@ impl<T: MiscDevice> MiscdeviceVTable<T> {
         } else {
             None
         },
+        owner: this_module::<T::OwnerModule>().as_ptr(),
         ..pin_init::zeroed()
     };
 

-- 
2.43.0



^ permalink raw reply related

* [PATCH v3 6/6] rust: configfs: use `LocalModule` for `THIS_MODULE`
From: Alvin Sun @ 2026-06-22  2:45 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, Alvin Sun
In-Reply-To: <20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev>

Replace the `THIS_MODULE` static reference in the `configfs_attrs!`
macro with `this_module::<LocalModule>()`, and update
rnull to import `LocalModule` instead of `THIS_MODULE`, consistent
with the move of `THIS_MODULE` into the `ModuleMetadata` trait.

Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
 drivers/block/rnull/configfs.rs | 6 ++----
 rust/kernel/configfs.rs         | 8 +++++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index c10a55fc58948..b2547ad1e5ddd 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -1,9 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use super::{
-    NullBlkDevice,
-    THIS_MODULE, //
-};
+use super::NullBlkDevice;
+use crate::LocalModule;
 use kernel::{
     block::mq::gen_disk::{
         GenDisk,
diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
index 2339c6467325d..8bd32627386e0 100644
--- a/rust/kernel/configfs.rs
+++ b/rust/kernel/configfs.rs
@@ -875,7 +875,7 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
 ///                 configfs::Subsystem<Configuration>,
 ///                 Configuration
 ///                 >::new_with_child_ctor::<N,Child>(
-///             &THIS_MODULE,
+///             ::kernel::this_module::<LocalModule>(),
 ///             &CONFIGURATION_ATTRS
 ///         );
 ///
@@ -1021,7 +1021,8 @@ macro_rules! configfs_attrs {
 
                     static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data>  =
                         $crate::configfs::ItemType::<$container, $data>::new::<N>(
-                            &THIS_MODULE, &[<$ data:upper _ATTRS >]
+                            $crate::this_module::<LocalModule>(),
+                            &[<$ data:upper _ATTRS >]
                         );
                 )?
 
@@ -1030,7 +1031,8 @@ macro_rules! configfs_attrs {
                         $crate::configfs::ItemType<$container, $data>  =
                             $crate::configfs::ItemType::<$container, $data>::
                             new_with_child_ctor::<N, $child>(
-                                &THIS_MODULE, &[<$ data:upper _ATTRS >]
+                                $crate::this_module::<LocalModule>(),
+                                &[<$ data:upper _ATTRS >]
                             );
                 )?
 

-- 
2.43.0



^ permalink raw reply related

* [PATCH v3 2/6] rust: doctest: add LocalModule fallback for #[vtable] ThisModule
From: Alvin Sun @ 2026-06-22  2:44 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, Alvin Sun
In-Reply-To: <20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev>

Add a `LocalModule` struct with a null-pointer `ModuleMetadata` impl
in the doctest harness, so that `crate::LocalModule` (auto-inserted
by `#[vtable]`) resolves correctly when there is no `module!` macro.

Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
 scripts/rustdoc_test_gen.rs | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index ee76e96b41eea..198af4e446c8c 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -239,6 +239,22 @@ macro_rules! assert_eq {{
 
 const __LOG_PREFIX: &[u8] = b"rust_doctests_kernel\0";
 
+/// Dummy module type for doctest context.
+struct LocalModule;
+
+use kernel::{{
+    str::CStr,
+    ModuleMetadata,
+    ThisModule, //
+}};
+use core::ptr::null_mut;
+
+impl ModuleMetadata for LocalModule {{
+    const NAME: &'static CStr = c"rust_doctests_kernel";
+    // SAFETY: `try_module_get`/`module_put` handle null module pointers gracefully.
+    const THIS_MODULE: ThisModule = unsafe {{ ThisModule::from_ptr(null_mut()) }};
+}}
+
 {rust_tests}
 "#
     )

-- 
2.43.0



^ permalink raw reply related

* [PATCH v3 3/6] rust: macros: auto-insert OwnerModule in #[vtable]
From: Alvin Sun @ 2026-06-22  2:44 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, Alvin Sun
In-Reply-To: <20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev>

Auto-add `type OwnerModule: ::kernel::ModuleMetadata;` as a required
associated type on the trait side if not already defined, and
auto-insert `type OwnerModule = crate::LocalModule;` on the impl side
if not explicitly provided, eliminating the need to manually declare
and implement `OwnerModule` in every vtable trait and impl.

Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/all/DIMMWHUOLPSH.13JFRHDKDQJGO@garyguo.net
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
 rust/macros/lib.rs    |  6 ++++++
 rust/macros/vtable.rs | 41 ++++++++++++++++++++++++++++++++++++-----
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/rust/macros/lib.rs b/rust/macros/lib.rs
index 2cfd59e0f9e7c..bc7ded353c5ca 100644
--- a/rust/macros/lib.rs
+++ b/rust/macros/lib.rs
@@ -176,6 +176,12 @@ pub fn module(input: TokenStream) -> TokenStream {
 ///
 /// This macro should not be used when all functions are required.
 ///
+/// Additionally, this macro automatically handles the `OwnerModule`
+/// associated type: on the trait side, `type OwnerModule: ModuleMetadata;`
+/// is added as a required associated type if not already defined; on the
+/// impl side, `type OwnerModule = LocalModule;` is automatically inserted
+/// if not explicitly defined.
+///
 /// # Examples
 ///
 /// ```
diff --git a/rust/macros/vtable.rs b/rust/macros/vtable.rs
index c6510b0c4ea1d..be9a5ed8abe5e 100644
--- a/rust/macros/vtable.rs
+++ b/rust/macros/vtable.rs
@@ -30,6 +30,22 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
          const USE_VTABLE_ATTR: ();
     });
 
+    // Add `type OwnerModule: ModuleMetadata` as a required associated type if
+    // the trait does not already define it.
+    if !item
+        .items
+        .iter()
+        .any(|i| matches!(i, TraitItem::Type(t) if t.ident == "OwnerModule"))
+    {
+        gen_items.push(parse_quote! {
+            /// The module implementing this vtable trait.
+            ///
+            /// Automatically set to `crate::LocalModule` by the `#[vtable]`
+            /// impl macro.
+            type OwnerModule: ::kernel::ModuleMetadata;
+        });
+    }
+
     for item in &item.items {
         if let TraitItem::Fn(fn_item) = item {
             let name = &fn_item.sig.ident;
@@ -57,12 +73,18 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
 
 fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
     let mut gen_items = Vec::new();
-    let mut defined_consts = HashSet::new();
+    let mut defined_items = HashSet::new();
 
-    // Iterate over all user-defined constants to gather any possible explicit overrides.
+    // Iterate over all user-defined items to gather any possible explicit overrides.
     for item in &item.items {
-        if let ImplItem::Const(const_item) = item {
-            defined_consts.insert(const_item.ident.clone());
+        match item {
+            ImplItem::Const(const_item) => {
+                defined_items.insert(const_item.ident.clone());
+            }
+            ImplItem::Type(type_item) => {
+                defined_items.insert(type_item.ident.clone());
+            }
+            _ => {}
         }
     }
 
@@ -70,6 +92,15 @@ fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
         const USE_VTABLE_ATTR: () = ();
     });
 
+    // Auto-insert `type OwnerModule = crate::LocalModule` if not explicitly defined.
+    // `crate::LocalModule` resolves to the real module type (via `module!`) or a
+    // dummy fallback in non-module contexts (e.g., doctests).
+    if !defined_items.contains(&parse_quote!(OwnerModule)) {
+        gen_items.push(parse_quote! {
+            type OwnerModule = crate::LocalModule;
+        });
+    }
+
     for item in &item.items {
         if let ImplItem::Fn(fn_item) = item {
             let name = &fn_item.sig.ident;
@@ -78,7 +109,7 @@ fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
                 name.span(),
             );
             // Skip if it's declared already -- this allows user override.
-            if defined_consts.contains(&gen_const_name) {
+            if defined_items.contains(&gen_const_name) {
                 continue;
             }
             let cfg_attrs = crate::helpers::gather_cfg_attrs(&fn_item.attrs);

-- 
2.43.0



^ permalink raw reply related

* [PATCH v3 1/6] rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
From: Alvin Sun @ 2026-06-22  2:44 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, Alvin Sun
In-Reply-To: <20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev>

Since `const_refs_to_static` has been stable as of the MSRV bump, a
`ThisModule` pointer can now be used in const contexts.

Add a `THIS_MODULE` const to the `ModuleMetadata` trait so that modules
can provide their `ThisModule` pointer in const contexts such as static
`file_operations`.

Move the `THIS_MODULE` static from the `module!` macro into the
`ModuleMetadata` impl, add a `this_module()` helper, and update `__init`
to use it.

Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
 rust/kernel/lib.rs    |  8 ++++++++
 rust/macros/module.rs | 34 +++++++++++++++++-----------------
 2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index b72b2fbe046d6..50f5a7b5f028e 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -184,6 +184,14 @@ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, error::Erro
 pub trait ModuleMetadata {
     /// The name of the module as specified in the `module!` macro.
     const NAME: &'static crate::str::CStr;
+
+    /// The module's `THIS_MODULE` pointer.
+    const THIS_MODULE: ThisModule;
+}
+
+/// Returns a reference to the `THIS_MODULE` of the given module type.
+pub const fn this_module<M: ModuleMetadata>() -> &'static ThisModule {
+    &M::THIS_MODULE
 }
 
 /// Equivalent to `THIS_MODULE` in the C API.
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index 06c18e2075083..b9fdee2f2af47 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -497,28 +497,28 @@ pub(crate) fn module(info: ModuleInfo) -> Result<TokenStream> {
         /// Used by the printing macros, e.g. [`info!`].
         const __LOG_PREFIX: &[u8] = #name_cstr.to_bytes_with_nul();
 
-        // SAFETY: `__this_module` is constructed by the kernel at load time and will not be
-        // freed until the module is unloaded.
-        #[cfg(MODULE)]
-        static THIS_MODULE: ::kernel::ThisModule = unsafe {
-            extern "C" {
-                static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
-            };
-
-            ::kernel::ThisModule::from_ptr(__this_module.get())
-        };
-
-        #[cfg(not(MODULE))]
-        static THIS_MODULE: ::kernel::ThisModule = unsafe {
-            ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
-        };
-
         /// The `LocalModule` type is the type of the module created by `module!`,
         /// `module_pci_driver!`, `module_platform_driver!`, etc.
         type LocalModule = #type_;
 
         impl ::kernel::ModuleMetadata for #type_ {
             const NAME: &'static ::kernel::str::CStr = #name_cstr;
+
+            #[cfg(MODULE)]
+            const THIS_MODULE: ::kernel::ThisModule = {
+                extern "C" {
+                    static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
+                }
+
+                // SAFETY: `__this_module` is constructed by the kernel at load time
+                // and lives until the module is unloaded.
+                unsafe { ::kernel::ThisModule::from_ptr(__this_module.get()) }
+            };
+
+            #[cfg(not(MODULE))]
+            const THIS_MODULE: ::kernel::ThisModule = unsafe {
+                ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
+            };
         }
 
         // Double nested modules, since then nobody can access the public items inside.
@@ -616,7 +616,7 @@ pub extern "C" fn #ident_exit() {
                 /// This function must only be called once.
                 unsafe fn __init() -> ::kernel::ffi::c_int {
                     let initer = <super::super::LocalModule as ::kernel::InPlaceModule>::init(
-                        &super::super::THIS_MODULE
+                        ::kernel::this_module::<super::super::LocalModule>()
                     );
                     // SAFETY: No data race, since `__MOD` can only be accessed by this module
                     // and there only `__init` and `__exit` access it. These functions are only

-- 
2.43.0



^ permalink raw reply related

* [PATCH v3 0/6] Fix missing fops.owner in Rust DRM/misc abstractions
From: Alvin Sun @ 2026-06-22  2:44 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, Alvin Sun

During tyr debugfs development, a kernel NULL pointer dereference was
encountered after `rmmod tyr` while gnome-shell still held /dev/card1 open:

```
  [158827.868132] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
  [158827.868918] Mem abort info:
  [158827.869177]   ESR = 0x0000000086000004
  [158827.869519]   EC = 0x21: IABT (current EL), IL = 32 bits
  [158827.870000]   SET = 0, FnV = 0
  [158827.870281]   EA = 0, S1PTW = 0
  [158827.870571]   FSC = 0x04: level 0 translation fault
  [158827.871043] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108dec000
  [158827.871623] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
  [158827.872242] Internal error: Oops: 0000000086000004 [#1]  SMP
  [158827.872246] Modules linked in: tyr sunrpc snd_soc_simple_card rk805_pwrkey snd_soc_simple_card_utils rtw88_8822bu display_connector rtw88_usb rtw88_8822b snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec
  rtw88_core]
  [158827.872337] CPU: 4 UID: 1000 PID: 11276 Comm: gnome-s:disk$0 Tainted: G                 N  7.1.0-rc1+ #331 PREEMPT
  [158827.880534] Tainted: [N]=TEST
  [158827.880535] Hardware name: FriendlyElec NanoPi R6C/NanoPi R6C, BIOS v1.1 04/09/2025
  [158827.880538] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [158827.880542] pc : 0x0
  [158827.880547] lr : _RNvNtCs257m05FHVbX_3tyr2vm8pt_unmap+0x8c/0x12c [tyr]
  [158827.880578] sp : ffff800083c236b0
  [158827.880579] x29: ffff800083c236d0 x28: ffff00013f8a0000 x27: 0000000000000000
  [158827.880585] x26: 000000000000007c x25: ffff000108e6ed80 x24: 0000000000401000
  [158827.880590] x23: 0000000000000000 x22: 0000000040000000 x21: 0000000000001000
  [158827.880595] x20: ffff00010f778138 x19: 0000000000400000 x18: 00000000ffffffff
  [158827.880600] x17: 000000040044ffff x16: 045000f2b5503510 x15: 0720072007200720
  [158827.880606] x14: 0720072007200720 x13: 0000000000401000 x12: 0000000000400000
  [158827.880611] x11: ffff800083c239d0 x10: ffff000141e4fd88 x9 : 0000000000000000
  [158827.880615] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000400000
  [158827.880620] x5 : ffff00013f8a0000 x4 : 0000000000000000 x3 : 0000000000000001
  [158827.880625] x2 : 0000000000001000 x1 : 0000000000400000 x0 : ffff00010f778138
  [158827.880630] Call trace:
  [158827.880632]  0x0 (P)
  [158827.880635]  _RNvXs6_NtCs257m05FHVbX_3tyr2vmNtB5_9GpuVmDataNtNtNtCsgmSOfgXi5CZ_6kernel3drm5gpuvm11DriverGpuVm13sm_step_unmap+0x3c/0x120 [tyr]
  [158827.891166]  _RNvMs4_NtNtNtCsgmSOfgXi5CZ_6kernel3drm5gpuvm6sm_opsINtB7_5GpuVmNtNtCs257m05FHVbX_3tyr2vm9GpuVmDataE13sm_step_unmapB13_+0x18/0x34 [tyr]
  [158827.891187]  op_unmap_cb+0x78/0xb0
  [158827.891196]  __drm_gpuvm_sm_unmap+0x18c/0x1b4
  [158827.891204]  drm_gpuvm_sm_unmap+0x38/0x4c
  [158827.891209]  _RNvMs5_NtCs257m05FHVbX_3tyr2vmNtB5_2Vm7exec_op+0x1cc/0x254 [tyr]
  [158827.894085]  _RNvMs5_NtCs257m05FHVbX_3tyr2vmNtB5_2Vm11unmap_range+0x124/0x188 [tyr]
  [158827.894105]  _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeNtNtCs257m05FHVbX_3tyr3gem8KernelBoEBK_+0x44/0xd8 [tyr]
  [158827.894125]  _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeINtNtNtCsgmSOfgXi5CZ_6kernel5alloc4kvec3VecNtNtCs257m05FHVbX_3tyr2fw7SectionNtNtBL_9allocator7KmallocEEB1r_+0x3c/0x100 [tyr]
  [158827.894147]  _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeINtNtNtCsgmSOfgXi5CZ_6kernel4sync3arc3ArcNtNtCs257m05FHVbX_3tyr2fw8FirmwareEEB1p_+0x94/0x190 [tyr]
  [158827.894167]  _RNvMs4_NtNtCsgmSOfgXi5CZ_6kernel3drm6deviceINtB5_6DeviceNtNtCs257m05FHVbX_3tyr6driver12TyrDrmDriverE7releaseBW_+0x30/0x98 [tyr]
  [158827.899550]  drm_dev_put.part.0+0x88/0xc0
  [158827.899557]  drm_minor_release+0x18/0x28
  [158827.899562]  drm_release+0x144/0x170
  [158827.899567]  __fput+0xe4/0x30c
  [158827.899573]  ____fput+0x14/0x20
  [158827.899579]  task_work_run+0x7c/0xe8
  [158827.899586]  do_exit+0x2a8/0xac4
  [158827.899590]  do_group_exit+0x34/0x90
  [158827.899594]  get_signal+0xaac/0xabc
  [158827.899599]  arch_do_signal_or_restart+0x90/0x3e8
  [158827.899606]  exit_to_user_mode_loop+0x140/0x1d0
  [158827.899613]  el0_svc+0x2f4/0x2f8
  [158827.899620]  el0t_64_sync_handler+0xa0/0xe4
  [158827.899627]  el0t_64_sync+0x198/0x19c
  [158827.899632] ---[ end trace 0000000000000000 ]---
```

The root cause: `fops.owner` was `NULL` in Rust DRM drivers, so the kernel
never blocked module unloading while file descriptors were open. This leads to
use-after-free when drm_release (or other fops) is called on freed module code.

The series moves `THIS_MODULE` into the `ModuleMetadata` as a const, threads it
through `#[vtable]` to set `fops.owner` in DRM/miscdevice, and updates configfs
and rnull to use `this_module::<LocalModule>()`.

Assisted-by: opencode:glm-5.2
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
Changes in v3:
- Renamed vtable associated type `ThisModule` to `OwnerModule`
- Added `this_module()` helper for ergonomic `THIS_MODULE` access
- Refined vtable macro implementation: one-liner detection and single `defined_items` set
- Reordered commits to place doctest fallback before vtable auto-insert
- Link to v2: https://lore.kernel.org/r/20260521-fix-fops-owner-v2-0-fd99079c5a04@linux.dev

Changes in v2:
- Merged old `static THIS_MODULE` and v1's `MODULE_PTR` into a single
  `ModuleMetadata::THIS_MODULE` const
- `#[vtable]` macro now auto-inserts `type ThisModule`, removing all per-driver
  manual patches from v1
- Added configfs & rnull usage site updates and doctest `LocalModule` fallback
- Link to v1: https://lore.kernel.org/r/20260519-fix-fops-owner-v1-0-2ded9830da14@linux.dev

---
Alvin Sun (6):
      rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
      rust: doctest: add LocalModule fallback for #[vtable] ThisModule
      rust: macros: auto-insert OwnerModule in #[vtable]
      rust: drm: set fops.owner from driver module pointer
      rust: miscdevice: set fops.owner from driver module pointer
      rust: configfs: use `LocalModule` for `THIS_MODULE`

 drivers/block/rnull/configfs.rs |  6 ++----
 rust/kernel/configfs.rs         |  8 +++++---
 rust/kernel/drm/device.rs       |  3 ++-
 rust/kernel/drm/gem/mod.rs      |  4 ++--
 rust/kernel/lib.rs              |  8 ++++++++
 rust/kernel/miscdevice.rs       |  4 +++-
 rust/macros/lib.rs              |  6 ++++++
 rust/macros/module.rs           | 34 +++++++++++++++++-----------------
 rust/macros/vtable.rs           | 41 ++++++++++++++++++++++++++++++++++++-----
 scripts/rustdoc_test_gen.rs     | 16 ++++++++++++++++
 10 files changed, 97 insertions(+), 33 deletions(-)
---
base-commit: b7e5ac83cb16f7ffd11dc23736f84276602100ed
change-id: 20260519-fix-fops-owner-e3a77bb27c6c
prerequisite-change-id: 20260519-miscdev-use-format-9ab7e83b1c11:v3
prerequisite-patch-id: 405b334ff0d48ad350014f05a2321bdbaa025400
prerequisite-patch-id: 604b631c81d5423f4ebb2e12ba2d22e9ce371bfc
prerequisite-patch-id: cb550d94cefe01920e0d3ced2b2bcbecd76f3907
prerequisite-patch-id: 3bc830839742591460cb86d9472c04f4686dc600
prerequisite-patch-id: 571058244bc8c7088638d2e3225713011246c7e9
prerequisite-patch-id: 347c5a3c6dbef9832bfce8419fc23e6e08ba477f
prerequisite-patch-id: 3e202d988b56b88446f7535e90d3f00cf5f15701

Best regards,
-- 
Alvin Sun <alvin.sun@linux.dev>



^ permalink raw reply

* Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Hillf Danton @ 2026-06-22  1:43 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: edumazet, linux-block, nbd, linux-kernel,
	syzbot+6b85d1e39a5b8ed9a954
In-Reply-To: <20260621235255.66015-1-kartikey406@gmail.com>

On Mon, 22 Jun 2026 05:22:55 +0530 Deepanshu Kartikey wrote:
> nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
> held at the point of reclassification. That assertion was copied from
> nvme-tcp, where the socket is created internally by the kernel
> (sock_create_kern()) and is never visible to user space, so the lock
> is guaranteed to be free.
> 
> NBD is different: the socket is looked up from a user-supplied fd in
> nbd_get_socket(), and user space retains that fd. A concurrent syscall
> on the same socket (or softirq processing taking bh_lock_sock() on a
> connected TCP socket) can legitimately hold the lock at the instant
> NBD reclassifies it. sock_allow_reclassification() then returns false
> and the WARN_ON_ONCE() fires, which turns into a crash under
> panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
> against socket activity on the same fd, as reported by syzbot.
> 
Given the syzbot report, if you are right (I suspect) then Eric delivered
another half-baked croissant, and feel free to cut it off instead to make
room for correct fix.

> Hitting a held lock here is expected for an externally owned socket and
> is not a kernel bug, so skip reclassification silently instead of
> warning. Reclassification is a lockdep-only annotation, so skipping it
> in the rare racing case is harmless.
> 
> Reported-by: syzbot+6b85d1e39a5b8ed9a954@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=6b85d1e39a5b8ed9a954
> Fixes: d532cddb6c60 ("nbd: Reclassify sockets to avoid lockdep circular dependency")
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> ---
>  drivers/block/nbd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 3a585a0c882a..8f10762e90ef 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1246,7 +1246,7 @@ static void nbd_reclassify_socket(struct socket *sock)
>  {
>  	struct sock *sk = sock->sk;
>  
> -	if (WARN_ON_ONCE(!sock_allow_reclassification(sk)))
> +	if (!sock_allow_reclassification(sk))
>  		return;
>  
>  	switch (sk->sk_family) {
> -- 
> 2.43.0

^ permalink raw reply

* [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Deepanshu Kartikey @ 2026-06-21 23:52 UTC (permalink / raw)
  To: josef, axboe, edumazet
  Cc: linux-block, nbd, linux-kernel, Deepanshu Kartikey,
	syzbot+6b85d1e39a5b8ed9a954


nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
held at the point of reclassification. That assertion was copied from
nvme-tcp, where the socket is created internally by the kernel
(sock_create_kern()) and is never visible to user space, so the lock
is guaranteed to be free.

NBD is different: the socket is looked up from a user-supplied fd in
nbd_get_socket(), and user space retains that fd. A concurrent syscall
on the same socket (or softirq processing taking bh_lock_sock() on a
connected TCP socket) can legitimately hold the lock at the instant
NBD reclassifies it. sock_allow_reclassification() then returns false
and the WARN_ON_ONCE() fires, which turns into a crash under
panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
against socket activity on the same fd, as reported by syzbot.

Hitting a held lock here is expected for an externally owned socket and
is not a kernel bug, so skip reclassification silently instead of
warning. Reclassification is a lockdep-only annotation, so skipping it
in the rare racing case is harmless.

Reported-by: syzbot+6b85d1e39a5b8ed9a954@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6b85d1e39a5b8ed9a954
Fixes: d532cddb6c60 ("nbd: Reclassify sockets to avoid lockdep circular dependency")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 drivers/block/nbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 3a585a0c882a..8f10762e90ef 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1246,7 +1246,7 @@ static void nbd_reclassify_socket(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 
-	if (WARN_ON_ONCE(!sock_allow_reclassification(sk)))
+	if (!sock_allow_reclassification(sk))
 		return;
 
 	switch (sk->sk_family) {
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] block, bfq: protect async queue reset with blkcg locks
From: Tao Cui @ 2026-06-21 18:33 UTC (permalink / raw)
  To: Cen Zhang, Yu Kuai, Tejun Heo, Josef Bacik, Jens Axboe,
	Arianna Avanzini, Paolo Valente
  Cc: cui.tao, linux-block, cgroups, linux-kernel, baijiaju1990
In-Reply-To: <20260621135930.2657810-1-zzzccc427@gmail.com>


Nice catch. The race is real, and the fix lines up with how the rest
of the blkcg code already protects blkg_list walks — the new nesting
(blkcg_mutex -> queue_lock -> bfqd->lock) is the same order
blkg_free_workfn() and bfq_pd_offline() use, so no inversion.

Reviewed-by: Tao Cui <cuitao@kylinos.cn>

在 2026/6/21 21:59, Cen Zhang 写道:
> Writing 0 to BFQ's low_latency attribute ends weight raising for active,
> idle and async queues. The async cgroup path walks q->blkg_list, converts
> each blkg to BFQ policy data and then reads bfqg->async_bfqq and
> bfqg->async_idle_bfqq.
> 
> That walk was protected only by bfqd->lock. blkcg release work is
> serialized by q->blkcg_mutex and q->queue_lock instead, and
> blkg_free_workfn() can call BFQ's pd_free_fn before it removes
> blkg->q_node from q->blkg_list. A low_latency reset can therefore still
> find the blkg on the queue list after the BFQ policy data has been freed.
> 
> The buggy scenario involves two paths, with each column showing the order
> within that path:
> 
> BFQ low_latency reset:              blkcg blkg release work:
> 1. bfq_low_latency_store()          1. blkg_free_workfn() takes
>    calls bfq_end_wr().                 q->blkcg_mutex.
> 2. bfq_end_wr_async() walks         2. BFQ pd_free_fn drops the
>    q->blkg_list.                       final bfq_group reference.
> 3. blkg_to_bfqg() returns           3. blkg->q_node remains on
>    the stale policy data.              q->blkg_list until list_del_init().
> 4. bfq_end_wr_async_queues()
>    reads async queue fields.
> 
> Fix this by taking q->blkcg_mutex and q->queue_lock around the
> q->blkg_list walk, then taking bfqd->lock before touching BFQ async
> queues. The mutex serializes against policy-data free and queue_lock
> stabilizes the list. Move the async reset out of bfq_end_wr()'s existing
> bfqd->lock critical section so the lock order matches blkcg policy
> callbacks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox