* Re: [PATCH blktests] Fix _get_page_size()
From: Shin'ichiro Kawasaki @ 2026-06-22 22:27 UTC (permalink / raw)
To: Omar Sandoval; +Cc: Bart Van Assche, Jeff Moyer, linux-block, kch
In-Reply-To: <ajlxXfgpMQJ4qlRR@telecaster>
On Jun 22, 2026 / 10:31, Omar Sandoval wrote:
> On Mon, Jun 22, 2026 at 08:38:48PM +0900, Shin'ichiro Kawasaki wrote:
> > On Jun 20, 2026 / 09:11, Bart Van Assche wrote:
> > > On 6/20/26 6:51 AM, Shin'ichiro Kawasaki wrote:
> > > > On Jun 20, 2026 / 05:55, Bart Van Assche wrote:
> > > > > On 6/20/26 3:26 AM, Shin'ichiro Kawasaki wrote:
> > > > > > This is a rather fundamental change, so I would like to ask opinions from
> > > > > > other blktests users, especially Omar and Chaitanya. What do you think about
> > > > > > the idea to add getconf to the requirement list?
> > > > >
> > > > > CONFIG_PAGE_SHIFT was introduced in the Linux kernel in February 2024
> > > > > (commit ba89f9c8ccba ("arch: consolidate existing CONFIG_PAGE_SIZE_*KB
> > > > > definitions")). Older kernels had CONFIG_PAGE_SIZE_4KB,
> > > > > CONFIG_PAGE_SIZE_16KB, etc. This means that it is possible to derive the
> > > > > kernel page size from the kernel configuration file for all upstream and
> > > > > distro kernels, isn't it?
> > > >
> > > > I checked the commit is in the tag v6.9. My Debian bookworm system has kernel
> > > > v6.1, then the config file at /boot does not have CONFIG_PAGE_SHIFT as expected.
> > > > But it does not have CONFIG_PAGE_SIZE_* either... I'm still afraid that kernel
> > > > config file approach is not reliable.
> > >
> > > Right, for older kernels CONFIG_PAGE_SIZE_*KB is only available for some
> > > but not for all supported architectures.
> > >
> > > It is not clear to me where the desire to avoid the dependency on
> > > getconf comes from? As far as I know it is available on all Linux
> > > distro's. Since it is typically included in the C library package it
> > > should not introduce a new dependency.
> >
> > I think less dependent is the better in general, and wanted to confirm that
> > it is fine for everybody. If there is no voice to object, I will create a
> > patch to add getconf to the requirement list.
>
> I agree with Bart, getconf is ubiquitous enough that it's not worth
> trying to hack around its absence. In my opinion, parsing kernel config
> options should be a last resort. If anyone complains about the getconf
> dependency in the future, I think it'd be better to add a simple
> src/pagesize.c file that uses sysconf(_SC_PAGESIZE), but I don't expect
> that to be necessary.
Omar, thank you for the comment. It's good to have the plan B idea of
"src/pagesize.c". I will prepare the patch to add getconf to the
requirement list as the plan A.
^ permalink raw reply
* [PATCH net v3 0/2] vsock/virtio: fix msg_iter desync on transmission failure
From: Octavian Purdila @ 2026-06-22 22:27 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Jens Axboe, Octavian Purdila
This series fixes a msg_iter desync issue in the virtio vsock transport
that can lead to warnings and eventual -ENOMEM under specific failure
scenarios (e.g. partial GUP failure during MSG_ZEROCOPY transmission).
To fix this, we need to restore the msg_iter state on transmission failure.
However, since virtio vsock transport can be built as a module, we first
need to export iov_iter_restore.
Patch 1 exports iov_iter_restore.
Patch 2 implements the msg_iter restoration in virtio vsock.
Changes in v3:
- Use EXPORT_SYMBOL_GPL (Jens)
Changes in v2:
- Use iov_iter_savestate()/iov_iter_restore() (Stefano)
- Use a single restore point (Stefano)
- Reverse xmas tree (Stefano)
- Added comments in the code (Stefano)
v2: https://lore.kernel.org/all/20260613000953.467473-1-tavip@google.com/
v1: https://lore.kernel.org/all/20260609004809.1285028-1-tavip@google.com/
Octavian Purdila (2):
iov_iter: export iov_iter_restore
vsock/virtio: restore msg_iter on transmission failure
lib/iov_iter.c | 1 +
net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
2 files changed, 14 insertions(+)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply
* [PATCH net v3 1/2] iov_iter: export iov_iter_restore
From: Octavian Purdila @ 2026-06-22 22:27 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Jens Axboe, Octavian Purdila
In-Reply-To: <20260622222757.2130402-1-tavip@google.com>
Export iov_iter_restore so that it can be used by modules.
This is needed by the virtio vsock transport (which can be built as a
module) to restore the msg_iter state when transmission fails.
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Octavian Purdila <tavip@google.com>
---
lib/iov_iter.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 273919b161617..f5df63961fb24 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
i->__iov -= state->nr_segs - i->nr_segs;
i->nr_segs = state->nr_segs;
}
+EXPORT_SYMBOL_GPL(iov_iter_restore);
/*
* Extract a list of contiguous pages from an ITER_FOLIOQ iterator. This does
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH net v3 2/2] vsock/virtio: restore msg_iter on transmission failure
From: Octavian Purdila @ 2026-06-22 22:27 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Jens Axboe, Octavian Purdila,
syzbot+28e5f3d207b14bae122a
In-Reply-To: <20260622222757.2130402-1-tavip@google.com>
When transmission fails in virtio_transport_send_pkt_info, the msg_iter
might have been partially advanced. If we don't restore it, the next
attempt to send data will use an incorrect iterator state, leading to
desync and warnings like "send_pkt() returns 0, but X expected".
Specifically, this can happen in the following scenario, triggered by
the syzkaller repro:
1. A write-only VMA (PROT_WRITE only) is partially populated by a
prior TUN write that failed with -EIO but still faulted in some
pages).
2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
buffer from this VMA.
3. The first packet (64KB) is sent successfully because the pages are
populated.
4. The second packet allocation fails because GUP fast pins the first page
but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
permissions.
5. The iterator is advanced by the partially successful GUP (68KB total
advanced: 64KB from first packet + 4KB from second), but the send loop
breaks and only reports 64KB sent. This creates a 4KB desync.
6. The next retry starts with a non-zero iov_offset, disabling zerocopy
and falling back to copy mode.
7. In copy mode, the transmission succeeds for the next packets but
exhausts the iterator early because of the desync.
8. The final retry sees an empty iterator but zerocopy is re-enabled
(offset resets). It attempts to send the remaining bytes with zerocopy
but pins 0 pages, creating an empty packet.
9. The transport sends the empty packet, triggering the warning because
the returned bytes (header only) do not match the expected payload size.
10. The loop continues to spin, allocating ubuf_info each time, eventually
exhausting sysctl_optmem_max and returning -ENOMEM to userspace.
Restore msg_iter to its original state before the packet allocation
and transmission attempt if they fail.
Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Reported-by: syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
Assisted-by: gemini:gemini-3.1-pro
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Octavian Purdila <tavip@google.com>
---
net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 09475007165b3..35fd4094d771d 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -295,6 +295,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
u32 src_cid, src_port, dst_cid, dst_port;
const struct virtio_transport *t_ops;
+ struct iov_iter_state msg_iter_state;
struct virtio_vsock_sock *vvs;
struct ubuf_info *uarg = NULL;
u32 pkt_len = info->pkt_len;
@@ -368,8 +369,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
struct sk_buff *skb;
size_t skb_len;
+ /* Save iterator state in case allocation or transmission fails
+ * so we can restore it and retry.
+ */
+ if (info->msg)
+ iov_iter_save_state(&info->msg->msg_iter, &msg_iter_state);
+
skb_len = min(max_skb_len, rest_len);
+ /* Note: virtio_transport_alloc_skb() can advance info->msg->msg_iter
+ * even if it fails (e.g. partial GUP success).
+ */
skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
uarg,
src_cid, src_port,
@@ -399,6 +409,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
break;
} while (rest_len);
+ if (info->msg && ret < 0)
+ iov_iter_restore(&info->msg->msg_iter, &msg_iter_state);
+
virtio_transport_put_credit(vvs, rest_len);
/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Hillf Danton @ 2026-06-23 0:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: Deepanshu Kartikey, linux-block, nbd, linux-kernel,
syzbot+6b85d1e39a5b8ed9a954
In-Reply-To: <CANn89iJBomCNpwzOiYHmmPf0i3KQGaqoiKh6VFeM6NHOQaCn3Q@mail.gmail.com>
On Mon, 22 Jun 2026 01:18:10 -0700 Eric Dumazet wrote:
>On Sun, Jun 21, 2026 at 6:43 PM Hillf Danton <hdanton@sina.com> wrote:
>> On Mon, 22 Jun 2026 05:22:55 +0530 Deepanshu Kartikey wrote:
>> > nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
>> > held at the point of reclassification. That assertion was copied from
>> > nvme-tcp, where the socket is created internally by the kernel
>> > (sock_create_kern()) and is never visible to user space, so the lock
>> > is guaranteed to be free.
>> >
>> > NBD is different: the socket is looked up from a user-supplied fd in
>> > nbd_get_socket(), and user space retains that fd. A concurrent syscall
>> > on the same socket (or softirq processing taking bh_lock_sock() on a
>> > connected TCP socket) can legitimately hold the lock at the instant
>> > NBD reclassifies it. sock_allow_reclassification() then returns false
>> > and the WARN_ON_ONCE() fires, which turns into a crash under
>> > panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
>> > against socket activity on the same fd, as reported by syzbot.
>> >
>> Given the syzbot report, if you are right (I suspect) then Eric delivered
>> another half-baked croissant, and feel free to cut it off instead to make
>> room for correct fix.
>
> Nobody (including you) caught this.difference between nbd and other
> sock_allow_reclassification() callers.
>
Nope, actually it raises the question -- does the deadlock still remain
after your fix without the lock key you added applied?
> What was the "correct fix" you envisioned exactly?
>
Frankly I had no evidence against your fix a couple days back, but now I
see your lock key approach fails to take off. And the correct fix is to
erase the incorrect locking order ffa1e7ada456 tries to catch, more
difficult than you thought so far.
^ permalink raw reply
* Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Eric Dumazet @ 2026-06-23 0:21 UTC (permalink / raw)
To: Hillf Danton
Cc: Deepanshu Kartikey, linux-block, nbd, linux-kernel,
syzbot+6b85d1e39a5b8ed9a954
In-Reply-To: <20260623000723.135-1-hdanton@sina.com>
On Mon, Jun 22, 2026 at 5:07 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Mon, 22 Jun 2026 01:18:10 -0700 Eric Dumazet wrote:
> >On Sun, Jun 21, 2026 at 6:43 PM Hillf Danton <hdanton@sina.com> wrote:
> >> On Mon, 22 Jun 2026 05:22:55 +0530 Deepanshu Kartikey wrote:
> >> > nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
> >> > held at the point of reclassification. That assertion was copied from
> >> > nvme-tcp, where the socket is created internally by the kernel
> >> > (sock_create_kern()) and is never visible to user space, so the lock
> >> > is guaranteed to be free.
> >> >
> >> > NBD is different: the socket is looked up from a user-supplied fd in
> >> > nbd_get_socket(), and user space retains that fd. A concurrent syscall
> >> > on the same socket (or softirq processing taking bh_lock_sock() on a
> >> > connected TCP socket) can legitimately hold the lock at the instant
> >> > NBD reclassifies it. sock_allow_reclassification() then returns false
> >> > and the WARN_ON_ONCE() fires, which turns into a crash under
> >> > panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
> >> > against socket activity on the same fd, as reported by syzbot.
> >> >
> >> Given the syzbot report, if you are right (I suspect) then Eric delivered
> >> another half-baked croissant, and feel free to cut it off instead to make
> >> room for correct fix.
> >
> > Nobody (including you) caught this.difference between nbd and other
> > sock_allow_reclassification() callers.
> >
> Nope, actually it raises the question -- does the deadlock still remain
> after your fix without the lock key you added applied?
LOCKDEP might have a false positive, but it will be much much harder to trigger.
I had about 50 syzbot duplicates (that I did not release) before d532cddb6c60
("nbd: Reclassify sockets to avoid lockdep circular dependency").
>
> > What was the "correct fix" you envisioned exactly?
> >
> Frankly I had no evidence against your fix a couple days back, but now I
> see your lock key approach fails to take off. And the correct fix is to
> erase the incorrect locking order ffa1e7ada456 tries to catch, more
> difficult than you thought so far.
Which incorrect locking order are you referring to? This is a LOCKDEP
false positive.
I suggest you send a patch so we can discuss it.
^ permalink raw reply
* Re: [PATCH] nbd: don't warn when reclassifying a busy socket lock
From: Hillf Danton @ 2026-06-23 0:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: Deepanshu Kartikey, linux-block, nbd, linux-kernel,
syzbot+6b85d1e39a5b8ed9a954
In-Reply-To: <CANn89iLPWCo_u-8jCsDM6jjZYfESvtUt9n3xD7yuAyNNntSw6w@mail.gmail.com>
On Mon, 22 Jun 2026 17:21:53 -0700 Eric Dumazet wrote:
>On Mon, Jun 22, 2026 at 5:07 PM Hillf Danton <hdanton@sina.com> wrote:
>> On Mon, 22 Jun 2026 01:18:10 -0700 Eric Dumazet wrote:
>> >On Sun, Jun 21, 2026 at 6:43 PM Hillf Danton <hdanton@sina.com> wrote:
>> >> On Mon, 22 Jun 2026 05:22:55 +0530 Deepanshu Kartikey wrote:
>> >> > nbd_reclassify_socket() warns via WARN_ON_ONCE() if the socket lock is
>> >> > held at the point of reclassification. That assertion was copied from
>> >> > nvme-tcp, where the socket is created internally by the kernel
>> >> > (sock_create_kern()) and is never visible to user space, so the lock
>> >> > is guaranteed to be free.
>> >> >
>> >> > NBD is different: the socket is looked up from a user-supplied fd in
>> >> > nbd_get_socket(), and user space retains that fd. A concurrent syscall
>> >> > on the same socket (or softirq processing taking bh_lock_sock() on a
>> >> > connected TCP socket) can legitimately hold the lock at the instant
>> >> > NBD reclassifies it. sock_allow_reclassification() then returns false
>> >> > and the WARN_ON_ONCE() fires, which turns into a crash under
>> >> > panic_on_warn. This is reachable by simply racing NBD_CMD_CONNECT
>> >> > against socket activity on the same fd, as reported by syzbot.
>> >> >
>> >> Given the syzbot report, if you are right (I suspect) then Eric delivered
>> >> another half-baked croissant, and feel free to cut it off instead to make
>> >> room for correct fix.
>> >
>> > Nobody (including you) caught this.difference between nbd and other
>> > sock_allow_reclassification() callers.
>> >
>> Nope, actually it raises the question -- does the deadlock still remain
>> after your fix without the lock key you added applied?
>
>LOCKDEP might have a false positive, but it will be much much harder to trigger.
>
>I had about 50 syzbot duplicates (that I did not release) before d532cddb6c60
> ("nbd: Reclassify sockets to avoid lockdep circular dependency").
>
>>
>> > What was the "correct fix" you envisioned exactly?
>> >
>> Frankly I had no evidence against your fix a couple days back, but now I
>> see your lock key approach fails to take off. And the correct fix is to
>> erase the incorrect locking order ffa1e7ada456 tries to catch, more
>> difficult than you thought so far.
>
>Which incorrect locking order are you referring to? This is a LOCKDEP
>false positive.
>
In addition to 50 syzbot reports, your fix has a Fixes tag, no?
>I suggest you send a patch so we can discuss it.
The deadlock existed before ffa1e7ada456, why is a chance left for your fix?
^ permalink raw reply
* Re: [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path
From: Tao Cui @ 2026-06-23 1:16 UTC (permalink / raw)
To: Zizhi Wo, axboe, tj, josef, linux-block
Cc: cui.tao, cgroups, yangerkun, chengzhihao1, houtao1, yukuai
In-Reply-To: <20260622070714.1158886-2-wozizhi@huaweicloud.com>
Hi Zizhi,
Thanks for the patch. I ran into the same issue and posted a fix for it
earlier:
https://lore.kernel.org/all/20260507061229.57466-1-cuitao@kylinos.cn/
The leak fix is identical to yours (blkg_put() -> percpu_ref_kill()),
plus one extra change: moving blkg->online = true into the success
block:
if (likely(!ret)) {
...
+ blkg->online = true;
}
- blkg->online = true;
On the failure path the blkg was never inserted into any list, and its
blkg->pd[i]->online flags were not set either (those are in the same
block). Leaving blkg->online = true marks a blkg as online that was
never created -- inconsistent with pd[]->online and with
blkg_destroy(), which clears blkg->online = false. Not observable
today, since the failed blkg is on no list and unreachable by the
online readers, but the flag should track the actual insertion.
(This was sent to the cgroups list rather than linux-block, hence the
overlap.)
Thanks,
Tao
在 2026/6/22 15:07, Zizhi Wo 写道:
> When radix_tree_insert() fails in blkg_create(), the error path calls
> blkg_put() to release the blkg. This was correct when blkg->refcnt was an
> atomic_t: blkg_put() dropped it to 0 and triggered the release path.
>
> But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use
> percpu_ref") switched refcnt to a percpu_ref. In percpu mode
> percpu_ref_put() never checks for zero, so the release callback is never
> invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so
> blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call
> blkg_destroy()->percpu_ref_kill() either, cause the leak.
>
> Fix it by killing the percpu_ref instead, which switches it to atomic mode
> and drops the initial ref.
>
> Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
> Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
> Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
> ---
> block/blk-cgroup.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index bc63bd220865..6386fe413994 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
>
> if (!ret)
> return blkg;
>
> /* @blkg failed fully initialized, use the usual release path */
> - blkg_put(blkg);
> + percpu_ref_kill(&blkg->refcnt);
> return ERR_PTR(ret);
>
> err_put_css:
> css_put(&blkcg->css);
> err_free_blkg:
^ permalink raw reply
* [PATCH v2] block: serialize elevator changes for the same queue using a writer lock
From: Shin'ichiro Kawasaki @ 2026-06-23 1:32 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Ming Lei, Nilay Shroff, Shin'ichiro Kawasaki
When elevator_change() is called concurrently for the same queue, the
elevator_change_done() function runs concurrently as well. This function
adds or deletes kobjects for the debugfs entry of the queue. Then the
concurrent calls cause memory corruption of the kobjects and result in a
process hang. The core part of the elevator switch is protected by queue
freeze and q->elevator_lock. However, since the commit 559dc11143eb
("block: move elv_register[unregister]_queue out of elevator_lock"), the
elevator_change_done() is not serialized. Hence the memory corruption
and the hang.
The failures are observed when udev-worker writes to a sysfs
queue/scheduler attribute file while the blktests test case block/005
writes to the same attribute file. The failure also can be recreated by
running two processes that write to the same queue/scheduler file
concurrently. The failure is observed since another commit 370ac285f23a
("block: avoid cpu_hotplug_lock depedency on freeze_lock"). This commit
changed the behavior of queue freeze and it unveiled the failure.
Fix the failure by changing elv_iosched_store() to acquire
update_nr_hwq_lock as the writer lock instead of the reader lock. This
serializes the whole elevator switch steps, including the
elevator_change_done() call.
Fixes: 559dc11143eb ("block: move elv_register[unregister]_queue out of elevator_lock")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
I observed that the blktests test case block/005 hung on a specific
server hardware using a specific HDD as a block device. During the test
case run, the kernel reported KASAN null-ptr-deref and slab-use-after-
free errors. The failure happened when a sysfs queue/scheduler attribute
file is written concurrently. I reported the failure and shared a
candidate fix patch as RFC [1]. Based on the comments and discussion on
the RFC patch, I propose this v2 patch that avoids introducing a new
lock. My thanks go to Ming and Nilay for the discussion.
Please refer to [1] for details of the failure. Also, I created a
blktests test case that recreates the hang [2], which I used to test the
fix.
* Changes from RFC v1
- Instead of adding a new mutex to struct request_queue, replace the
reader lock on update_nr_hwq_lock with the writer lock in
elv_iosched_store().
[1] https://lore.kernel.org/linux-block/20260611074200.474676-1-shinichiro.kawasaki@wdc.com/
[2] https://github.com/kawasaki/blktests/commit/8e80b3ccc0bbbe3f209d00eacd138d020de97fc6
block/elevator.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/elevator.c b/block/elevator.c
index 3bcd37c2aa34..b03185a217ff 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
* update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
* kn->active -> update_nr_hwq_lock (via this sysfs write path)
*/
- if (!down_read_trylock(&set->update_nr_hwq_lock)) {
+ if (!down_write_trylock(&set->update_nr_hwq_lock)) {
ret = -EBUSY;
goto out;
}
@@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
} else {
ret = -ENOENT;
}
- up_read(&set->update_nr_hwq_lock);
+ up_write(&set->update_nr_hwq_lock);
out:
if (ctx.type)
--
2.54.0
^ permalink raw reply related
* Re: [PATCH] block/cgroup: Drop stale -EBUSY retry from blkg_conf_prep()
From: Tao Cui @ 2026-06-23 1:33 UTC (permalink / raw)
To: Yang Xiuwei, Tejun Heo, Josef Bacik, Jens Axboe
Cc: cui.tao, cgroups, linux-block
In-Reply-To: <20260622085623.520209-1-yangxiuwei@kylinos.cn>
在 2026/6/22 16:56, Yang Xiuwei 写道:
> Since commit 8f4236d9008b ("block: remove QUEUE_FLAG_BYPASS and
> ->bypass") nothing in the blkcg blkg lookup/creation path
> returns -EBUSY anymore...
Correct. I traced every error path in blkg_conf_prep() (and blkg_create()
underneath it): the only possible values are -EINVAL, -EOPNOTSUPP, -ENOMEM,
-ENODEV and -EEXIST (from radix_tree_insert). The -EBUSY source was indeed
the blk_queue_bypass() check removed by 8f4236d9008b, so the retry branch
has been dead since 2018. Clean removal with no behavioral change.
Reviewed-by: Tao Cui <cuitao@kylinos.cn>
^ permalink raw reply
* Re: [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path
From: Zizhi Wo @ 2026-06-23 1:38 UTC (permalink / raw)
To: Tao Cui, axboe, tj, josef, linux-block
Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai
In-Reply-To: <38704548-786f-4ec7-afd4-228aa8d68ad7@linux.dev>
在 2026/6/23 9:16, Tao Cui 写道:
> Hi Zizhi,
>
> Thanks for the patch. I ran into the same issue and posted a fix for it
> earlier:
>
> https://lore.kernel.org/all/20260507061229.57466-1-cuitao@kylinos.cn/
>
> The leak fix is identical to yours (blkg_put() -> percpu_ref_kill()),
> plus one extra change: moving blkg->online = true into the success
> block:
>
> if (likely(!ret)) {
> ...
> + blkg->online = true;
> }
> - blkg->online = true;
>
> On the failure path the blkg was never inserted into any list, and its
> blkg->pd[i]->online flags were not set either (those are in the same
> block). Leaving blkg->online = true marks a blkg as online that was
> never created -- inconsistent with pd[]->online and with
> blkg_destroy(), which clears blkg->online = false. Not observable
> today, since the failed blkg is on no list and unreachable by the
> online readers, but the flag should track the actual insertion.
>
> (This was sent to the cgroups list rather than linux-block, hence the
> overlap.)
>
> Thanks,
> Tao
I'm not subscribed to the cgroup mailing list, so I didn't see that this
issue had already been fixed. :( And indeed, your patch nicely updates
blkg->online as well. — I hadn't realized that.
Thanks for the heads-up!
Thanks,
Zizhi Wo
>
> 在 2026/6/22 15:07, Zizhi Wo 写道:
>> When radix_tree_insert() fails in blkg_create(), the error path calls
>> blkg_put() to release the blkg. This was correct when blkg->refcnt was an
>> atomic_t: blkg_put() dropped it to 0 and triggered the release path.
>>
>> But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use
>> percpu_ref") switched refcnt to a percpu_ref. In percpu mode
>> percpu_ref_put() never checks for zero, so the release callback is never
>> invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so
>> blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call
>> blkg_destroy()->percpu_ref_kill() either, cause the leak.
>>
>> Fix it by killing the percpu_ref instead, which switches it to atomic mode
>> and drops the initial ref.
>>
>> Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
>> Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
>> Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
>> ---
>> block/blk-cgroup.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
>> index bc63bd220865..6386fe413994 100644
>> --- a/block/blk-cgroup.c
>> +++ b/block/blk-cgroup.c
>> @@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
>>
>> if (!ret)
>> return blkg;
>>
>> /* @blkg failed fully initialized, use the usual release path */
>> - blkg_put(blkg);
>> + percpu_ref_kill(&blkg->refcnt);
>> return ERR_PTR(ret);
>>
>> err_put_css:
>> css_put(&blkcg->css);
>> err_free_blkg:
^ permalink raw reply
* [PATCH blktests v2 1/2] src/miniublk: switch to ioctl-encoded ublk commands
From: Sebastian Chlad @ 2026-06-23 3:27 UTC (permalink / raw)
To: linux-block; +Cc: shinichiro.kawasaki, Sebastian Chlad
Kernels built without CONFIG_BLKDEV_UBLK_LEGACY_OPCODES reject the
legacy raw UBLK_CMD_* and UBLK_IO_* opcodes. Switch miniublk to use
the ioctl-encoded UBLK_U_CMD_* and UBLK_U_IO_* variants defined in
linux/ublk_cmd.h instead.
For IO commands, the ioctl-encoded opcode is used for submission while
_IOC_NR() extracts the raw NR bits for build_user_data(), keeping the
user_data tag encoding intact.
Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
Co-authored-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
changes in v2:
1. Makefile as prepared by Shin'ichiro Kawasaki
2. restored else if to check if (io->flags & UBLKSRV_NEED_FETCH_RQ)
src/Makefile | 9 +++++++++
src/miniublk.c | 28 ++++++++++++++--------------
2 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/src/Makefile b/src/Makefile
index d8833bf..adfe3ef 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -8,6 +8,10 @@ HAVE_C_MACRO = $(shell if echo "$(H)include <$(1)>" | \
$(CC) $(CFLAGS) -E - 2>&1 /dev/null | grep $(2) > /dev/null 2>&1; \
then echo 1;else echo 0; fi)
+HAVE_C_DEF = $(shell if echo -e "$(H)include <$(1)>\n#ifdef $(2)\nHAVE_$(2)\n#endif" | \
+ $(CC) $(CFLAGS) -E - 2>&1 /dev/null | grep HAVE_$(2) > /dev/null 2>&1; \
+ then echo 1;else echo 0; fi)
+
C_TARGETS := \
dio-offsets \
loblksize \
@@ -27,6 +31,7 @@ C_UBLK_TARGETS := miniublk
HAVE_LIBURING := $(call HAVE_C_MACRO,liburing.h,IORING_OP_URING_CMD)
HAVE_UBLK_HEADER := $(call HAVE_C_HEADER,linux/ublk_cmd.h,1)
+HAVE_NEW_UBLK_INTF := $(call HAVE_C_DEF,linux/ublk_cmd.h,UBLK_U_CMD_START_DEV)
CXX_TARGETS := \
discontiguous-io
@@ -37,8 +42,12 @@ SYZKALLER_TARGETS := \
TARGETS := $(C_TARGETS) $(CXX_TARGETS) $(SYZKALLER_TARGETS)
ifeq ($(HAVE_UBLK_HEADER), 1)
+ifeq ($(HAVE_NEW_UBLK_INTF), 1)
C_URING_TARGETS += $(C_UBLK_TARGETS)
else
+$(info Skip $(C_UBLK_TARGETS) build due to missing new ublk interface(v6.4+))
+endif
+else
$(info Skip $(C_UBLK_TARGETS) build due to missing kernel header(v6.0+))
endif
diff --git a/src/miniublk.c b/src/miniublk.c
index f98f850..628207a 100644
--- a/src/miniublk.c
+++ b/src/miniublk.c
@@ -294,7 +294,7 @@ static int __ublk_ctrl_cmd(struct ublk_dev *dev,
int ublk_ctrl_stop_dev(struct ublk_dev *dev)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_STOP_DEV,
+ .cmd_op = UBLK_U_CMD_STOP_DEV,
};
return __ublk_ctrl_cmd(dev, &data);
@@ -304,7 +304,7 @@ int ublk_ctrl_start_dev(struct ublk_dev *dev,
int daemon_pid)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_START_DEV,
+ .cmd_op = UBLK_U_CMD_START_DEV,
.flags = CTRL_CMD_HAS_DATA,
};
@@ -316,7 +316,7 @@ int ublk_ctrl_start_dev(struct ublk_dev *dev,
int ublk_ctrl_add_dev(struct ublk_dev *dev)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_ADD_DEV,
+ .cmd_op = UBLK_U_CMD_ADD_DEV,
.flags = CTRL_CMD_HAS_BUF,
.addr = (__u64)&dev->dev_info,
.len = sizeof(struct ublksrv_ctrl_dev_info),
@@ -328,7 +328,7 @@ int ublk_ctrl_add_dev(struct ublk_dev *dev)
int ublk_ctrl_del_dev(struct ublk_dev *dev)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_DEL_DEV,
+ .cmd_op = UBLK_U_CMD_DEL_DEV,
.flags = 0,
};
@@ -338,7 +338,7 @@ int ublk_ctrl_del_dev(struct ublk_dev *dev)
int ublk_ctrl_get_info(struct ublk_dev *dev)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_GET_DEV_INFO,
+ .cmd_op = UBLK_U_CMD_GET_DEV_INFO,
.flags = CTRL_CMD_HAS_BUF,
.addr = (__u64)&dev->dev_info,
.len = sizeof(struct ublksrv_ctrl_dev_info),
@@ -351,7 +351,7 @@ int ublk_ctrl_set_params(struct ublk_dev *dev,
struct ublk_params *params)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_SET_PARAMS,
+ .cmd_op = UBLK_U_CMD_SET_PARAMS,
.flags = CTRL_CMD_HAS_BUF,
.addr = (__u64)params,
.len = sizeof(*params),
@@ -364,7 +364,7 @@ static int ublk_ctrl_get_params(struct ublk_dev *dev,
struct ublk_params *params)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_GET_PARAMS,
+ .cmd_op = UBLK_U_CMD_GET_PARAMS,
.flags = CTRL_CMD_HAS_BUF,
.addr = (__u64)params,
.len = sizeof(*params),
@@ -378,7 +378,7 @@ static int ublk_ctrl_get_params(struct ublk_dev *dev,
static int ublk_ctrl_start_user_recover(struct ublk_dev *dev)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_START_USER_RECOVERY,
+ .cmd_op = UBLK_U_CMD_START_USER_RECOVERY,
.flags = 0,
};
@@ -389,7 +389,7 @@ static int ublk_ctrl_end_user_recover(struct ublk_dev *dev,
int daemon_pid)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_CMD_END_USER_RECOVERY,
+ .cmd_op = UBLK_U_CMD_END_USER_RECOVERY,
.flags = CTRL_CMD_HAS_DATA,
};
@@ -624,9 +624,9 @@ static int ublk_queue_io_cmd(struct ublk_queue *q,
return 0;
if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
- cmd_op = UBLK_IO_COMMIT_AND_FETCH_REQ;
+ cmd_op = UBLK_U_IO_COMMIT_AND_FETCH_REQ;
else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
- cmd_op = UBLK_IO_FETCH_REQ;
+ cmd_op = UBLK_U_IO_FETCH_REQ;
sqe = io_uring_get_sqe(&q->ring);
if (!sqe) {
@@ -637,7 +637,7 @@ static int ublk_queue_io_cmd(struct ublk_queue *q,
cmd = (struct ublksrv_io_cmd *)ublk_get_sqe_cmd(sqe);
- if (cmd_op == UBLK_IO_COMMIT_AND_FETCH_REQ)
+ if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
cmd->result = io->result;
/* These fields should be written once, never change */
@@ -650,7 +650,7 @@ static int ublk_queue_io_cmd(struct ublk_queue *q,
cmd->addr = (__u64)io->buf_addr;
cmd->q_id = q->q_id;
- user_data = build_user_data(tag, cmd_op, 0, 0);
+ user_data = build_user_data(tag, _IOC_NR(cmd_op), 0, 0);
io_uring_sqe_set_data64(sqe, user_data);
io->flags = 0;
@@ -658,7 +658,7 @@ static int ublk_queue_io_cmd(struct ublk_queue *q,
q->cmd_inflight += 1;
ublk_dbg(UBLK_DBG_IO_CMD, "%s: (qid %d tag %u cmd_op %u) iof %x stopping %d\n",
- __func__, q->q_id, tag, cmd_op,
+ __func__, q->q_id, tag, _IOC_NR(cmd_op),
io->flags, !!(q->state & UBLKSRV_QUEUE_STOPPING));
return 1;
}
--
2.51.0
^ permalink raw reply related
* [PATCH blktests v2 2/2] src/miniublk: fall back to legacy opcodes on older kernels
From: Sebastian Chlad @ 2026-06-23 3:27 UTC (permalink / raw)
To: linux-block; +Cc: shinichiro.kawasaki, Sebastian Chlad
In-Reply-To: <20260623032707.14439-1-sebastian.chlad@suse.com>
Try ioctl-encoded ADD_DEV and GET_DEV_INFO first; if either fails,
retry with the legacy raw opcode. After a successful bootstrap
command, derive use_ioctl from UBLK_F_CMD_IOCTL_ENCODE in dev_info.flags
so all subsequent control and IO commands use the mode reported by the
kernel.
Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
---
src/miniublk.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 42 insertions(+), 5 deletions(-)
diff --git a/src/miniublk.c b/src/miniublk.c
index 628207a..b0c308b 100644
--- a/src/miniublk.c
+++ b/src/miniublk.c
@@ -112,6 +112,7 @@ struct ublk_dev {
int fds[2]; /* fds[0] points to /dev/ublkcN */
int nr_fds;
int ctrl_fd;
+ bool use_ioctl;
struct io_uring ring;
};
@@ -235,7 +236,7 @@ static inline int ublk_setup_ring(struct io_uring *r, int depth,
static inline void ublk_ctrl_init_cmd(struct ublk_dev *dev,
struct io_uring_sqe *sqe,
- struct ublk_ctrl_cmd_data *data)
+ struct ublk_ctrl_cmd_data *data, __u32 cmd_op)
{
struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
struct ublksrv_ctrl_cmd *cmd = (struct ublksrv_ctrl_cmd *)ublk_get_sqe_cmd(sqe);
@@ -255,25 +256,34 @@ static inline void ublk_ctrl_init_cmd(struct ublk_dev *dev,
cmd->dev_id = info->dev_id;
cmd->queue_id = -1;
- ublk_set_sqe_cmd_op(sqe, data->cmd_op);
+ ublk_set_sqe_cmd_op(sqe, cmd_op);
io_uring_sqe_set_data(sqe, cmd);
}
+static void ublk_update_ioctl_encoding(struct ublk_dev *dev)
+{
+ dev->use_ioctl = !!(dev->dev_info.flags & UBLK_F_CMD_IOCTL_ENCODE);
+}
+
static int __ublk_ctrl_cmd(struct ublk_dev *dev,
struct ublk_ctrl_cmd_data *data)
{
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
+ __u32 cmd_op = data->cmd_op;
int ret = -EINVAL;
+ if (!dev->use_ioctl)
+ cmd_op = _IOC_NR(cmd_op);
+
sqe = io_uring_get_sqe(&dev->ring);
if (!sqe) {
ublk_err("%s: can't get sqe ret %d\n", __func__, ret);
return ret;
}
- ublk_ctrl_init_cmd(dev, sqe, data);
+ ublk_ctrl_init_cmd(dev, sqe, data, cmd_op);
ret = io_uring_submit(&dev->ring);
if (ret < 0) {
@@ -321,8 +331,19 @@ int ublk_ctrl_add_dev(struct ublk_dev *dev)
.addr = (__u64)&dev->dev_info,
.len = sizeof(struct ublksrv_ctrl_dev_info),
};
+ int ret;
- return __ublk_ctrl_cmd(dev, &data);
+ ret = __ublk_ctrl_cmd(dev, &data);
+ if (ret < 0) {
+ /* retry with legacy opcode on older kernels */
+ dev->use_ioctl = false;
+ ret = __ublk_ctrl_cmd(dev, &data);
+ }
+
+ if (ret >= 0)
+ ublk_update_ioctl_encoding(dev);
+
+ return ret;
}
int ublk_ctrl_del_dev(struct ublk_dev *dev)
@@ -343,8 +364,19 @@ int ublk_ctrl_get_info(struct ublk_dev *dev)
.addr = (__u64)&dev->dev_info,
.len = sizeof(struct ublksrv_ctrl_dev_info),
};
+ int ret;
- return __ublk_ctrl_cmd(dev, &data);
+ ret = __ublk_ctrl_cmd(dev, &data);
+ if (ret < 0 && dev->use_ioctl) {
+ /* retry with legacy opcode on older kernels */
+ dev->use_ioctl = false;
+ ret = __ublk_ctrl_cmd(dev, &data);
+ }
+
+ if (ret >= 0)
+ ublk_update_ioctl_encoding(dev);
+
+ return ret;
}
int ublk_ctrl_set_params(struct ublk_dev *dev,
@@ -453,6 +485,8 @@ static struct ublk_dev *ublk_ctrl_init()
struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
int ret;
+ dev->use_ioctl = true; /* use ioctl opcodes by default */
+
dev->ctrl_fd = open(CTRL_DEV, O_RDWR);
if (dev->ctrl_fd < 0) {
ublk_err("control dev %s can't be opened: %m %d\n", CTRL_DEV, errno);
@@ -628,6 +662,9 @@ static int ublk_queue_io_cmd(struct ublk_queue *q,
else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
cmd_op = UBLK_U_IO_FETCH_REQ;
+ if (!q->dev->use_ioctl)
+ cmd_op = _IOC_NR(cmd_op);
+
sqe = io_uring_get_sqe(&q->ring);
if (!sqe) {
ublk_err("%s: run out of sqe %d, tag %d\n",
--
2.51.0
^ permalink raw reply related
* Re: [PATCH v2] block: serialize elevator changes for the same queue using a writer lock
From: Nilay Shroff @ 2026-06-23 5:29 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Ming Lei
In-Reply-To: <20260623013238.642052-1-shinichiro.kawasaki@wdc.com>
On 6/23/26 7:02 AM, Shin'ichiro Kawasaki wrote:
> When elevator_change() is called concurrently for the same queue, the
> elevator_change_done() function runs concurrently as well. This function
> adds or deletes kobjects for the debugfs entry of the queue. Then the
> concurrent calls cause memory corruption of the kobjects and result in a
> process hang. The core part of the elevator switch is protected by queue
> freeze and q->elevator_lock. However, since the commit 559dc11143eb
> ("block: move elv_register[unregister]_queue out of elevator_lock"), the
> elevator_change_done() is not serialized. Hence the memory corruption
> and the hang.
>
> The failures are observed when udev-worker writes to a sysfs
> queue/scheduler attribute file while the blktests test case block/005
> writes to the same attribute file. The failure also can be recreated by
> running two processes that write to the same queue/scheduler file
> concurrently. The failure is observed since another commit 370ac285f23a
> ("block: avoid cpu_hotplug_lock depedency on freeze_lock"). This commit
> changed the behavior of queue freeze and it unveiled the failure.
>
> Fix the failure by changing elv_iosched_store() to acquire
> update_nr_hwq_lock as the writer lock instead of the reader lock. This
> serializes the whole elevator switch steps, including the
> elevator_change_done() call.
>
> Fixes: 559dc11143eb ("block: move elv_register[unregister]_queue out of elevator_lock")
> Signed-off-by: Shin'ichiro Kawasaki<shinichiro.kawasaki@wdc.com>
Looks good to me.
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply
* [BUG] nbd: backend string memory leaked when device_create_file() fails in nbd_genl_connect()
From: Peiyang He @ 2026-06-23 5:51 UTC (permalink / raw)
To: josef, linux-block; +Cc: axboe, nbd, linux-kernel, syzkaller
Hello Linux kernel developers and maintainers,
I found a memory leak in drivers/block/nbd.c when fuzzing with Syzkaller.
Kernel version: commit 8cd9520d35a6c38db6567e97dd93b1f11f185dc6 (tag v7.1).
And the leak is also possible in the current mainline.
Relevant kernel config:
CONFIG_BLK_DEV_NBD=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAULT_INJECTION_DEBUG_FS=y
=============================
The original Syzkaller report
=============================
BUG: memory leak
unreferenced object 0xffff888011d17420 (size 16):
comm "syz.3.80", pid 9959, jiffies 4294977483
hex dump (first 16 bytes):
2f 64 65 76 2f 6e 62 64 23 00 00 00 00 00 00 00 /dev/nbd#.......
backtrace (crc 889be63d):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4575 [inline]
slab_alloc_node mm/slub.c:4899 [inline]
__do_kmalloc_node mm/slub.c:5295 [inline]
__kmalloc_noprof+0x552/0x850 mm/slub.c:5308
kmalloc_noprof include/linux/slab.h:954 [inline]
nla_strdup+0xc6/0x150 lib/nlattr.c:816
nbd_genl_connect+0x1231/0x1c10 drivers/block/nbd.c:2224
genl_family_rcv_msg_doit+0x209/0x2f0 net/netlink/genetlink.c:1114
genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1209
netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2555
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x584/0x850 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x8ba/0xd60 net/netlink/af_netlink.c:1899
sock_sendmsg_nosec net/socket.c:787 [inline]
__sock_sendmsg net/socket.c:802 [inline]
____sys_sendmsg+0x9eb/0xb80 net/socket.c:2699
___sys_sendmsg+0x134/0x1d0 net/socket.c:2753
__sys_sendmsg+0x16d/0x220 net/socket.c:2785
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x116/0x800 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
<<<<<<<<<<<<<<< tail report >>>>>>>>>>>>>>>
==========
Root cause
==========
nbd_genl_connect() allocates the backend string unconditionally BEFORE
creating the sysfs file:
/* drivers/block/nbd.c (v7.1, around line 2223) */
if (info->attrs[NBD_ATTR_BACKEND_IDENTIFIER]) {
nbd->backend = nla_strdup(...); /* (1) heap alloc */
if (!nbd->backend) {
ret = -ENOMEM;
goto out; /* (2) backend is NULL here, no leak */
}
}
ret = device_create_file(disk_to_dev(nbd->disk), &backend_attr);
if (ret) {
dev_err(...);
goto out; /* (3) LEAK: flag not yet set */
}
set_bit(NBD_RT_HAS_BACKEND_FILE, &config->runtime_flags); /* (4) skipped */
ret = nbd_start_device(nbd);
out:
...
mutex_unlock(&nbd->config_lock);
nbd_config_put(nbd);
...
The cleanup path in nbd_config_put() frees nbd->backend ONLY when
NBD_RT_HAS_BACKEND_FILE is set:
/* drivers/block/nbd.c (v7.1, around line 1445) */
if (test_and_clear_bit(NBD_RT_HAS_BACKEND_FILE,
&config->runtime_flags)) {
device_remove_file(disk_to_dev(nbd->disk), &backend_attr);
kfree(nbd->backend); /* never reached here if flag is not set */
nbd->backend = NULL;
}
When device_create_file() at step (3) fails (e.g. due to OOM or fault injection),
execution jumps to out before step (4) sets the flag.
nbd_config_put() is then called with config_refs dropping to zero,
but since NBD_RT_HAS_BACKEND_FILE is not set, it skips the kfree().
The string allocated at step (1) is therefore leaked.
The size of a single leak equals the length of the NBD_ATTR_BACKEND_IDENTIFIER
netlink string rounded up to the next slab boundary.
The nla_policy entry for this attribute has no .len constraint, so in theory a single connect call
could leak up to ~65507 bytes (NLA_STRING maximum).
During Syzkaller fuzzing, the fuzzer produced a 16-byte string.
============
Proposed Fix
============
Decouple device_remove_file() (which correctly requires the flag to avoid
removing a file that was never created) from kfree(nbd->backend) (which only
needs the pointer to be non-NULL):
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1445,9 +1445,9 @@ static void nbd_config_put(struct nbd_device *nbd)
if (test_and_clear_bit(NBD_RT_HAS_BACKEND_FILE,
&config->runtime_flags)) {
device_remove_file(disk_to_dev(nbd->disk), &backend_attr);
- kfree(nbd->backend);
- nbd->backend = NULL;
}
+ kfree(nbd->backend);
+ nbd->backend = NULL;
The flag still guards device_remove_file() so we never try to remove a sysfs
file that was never created. nbd->backend is always freed if non-NULL,
regardless of whether the file creation succeeded.
Note: the line number in the above diff is valid in the v7.1 kernel.
I will adjust it according to the mainline kernel if this patch is considered resonable.
Best,
Peiyang
^ permalink raw reply
* [PATCH v4 0/9] Fix missing fops.owner in Rust DRM/misc abstractions
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
During tyr debugfs development, a kernel NULL pointer dereference was
encountered after `rmmod tyr` while gnome-shell still held /dev/card1 open:
```
[158827.868132] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[158827.868918] Mem abort info:
[158827.869177] ESR = 0x0000000086000004
[158827.869519] EC = 0x21: IABT (current EL), IL = 32 bits
[158827.870000] SET = 0, FnV = 0
[158827.870281] EA = 0, S1PTW = 0
[158827.870571] FSC = 0x04: level 0 translation fault
[158827.871043] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000108dec000
[158827.871623] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[158827.872242] Internal error: Oops: 0000000086000004 [#1] SMP
[158827.872246] Modules linked in: tyr sunrpc snd_soc_simple_card rk805_pwrkey snd_soc_simple_card_utils rtw88_8822bu display_connector rtw88_usb rtw88_8822b snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec
rtw88_core]
[158827.872337] CPU: 4 UID: 1000 PID: 11276 Comm: gnome-s:disk$0 Tainted: G N 7.1.0-rc1+ #331 PREEMPT
[158827.880534] Tainted: [N]=TEST
[158827.880535] Hardware name: FriendlyElec NanoPi R6C/NanoPi R6C, BIOS v1.1 04/09/2025
[158827.880538] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[158827.880542] pc : 0x0
[158827.880547] lr : _RNvNtCs257m05FHVbX_3tyr2vm8pt_unmap+0x8c/0x12c [tyr]
[158827.880578] sp : ffff800083c236b0
[158827.880579] x29: ffff800083c236d0 x28: ffff00013f8a0000 x27: 0000000000000000
[158827.880585] x26: 000000000000007c x25: ffff000108e6ed80 x24: 0000000000401000
[158827.880590] x23: 0000000000000000 x22: 0000000040000000 x21: 0000000000001000
[158827.880595] x20: ffff00010f778138 x19: 0000000000400000 x18: 00000000ffffffff
[158827.880600] x17: 000000040044ffff x16: 045000f2b5503510 x15: 0720072007200720
[158827.880606] x14: 0720072007200720 x13: 0000000000401000 x12: 0000000000400000
[158827.880611] x11: ffff800083c239d0 x10: ffff000141e4fd88 x9 : 0000000000000000
[158827.880615] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000400000
[158827.880620] x5 : ffff00013f8a0000 x4 : 0000000000000000 x3 : 0000000000000001
[158827.880625] x2 : 0000000000001000 x1 : 0000000000400000 x0 : ffff00010f778138
[158827.880630] Call trace:
[158827.880632] 0x0 (P)
[158827.880635] _RNvXs6_NtCs257m05FHVbX_3tyr2vmNtB5_9GpuVmDataNtNtNtCsgmSOfgXi5CZ_6kernel3drm5gpuvm11DriverGpuVm13sm_step_unmap+0x3c/0x120 [tyr]
[158827.891166] _RNvMs4_NtNtNtCsgmSOfgXi5CZ_6kernel3drm5gpuvm6sm_opsINtB7_5GpuVmNtNtCs257m05FHVbX_3tyr2vm9GpuVmDataE13sm_step_unmapB13_+0x18/0x34 [tyr]
[158827.891187] op_unmap_cb+0x78/0xb0
[158827.891196] __drm_gpuvm_sm_unmap+0x18c/0x1b4
[158827.891204] drm_gpuvm_sm_unmap+0x38/0x4c
[158827.891209] _RNvMs5_NtCs257m05FHVbX_3tyr2vmNtB5_2Vm7exec_op+0x1cc/0x254 [tyr]
[158827.894085] _RNvMs5_NtCs257m05FHVbX_3tyr2vmNtB5_2Vm11unmap_range+0x124/0x188 [tyr]
[158827.894105] _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeNtNtCs257m05FHVbX_3tyr3gem8KernelBoEBK_+0x44/0xd8 [tyr]
[158827.894125] _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeINtNtNtCsgmSOfgXi5CZ_6kernel5alloc4kvec3VecNtNtCs257m05FHVbX_3tyr2fw7SectionNtNtBL_9allocator7KmallocEEB1r_+0x3c/0x100 [tyr]
[158827.894147] _RINvNtCs5hGKnPbRUFW_4core3ptr13drop_in_placeINtNtNtCsgmSOfgXi5CZ_6kernel4sync3arc3ArcNtNtCs257m05FHVbX_3tyr2fw8FirmwareEEB1p_+0x94/0x190 [tyr]
[158827.894167] _RNvMs4_NtNtCsgmSOfgXi5CZ_6kernel3drm6deviceINtB5_6DeviceNtNtCs257m05FHVbX_3tyr6driver12TyrDrmDriverE7releaseBW_+0x30/0x98 [tyr]
[158827.899550] drm_dev_put.part.0+0x88/0xc0
[158827.899557] drm_minor_release+0x18/0x28
[158827.899562] drm_release+0x144/0x170
[158827.899567] __fput+0xe4/0x30c
[158827.899573] ____fput+0x14/0x20
[158827.899579] task_work_run+0x7c/0xe8
[158827.899586] do_exit+0x2a8/0xac4
[158827.899590] do_group_exit+0x34/0x90
[158827.899594] get_signal+0xaac/0xabc
[158827.899599] arch_do_signal_or_restart+0x90/0x3e8
[158827.899606] exit_to_user_mode_loop+0x140/0x1d0
[158827.899613] el0_svc+0x2f4/0x2f8
[158827.899620] el0t_64_sync_handler+0xa0/0xe4
[158827.899627] el0t_64_sync+0x198/0x19c
[158827.899632] ---[ end trace 0000000000000000 ]---
```
The root cause: `fops.owner` was `NULL` in Rust DRM drivers, so the kernel
never blocked module unloading while file descriptors were open. This leads to
use-after-free when drm_release (or other fops) is called on freed module code.
The series moves `THIS_MODULE` into the `ModuleMetadata` as a const, threads it
through `#[vtable]` to set `fops.owner` in DRM/miscdevice, and updates configfs
and rnull to use `this_module::<LocalModule>()`.
Assisted-by: opencode:glm-5.2
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
Changes in v4:
- Move module-related types into a new `rust/kernel/module.rs`.
- Migrate binder from the `module!`-generated `THIS_MODULE` static to
`this_module::<LocalModule>()`.
- Reorganise the series so that every commit builds independently, and
drop the legacy `THIS_MODULE` static once all users are migrated.
- Link to v3: https://lore.kernel.org/r/20260622-fix-fops-owner-v3-0-49d45cb37032@linux.dev
Changes in v3:
- Renamed vtable associated type `ThisModule` to `OwnerModule`
- Added `this_module()` helper for ergonomic `THIS_MODULE` access
- Refined vtable macro implementation: one-liner detection and single `defined_items` set
- Reordered commits to place doctest fallback before vtable auto-insert
- Link to v2: https://lore.kernel.org/r/20260521-fix-fops-owner-v2-0-fd99079c5a04@linux.dev
Changes in v2:
- Merged old `static THIS_MODULE` and v1's `MODULE_PTR` into a single
`ModuleMetadata::THIS_MODULE` const
- `#[vtable]` macro now auto-inserts `type ThisModule`, removing all per-driver
manual patches from v1
- Added configfs & rnull usage site updates and doctest `LocalModule` fallback
- Link to v1: https://lore.kernel.org/r/20260519-fix-fops-owner-v1-0-2ded9830da14@linux.dev
---
Alvin Sun (9):
rust: module: move module types into `module.rs`
rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
rust: doctest: add LocalModule fallback for #[vtable] ThisModule
rust: macros: auto-insert OwnerModule in #[vtable]
rust: drm: set fops.owner from driver module pointer
rust: miscdevice: set fops.owner from driver module pointer
rust: configfs: use `LocalModule` for `THIS_MODULE`
rust: binder: use `LocalModule` for `THIS_MODULE`
rust: macros: remove `THIS_MODULE` static from `module!`
drivers/android/binder/rust_binder_main.rs | 3 +-
drivers/block/rnull/configfs.rs | 6 +--
rust/kernel/auxiliary.rs | 2 +-
rust/kernel/configfs.rs | 8 +--
rust/kernel/drm/device.rs | 3 +-
rust/kernel/drm/gem/mod.rs | 4 +-
rust/kernel/i2c.rs | 2 +-
rust/kernel/lib.rs | 75 +++-------------------------
rust/kernel/miscdevice.rs | 4 +-
rust/kernel/module.rs | 79 ++++++++++++++++++++++++++++++
rust/kernel/net/phy.rs | 6 ++-
rust/kernel/pci.rs | 2 +-
rust/kernel/platform.rs | 2 +-
rust/kernel/usb.rs | 2 +-
rust/macros/lib.rs | 6 +++
rust/macros/module.rs | 34 ++++++-------
rust/macros/vtable.rs | 41 ++++++++++++++--
scripts/rustdoc_test_gen.rs | 16 ++++++
18 files changed, 187 insertions(+), 108 deletions(-)
---
base-commit: b7e5ac83cb16f7ffd11dc23736f84276602100ed
change-id: 20260519-fix-fops-owner-e3a77bb27c6c
prerequisite-change-id: 20260519-miscdev-use-format-9ab7e83b1c11:v3
prerequisite-patch-id: 405b334ff0d48ad350014f05a2321bdbaa025400
prerequisite-patch-id: 604b631c81d5423f4ebb2e12ba2d22e9ce371bfc
prerequisite-patch-id: cb550d94cefe01920e0d3ced2b2bcbecd76f3907
prerequisite-patch-id: 3bc830839742591460cb86d9472c04f4686dc600
prerequisite-patch-id: 571058244bc8c7088638d2e3225713011246c7e9
prerequisite-patch-id: 347c5a3c6dbef9832bfce8419fc23e6e08ba477f
prerequisite-patch-id: 3e202d988b56b88446f7535e90d3f00cf5f15701
Best regards,
--
Alvin Sun <alvin.sun@linux.dev>
^ permalink raw reply
* [PATCH v4 2/9] rust: module: add `THIS_MODULE` const to `ModuleMetadata` trait
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Since `const_refs_to_static` has been stable as of the MSRV bump, a
`ThisModule` pointer can now be used in const contexts.
Add a `THIS_MODULE` const to the `ModuleMetadata` trait so that modules
can provide their `ThisModule` pointer in const contexts such as static
`file_operations`.
Add a `this_module()` helper to retrieve the `THIS_MODULE` pointer of a
given module type, and update `__init` to use it instead of the
`THIS_MODULE` static generated by the `module!` macro.
The `static THIS_MODULE` generated by the `module!` macro is retained
for backwards compatibility with existing users and removed in a later
patch once all references have been migrated.
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/module.rs | 8 ++++++++
rust/macros/module.rs | 18 +++++++++++++++++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/module.rs b/rust/kernel/module.rs
index be242a82e86d2..5aca42f7a33fc 100644
--- a/rust/kernel/module.rs
+++ b/rust/kernel/module.rs
@@ -42,6 +42,14 @@ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, crate::erro
pub trait ModuleMetadata {
/// The name of the module as specified in the `module!` macro.
const NAME: &'static crate::str::CStr;
+
+ /// The module's `THIS_MODULE` pointer.
+ const THIS_MODULE: ThisModule;
+}
+
+/// Returns a reference to the `THIS_MODULE` of the given module type.
+pub const fn this_module<M: ModuleMetadata>() -> &'static ThisModule {
+ &M::THIS_MODULE
}
/// Equivalent to `THIS_MODULE` in the C API.
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index 06c18e2075083..aa9a618d5d19e 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -519,6 +519,22 @@ pub(crate) fn module(info: ModuleInfo) -> Result<TokenStream> {
impl ::kernel::ModuleMetadata for #type_ {
const NAME: &'static ::kernel::str::CStr = #name_cstr;
+
+ #[cfg(MODULE)]
+ const THIS_MODULE: ::kernel::ThisModule = {
+ extern "C" {
+ static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
+ }
+
+ // SAFETY: `__this_module` is constructed by the kernel at load time
+ // and lives until the module is unloaded.
+ unsafe { ::kernel::ThisModule::from_ptr(__this_module.get()) }
+ };
+
+ #[cfg(not(MODULE))]
+ const THIS_MODULE: ::kernel::ThisModule = unsafe {
+ ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
+ };
}
// Double nested modules, since then nobody can access the public items inside.
@@ -616,7 +632,7 @@ pub extern "C" fn #ident_exit() {
/// This function must only be called once.
unsafe fn __init() -> ::kernel::ffi::c_int {
let initer = <super::super::LocalModule as ::kernel::InPlaceModule>::init(
- &super::super::THIS_MODULE
+ ::kernel::module::this_module::<super::super::LocalModule>()
);
// SAFETY: No data race, since `__MOD` can only be accessed by this module
// and there only `__init` and `__exit` access it. These functions are only
--
2.43.0
^ permalink raw reply related
* [PATCH v4 1/9] rust: module: move module types into `module.rs`
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Move `Module`, `InPlaceModule`, `ModuleMetadata` and `ThisModule` from
`lib.rs` into a new `rust/kernel/module.rs`. Re-export them from `lib.rs`
to avoid tree-wide changes.
Switch six bus driver registrations from `module.0` to the public
`ThisModule::as_ptr()` accessor, since the field is no longer visible
outside the new `module` submodule.
No functional change.
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/auxiliary.rs | 2 +-
rust/kernel/i2c.rs | 2 +-
rust/kernel/lib.rs | 75 +++++-------------------------------------------
rust/kernel/module.rs | 71 +++++++++++++++++++++++++++++++++++++++++++++
rust/kernel/net/phy.rs | 6 +++-
rust/kernel/pci.rs | 2 +-
rust/kernel/platform.rs | 2 +-
rust/kernel/usb.rs | 2 +-
8 files changed, 88 insertions(+), 74 deletions(-)
diff --git a/rust/kernel/auxiliary.rs b/rust/kernel/auxiliary.rs
index 93c0db1f66555..4a02f83240be3 100644
--- a/rust/kernel/auxiliary.rs
+++ b/rust/kernel/auxiliary.rs
@@ -63,7 +63,7 @@ unsafe fn register(
// SAFETY: `adrv` is guaranteed to be a valid `DriverType`.
to_result(unsafe {
- bindings::__auxiliary_driver_register(adrv.get(), module.0, name.as_char_ptr())
+ bindings::__auxiliary_driver_register(adrv.get(), module.as_ptr(), name.as_char_ptr())
})
}
diff --git a/rust/kernel/i2c.rs b/rust/kernel/i2c.rs
index 7b908f0c5a58d..24eff08f47123 100644
--- a/rust/kernel/i2c.rs
+++ b/rust/kernel/i2c.rs
@@ -142,7 +142,7 @@ unsafe fn register(
}
// SAFETY: `idrv` is guaranteed to be a valid `DriverType`.
- to_result(unsafe { bindings::i2c_register_driver(module.0, idrv.get()) })
+ to_result(unsafe { bindings::i2c_register_driver(module.as_ptr(), idrv.get()) })
}
unsafe fn unregister(idrv: &Opaque<Self::DriverType>) {
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index b72b2fbe046d6..040ae85056509 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -93,6 +93,7 @@
pub mod maple_tree;
pub mod miscdevice;
pub mod mm;
+pub mod module;
pub mod module_param;
#[cfg(CONFIG_NET)]
pub mod net;
@@ -139,79 +140,17 @@
#[doc(hidden)]
pub use bindings;
pub use macros;
+pub use module::{
+ InPlaceModule,
+ Module,
+ ModuleMetadata,
+ ThisModule, //
+};
pub use uapi;
/// Prefix to appear before log messages printed from within the `kernel` crate.
const __LOG_PREFIX: &[u8] = b"rust_kernel\0";
-/// The top level entrypoint to implementing a kernel module.
-///
-/// For any teardown or cleanup operations, your type may implement [`Drop`].
-pub trait Module: Sized + Sync + Send {
- /// Called at module initialization time.
- ///
- /// Use this method to perform whatever setup or registration your module
- /// should do.
- ///
- /// Equivalent to the `module_init` macro in the C API.
- fn init(module: &'static ThisModule) -> error::Result<Self>;
-}
-
-/// A module that is pinned and initialised in-place.
-pub trait InPlaceModule: Sync + Send {
- /// Creates an initialiser for the module.
- ///
- /// It is called when the module is loaded.
- fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, error::Error>;
-}
-
-impl<T: Module> InPlaceModule for T {
- fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, error::Error> {
- let initer = move |slot: *mut Self| {
- let m = <Self as Module>::init(module)?;
-
- // SAFETY: `slot` is valid for write per the contract with `pin_init_from_closure`.
- unsafe { slot.write(m) };
- Ok(())
- };
-
- // SAFETY: On success, `initer` always fully initialises an instance of `Self`.
- unsafe { pin_init::pin_init_from_closure(initer) }
- }
-}
-
-/// Metadata attached to a [`Module`] or [`InPlaceModule`].
-pub trait ModuleMetadata {
- /// The name of the module as specified in the `module!` macro.
- const NAME: &'static crate::str::CStr;
-}
-
-/// Equivalent to `THIS_MODULE` in the C API.
-///
-/// C header: [`include/linux/init.h`](srctree/include/linux/init.h)
-pub struct ThisModule(*mut bindings::module);
-
-// SAFETY: `THIS_MODULE` may be used from all threads within a module.
-unsafe impl Sync for ThisModule {}
-
-impl ThisModule {
- /// Creates a [`ThisModule`] given the `THIS_MODULE` pointer.
- ///
- /// # Safety
- ///
- /// The pointer must be equal to the right `THIS_MODULE`.
- pub const unsafe fn from_ptr(ptr: *mut bindings::module) -> ThisModule {
- ThisModule(ptr)
- }
-
- /// Access the raw pointer for this module.
- ///
- /// It is up to the user to use it correctly.
- pub const fn as_ptr(&self) -> *mut bindings::module {
- self.0
- }
-}
-
#[cfg(not(testlib))]
#[panic_handler]
fn panic(info: &core::panic::PanicInfo<'_>) -> ! {
diff --git a/rust/kernel/module.rs b/rust/kernel/module.rs
new file mode 100644
index 0000000000000..be242a82e86d2
--- /dev/null
+++ b/rust/kernel/module.rs
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Module-related types and helpers.
+
+/// The entrypoint to implementing a kernel module.
+///
+/// For any teardown or cleanup operations, your type may implement [`Drop`].
+pub trait Module: Sized + Sync + Send {
+ /// Called at module initialization time.
+ ///
+ /// Use this method to perform whatever setup or registration your module
+ /// should do.
+ ///
+ /// Equivalent to the `module_init` macro in the C API.
+ fn init(module: &'static ThisModule) -> crate::error::Result<Self>;
+}
+
+/// A module that is pinned and initialised in-place.
+pub trait InPlaceModule: Sync + Send {
+ /// Creates an initialiser for the module.
+ ///
+ /// It is called when the module is loaded.
+ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, crate::error::Error>;
+}
+
+impl<T: Module> InPlaceModule for T {
+ fn init(module: &'static ThisModule) -> impl pin_init::PinInit<Self, crate::error::Error> {
+ let initer = move |slot: *mut Self| {
+ let m = <Self as Module>::init(module)?;
+
+ // SAFETY: `slot` is valid for write per the contract with `pin_init_from_closure`.
+ unsafe { slot.write(m) };
+ Ok(())
+ };
+
+ // SAFETY: On success, `initer` always fully initialises an instance of `Self`.
+ unsafe { pin_init::pin_init_from_closure(initer) }
+ }
+}
+
+/// Metadata attached to a [`Module`] or [`InPlaceModule`].
+pub trait ModuleMetadata {
+ /// The name of the module as specified in the `module!` macro.
+ const NAME: &'static crate::str::CStr;
+}
+
+/// Equivalent to `THIS_MODULE` in the C API.
+///
+/// C header: [`include/linux/init.h`](srctree/include/linux/init.h)
+pub struct ThisModule(*mut crate::bindings::module);
+
+// SAFETY: `THIS_MODULE` may be used from all threads within a module.
+unsafe impl Sync for ThisModule {}
+
+impl ThisModule {
+ /// Creates a [`ThisModule`] given the `THIS_MODULE` pointer.
+ ///
+ /// # Safety
+ ///
+ /// The pointer must be equal to the right `THIS_MODULE`.
+ pub const unsafe fn from_ptr(ptr: *mut crate::bindings::module) -> ThisModule {
+ ThisModule(ptr)
+ }
+
+ /// Access the raw pointer for this module.
+ ///
+ /// It is up to the user to use it correctly.
+ pub const fn as_ptr(&self) -> *mut crate::bindings::module {
+ self.0
+ }
+}
diff --git a/rust/kernel/net/phy.rs b/rust/kernel/net/phy.rs
index 3ca99db5cccf2..8b7036b8fe480 100644
--- a/rust/kernel/net/phy.rs
+++ b/rust/kernel/net/phy.rs
@@ -659,7 +659,11 @@ pub fn register(
// the `drivers` slice are initialized properly. `drivers` will not be moved.
// So it's just an FFI call.
to_result(unsafe {
- bindings::phy_drivers_register(drivers[0].0.get(), drivers.len().try_into()?, module.0)
+ bindings::phy_drivers_register(
+ drivers[0].0.get(),
+ drivers.len().try_into()?,
+ module.as_ptr(),
+ )
})?;
// INVARIANT: The `drivers` slice is successfully registered to the kernel via `phy_drivers_register`.
Ok(Registration { drivers })
diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
index af74ddff6114d..916ed2cb6b70b 100644
--- a/rust/kernel/pci.rs
+++ b/rust/kernel/pci.rs
@@ -86,7 +86,7 @@ unsafe fn register(
// SAFETY: `pdrv` is guaranteed to be a valid `DriverType`.
to_result(unsafe {
- bindings::__pci_register_driver(pdrv.get(), module.0, name.as_char_ptr())
+ bindings::__pci_register_driver(pdrv.get(), module.as_ptr(), name.as_char_ptr())
})
}
diff --git a/rust/kernel/platform.rs b/rust/kernel/platform.rs
index 8917d4ee499fb..9fdbafd53bc21 100644
--- a/rust/kernel/platform.rs
+++ b/rust/kernel/platform.rs
@@ -82,7 +82,7 @@ unsafe fn register(
}
// SAFETY: `pdrv` is guaranteed to be a valid `DriverType`.
- to_result(unsafe { bindings::__platform_driver_register(pdrv.get(), module.0) })
+ to_result(unsafe { bindings::__platform_driver_register(pdrv.get(), module.as_ptr()) })
}
unsafe fn unregister(pdrv: &Opaque<Self::DriverType>) {
diff --git a/rust/kernel/usb.rs b/rust/kernel/usb.rs
index 9c17a672cd275..213db32727c17 100644
--- a/rust/kernel/usb.rs
+++ b/rust/kernel/usb.rs
@@ -63,7 +63,7 @@ unsafe fn register(
// SAFETY: `udrv` is guaranteed to be a valid `DriverType`.
to_result(unsafe {
- bindings::usb_register_driver(udrv.get(), module.0, name.as_char_ptr())
+ bindings::usb_register_driver(udrv.get(), module.as_ptr(), name.as_char_ptr())
})
}
--
2.43.0
^ permalink raw reply related
* [PATCH v4 3/9] rust: doctest: add LocalModule fallback for #[vtable] ThisModule
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Add a `LocalModule` struct with a null-pointer `ModuleMetadata` impl
in the doctest harness, so that `crate::LocalModule` (auto-inserted
by `#[vtable]`) resolves correctly when there is no `module!` macro.
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
scripts/rustdoc_test_gen.rs | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index ee76e96b41eea..198af4e446c8c 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -239,6 +239,22 @@ macro_rules! assert_eq {{
const __LOG_PREFIX: &[u8] = b"rust_doctests_kernel\0";
+/// Dummy module type for doctest context.
+struct LocalModule;
+
+use kernel::{{
+ str::CStr,
+ ModuleMetadata,
+ ThisModule, //
+}};
+use core::ptr::null_mut;
+
+impl ModuleMetadata for LocalModule {{
+ const NAME: &'static CStr = c"rust_doctests_kernel";
+ // SAFETY: `try_module_get`/`module_put` handle null module pointers gracefully.
+ const THIS_MODULE: ThisModule = unsafe {{ ThisModule::from_ptr(null_mut()) }};
+}}
+
{rust_tests}
"#
)
--
2.43.0
^ permalink raw reply related
* [PATCH v4 4/9] rust: macros: auto-insert OwnerModule in #[vtable]
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Auto-add `type OwnerModule: ::kernel::ModuleMetadata;` as a required
associated type on the trait side if not already defined, and
auto-insert `type OwnerModule = crate::LocalModule;` on the impl side
if not explicitly provided, eliminating the need to manually declare
and implement `OwnerModule` in every vtable trait and impl.
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/all/DIMMWHUOLPSH.13JFRHDKDQJGO@garyguo.net
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/macros/lib.rs | 6 ++++++
rust/macros/vtable.rs | 41 ++++++++++++++++++++++++++++++++++++-----
2 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/rust/macros/lib.rs b/rust/macros/lib.rs
index 2cfd59e0f9e7c..bc7ded353c5ca 100644
--- a/rust/macros/lib.rs
+++ b/rust/macros/lib.rs
@@ -176,6 +176,12 @@ pub fn module(input: TokenStream) -> TokenStream {
///
/// This macro should not be used when all functions are required.
///
+/// Additionally, this macro automatically handles the `OwnerModule`
+/// associated type: on the trait side, `type OwnerModule: ModuleMetadata;`
+/// is added as a required associated type if not already defined; on the
+/// impl side, `type OwnerModule = LocalModule;` is automatically inserted
+/// if not explicitly defined.
+///
/// # Examples
///
/// ```
diff --git a/rust/macros/vtable.rs b/rust/macros/vtable.rs
index c6510b0c4ea1d..be9a5ed8abe5e 100644
--- a/rust/macros/vtable.rs
+++ b/rust/macros/vtable.rs
@@ -30,6 +30,22 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
const USE_VTABLE_ATTR: ();
});
+ // Add `type OwnerModule: ModuleMetadata` as a required associated type if
+ // the trait does not already define it.
+ if !item
+ .items
+ .iter()
+ .any(|i| matches!(i, TraitItem::Type(t) if t.ident == "OwnerModule"))
+ {
+ gen_items.push(parse_quote! {
+ /// The module implementing this vtable trait.
+ ///
+ /// Automatically set to `crate::LocalModule` by the `#[vtable]`
+ /// impl macro.
+ type OwnerModule: ::kernel::ModuleMetadata;
+ });
+ }
+
for item in &item.items {
if let TraitItem::Fn(fn_item) = item {
let name = &fn_item.sig.ident;
@@ -57,12 +73,18 @@ fn handle_trait(mut item: ItemTrait) -> Result<ItemTrait> {
fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
let mut gen_items = Vec::new();
- let mut defined_consts = HashSet::new();
+ let mut defined_items = HashSet::new();
- // Iterate over all user-defined constants to gather any possible explicit overrides.
+ // Iterate over all user-defined items to gather any possible explicit overrides.
for item in &item.items {
- if let ImplItem::Const(const_item) = item {
- defined_consts.insert(const_item.ident.clone());
+ match item {
+ ImplItem::Const(const_item) => {
+ defined_items.insert(const_item.ident.clone());
+ }
+ ImplItem::Type(type_item) => {
+ defined_items.insert(type_item.ident.clone());
+ }
+ _ => {}
}
}
@@ -70,6 +92,15 @@ fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
const USE_VTABLE_ATTR: () = ();
});
+ // Auto-insert `type OwnerModule = crate::LocalModule` if not explicitly defined.
+ // `crate::LocalModule` resolves to the real module type (via `module!`) or a
+ // dummy fallback in non-module contexts (e.g., doctests).
+ if !defined_items.contains(&parse_quote!(OwnerModule)) {
+ gen_items.push(parse_quote! {
+ type OwnerModule = crate::LocalModule;
+ });
+ }
+
for item in &item.items {
if let ImplItem::Fn(fn_item) = item {
let name = &fn_item.sig.ident;
@@ -78,7 +109,7 @@ fn handle_impl(mut item: ItemImpl) -> Result<ItemImpl> {
name.span(),
);
// Skip if it's declared already -- this allows user override.
- if defined_consts.contains(&gen_const_name) {
+ if defined_items.contains(&gen_const_name) {
continue;
}
let cfg_attrs = crate::helpers::gather_cfg_attrs(&fn_item.attrs);
--
2.43.0
^ permalink raw reply related
* [PATCH v4 7/9] rust: configfs: use `LocalModule` for `THIS_MODULE`
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Replace the `THIS_MODULE` static reference in the `configfs_attrs!`
macro with `this_module::<LocalModule>()`, and update
rnull to import `LocalModule` instead of `THIS_MODULE`, consistent
with the move of `THIS_MODULE` into the `ModuleMetadata` trait.
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
drivers/block/rnull/configfs.rs | 6 ++----
rust/kernel/configfs.rs | 8 +++++---
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index c10a55fc58948..b2547ad1e5ddd 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -1,9 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
-use super::{
- NullBlkDevice,
- THIS_MODULE, //
-};
+use super::NullBlkDevice;
+use crate::LocalModule;
use kernel::{
block::mq::gen_disk::{
GenDisk,
diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
index 2339c6467325d..b542422115461 100644
--- a/rust/kernel/configfs.rs
+++ b/rust/kernel/configfs.rs
@@ -875,7 +875,7 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
/// configfs::Subsystem<Configuration>,
/// Configuration
/// >::new_with_child_ctor::<N,Child>(
-/// &THIS_MODULE,
+/// ::kernel::module::this_module::<LocalModule>(),
/// &CONFIGURATION_ATTRS
/// );
///
@@ -1021,7 +1021,8 @@ macro_rules! configfs_attrs {
static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data> =
$crate::configfs::ItemType::<$container, $data>::new::<N>(
- &THIS_MODULE, &[<$ data:upper _ATTRS >]
+ $crate::module::this_module::<LocalModule>(),
+ &[<$ data:upper _ATTRS >]
);
)?
@@ -1030,7 +1031,8 @@ macro_rules! configfs_attrs {
$crate::configfs::ItemType<$container, $data> =
$crate::configfs::ItemType::<$container, $data>::
new_with_child_ctor::<N, $child>(
- &THIS_MODULE, &[<$ data:upper _ATTRS >]
+ $crate::module::this_module::<LocalModule>(),
+ &[<$ data:upper _ATTRS >]
);
)?
--
2.43.0
^ permalink raw reply related
* [PATCH v4 5/9] rust: drm: set fops.owner from driver module pointer
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Change `create_fops()` to accept an owner module pointer instead of
hardcoding `null_mut()`, ensuring the kernel correctly tracks the
module owning the DRM device's file operations.
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/drm/device.rs | 3 ++-
rust/kernel/drm/gem/mod.rs | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/rust/kernel/drm/device.rs b/rust/kernel/drm/device.rs
index 403fc35353c74..d92cacb665366 100644
--- a/rust/kernel/drm/device.rs
+++ b/rust/kernel/drm/device.rs
@@ -111,7 +111,8 @@ impl<T: drm::Driver> Device<T> {
fops: &Self::GEM_FOPS,
};
- const GEM_FOPS: bindings::file_operations = drm::gem::create_fops();
+ const GEM_FOPS: bindings::file_operations =
+ drm::gem::create_fops(crate::module::this_module::<T::OwnerModule>().as_ptr());
/// Create a new `drm::Device` for a `drm::Driver`.
pub fn new(dev: &device::Device, data: impl PinInit<T::Data, Error>) -> Result<ARef<Self>> {
diff --git a/rust/kernel/drm/gem/mod.rs b/rust/kernel/drm/gem/mod.rs
index 01b5bd47a3332..9a203efc59116 100644
--- a/rust/kernel/drm/gem/mod.rs
+++ b/rust/kernel/drm/gem/mod.rs
@@ -357,10 +357,10 @@ impl<T: DriverObject> AllocImpl for Object<T> {
};
}
-pub(super) const fn create_fops() -> bindings::file_operations {
+pub(super) const fn create_fops(owner: *mut bindings::module) -> bindings::file_operations {
let mut fops: bindings::file_operations = pin_init::zeroed();
- fops.owner = core::ptr::null_mut();
+ fops.owner = owner;
fops.open = Some(bindings::drm_open);
fops.release = Some(bindings::drm_release);
fops.unlocked_ioctl = Some(bindings::drm_ioctl);
--
2.43.0
^ permalink raw reply related
* [PATCH v4 6/9] rust: miscdevice: set fops.owner from driver module pointer
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Set the miscdevice fops owner field from the driver module pointer
via the `this_module::<T::OwnerModule>()` helper, instead of
defaulting to null.
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/kernel/miscdevice.rs | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs
index 83ce50def5ac9..2a4329f98614e 100644
--- a/rust/kernel/miscdevice.rs
+++ b/rust/kernel/miscdevice.rs
@@ -24,12 +24,13 @@
IovIterSource, //
},
mm::virt::VmaNew,
+ module::this_module,
prelude::*,
seq_file::SeqFile,
types::{
ForeignOwnable,
Opaque, //
- },
+ }, //
};
use core::marker::PhantomData;
@@ -430,6 +431,7 @@ impl<T: MiscDevice> MiscdeviceVTable<T> {
} else {
None
},
+ owner: this_module::<T::OwnerModule>().as_ptr(),
..pin_init::zeroed()
};
--
2.43.0
^ permalink raw reply related
* [PATCH v4 9/9] rust: macros: remove `THIS_MODULE` static from `module!`
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
All users have been migrated to `ModuleMetadata::THIS_MODULE` const or
`this_module::<LocalModule>()` helper. The `static THIS_MODULE`
generated by the `module!` macro is no longer referenced anywhere,
so remove it to avoid having two sources of the same `ThisModule`
pointer.
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
rust/macros/module.rs | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index aa9a618d5d19e..23b6a1b456b80 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -497,22 +497,6 @@ pub(crate) fn module(info: ModuleInfo) -> Result<TokenStream> {
/// Used by the printing macros, e.g. [`info!`].
const __LOG_PREFIX: &[u8] = #name_cstr.to_bytes_with_nul();
- // SAFETY: `__this_module` is constructed by the kernel at load time and will not be
- // freed until the module is unloaded.
- #[cfg(MODULE)]
- static THIS_MODULE: ::kernel::ThisModule = unsafe {
- extern "C" {
- static __this_module: ::kernel::types::Opaque<::kernel::bindings::module>;
- };
-
- ::kernel::ThisModule::from_ptr(__this_module.get())
- };
-
- #[cfg(not(MODULE))]
- static THIS_MODULE: ::kernel::ThisModule = unsafe {
- ::kernel::ThisModule::from_ptr(::core::ptr::null_mut())
- };
-
/// The `LocalModule` type is the type of the module created by `module!`,
/// `module_pci_driver!`, `module_platform_driver!`, etc.
type LocalModule = #type_;
--
2.43.0
^ permalink raw reply related
* [PATCH v4 8/9] rust: binder: use `LocalModule` for `THIS_MODULE`
From: Alvin Sun @ 2026-06-23 6:29 UTC (permalink / raw)
To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
Jens Axboe, Dave Ertman, Ira Weiny, Leon Romanovsky, Igor Korotin,
FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
Arve Hjønnevåg, Todd Kjos, Christian Brauner,
Carlos Llamas
Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
linux-pci, Alvin Sun
In-Reply-To: <20260623-fix-fops-owner-v4-0-0daf5f077d5c@linux.dev>
Replace the `THIS_MODULE` static reference in the binder fops with
`this_module::<LocalModule>()`, consistent with the move of
`THIS_MODULE` into the `ModuleMetadata` trait.
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
drivers/android/binder/rust_binder_main.rs | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/android/binder/rust_binder_main.rs b/drivers/android/binder/rust_binder_main.rs
index dc1941cd2407b..d6ceebbd5f94e 100644
--- a/drivers/android/binder/rust_binder_main.rs
+++ b/drivers/android/binder/rust_binder_main.rs
@@ -17,6 +17,7 @@
bindings::{self, seq_file},
fs::File,
list::{ListArc, ListArcSafe, ListLinksSelfPtr, TryNewListArc},
+ module::this_module,
prelude::*,
seq_file::SeqFile,
seq_print,
@@ -318,7 +319,7 @@ unsafe impl<T> Sync for AssertSync<T> {}
let zeroed_ops = unsafe { core::mem::MaybeUninit::zeroed().assume_init() };
let ops = kernel::bindings::file_operations {
- owner: THIS_MODULE.as_ptr(),
+ owner: this_module::<LocalModule>().as_ptr(),
poll: Some(rust_binder_poll),
unlocked_ioctl: Some(rust_binder_ioctl),
compat_ioctl: bindings::compat_ptr_ioctl,
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox