Linux block layer
 help / color / mirror / Atom feed
* Re: [PATCH] block, bfq: protect async queue reset with blkcg locks
From: Tao Cui @ 2026-06-21 18:33 UTC (permalink / raw)
  To: Cen Zhang, Yu Kuai, Tejun Heo, Josef Bacik, Jens Axboe,
	Arianna Avanzini, Paolo Valente
  Cc: cui.tao, linux-block, cgroups, linux-kernel, baijiaju1990
In-Reply-To: <20260621135930.2657810-1-zzzccc427@gmail.com>


Nice catch. The race is real, and the fix lines up with how the rest
of the blkcg code already protects blkg_list walks — the new nesting
(blkcg_mutex -> queue_lock -> bfqd->lock) is the same order
blkg_free_workfn() and bfq_pd_offline() use, so no inversion.

Reviewed-by: Tao Cui <cuitao@kylinos.cn>

在 2026/6/21 21:59, Cen Zhang 写道:
> Writing 0 to BFQ's low_latency attribute ends weight raising for active,
> idle and async queues. The async cgroup path walks q->blkg_list, converts
> each blkg to BFQ policy data and then reads bfqg->async_bfqq and
> bfqg->async_idle_bfqq.
> 
> That walk was protected only by bfqd->lock. blkcg release work is
> serialized by q->blkcg_mutex and q->queue_lock instead, and
> blkg_free_workfn() can call BFQ's pd_free_fn before it removes
> blkg->q_node from q->blkg_list. A low_latency reset can therefore still
> find the blkg on the queue list after the BFQ policy data has been freed.
> 
> The buggy scenario involves two paths, with each column showing the order
> within that path:
> 
> BFQ low_latency reset:              blkcg blkg release work:
> 1. bfq_low_latency_store()          1. blkg_free_workfn() takes
>    calls bfq_end_wr().                 q->blkcg_mutex.
> 2. bfq_end_wr_async() walks         2. BFQ pd_free_fn drops the
>    q->blkg_list.                       final bfq_group reference.
> 3. blkg_to_bfqg() returns           3. blkg->q_node remains on
>    the stale policy data.              q->blkg_list until list_del_init().
> 4. bfq_end_wr_async_queues()
>    reads async queue fields.
> 
> Fix this by taking q->blkcg_mutex and q->queue_lock around the
> q->blkg_list walk, then taking bfqd->lock before touching BFQ async
> queues. The mutex serializes against policy-data free and queue_lock
> stabilizes the list. Move the async reset out of bfq_end_wr()'s existing
> bfqd->lock critical section so the lock order matches blkcg policy
> callbacks.

^ permalink raw reply

* [PATCH] block, bfq: protect async queue reset with blkcg locks
From: Cen Zhang @ 2026-06-21 13:59 UTC (permalink / raw)
  To: Yu Kuai, Tejun Heo, Josef Bacik, Jens Axboe, Arianna Avanzini,
	Paolo Valente
  Cc: linux-block, cgroups, linux-kernel, baijiaju1990, zzzccc427

Writing 0 to BFQ's low_latency attribute ends weight raising for active,
idle and async queues. The async cgroup path walks q->blkg_list, converts
each blkg to BFQ policy data and then reads bfqg->async_bfqq and
bfqg->async_idle_bfqq.

That walk was protected only by bfqd->lock. blkcg release work is
serialized by q->blkcg_mutex and q->queue_lock instead, and
blkg_free_workfn() can call BFQ's pd_free_fn before it removes
blkg->q_node from q->blkg_list. A low_latency reset can therefore still
find the blkg on the queue list after the BFQ policy data has been freed.

The buggy scenario involves two paths, with each column showing the order
within that path:

BFQ low_latency reset:              blkcg blkg release work:
1. bfq_low_latency_store()          1. blkg_free_workfn() takes
   calls bfq_end_wr().                 q->blkcg_mutex.
2. bfq_end_wr_async() walks         2. BFQ pd_free_fn drops the
   q->blkg_list.                       final bfq_group reference.
3. blkg_to_bfqg() returns           3. blkg->q_node remains on
   the stale policy data.              q->blkg_list until list_del_init().
4. bfq_end_wr_async_queues()
   reads async queue fields.

Fix this by taking q->blkcg_mutex and q->queue_lock around the
q->blkg_list walk, then taking bfqd->lock before touching BFQ async
queues. The mutex serializes against policy-data free and queue_lock
stabilizes the list. Move the async reset out of bfq_end_wr()'s existing
bfqd->lock critical section so the lock order matches blkcg policy
callbacks.

Validation reproduced this kernel report:
BUG: KASAN: slab-use-after-free in bfq_end_wr_async_queues+0x246/0x340

Call Trace:
 <TASK>
 dump_stack_lvl+0x66/0xa0
 print_report+0xce/0x630
 ? bfq_end_wr_async_queues+0x246/0x340
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __virt_addr_valid+0x20d/0x410
 ? bfq_end_wr_async_queues+0x246/0x340
 kasan_report+0xe0/0x110
 ? bfq_end_wr_async_queues+0x246/0x340
 bfq_end_wr_async_queues+0x246/0x340
 bfq_end_wr_async+0xba/0x180
 bfq_low_latency_store+0x4e5/0x690
 ? 0xffffffffc02150da
 ? __pfx_bfq_low_latency_store+0x10/0x10
 ? __pfx_bfq_low_latency_store+0x10/0x10
 elv_attr_store+0xc4/0x110
 kernfs_fop_write_iter+0x2f5/0x4a0
 vfs_write+0x604/0x11f0
 ? __pfx_locks_remove_posix+0x10/0x10
 ? __pfx_vfs_write+0x10/0x10
 ksys_write+0xf9/0x1d0
 ? __pfx_ksys_write+0x10/0x10
 do_syscall_64+0x115/0x6a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Allocated by task 544:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0xaa/0xb0
 bfq_pd_alloc+0xc0/0x1b0
 blkg_alloc+0x346/0x960
 blkg_create+0x8c2/0x10d0
 bio_associate_blkg_from_css+0x9f3/0xfa0
 bio_associate_blkg+0xd9/0x200
 bio_init+0x303/0x640
 __blkdev_direct_IO_simple+0x56b/0x8a0
 blkdev_direct_IO+0x8e7/0x2580
 blkdev_read_iter+0x205/0x400
 vfs_read+0x7b0/0xda0
 ksys_read+0xf9/0x1d0
 do_syscall_64+0x115/0x6a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 465:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0x5f/0x80
 kfree+0x307/0x580
 blkg_free_workfn+0xef/0x460
 process_one_work+0x8d0/0x1870
 worker_thread+0x575/0xf80
 kthread+0x2e7/0x3c0
 ret_from_fork+0x576/0x810
 ret_from_fork_asm+0x1a/0x30

Fixes: 44e44a1b329e ("block, bfq: improve responsiveness")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
 block/bfq-cgroup.c  | 13 ++++++++++++-
 block/bfq-iosched.c |  3 ++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 0bd0332b3d78..d8fdace464b4 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -936,14 +936,23 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
 
 void bfq_end_wr_async(struct bfq_data *bfqd)
 {
+	struct request_queue *q = bfqd->queue;
 	struct blkcg_gq *blkg;
 
-	list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {
+	mutex_lock(&q->blkcg_mutex);
+	spin_lock_irq(&q->queue_lock);
+	spin_lock(&bfqd->lock);
+
+	list_for_each_entry(blkg, &q->blkg_list, q_node) {
 		struct bfq_group *bfqg = blkg_to_bfqg(blkg);
 
 		bfq_end_wr_async_queues(bfqd, bfqg);
 	}
 	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
+
+	spin_unlock(&bfqd->lock);
+	spin_unlock_irq(&q->queue_lock);
+	mutex_unlock(&q->blkcg_mutex);
 }
 
 static int bfq_io_show_weight_legacy(struct seq_file *sf, void *v)
@@ -1416,7 +1425,9 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio) {}
 
 void bfq_end_wr_async(struct bfq_data *bfqd)
 {
+	spin_lock_irq(&bfqd->lock);
 	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
+	spin_unlock_irq(&bfqd->lock);
 }
 
 struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 141c602d5e85..eec9be62061b 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2653,9 +2653,10 @@ static void bfq_end_wr(struct bfq_data *bfqd)
 	}
 	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
 		bfq_bfqq_end_wr(bfqq);
-	bfq_end_wr_async(bfqd);
 
 	spin_unlock_irq(&bfqd->lock);
+
+	bfq_end_wr_async(bfqd);
 }
 
 static sector_t bfq_io_struct_pos(void *io_struct, bool request)
-- 
2.43.0


^ permalink raw reply related

* [PATCH] blk-iolatency: flush enable work after policy deactivation
From: Cen Zhang @ 2026-06-21 13:59 UTC (permalink / raw)
  To: Tejun Heo, Josef Bacik, Jens Axboe
  Cc: cgroups, linux-block, linux-kernel, baijiaju1990, zzzccc427

A blk-iolatency rq-qos teardown can free struct blk_iolatency while a
freshly queued enable_work callback still references it. The observed
failure is:

blkcg_iolatency_exit() flushes enable_work before deactivating the
iolatency policy. However, blkcg_deactivate_policy() calls
iolatency_pd_offline() for online policy data, and iolatency_pd_offline()
clears min_lat_nsec through iolatency_set_min_lat_nsec(). If this clears
the last nonzero latency target, enable_cnt reaches zero and schedules
enable_work again after the flush has already returned.

The buggy scenario involves two paths, with each column showing the order
within that path:

blkcg_iolatency_exit() path:          system_wq worker path:
1. Flush old enable_work.             1. enable_work is idle.
2. Deactivate the policy.             2. no worker owns it.
3. Offline queues new enable_work.    3. work item becomes pending.
4. Free blkiolat.                     4. worker later runs the item.
5. Owner storage is gone.             5. worker dereferences blkiolat.

Flush enable_work again after blkcg_deactivate_policy() returns and before
freeing blkiolat. Policy offline callbacks have completed at that point,
so the second drain covers the late queueing path without changing the
normal enable/disable accounting rules.

Validation reproduced this kernel report:
BUG: KASAN: slab-use-after-free in assign_work+0x2a/0x150

Call Trace:
 <TASK>
 dump_stack_lvl+0x53/0x70
 print_report+0xd0/0x630
 ? __pfx__raw_spin_lock_irqsave+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __virt_addr_valid+0xea/0x1a0
 ? assign_work+0x2a/0x150
 kasan_report+0xce/0x100
 ? assign_work+0x2a/0x150
 assign_work+0x2a/0x150
 worker_thread+0x1b7/0x500
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x192/0x1d0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2ac/0x3c0
 ? __pfx_ret_from_fork+0x10/0x10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __switch_to+0x2d5/0x6e0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 470:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x8f/0xa0
 iolatency_set_limit+0x301/0x450
 cgroup_file_write+0x178/0x2e0
 kernfs_fop_write_iter+0x1ef/0x290
 vfs_write+0x446/0x6f0
 ksys_write+0xc7/0x160
 do_syscall_64+0xf9/0x540
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 611:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0x43/0x70
 kfree+0x131/0x390
 rq_qos_exit+0x5d/0x90
 __del_gendisk+0x394/0x490
 del_gendisk+0xa1/0xe0
 virtblk_remove+0x41/0xd0
 virtio_dev_remove+0x63/0xe0
 device_release_driver_internal+0x246/0x2e0
 unbind_store+0xa9/0xb0
 kernfs_fop_write_iter+0x1ef/0x290
 vfs_write+0x446/0x6f0
 ksys_write+0xc7/0x160
 do_syscall_64+0xf9/0x540
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Last potentially related work creation:
 kasan_save_stack+0x33/0x60
 kasan_record_aux_stack+0x8c/0xa0
 __queue_work+0x42a/0x800
 queue_work_on+0x5d/0x70
 iolatency_set_min_lat_nsec+0x196/0x230
 iolatency_pd_offline+0x1f/0x40
 blkcg_deactivate_policy+0x194/0x270
 blkcg_iolatency_exit+0x33/0x40
 rq_qos_exit+0x5d/0x90
 __del_gendisk+0x394/0x490
 del_gendisk+0xa1/0xe0
 virtblk_remove+0x41/0xd0
 virtio_dev_remove+0x63/0xe0
 device_release_driver_internal+0x246/0x2e0
 unbind_store+0xa9/0xb0
 kernfs_fop_write_iter+0x1ef/0x290
 vfs_write+0x446/0x6f0
 ksys_write+0xc7/0x160
 do_syscall_64+0xf9/0x540
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Second to last potentially related work creation:
 kasan_save_stack+0x33/0x60
 kasan_record_aux_stack+0x8c/0xa0
 __queue_work+0x42a/0x800
 queue_work_on+0x5d/0x70
 iolatency_set_min_lat_nsec+0x196/0x230
 iolatency_set_limit+0x3f1/0x450
 cgroup_file_write+0x178/0x2e0
 kernfs_fop_write_iter+0x1ef/0x290
 vfs_write+0x446/0x6f0
 ksys_write+0xc7/0x160
 do_syscall_64+0xf9/0x540
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 8a177a36da6c ("blk-iolatency: Fix inflight count imbalances and IO hangs on offline")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
 block/blk-iolatency.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 1aaee6fb0f59..a0bdd8a5c94c 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -639,6 +639,11 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)
 	timer_shutdown_sync(&blkiolat->timer);
 	flush_work(&blkiolat->enable_work);
 	blkcg_deactivate_policy(rqos->disk, &blkcg_policy_iolatency);
+	/*
+	 * blkcg_deactivate_policy() invokes iolatency_pd_offline(), which may
+	 * queue enable_work again when it clears the last latency target.
+	 */
+	flush_work(&blkiolat->enable_work);
 	kfree(blkiolat);
 }
 
-- 
2.43.0


^ permalink raw reply related

* [syzbot] [nbd?] WARNING in nbd_add_socket
From: syzbot @ 2026-06-21  6:23 UTC (permalink / raw)
  To: axboe, josef, linux-block, linux-kernel, nbd, netdev,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    b85966adbf5d Merge tag 'net-next-7.2' of git://git.kernel...
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=101f6d56580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9a9f723a32776544
dashboard link: https://syzkaller.appspot.com/bug?extid=6b85d1e39a5b8ed9a954
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13584aae580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11fd7b7a580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/780edcc3cc37/disk-b85966ad.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/967dd18c7ecd/vmlinux-b85966ad.xz
kernel image: https://storage.googleapis.com/syzbot-assets/cf9fa92c90ff/bzImage-b85966ad.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+6b85d1e39a5b8ed9a954@syzkaller.appspotmail.com

netlink: 3936 bytes leftover after parsing attributes in process `syz.0.25'.
------------[ cut here ]------------
!sock_allow_reclassification(sk)
WARNING: drivers/block/nbd.c:1249 at nbd_reclassify_socket drivers/block/nbd.c:1249 [inline], CPU#0: syz.0.25/5992
WARNING: drivers/block/nbd.c:1249 at nbd_add_socket+0xf35/0x12c0 drivers/block/nbd.c:1293, CPU#0: syz.0.25/5992
Modules linked in:

CPU: 0 UID: 0 PID: 5992 Comm: syz.0.25 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
RIP: 0010:nbd_reclassify_socket drivers/block/nbd.c:1249 [inline]
RIP: 0010:nbd_add_socket+0xf35/0x12c0 drivers/block/nbd.c:1293
Code: f7 e8 6f b5 20 fc bf e0 01 00 00 49 03 3e 48 c7 c6 40 02 55 8c e8 2b a8 1b fb b8 f0 ff ff ff e9 b2 fd ff ff e8 ac 60 b5 fb 90 <0f> 0b 90 e9 16 f8 ff ff e8 5e 2e 97 05 44 89 e9 80 e1 07 fe c1 38
RSP: 0018:ffffc90002ef7160 EFLAGS: 00010293

RAX: ffffffff86109574 RBX: 1ffff1100651ddb9 RCX: ffff888020b68000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc90002ef7250 R08: ffff888035af2bdf R09: 1ffff11006b5e57b
R10: dffffc0000000000 R11: ffffed1006b5e57c R12: ffff8880328eec00
R13: 1ffff920005dee38 R14: dffffc0000000000 R15: 0000000000000001
FS:  00007fcc9d5dd6c0(0000) GS:ffff88812527c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f81a8ab50f0 CR3: 0000000078780000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 nbd_genl_connect+0x133d/0x1c10 drivers/block/nbd.c:2254
 genl_family_rcv_msg_doit+0x233/0x340 net/netlink/genetlink.c:1114
 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
 genl_rcv_msg+0x614/0x7a0 net/netlink/genetlink.c:1209
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec net/socket.c:775 [inline]
 __sock_sendmsg net/socket.c:790 [inline]
 ____sys_sendmsg+0x9b9/0xa20 net/socket.c:2684
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2738
 __sys_sendmsg net/socket.c:2770 [inline]
 __do_sys_sendmsg net/socket.c:2775 [inline]
 __se_sys_sendmsg net/socket.c:2773 [inline]
 __x64_sys_sendmsg+0x1b1/0x290 net/socket.c:2773
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fcc9df9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fcc9d5dd028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fcc9e216090 RCX: 00007fcc9df9ce59
RDX: 0000000000004040 RSI: 0000200000000140 RDI: 0000000000000004
RBP: 00007fcc9e032d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fcc9e216128 R14: 00007fcc9e216090 R15: 00007ffc8f827678
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* [PATCH] Documentation: ABI: fix "unexpected indentation" error in sysfs-block
From: Jay Winston @ 2026-06-21  6:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel, corbet, Jay Winston

`make htmldocs` reports:
Documentation/ABI/stable/sysfs-block:612: ERROR: Unexpected indentation

Leading dashes at lines 623, 636, and 641 were considered line
continuation with errant indent and not bullet points due to
missing blank lines.  Add the blank lines.

Signed-off-by: Jay Winston <jaybenjaminwinston@gmail.com>
---
 Documentation/ABI/stable/sysfs-block | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index aa1e94169666..f4bce370a540 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -620,6 +620,7 @@ Description:
 		- async_depth is always equal to nr_requests.
 
 		For bfq scheduler:
+
 		- By default, async_depth is set to 75% of nr_requests.
 		  Internal limits are then derived from this value:
 		  * Sync writes: limited to async_depth (≈75% of nr_requests).
@@ -633,11 +634,13 @@ Description:
 		  these limits proportionally based on the new value.
 
 		For Kyber:
+
 		- By default async_depth is set to 75% of nr_requests.
 		- If the user writes a custom value to async_depth, then it override the
 		  default and directly control the limit for writes and async I/O.
 
 		For mq-deadline:
+
 		- By default async_depth is set to nr_requests.
 		- If the user writes a custom value to async_depth, then it override the
 		  default and directly control the limit for writes and async I/O.
-- 
2.46.4


^ permalink raw reply related

* [PATCH] block: Make WBT latency writes honor enable state
From: guzebing @ 2026-06-21  1:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Guzebing, linux-block, linux-kernel

From: Guzebing <guzebing1612@gmail.com>

queue/wbt_lat_usec controls both the stored WBT latency target and the
effective WBT enable state.

The old no-op check skipped updates whenever the converted latency
matched the stored min_lat_nsec. That check ignored whether the current
WBT state already matched the state requested by the write. For a queue
disabled by default, attempting to enable WBT by writing the default
value through sysfs could return success while the enable state was left
unchanged.

Treat a write as a no-op only when both the stored latency and the
effective WBT enabled state already match the converted value.

Signed-off-by: Guzebing <guzebing1612@gmail.com>
---
Background:

The issue can be reproduced on an NVMe namespace when BFQ is available:

  echo bfq > /sys/block/nvme0n1/queue/scheduler
  cat /sys/block/nvme0n1/queue/wbt_lat_usec
  echo 2000 > /sys/block/nvme0n1/queue/wbt_lat_usec
  cat /sys/block/nvme0n1/queue/wbt_lat_usec

After BFQ selects the queue, WBT is disabled by default.  On a
non-rotational NVMe namespace the stored default latency remains
2000000 nsec, while the sysfs file reports 0 because the effective WBT
state is disabled:

  queue/wbt_lat_usec = 0
  debugfs enabled = 3
  debugfs min_lat_nsec = 2000000

Writing the default value succeeds, but the old no-op check skips the
state transition because min_lat_nsec already matches the converted
value:

  echo 2000 > /sys/block/nvme0n1/queue/wbt_lat_usec
  # echo returns success, but:
  queue/wbt_lat_usec = 0
  debugfs enabled = 3
  debugfs min_lat_nsec = 2000000

As a control, writing a non-default value first does work:

  echo 5000 > /sys/block/nvme0n1/queue/wbt_lat_usec
  queue/wbt_lat_usec = 5000
  debugfs enabled = 2
  debugfs min_lat_nsec = 5000000

Writing the default value after that also works, because the stored
latency changes from 5000000 nsec back to 2000000 nsec:

  echo 2000 > /sys/block/nvme0n1/queue/wbt_lat_usec
  queue/wbt_lat_usec = 2000
  debugfs enabled = 2
  debugfs min_lat_nsec = 2000000

With this patch, writing the default value after BFQ default-disables
WBT also re-enables WBT as expected:

  queue/wbt_lat_usec = 2000
  debugfs enabled = 2
  debugfs min_lat_nsec = 2000000

 block/blk-wbt.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index dcc2438ca16dc..953d400fd0137 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -813,6 +813,21 @@ static void wbt_queue_depth_changed(struct rq_qos *rqos)
 	wbt_update_limits(RQWB(rqos));
 }
 
+static bool wbt_set_lat_changed(struct request_queue *q, u64 val)
+{
+	struct rq_qos *rqos = wbt_rq_qos(q);
+	struct rq_wb *rwb;
+
+	if (!rqos)
+		return true;
+
+	rwb = RQWB(rqos);
+	if (rwb->min_lat_nsec != val)
+		return true;
+
+	return rwb_enabled(rwb) != !!val;
+}
+
 static void wbt_exit(struct rq_qos *rqos)
 {
 	struct rq_wb *rwb = RQWB(rqos);
@@ -1005,8 +1020,12 @@ int wbt_set_lat(struct gendisk *disk, s64 val)
 	else if (val >= 0)
 		val *= 1000ULL;
 
-	if (wbt_get_min_lat(q) == val)
+	mutex_lock(&disk->rqos_state_mutex);
+	if (!wbt_set_lat_changed(q, val)) {
+		mutex_unlock(&disk->rqos_state_mutex);
 		goto out;
+	}
+	mutex_unlock(&disk->rqos_state_mutex);
 
 	blk_mq_quiesce_queue(q);
 
-- 
2.20.1

^ permalink raw reply related

* Re: [PATCH V2] blk-cgroup: fix UAF in __blkcg_rstat_flush()
From: Jose Fernandez (Anthropic) @ 2026-06-20 23:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Michal Koutný, stable, Jay Shin,
	Tejun Heo, Waiman Long, coregee2000
In-Reply-To: <20260205155425.342084-1-ming.lei@redhat.com>

On Thu, 5 Feb 2026 23:54:23 +0800, Ming Lei wrote:
> Move the flush from __blkg_release() (RCU callback) to blkg_release()
> (before call_rcu). This ensures the RCU grace period waits for any
> concurrent flush's rcu_read_lock() section to complete before freeing.

We started seeing this in the wild on a 6.18.35-based kernel as a NULL
pointer dereference rather than a KASAN report.  The freed blkg /
percpu iostat slot gets reallocated and zeroed before the concurrent
flusher reaches it, so bisc->blkg reads back as NULL:

  BUG: kernel NULL pointer dereference, address: 0000000000000030
  #PF: supervisor read access in kernel mode
  RIP: 0010:__blkcg_rstat_flush.isra.0+0x8d/0x1c0
  Code: ... 48 8b 1a 4c 8d 78 f8 31 c0 f3 48 ab <4c> 8b 73 30 ...
  RBX: 0000000000000000
  Call Trace:
   <IRQ>
   __blkg_release+0x2d/0xf0
   rcu_do_batch+0x1b8/0x570
   rcu_core+0x167/0x350
   handle_softirqs+0xda/0x330

The workload is container-heavy with frequent block-device add/remove,
so multiple blkgs in the same blkcg routinely hit blkg_release()
concurrently on different CPUs.

I can reproduce reliably under KASAN by inserting a udelay(2000)
between llist_del_all() and raw_spin_lock_irqsave() in
__blkcg_rstat_flush(), then driving direct I/O to N loop devices from
one cgroup followed by parallel LOOP_CTL_REMOVE on each device.  KASAN
reports slab-use-after-free in __blkcg_rstat_flush() with the expected
alloc=blkg_alloc / free=blkg_free_workfn stacks.

With this patch applied on top of the same udelay-widened tree, the
same harness runs 150 rounds clean.

This doesn't appear to have been picked up after V1 was dropped; would
be good to get it queued.

Tested-by: Jose Fernandez (Anthropic) <jose.fernandez@linux.dev>

^ permalink raw reply

* Re: [PATCH V3] blk-cgroup: defer blkcg css_put until blkg is unlinked from queue
From: yu kuai @ 2026-06-20 18:29 UTC (permalink / raw)
  To: Zizhi Wo, axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai
In-Reply-To: <20260616011746.2451461-1-wozizhi@huaweicloud.com>

在 2026/6/16 9:17, Zizhi Wo 写道:

> From: Zizhi Wo<wozizhi@huawei.com>
>
> [BUG]
> Our fuzz testing triggered a blkcg use-after-free issue:
>
>    BUG: KASAN: slab-use-after-free in _raw_spin_lock+0x75/0xe0
>    Call Trace:
>    ...
>    blkcg_deactivate_policy+0x244/0x4d0
>    ioc_rqos_exit+0x44/0xe0
>    rq_qos_exit+0xba/0x120
>    __del_gendisk+0x50b/0x800
>    del_gendisk+0xff/0x190
>    ...
>
> [CAUSE]
> process1						process2
> cgroup_rmdir
> ...
>    css_killed_work_fn
>      offline_css
>      ...
>        blkcg_destroy_blkgs
>        ...
>          __blkg_release
> 	  css_put(&blkg->blkcg->css)
>            blkg_free
> 	    INIT_WORK(xxx, blkg_free_workfn)
> 	    schedule_work
>      css_put
>      ...
>        blkcg_css_free
>          kfree(blkcg)--------blkcg has been freed!!!
> ====================================schedule_work
>                blkg_free_workfn
> 							__del_gendisk
> 							  rq_qos_exit
> 							    ioc_rqos_exit
> 							      blkcg_deactivate_policy
> 							        mutex_lock(&q->blkcg_mutex)
> 								spin_lock_irq(&q->queue_lock)
> 							        list_for_each_entry(blkg, xxx)
> 								  blkcg = blkg->blkcg
> 								  spin_lock(&blkcg->lock)-------UAF!!!
> 	        mutex_lock(&q->blkcg_mutex)
> 	        spin_lock_irq(&q->queue_lock)
> 	        /* Only then is the blkg removed from the list */
> 	        list_del_init(&blkg->q_node)
>
> As a result, a blkg can still be reachable through q->blkg_list while
> its ->blkcg has already been freed.
>
> [Fix]
> Fix this by deferring the blkcg css_put() until after the blkg has been
> unlinked from q->blkg_list in blkg_free_workfn(). This ensures that the
> blkcg outlives every blkg still reachable through q->blkg_list, so any
> iterator holding q->queue_lock is guaranteed to observe a valid
> blkg->blkcg.
>
> While at it, move css_tryget_online() from blkg_create() into blkg_alloc()
> so that the css reference is owned by the alloc/free pair rather than
> straddling layers:
> blkg_alloc()  <-> blkg_free()
> blkg_create() <-> blkg_destroy()
>
> Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
> Suggested-by: Hou Tao<houtao1@huawei.com>
> Signed-off-by: Zizhi Wo<wozizhi@huawei.com>
> Reviewed-by: Yu Kuai<yukuai@fygo.io>
> ---
> v3:
>   - move css_put() after mutex_unlock() in blkg_free_workfn().
>
> v2:
>   - Move css_tryget_online() from blkg_create() into blkg_alloc() so the
>     css reference follows the blkg's own lifetime, making the put in
>     blkg_free_workfn() symmetric with the get in blkg_alloc().
>
> v1:https://lore.kernel.org/all/20260518010932.633707-1-wozizhi@huaweicloud.com/
>   block/blk-cgroup.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)
Reviewed-by: Yu Kuai <yukuai@fygo.io>

-- 
Thanks,
Kuai

^ permalink raw reply

* Re: [PATCH v4] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Tetsuo Handa @ 2026-06-20  9:42 UTC (permalink / raw)
  To: Al Viro
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds,
	linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner,
	Hillf Danton
In-Reply-To: <20260620073939.GF2636677@ZenIV>

On 2026/06/20 16:39, Al Viro wrote:
> On Fri, Jun 19, 2026 at 11:33:11PM +0900, Tetsuo Handa wrote:
>> Sending this commit to linux.git will be the fastest way to identify who is issuing
>> I/O requests too late. Therefore, I want to get a conclusion on xfs/259 breakage.
>> Al, can you get the same result?
> 
> Not a peep in the logs, breakage still there (with cherry-picked fb1d5846e99c8aa4ce
> and CONFIG_KCOV enabled, that is).

Please test with debug printk() patch shown below. What messages do you get?

--------------------------------------------------------------------------------
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index c3b607a3ddc4..7408f314a1fa 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1763,6 +1763,8 @@ static void lo_release(struct gendisk *disk)
 	mutex_unlock(&lo->lo_mutex);
 
 	if (need_clear) {
+		printk("Flush: task=%s[%d] dev=loop%d state=%d\n",
+		       current->comm, current->pid, lo->lo_number, lo->lo_state);
 		/*
 		 * Temporarily release disk->open_mutex in order to flush pending I/O
 		 * requests before clearing the backing device.
@@ -1813,6 +1815,8 @@ static void lo_release(struct gendisk *disk)
 		mutex_lock(&lo->lo_disk->open_mutex);
 		if (WARN_ON(data_race(READ_ONCE(lo->lo_state)) != Lo_rundown))
 			return;
+		printk("Teardown: task=%s[%d] dev=loop%d state=%d\n",
+		       current->comm, current->pid, lo->lo_number, lo->lo_state);
 		__loop_clr_fd(lo);
 	}
 }
diff --git a/fs/namespace.c b/fs/namespace.c
index 09ab7fc72f86..9710460fb449 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1893,6 +1893,8 @@ static int do_umount(struct mount *mnt, int flags)
 		 */
 		lock_mount_hash();
 		if (!list_empty(&mnt->mnt_mounts) || mnt_get_count(mnt) != 2) {
+			printk("%s: task=%s[%d] !list_empty(&mnt->mnt_mounts)=%d mnt_get_count(mnt)=%d\n", __func__,
+			       current->comm, current->pid, !list_empty(&mnt->mnt_mounts), mnt_get_count(mnt));
 			unlock_mount_hash();
 			return -EBUSY;
 		}
@@ -1960,6 +1962,9 @@ static int do_umount(struct mount *mnt, int flags)
 		if (!propagate_mount_busy(mnt, 2)) {
 			umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
 			retval = 0;
+		} else {
+			printk("%s: task=%s[%d] propagate_mount_busy()!=0\n", __func__,
+			       current->comm, current->pid);
 		}
 	}
 out:
--------------------------------------------------------------------------------



^ permalink raw reply related

* Re: [PATCH v4] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Al Viro @ 2026-06-20  7:39 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds,
	linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner,
	Hillf Danton
In-Reply-To: <a9254bf0-fc7a-4b2e-a62f-064e71016fb6@I-love.SAKURA.ne.jp>

On Fri, Jun 19, 2026 at 11:33:11PM +0900, Tetsuo Handa wrote:
> Sending this commit to linux.git will be the fastest way to identify who is issuing
> I/O requests too late. Therefore, I want to get a conclusion on xfs/259 breakage.
> Al, can you get the same result?

Not a peep in the logs, breakage still there (with cherry-picked fb1d5846e99c8aa4ce
and CONFIG_KCOV enabled, that is).

^ permalink raw reply

* Re: [PATCH blktests] Fix _get_page_size()
From: Bart Van Assche @ 2026-06-20  7:11 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki; +Cc: Jeff Moyer, linux-block, osandov, kch
In-Reply-To: <ajYabLMbEo6zyOWh@shinmob>

On 6/20/26 6:51 AM, Shin'ichiro Kawasaki wrote:
> On Jun 20, 2026 / 05:55, Bart Van Assche wrote:
>> On 6/20/26 3:26 AM, Shin'ichiro Kawasaki wrote:
>>> This is a rather fundamental change, so I would like to ask opinions from
>>> other blktests users, especially Omar and Chaitanya. What do you think about
>>> the idea to add getconf to the requirement list?
>>
>> CONFIG_PAGE_SHIFT was introduced in the Linux kernel in February 2024
>> (commit ba89f9c8ccba ("arch: consolidate existing CONFIG_PAGE_SIZE_*KB
>> definitions")). Older kernels had CONFIG_PAGE_SIZE_4KB,
>> CONFIG_PAGE_SIZE_16KB, etc. This means that it is possible to derive the
>> kernel page size from the kernel configuration file for all upstream and
>> distro kernels, isn't it?
> 
> I checked the commit is in the tag v6.9. My Debian bookworm system has kernel
> v6.1, then the config file at /boot does not have CONFIG_PAGE_SHIFT as expected.
> But it does not have CONFIG_PAGE_SIZE_* either... I'm still afraid that kernel
> config file approach is not reliable.

Right, for older kernels CONFIG_PAGE_SIZE_*KB is only available for some
but not for all supported architectures.

It is not clear to me where the desire to avoid the dependency on
getconf comes from? As far as I know it is available on all Linux
distro's. Since it is typically included in the C library package it
should not introduce a new dependency.

Thanks,

Bart.

^ permalink raw reply

* Re: [PATCH blktests] Fix _get_page_size()
From: Shin'ichiro Kawasaki @ 2026-06-20  4:51 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jeff Moyer, linux-block, osandov, kch
In-Reply-To: <d0432702-ac0b-410e-9586-2cb9be079033@acm.org>

On Jun 20, 2026 / 05:55, Bart Van Assche wrote:
> On 6/20/26 3:26 AM, Shin'ichiro Kawasaki wrote:
> > This is a rather fundamental change, so I would like to ask opinions from
> > other blktests users, especially Omar and Chaitanya. What do you think about
> > the idea to add getconf to the requirement list?
> 
> CONFIG_PAGE_SHIFT was introduced in the Linux kernel in February 2024
> (commit ba89f9c8ccba ("arch: consolidate existing CONFIG_PAGE_SIZE_*KB
> definitions")). Older kernels had CONFIG_PAGE_SIZE_4KB,
> CONFIG_PAGE_SIZE_16KB, etc. This means that it is possible to derive the
> kernel page size from the kernel configuration file for all upstream and
> distro kernels, isn't it?

I checked the commit is in the tag v6.9. My Debian bookworm system has kernel
v6.1, then the config file at /boot does not have CONFIG_PAGE_SHIFT as expected.
But it does not have CONFIG_PAGE_SIZE_* either... I'm still afraid that kernel
config file approach is not reliable.

$ uname -a
Linux testnode3 6.1.0-49-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.174-1 (2026-05-26) x86_64 GNU/Linux
$ grep PAGE_S /boot/config-6.1.0-49-amd64
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y

^ permalink raw reply

* Re: [PATCH blktests] Fix _get_page_size()
From: Bart Van Assche @ 2026-06-20  3:55 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jeff Moyer; +Cc: linux-block, osandov, kch
In-Reply-To: <ajXmBu9lDZwgMG7_@shinmob>

On 6/20/26 3:26 AM, Shin'ichiro Kawasaki wrote:
> This is a rather fundamental change, so I would like to ask opinions from
> other blktests users, especially Omar and Chaitanya. What do you think about
> the idea to add getconf to the requirement list?

CONFIG_PAGE_SHIFT was introduced in the Linux kernel in February 2024
(commit ba89f9c8ccba ("arch: consolidate existing CONFIG_PAGE_SIZE_*KB
definitions")). Older kernels had CONFIG_PAGE_SIZE_4KB,
CONFIG_PAGE_SIZE_16KB, etc. This means that it is possible to derive the
kernel page size from the kernel configuration file for all upstream and
distro kernels, isn't it?

Thanks,

Bart.

^ permalink raw reply

* Re: [PATCH blktests] Fix _get_page_size()
From: Shin'ichiro Kawasaki @ 2026-06-20  1:26 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-block, osandov, kch
In-Reply-To: <x497bnvlxlc.fsf@segfault.usersys.redhat.com>

CC+: Omar, Chaitanya,

On Jun 18, 2026 / 10:41, Jeff Moyer wrote:
> There is no CONFIG_PAGE_SHIFT stored in /boot/config-`uname -r` on RHEL
> systems (maybe all systems?).  As a result, tests that make use of
> _get_page_size() were doing the wrong thing.  For example, throtl/002
> used it to calculate I/O sizes for direct IO.  Those sizes ended up not
> being a multiple of the logical block size, and hence throtl/002 was
> failing.
> 
> Fixes: 8eca9fa ("common/rc, scsi/011, zbd/010: introduce _page_size_equals() helper")
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

Thanks for finding this out. When I applied the commit in the Fixes tag, I had
checked my Fedora system's /boot/config-* files and had found CONFIG_PAGE_SHIFT
defined. I wanted to avoid the dependency to getconf, and chose the way to rely
on CONFIG_PAGE_SHIFT. But that is not an option for other distros. Today I
checked my Debian system, and CONFIG_PAGE_SHIFT was not defined either. Now I
see that we should not use CONFIG_PAGE_SHIFT.

> 
> diff --git a/common/rc b/common/rc
> index 20f7c7a..d60a125 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -562,13 +562,8 @@ _have_systemctl_unit() {
>  	return 0
>  }
>  
> -# Get system page size from kernel conguration
>  _get_page_size() {
> -	local page_shift
> -
> -	page_shift=$(_get_kernel_option PAGE_SHIFT)
> -
> -	echo $((1<< page_shift))
> +	getconf PAGE_SIZE
>  }
>  
>  # Check if the system page size matches the required size (in bytes).
> 

The patch above should work, but it creates a new dependeny on the tool getconf.
There are 6 test cases that require page size and getconf. Then, we need to
check that getconf command is available for the test cases.

  $ git grep _page_size
  common/rc:_get_page_size() {
  common/rc:# Example: _page_size_equals 4096
  common/rc:_page_size_equals() {
  common/rc:      current_size=$(_get_page_size)
  tests/scsi/011: _page_size_equals 4096
  tests/throtl/002:       page_size=$(_get_page_size)
  tests/throtl/003:       page_size=$(_get_page_size)
  tests/throtl/007:       page_size=$(_get_page_size)
  tests/zbd/010:  _page_size_equals 4096
  tests/zbd/014:  page_size=$(_get_page_size)

Havind said that, I think now Linux eco-system is in the phase to add variety
of page sizes, and I expect more test cases that depend on page sizes will be
added in near future. So, this could be the good timing to add getconf to the
blktests minimal requirement list described in README.md. This means that
blktests users will need to install glibc-common package for Fedora, or
libc-bin package for Debian.

This is a rather fundamental change, so I would like to ask opinions from
other blktests users, especially Omar and Chaitanya. What do you think about
the idea to add getconf to the requirement list?

^ permalink raw reply

* Re: [PATCH v4] loop: Fix NULL pointer dereference in lo_rw_aio()
From: Tetsuo Handa @ 2026-06-19 14:33 UTC (permalink / raw)
  To: Al Viro
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, Damien Le Moal,
	Ming Lei, linux-block, LKML, Andrew Morton, Linus Torvalds,
	linux-btrfs, David Sterba, linux-fsdevel, Christian Brauner,
	Hillf Danton
In-Reply-To: <9f8b5ab0-efbc-4cf3-a1f8-b43377416946@I-love.SAKURA.ne.jp>

Sending this commit to linux.git will be the fastest way to identify who is issuing
I/O requests too late. Therefore, I want to get a conclusion on xfs/259 breakage.
Al, can you get the same result?

On 2026/06/13 20:00, Tetsuo Handa wrote:
> On 2026/06/10 2:50, Al Viro wrote:
>> Still breaks xfs/259, same as the version in next-20260605...
> 
> I installed xfstests-dev and reproduced a "umount: /home/test: target is busy." problem which Al Viro is
> experiencing with https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20260608&id=fb1d5846e99c8aa4ce8da7e6ee7643b01da25b8c .
> 
(...snipped...)
> 
> I initially suspected that the cause of "target is busy" error is that fput() from
> __loop_clr_fd() does not wait for completion before "losetup -d" completes. But a
> debug printk() patch shown below indicated a tendency:
> 
>   (a) __loop_clr_fd() is called by "udev-worker" rather than "losetup" when this problem happens
> 
>   (b) propagate_mount_busy()!=0 when do_umount() fails with -EBUSY
> 
> --------------------------------------------------------------------------------
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index c3b607a3ddc4..7408f314a1fa 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1763,6 +1763,8 @@ static void lo_release(struct gendisk *disk)
>  	mutex_unlock(&lo->lo_mutex);
>  
>  	if (need_clear) {
> +		printk("Flush: task=%s[%d] dev=loop%d state=%d\n",
> +		       current->comm, current->pid, lo->lo_number, lo->lo_state);
>  		/*
>  		 * Temporarily release disk->open_mutex in order to flush pending I/O
>  		 * requests before clearing the backing device.
> @@ -1813,6 +1815,8 @@ static void lo_release(struct gendisk *disk)
>  		mutex_lock(&lo->lo_disk->open_mutex);
>  		if (WARN_ON(data_race(READ_ONCE(lo->lo_state)) != Lo_rundown))
>  			return;
> +		printk("Teardown: task=%s[%d] dev=loop%d state=%d\n",
> +		       current->comm, current->pid, lo->lo_number, lo->lo_state);
>  		__loop_clr_fd(lo);
>  	}
>  }
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 09ab7fc72f86..9710460fb449 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1893,6 +1893,8 @@ static int do_umount(struct mount *mnt, int flags)
>  		 */
>  		lock_mount_hash();
>  		if (!list_empty(&mnt->mnt_mounts) || mnt_get_count(mnt) != 2) {
> +			printk("%s: task=%s[%d] !list_empty(&mnt->mnt_mounts)=%d mnt_get_count(mnt)=%d\n", __func__,
> +			       current->comm, current->pid, !list_empty(&mnt->mnt_mounts), mnt_get_count(mnt));
>  			unlock_mount_hash();
>  			return -EBUSY;
>  		}
> @@ -1960,6 +1962,9 @@ static int do_umount(struct mount *mnt, int flags)
>  		if (!propagate_mount_busy(mnt, 2)) {
>  			umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
>  			retval = 0;
> +		} else {
> +			printk("%s: task=%s[%d] propagate_mount_busy()!=0\n", __func__,
> +			       current->comm, current->pid);
>  		}
>  	}
>  out:
> --------------------------------------------------------------------------------


^ permalink raw reply

* Re: [PATCH v3] block: assign caller-specific lockdep class to disk->open_mutex
From: Andreas Hindborg @ 2026-06-19 13:16 UTC (permalink / raw)
  To: Tetsuo Handa, Christoph Hellwig, Bart Van Assche, Jens Axboe
  Cc: linux-block, LKML, Andrew Morton, Ming Lei, Damien Le Moal,
	Qu Wenruo, Hillf Danton, Miguel Ojeda
In-Reply-To: <87y0gaekyb.fsf@t14s.mail-host-address-is-not-set>

Andreas Hindborg <a.hindborg@kernel.org> writes:

> "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp> writes:
>
>> On 2026/06/19 18:49, Andreas Hindborg wrote:
>>>> My understanding is that we don't have infrastructure for lock class keys
>>>> that can be applied to
>>>>
>>>>   +struct gendisk_lkclass {
>>>>   +	struct lock_class_key bio_lkclass;
>>>>   +	struct lock_class_key open_mutex_lkclass;
>>>>   +};
>>>>
>>>>   -	static struct lock_class_key __key;
>>>>   +	static struct gendisk_lkclass __key;
>>>>
>>>> change. Alternative approach is welcomed if you have one.
>>>
>>> Sorry, I did not pay enough attention. I would suggest this approach:
>>
>> I'm OK with your module-specific lockdep class approach
>>
>>> @@ -164,14 +127,21 @@ pub fn build<T: Operations>(
>>>              lim.features = bindings::BLK_FEAT_ROTATIONAL;
>>>          }
>>>
>>> -        // SAFETY: `tagset.raw_tag_set()` points to a valid and initialized tag set
>>> +        let keys = KBox::pin_init(
>>> +            Opaque::ffi_init(|ptr: *mut bindings::gendisk_lkclass| {
>>> +                // SAFETY: `ptr` is valid for writes
>>> +                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).bio_lkclass) };
>>> +                // SAFETY: `ptr` is valid for writes
>>> +                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).open_mutex_lkclass) };
>>> +            }),
>>> +            GFP_KERNEL,
>>> +        )?;
>>> +
>>> +        // SAFETY:
>>> +        // - `tagset.raw_tag_set()` points to a valid and initialized tag set.
>>> +        // - We keep `keys` alive for the lifetime of the returned gendisk.
>>>          let gendisk = from_err_ptr(unsafe {
>>> -            bindings::__blk_mq_alloc_disk(
>>> -                tagset.raw_tag_set(),
>>> -                &mut lim,
>>> -                data,
>>> -                lkclass.as_ptr(),
>>> -            )
>>> +            bindings::__blk_mq_alloc_disk(tagset.raw_tag_set(), &mut lim, data, keys.get())
>>>          })?;
>>>
>>>          const TABLE: bindings::block_device_operations = bindings::block_device_operations {
>>
>> if we can assume that there is (up to) only one
>>
>>   let mut disk = gen_disk::GenDiskBuilder::new().capacity_sectors(4096).build(fmt!("myblk"), tagset, ())?;
>>
>> call for each rust module.
>
> With the approach I suggest here, we get a new set of keys for each
> invocation of `GenDiskBuilder::build`. No statics involved, keys are
> allocated dynamically.
>
> The current code makes one key for to be shared by _all_ callers, which
> is probably a bad idea.
>
>>
>> By the way, are you aware that rust-enabled linux-next build is recently failing
>> ( https://syzkaller.appspot.com/bug?extid=1f14a35d0c73d31555e4 ) ?
>
> I was not aware, thanks for pointing that out. I actually did build
> linux-next for both June 16 and June 18 and I did not see any issue. It
> looks like a tooling problem to me. I'll see if I can figure out what it
> is.

I think this is [1]. We saw it for lkp when they used bindgen 0.72.0
with clang 22 I think. Probably syzkaller needs to bump bindgen version
to 0.72.2.

Cc: Miguel Ojeda <ojeda@kernel.org>

Not sure who to Cc on syzkaller side.

Best regards,
Andreas Hindborg


[1] https://github.com/rust-lang/rust-bindgen/issues/3264


^ permalink raw reply

* Re: [PATCH v3] block: assign caller-specific lockdep class to disk->open_mutex
From: Andreas Hindborg @ 2026-06-19 13:09 UTC (permalink / raw)
  To: Tetsuo Handa, Christoph Hellwig, Bart Van Assche, Jens Axboe
  Cc: linux-block, LKML, Andrew Morton, Ming Lei, Damien Le Moal,
	Qu Wenruo, Hillf Danton, Miguel Ojeda
In-Reply-To: <4d405942-9932-48b0-bdc0-9744d48fe699@I-love.SAKURA.ne.jp>

"Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> On 2026/06/19 18:49, Andreas Hindborg wrote:
>>> My understanding is that we don't have infrastructure for lock class keys
>>> that can be applied to
>>>
>>>   +struct gendisk_lkclass {
>>>   +	struct lock_class_key bio_lkclass;
>>>   +	struct lock_class_key open_mutex_lkclass;
>>>   +};
>>>
>>>   -	static struct lock_class_key __key;
>>>   +	static struct gendisk_lkclass __key;
>>>
>>> change. Alternative approach is welcomed if you have one.
>>
>> Sorry, I did not pay enough attention. I would suggest this approach:
>
> I'm OK with your module-specific lockdep class approach
>
>> @@ -164,14 +127,21 @@ pub fn build<T: Operations>(
>>              lim.features = bindings::BLK_FEAT_ROTATIONAL;
>>          }
>>
>> -        // SAFETY: `tagset.raw_tag_set()` points to a valid and initialized tag set
>> +        let keys = KBox::pin_init(
>> +            Opaque::ffi_init(|ptr: *mut bindings::gendisk_lkclass| {
>> +                // SAFETY: `ptr` is valid for writes
>> +                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).bio_lkclass) };
>> +                // SAFETY: `ptr` is valid for writes
>> +                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).open_mutex_lkclass) };
>> +            }),
>> +            GFP_KERNEL,
>> +        )?;
>> +
>> +        // SAFETY:
>> +        // - `tagset.raw_tag_set()` points to a valid and initialized tag set.
>> +        // - We keep `keys` alive for the lifetime of the returned gendisk.
>>          let gendisk = from_err_ptr(unsafe {
>> -            bindings::__blk_mq_alloc_disk(
>> -                tagset.raw_tag_set(),
>> -                &mut lim,
>> -                data,
>> -                lkclass.as_ptr(),
>> -            )
>> +            bindings::__blk_mq_alloc_disk(tagset.raw_tag_set(), &mut lim, data, keys.get())
>>          })?;
>>
>>          const TABLE: bindings::block_device_operations = bindings::block_device_operations {
>
> if we can assume that there is (up to) only one
>
>   let mut disk = gen_disk::GenDiskBuilder::new().capacity_sectors(4096).build(fmt!("myblk"), tagset, ())?;
>
> call for each rust module.

With the approach I suggest here, we get a new set of keys for each
invocation of `GenDiskBuilder::build`. No statics involved, keys are
allocated dynamically.

The current code makes one key for to be shared by _all_ callers, which
is probably a bad idea.

>
> By the way, are you aware that rust-enabled linux-next build is recently failing
> ( https://syzkaller.appspot.com/bug?extid=1f14a35d0c73d31555e4 ) ?

I was not aware, thanks for pointing that out. I actually did build
linux-next for both June 16 and June 18 and I did not see any issue. It
looks like a tooling problem to me. I'll see if I can figure out what it
is.


Best regards,
Andreas Hindborg



^ permalink raw reply

* Re: [PATCH v3] block: assign caller-specific lockdep class to disk->open_mutex
From: Tetsuo Handa @ 2026-06-19 12:19 UTC (permalink / raw)
  To: Andreas Hindborg, Christoph Hellwig, Bart Van Assche, Jens Axboe
  Cc: linux-block, LKML, Andrew Morton, Ming Lei, Damien Le Moal,
	Qu Wenruo, Hillf Danton, Miguel Ojeda
In-Reply-To: <87a4sqg8qh.fsf@t14s.mail-host-address-is-not-set>

On 2026/06/19 18:49, Andreas Hindborg wrote:
>> My understanding is that we don't have infrastructure for lock class keys
>> that can be applied to
>>
>>   +struct gendisk_lkclass {
>>   +	struct lock_class_key bio_lkclass;
>>   +	struct lock_class_key open_mutex_lkclass;
>>   +};
>>
>>   -	static struct lock_class_key __key;
>>   +	static struct gendisk_lkclass __key;
>>
>> change. Alternative approach is welcomed if you have one.
> 
> Sorry, I did not pay enough attention. I would suggest this approach:

I'm OK with your module-specific lockdep class approach

> @@ -164,14 +127,21 @@ pub fn build<T: Operations>(
>              lim.features = bindings::BLK_FEAT_ROTATIONAL;
>          }
>  
> -        // SAFETY: `tagset.raw_tag_set()` points to a valid and initialized tag set
> +        let keys = KBox::pin_init(
> +            Opaque::ffi_init(|ptr: *mut bindings::gendisk_lkclass| {
> +                // SAFETY: `ptr` is valid for writes
> +                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).bio_lkclass) };
> +                // SAFETY: `ptr` is valid for writes
> +                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).open_mutex_lkclass) };
> +            }),
> +            GFP_KERNEL,
> +        )?;
> +
> +        // SAFETY:
> +        // - `tagset.raw_tag_set()` points to a valid and initialized tag set.
> +        // - We keep `keys` alive for the lifetime of the returned gendisk.
>          let gendisk = from_err_ptr(unsafe {
> -            bindings::__blk_mq_alloc_disk(
> -                tagset.raw_tag_set(),
> -                &mut lim,
> -                data,
> -                lkclass.as_ptr(),
> -            )
> +            bindings::__blk_mq_alloc_disk(tagset.raw_tag_set(), &mut lim, data, keys.get())
>          })?;
>  
>          const TABLE: bindings::block_device_operations = bindings::block_device_operations {

if we can assume that there is (up to) only one

  let mut disk = gen_disk::GenDiskBuilder::new().capacity_sectors(4096).build(fmt!("myblk"), tagset, ())?;

call for each rust module.

By the way, are you aware that rust-enabled linux-next build is recently failing
( https://syzkaller.appspot.com/bug?extid=1f14a35d0c73d31555e4 ) ?


^ permalink raw reply

* Re: [PATCH v3] block: assign caller-specific lockdep class to disk->open_mutex
From: Andreas Hindborg @ 2026-06-19  9:49 UTC (permalink / raw)
  To: Tetsuo Handa, Christoph Hellwig, Bart Van Assche, Jens Axboe
  Cc: linux-block, LKML, Andrew Morton, Ming Lei, Damien Le Moal,
	Qu Wenruo, Hillf Danton, Miguel Ojeda
In-Reply-To: <9ebc39a8-6025-40e3-a378-ba26bf8d73ae@I-love.SAKURA.ne.jp>

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> On 2026/06/05 16:54, Andreas Hindborg wrote:
>> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
>> 
>>> The block core currently allocates a single monolithic lockdep key for
>>> disk->open_mutex across all callers. This single key conflates locking
>>> hierarchies between independent block streams. For example, if a stacked
>>> driver like loop flushes its internal workqueues inside lo_release() while
>>> holding its own open_mutex, lockdep views this as a potential ABBA deadlock
>>> against the underlying storage stack, leading to numerous circular
>>> dependency splats.
>>>
>>> To structurally reduce false positives, this patch splits the global
>>> monolithic lock class into distinct, per-caller instances during disk
>>> allocation. This is done by replacing "struct lock_class_key" with
>>> "struct gendisk_lkclass", which contains two instances of
>>> "struct lock_class_key" for the legacy "(bio completion)" map and
>>> disk->open_mutex respectively.
>>>
>>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> 
>> For the Rust part, we have existing infrastructure for lock class keys
>> [1]. Please take a look how we generate lock class keys elsewhere [2].
>
> My understanding is that we don't have infrastructure for lock class keys
> that can be applied to
>
>   +struct gendisk_lkclass {
>   +	struct lock_class_key bio_lkclass;
>   +	struct lock_class_key open_mutex_lkclass;
>   +};
>
>   -	static struct lock_class_key __key;
>   +	static struct gendisk_lkclass __key;
>
> change. Alternative approach is welcomed if you have one.

Sorry, I did not pay enough attention. I would suggest this approach:

diff --git a/drivers/block/rnull/rnull.rs b/drivers/block/rnull/rnull.rs
index 21557a1ce866..13048cea8bb0 100644
--- a/drivers/block/rnull/rnull.rs
+++ b/drivers/block/rnull/rnull.rs
@@ -68,7 +68,7 @@ fn new(
             .logical_block_size(block_size)?
             .physical_block_size(block_size)?
             .rotational(rotational)
-            .build(fmt!("{}", name.to_str()?), tagset, queue_data, kernel::my_gendisk_lkclass!())
+            .build(fmt!("{}", name.to_str()?), tagset, queue_data)
     }
 }
 
diff --git a/rust/kernel/block/mq.rs b/rust/kernel/block/mq.rs
index 10f22b200567..1fd0d54dd549 100644
--- a/rust/kernel/block/mq.rs
+++ b/rust/kernel/block/mq.rs
@@ -88,7 +88,7 @@
 //!     Arc::pin_init(TagSet::new(1, 256, 1), flags::GFP_KERNEL)?;
 //! let mut disk = gen_disk::GenDiskBuilder::new()
 //!     .capacity_sectors(4096)
-//!     .build(fmt!("myblk"), tagset, (), kernel::my_gendisk_lkclass!())?;
+//!     .build(fmt!("myblk"), tagset, ())?;
 //!
 //! # Ok::<(), kernel::error::Error>(())
 //! ```
diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index 8c452e90fe9d..58f87a184407 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -24,6 +24,7 @@
     sync::Arc,
     types::{
         ForeignOwnable,
+        Opaque,
         ScopeGuard, //
     }, //
 };
@@ -49,43 +50,6 @@ fn default() -> Self {
     }
 }
 
-/// A wrapper type for safely passing "struct gendisk_lkclass" argument.
-///
-/// This type can only be instantiated via the [`my_gendisk_lkclass!`] macro.
-pub struct GenDiskLockClass(pub(crate) *mut bindings::gendisk_lkclass);
-
-impl GenDiskLockClass {
-    /// Retrieve the underlying raw pointer.
-    pub(crate) fn as_ptr(&self) -> *mut bindings::gendisk_lkclass {
-        self.0
-    }
-}
-
-#[doc(hidden)]
-pub mod __internal {
-    use super::*;
-
-    /// Internal constructor used ONLY by the `my_gendisk_lkclass!` macro.
-    ///
-    /// SAFETY: `ptr` must point to a valid static `gendisk_lkclass` instance.
-    pub const unsafe fn new_lock_class(ptr: *mut bindings::gendisk_lkclass) -> GenDiskLockClass {
-        GenDiskLockClass(ptr)
-    }
-}
-
-/// Helper macro to generate a unique caller-local static lock class struct
-#[macro_export]
-macro_rules! my_gendisk_lkclass {
-    () => {{
-        static mut LKCLASS: $crate::bindings::gendisk_lkclass = $crate::bindings::gendisk_lkclass {
-            bio_lkclass: const { unsafe { ::core::mem::zeroed() } },
-            open_mutex_lkclass: const { unsafe { ::core::mem::zeroed() } },
-        };
-
-        unsafe { $crate::block::mq::gen_disk::__internal::new_lock_class(&raw mut LKCLASS) }
-    }};
-}
-
 impl GenDiskBuilder {
     /// Create a new instance.
     pub fn new() -> Self {
@@ -148,7 +112,6 @@ pub fn build<T: Operations>(
         name: fmt::Arguments<'_>,
         tagset: Arc<TagSet<T>>,
         queue_data: T::QueueData,
-        lkclass: GenDiskLockClass,
     ) -> Result<GenDisk<T>> {
         let data = queue_data.into_foreign();
         let recover_data = ScopeGuard::new(|| {
@@ -164,14 +127,21 @@ pub fn build<T: Operations>(
             lim.features = bindings::BLK_FEAT_ROTATIONAL;
         }
 
-        // SAFETY: `tagset.raw_tag_set()` points to a valid and initialized tag set
+        let keys = KBox::pin_init(
+            Opaque::ffi_init(|ptr: *mut bindings::gendisk_lkclass| {
+                // SAFETY: `ptr` is valid for writes
+                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).bio_lkclass) };
+                // SAFETY: `ptr` is valid for writes
+                unsafe { bindings::lockdep_register_key(&raw mut (*ptr).open_mutex_lkclass) };
+            }),
+            GFP_KERNEL,
+        )?;
+
+        // SAFETY:
+        // - `tagset.raw_tag_set()` points to a valid and initialized tag set.
+        // - We keep `keys` alive for the lifetime of the returned gendisk.
         let gendisk = from_err_ptr(unsafe {
-            bindings::__blk_mq_alloc_disk(
-                tagset.raw_tag_set(),
-                &mut lim,
-                data,
-                lkclass.as_ptr(),
-            )
+            bindings::__blk_mq_alloc_disk(tagset.raw_tag_set(), &mut lim, data, keys.get())
         })?;
 
         const TABLE: bindings::block_device_operations = bindings::block_device_operations {
@@ -230,6 +200,7 @@ pub fn build<T: Operations>(
         Ok(GenDisk {
             _tagset: tagset,
             gendisk,
+            _lock_class_keys: keys,
         })
     }
 }
@@ -245,6 +216,7 @@ pub fn build<T: Operations>(
 pub struct GenDisk<T: Operations> {
     _tagset: Arc<TagSet<T>>,
     gendisk: *mut bindings::gendisk,
+    _lock_class_keys: Pin<KBox<Opaque<bindings::gendisk_lkclass>>>,
 }
 
 // SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc` to a
---


Best regards,
Andreas Hindborg




^ permalink raw reply related

* [PATCH v4] rust: configfs: add procedural macro for declaring configfs attributes
From: Malte Wechter @ 2026-06-19  9:10 UTC (permalink / raw)
  To: Andreas Hindborg, Breno Leitao, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jens Axboe, Luis Chamberlain,
	Petr Pavlu, Daniel Gomez, Sami Tolvanen, Aaron Tomlin
  Cc: linux-kernel, rust-for-linux, linux-block, linux-modules,
	Malte Wechter

Implement `configfs_attrs!` as a procedural macro using `syn`, this
improves readability and maintainability. Remove the old macro and
replace all uses with the new macro. Add the new macro implementation
file to MAINTAINERS.

Signed-off-by: Malte Wechter <maltewechter@gmail.com>
---
Changes in v4:
- Update link path to configfs_attr macro in configfs.rs
- Fix doc strings for configfs_attr in macros/lib.rs
- Fix doc strings for parse_ordered_fields in macros/helpers.rs
- Update title prefix to `rust: configfs:`
- Link to v3: https://lore.kernel.org/r/20260612-configfs-syn-v3-1-3292fbc5cc32@gmail.com

Changes in v3:
- Remove 'make_static_ident' function, make names for static variables simpler
- Move 'parse_ordered_fields' macro from module.rs into helpers
- Use 'parse_ordered_fields' macro for parsing instead of doing it ad-hoc
- Link to v2: https://lore.kernel.org/r/20260603-configfs-syn-v2-1-cb58489c2647@gmail.com

Changes in v2:
- Add a try_parse helper function to macros/helpers.rs
- Fix bug where 'child' configuration gets dropped if trailing comma is missing (sashiko)
- Link to v1: https://lore.kernel.org/r/20260520-configfs-syn-v1-1-6c5b80a9cef2@gmail.com
---
 MAINTAINERS                     |   1 +
 drivers/block/rnull/configfs.rs |   2 +-
 rust/kernel/configfs.rs         | 263 +---------------------------------------
 rust/macros/configfs_attrs.rs   | 135 +++++++++++++++++++++
 rust/macros/helpers.rs          | 139 +++++++++++++++++++++
 rust/macros/lib.rs              |  87 +++++++++++++
 rust/macros/module.rs           | 137 ---------------------
 samples/rust/rust_configfs.rs   |   2 +-
 8 files changed, 370 insertions(+), 396 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..45f7a1ec93b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6464,6 +6464,7 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux.git config
 F:	fs/configfs/
 F:	include/linux/configfs.h
 F:	rust/kernel/configfs.rs
+F:	rust/macros/configfs_attrs.rs
 F:	samples/configfs/
 F:	samples/rust/rust_configfs.rs
 
diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
index 7c2eb5c0b722..f28ec69d7984 100644
--- a/drivers/block/rnull/configfs.rs
+++ b/drivers/block/rnull/configfs.rs
@@ -4,8 +4,8 @@
 use kernel::{
     block::mq::gen_disk::{GenDisk, GenDiskBuilder},
     configfs::{self, AttributeOperations},
-    configfs_attrs,
     fmt::{self, Write as _},
+    macros::configfs_attrs,
     new_mutex,
     page::PAGE_SIZE,
     prelude::*,
diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
index 2339c6467325..a8995a418136 100644
--- a/rust/kernel/configfs.rs
+++ b/rust/kernel/configfs.rs
@@ -21,7 +21,7 @@
 //!
 //! ```ignore
 //! use kernel::alloc::flags;
-//! use kernel::configfs_attrs;
+//! use macros::configfs_attrs;
 //! use kernel::configfs;
 //! use kernel::new_mutex;
 //! use kernel::page::PAGE_SIZE;
@@ -240,7 +240,7 @@ unsafe fn container_of(group: *const bindings::config_group) -> *const Self {
 /// A configfs group.
 ///
 /// To add a subgroup to configfs, pass this type as `ctype` to
-/// [`crate::configfs_attrs`] when creating a group in [`GroupOperations::make_group`].
+/// [`macros::configfs_attrs`] when creating a group in [`GroupOperations::make_group`].
 #[pin_data]
 pub struct Group<Data> {
     #[pin]
@@ -637,7 +637,7 @@ pub const fn new(name: &'static CStr) -> Self {
 /// implement `HasGroup<Data>`. The trait must be implemented once for each
 /// attribute of the group. The constant type parameter `ID` maps the
 /// implementation to a specific `Attribute`. `ID` must be passed when declaring
-/// attributes via the [`kernel::configfs_attrs`] macro, to tie
+/// attributes via the [`macros::configfs_attrs`] macro, to tie
 /// `AttributeOperations` implementations to concrete named attributes.
 #[vtable]
 pub trait AttributeOperations<const ID: u64 = 0> {
@@ -669,13 +669,13 @@ fn store(_data: &Self::Data, _page: &[u8]) -> Result {
 /// This type is used to construct a new [`ItemType`]. It represents a list of
 /// [`Attribute`] that will appear in the directory representing a [`Group`].
 /// Users should not directly instantiate this type, rather they should use the
-/// [`kernel::configfs_attrs`] macro to declare a static set of attributes for a
+/// [`macros::configfs_attrs`] macro to declare a static set of attributes for a
 /// group.
 ///
 /// # Note
 ///
 /// Instances of this type are constructed statically at compile by the
-/// [`kernel::configfs_attrs`] macro.
+/// [`macros::configfs_attrs`] macro.
 #[repr(transparent)]
 pub struct AttributeList<const N: usize, Data>(
     /// Null terminated Array of pointers to [`Attribute`]. The type is [`c_void`]
@@ -724,7 +724,7 @@ impl<const N: usize, Data> AttributeList<N, Data> {
 /// [`Subsystem`].
 ///
 /// Users should not directly instantiate objects of this type. Rather, they
-/// should use the [`kernel::configfs_attrs`] macro to statically declare the
+/// should use the [`macros::configfs_attrs`] macro to statically declare the
 /// shape of a [`Group`] or [`Subsystem`].
 #[pin_data]
 pub struct ItemType<Container, Data> {
@@ -791,254 +791,3 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
         self.item_type.get()
     }
 }
-
-/// Define a list of configfs attributes statically.
-///
-/// Invoking the macro in the following manner:
-///
-/// ```ignore
-/// let item_type = configfs_attrs! {
-///     container: configfs::Subsystem<Configuration>,
-///     data: Configuration,
-///     child: Child,
-///     attributes: [
-///         message: 0,
-///         bar: 1,
-///     ],
-/// };
-/// ```
-///
-/// Expands the following output:
-///
-/// ```ignore
-/// let item_type = {
-///     static CONFIGURATION_MESSAGE_ATTR: kernel::configfs::Attribute<
-///         0,
-///         Configuration,
-///         Configuration,
-///     > = unsafe {
-///         kernel::configfs::Attribute::new({
-///             const S: &str = "message\u{0}";
-///             const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
-///                 S.as_bytes()
-///             ) {
-///                 Ok(v) => v,
-///                 Err(_) => {
-///                     core::panicking::panic_fmt(core::const_format_args!(
-///                         "string contains interior NUL"
-///                     ));
-///                 }
-///             };
-///             C
-///         })
-///     };
-///
-///     static CONFIGURATION_BAR_ATTR: kernel::configfs::Attribute<
-///             1,
-///             Configuration,
-///             Configuration
-///     > = unsafe {
-///         kernel::configfs::Attribute::new({
-///             const S: &str = "bar\u{0}";
-///             const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
-///                 S.as_bytes()
-///             ) {
-///                 Ok(v) => v,
-///                 Err(_) => {
-///                     core::panicking::panic_fmt(core::const_format_args!(
-///                         "string contains interior NUL"
-///                     ));
-///                 }
-///             };
-///             C
-///         })
-///     };
-///
-///     const N: usize = (1usize + (1usize + 0usize)) + 1usize;
-///
-///     static CONFIGURATION_ATTRS: kernel::configfs::AttributeList<N, Configuration> =
-///         unsafe { kernel::configfs::AttributeList::new() };
-///
-///     {
-///         const N: usize = 0usize;
-///         unsafe { CONFIGURATION_ATTRS.add::<N, 0, _>(&CONFIGURATION_MESSAGE_ATTR) };
-///     }
-///
-///     {
-///         const N: usize = (1usize + 0usize);
-///         unsafe { CONFIGURATION_ATTRS.add::<N, 1, _>(&CONFIGURATION_BAR_ATTR) };
-///     }
-///
-///     static CONFIGURATION_TPE:
-///       kernel::configfs::ItemType<configfs::Subsystem<Configuration> ,Configuration>
-///         = kernel::configfs::ItemType::<
-///                 configfs::Subsystem<Configuration>,
-///                 Configuration
-///                 >::new_with_child_ctor::<N,Child>(
-///             &THIS_MODULE,
-///             &CONFIGURATION_ATTRS
-///         );
-///
-///     &CONFIGURATION_TPE
-/// }
-/// ```
-#[macro_export]
-macro_rules! configfs_attrs {
-    (
-        container: $container:ty,
-        data: $data:ty,
-        attributes: [
-            $($name:ident: $attr:literal),* $(,)?
-        ] $(,)?
-    ) => {
-        $crate::configfs_attrs!(
-            count:
-            @container($container),
-            @data($data),
-            @child(),
-            @no_child(x),
-            @attrs($($name $attr)*),
-            @eat($($name $attr,)*),
-            @assign(),
-            @cnt(0usize),
-        )
-    };
-    (
-        container: $container:ty,
-        data: $data:ty,
-        child: $child:ty,
-        attributes: [
-            $($name:ident: $attr:literal),* $(,)?
-        ] $(,)?
-    ) => {
-        $crate::configfs_attrs!(
-            count:
-            @container($container),
-            @data($data),
-            @child($child),
-            @no_child(),
-            @attrs($($name $attr)*),
-            @eat($($name $attr,)*),
-            @assign(),
-            @cnt(0usize),
-        )
-    };
-    (count:
-     @container($container:ty),
-     @data($data:ty),
-     @child($($child:ty)?),
-     @no_child($($no_child:ident)?),
-     @attrs($($aname:ident $aattr:literal)*),
-     @eat($name:ident $attr:literal, $($rname:ident $rattr:literal,)*),
-     @assign($($assign:block)*),
-     @cnt($cnt:expr),
-    ) => {
-        $crate::configfs_attrs!(
-            count:
-            @container($container),
-            @data($data),
-            @child($($child)?),
-            @no_child($($no_child)?),
-            @attrs($($aname $aattr)*),
-            @eat($($rname $rattr,)*),
-            @assign($($assign)* {
-                const N: usize = $cnt;
-                // The following macro text expands to a call to `Attribute::add`.
-
-                // SAFETY: By design of this macro, the name of the variable we
-                // invoke the `add` method on below, is not visible outside of
-                // the macro expansion. The macro does not operate concurrently
-                // on this variable, and thus we have exclusive access to the
-                // variable.
-                unsafe {
-                    $crate::macros::paste!(
-                        [< $data:upper _ATTRS >]
-                            .add::<N, $attr, _>(&[< $data:upper _ $name:upper _ATTR >])
-                    )
-                };
-            }),
-            @cnt(1usize + $cnt),
-        )
-    };
-    (count:
-     @container($container:ty),
-     @data($data:ty),
-     @child($($child:ty)?),
-     @no_child($($no_child:ident)?),
-     @attrs($($aname:ident $aattr:literal)*),
-     @eat(),
-     @assign($($assign:block)*),
-     @cnt($cnt:expr),
-    ) =>
-    {
-        $crate::configfs_attrs!(
-            final:
-            @container($container),
-            @data($data),
-            @child($($child)?),
-            @no_child($($no_child)?),
-            @attrs($($aname $aattr)*),
-            @assign($($assign)*),
-            @cnt($cnt),
-        )
-    };
-    (final:
-     @container($container:ty),
-     @data($data:ty),
-     @child($($child:ty)?),
-     @no_child($($no_child:ident)?),
-     @attrs($($name:ident $attr:literal)*),
-     @assign($($assign:block)*),
-     @cnt($cnt:expr),
-    ) =>
-    {
-        $crate::macros::paste!{
-            {
-                $(
-                    // SAFETY: We are expanding `configfs_attrs`.
-                    static [< $data:upper _ $name:upper _ATTR >]:
-                        $crate::configfs::Attribute<$attr, $data, $data> =
-                            unsafe {
-                                $crate::configfs::Attribute::new(
-                                    $crate::c_str!(::core::stringify!($name)),
-                                )
-                            };
-                )*
-
-
-                // We need space for a null terminator.
-                const N: usize = $cnt + 1usize;
-
-                // SAFETY: We are expanding `configfs_attrs`.
-                static [< $data:upper _ATTRS >]:
-                $crate::configfs::AttributeList<N, $data> =
-                    unsafe { $crate::configfs::AttributeList::new() };
-
-                $($assign)*
-
-                $(
-                    const [<$no_child:upper>]: bool = true;
-
-                    static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data>  =
-                        $crate::configfs::ItemType::<$container, $data>::new::<N>(
-                            &THIS_MODULE, &[<$ data:upper _ATTRS >]
-                        );
-                )?
-
-                $(
-                    static [< $data:upper _TPE >]:
-                        $crate::configfs::ItemType<$container, $data>  =
-                            $crate::configfs::ItemType::<$container, $data>::
-                            new_with_child_ctor::<N, $child>(
-                                &THIS_MODULE, &[<$ data:upper _ATTRS >]
-                            );
-                )?
-
-                & [< $data:upper _TPE >]
-            }
-        }
-    };
-
-}
-
-pub use crate::configfs_attrs;
diff --git a/rust/macros/configfs_attrs.rs b/rust/macros/configfs_attrs.rs
new file mode 100644
index 000000000000..81037bc38188
--- /dev/null
+++ b/rust/macros/configfs_attrs.rs
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use quote::{
+    format_ident,
+    quote, //
+};
+
+use syn::{
+    bracketed,
+    ext::IdentExt,
+    parse::{
+        Parse,
+        ParseStream, //
+    },
+    punctuated::Punctuated,
+    spanned::Spanned,
+    Error,
+    Ident,
+    LitInt,
+    Token,
+    Type, //
+};
+
+use crate::helpers::parse_ordered_fields;
+
+pub(crate) struct ConfigfsAttrs {
+    container: Type,
+    data: Type,
+    child: Option<Type>,
+    attributes: Vec<(Ident, LitInt)>,
+}
+
+fn parse_attribute_field(stream: ParseStream<'_>) -> syn::Result<(Ident, LitInt)> {
+    let id = stream.parse::<syn::Ident>()?;
+    let _colon = stream.parse::<Token![:]>()?;
+    let v = stream.parse::<LitInt>()?;
+    Ok((id, v))
+}
+
+fn parse_attributes(stream: ParseStream<'_>) -> syn::Result<Vec<(Ident, LitInt)>> {
+    let attr_stream;
+    let _bracket = bracketed!(attr_stream in stream);
+    let attributes = Punctuated::<(Ident, LitInt), Token![,]>::parse_terminated_with(
+        &attr_stream,
+        parse_attribute_field,
+    )?;
+    Ok(attributes.into_iter().collect::<Vec<_>>())
+}
+
+impl Parse for ConfigfsAttrs {
+    fn parse(input: ParseStream<'_>) -> syn::Result<Self> {
+        parse_ordered_fields!(
+            from input;
+            container [required] => (input.parse::<Type>())?,
+            data [required] => (input.parse::<Type>())?,
+            child => (input.parse::<Type>())?,
+            attributes [required] => parse_attributes(input)?,
+        );
+
+        Ok(ConfigfsAttrs {
+            container,
+            data,
+            child,
+            attributes,
+        })
+    }
+}
+
+pub(crate) fn configfs_attrs(cfs_attrs: ConfigfsAttrs) -> proc_macro2::TokenStream {
+    let (container_ty, data_ty) = (&cfs_attrs.container, &cfs_attrs.data);
+
+    let data_tp_ident = Ident::new("DATA_TPE", cfs_attrs.data.span());
+    let data_attr_ident = Ident::new("DATA_ATTR_LIST", cfs_attrs.data.span());
+
+    let n = cfs_attrs.attributes.len() + 1;
+
+    let attr_list = quote! {
+        static #data_attr_ident: kernel::configfs::AttributeList<#n, #data_ty> =
+            // SAFETY: We are expanding `configfs_attrs`.
+            unsafe { kernel::configfs::AttributeList::new() };
+    };
+
+    let mut attrs = Vec::new();
+    for (attr_idx, (name, id)) in cfs_attrs.attributes.iter().enumerate() {
+        let name_with_attr = format_ident!("{}_ATTR_{}", name.to_string().to_uppercase(), attr_idx);
+
+        let id: u64 = match id.base10_parse::<u64>() {
+            Ok(v) => v,
+            Err(_) => {
+                return syn::Error::new(id.span(), "Could not parse attribute ID as a u64")
+                    .to_compile_error();
+            }
+        };
+
+        attrs.push(quote! {
+        static #name_with_attr: kernel::configfs::Attribute<#id, #data_ty, #data_ty> =
+            // SAFETY: We are expanding `configfs_attrs`.
+            unsafe {
+              kernel::configfs::Attribute::new(kernel::c_str!(::core::stringify!(#name)))
+            };
+
+          // SAFETY: By design of this macro, the name of the variable we
+          // invoke the `add` method on below, is not visible outside of
+          // the macro expansion. The macro does not operate concurrently
+          // on this variable, and thus we have exclusive access to the
+          // variable.
+          unsafe { #data_attr_ident.add::<#attr_idx, #id, _>(&#name_with_attr) }
+        });
+    }
+
+    let has_child_code = if let Some(child) = cfs_attrs.child {
+        quote! { new_with_child_ctor::<#n, #child>}
+    } else {
+        quote! { new::<#n> }
+    };
+
+    let data_type = quote! {
+        {
+            static #data_tp_ident:
+            kernel::configfs::ItemType<#container_ty, #data_ty> =
+                kernel::configfs::ItemType::<#container_ty, #data_ty>::#has_child_code(
+                    &THIS_MODULE, &#data_attr_ident
+                );
+            &#data_tp_ident
+        }
+    };
+
+    quote! {
+        {
+            #attr_list
+            #(#attrs)*
+            #data_type
+        }
+    }
+}
diff --git a/rust/macros/helpers.rs b/rust/macros/helpers.rs
index d18fbf4daa0a..305dcbddf797 100644
--- a/rust/macros/helpers.rs
+++ b/rust/macros/helpers.rs
@@ -58,3 +58,142 @@ pub(crate) fn file() -> String {
 pub(crate) fn gather_cfg_attrs(attr: &[Attribute]) -> impl Iterator<Item = &Attribute> + '_ {
     attr.iter().filter(|a| a.path().is_ident("cfg"))
 }
+
+/// Parse fields that are required to use a specific order.
+///
+/// As fields must follow a specific order, we *could* just parse fields one by one by peeking.
+/// However the error message generated when implementing that way is not very friendly.
+///
+/// So instead we parse fields in an arbitrary order, but only enforce the ordering after parsing,
+/// and if the wrong order is used, the proper order is communicated to the user with error message.
+///
+/// Usage looks like this:
+/// ```ignore
+/// parse_ordered_fields! {
+///     from input;
+///
+///     // This will extract `foo: <field>` into a variable named `foo`.
+///     // The variable will have type `Option<_>`.
+///     foo => <expression that parses the field>,
+///
+///     // If you need the variable name to be different than the key name.
+///     // This extracts `baz: <field>` into a variable named `bar`.
+///     // You might want this if `baz` is a keyword.
+///     baz as bar => <expression that parse the field>,
+///
+///     // You can mark a key as required, and the variable will no longer be `Option`.
+///     // foobar will be of type `Expr` instead of `Option<Expr>`.
+///     foobar [required] => input.parse::<Expr>()?,
+/// }
+/// ```
+macro_rules! parse_ordered_fields {
+    (@gen
+        [$input:expr]
+        [$([$name:ident; $key:ident; $parser:expr])*]
+        [$([$req_name:ident; $req_key:ident])*]
+    ) => {
+        $(let mut $name = None;)*
+
+        const EXPECTED_KEYS: &[&str] = &[$(stringify!($key),)*];
+        const REQUIRED_KEYS: &[&str] = &[$(stringify!($req_key),)*];
+
+        let span = $input.span();
+        let mut seen_keys = Vec::new();
+
+        while !$input.is_empty() {
+            let key = $input.call(Ident::parse_any)?;
+
+            if seen_keys.contains(&key) {
+                Err(Error::new_spanned(
+                    &key,
+                    format!(r#"duplicated key "{key}". Keys can only be specified once."#),
+                ))?
+            }
+
+            $input.parse::<Token![:]>()?;
+
+            match &*key.to_string() {
+                $(
+                    stringify!($key) => $name = Some($parser),
+                )*
+                _ => {
+                    Err(Error::new_spanned(
+                        &key,
+                        format!(r#"unknown key "{key}". Valid keys are: {EXPECTED_KEYS:?}."#),
+                    ))?
+                }
+            }
+
+            $input.parse::<Token![,]>()?;
+            seen_keys.push(key);
+        }
+
+        for key in REQUIRED_KEYS {
+            if !seen_keys.iter().any(|e| e == key) {
+                Err(Error::new(span, format!(r#"missing required key "{key}""#)))?
+            }
+        }
+
+        let mut ordered_keys: Vec<&str> = Vec::new();
+        for key in EXPECTED_KEYS {
+            if seen_keys.iter().any(|e| e == key) {
+                ordered_keys.push(key);
+            }
+        }
+
+        if seen_keys != ordered_keys {
+            Err(Error::new(
+                span,
+                format!(r#"keys are not ordered as expected. Order them like: {ordered_keys:?}."#),
+            ))?
+        }
+
+        $(let $req_name = $req_name.expect("required field");)*
+    };
+
+    // Handle required fields.
+    (@gen
+        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+        $key:ident as $name:ident [required] => $parser:expr,
+        $($rest:tt)*
+    ) => {
+        parse_ordered_fields!(
+            @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)* [$name; $key]] $($rest)*
+        )
+    };
+    (@gen
+        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+        $name:ident [required] => $parser:expr,
+        $($rest:tt)*
+    ) => {
+        parse_ordered_fields!(
+            @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)* [$name; $name]] $($rest)*
+        )
+    };
+
+    // Handle optional fields.
+    (@gen
+        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+        $key:ident as $name:ident => $parser:expr,
+        $($rest:tt)*
+    ) => {
+        parse_ordered_fields!(
+            @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)*] $($rest)*
+        )
+    };
+    (@gen
+        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
+        $name:ident => $parser:expr,
+        $($rest:tt)*
+    ) => {
+        parse_ordered_fields!(
+            @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)*] $($rest)*
+        )
+    };
+
+    (from $input:expr; $($tok:tt)*) => {
+        parse_ordered_fields!(@gen [$input] [] [] $($tok)*)
+    }
+}
+
+pub(crate) use parse_ordered_fields;
diff --git a/rust/macros/lib.rs b/rust/macros/lib.rs
index 2cfd59e0f9e7..ebb41e80ecc7 100644
--- a/rust/macros/lib.rs
+++ b/rust/macros/lib.rs
@@ -15,6 +15,8 @@
 #![cfg_attr(not(CONFIG_RUSTC_HAS_SPAN_FILE), feature(proc_macro_span))]
 
 mod concat_idents;
+#[cfg(CONFIG_CONFIGFS_FS)]
+mod configfs_attrs;
 mod export;
 mod fmt;
 mod helpers;
@@ -489,3 +491,88 @@ pub fn kunit_tests(attr: TokenStream, input: TokenStream) -> TokenStream {
         .unwrap_or_else(|e| e.into_compile_error())
         .into()
 }
+
+/// Define a list of configfs attributes statically.
+///
+/// # Examples
+///
+/// ```ignore
+/// let item_type = configfs_attrs! {
+///     container: configfs::Subsystem<Configuration>,
+///     data: Configuration,
+///     child: Child,
+///     attributes: [
+///         message: 0,
+///         bar: 1,
+///     ],
+/// };
+/// ```
+///
+/// Expands the following output:
+///
+/// ```ignore
+/// let item_type = {
+///         static DATA_ATTR_LIST: kernel::configfs::AttributeList<
+///             3usize,
+///             Configuration,
+///         > = unsafe { kernel::configfs::AttributeList::new() };
+///         static MESSAGE_ATTR_0: kernel::configfs::Attribute<
+///             0u64,
+///             Configuration,
+///             Configuration,
+///         > = unsafe {
+///             kernel::configfs::Attribute::new({
+///                 const S: &str = "message\u{0}";
+///                 const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
+///                     S.as_bytes(),
+///                 ) {
+///                     Ok(v) => v,
+///                     Err(_) => {
+///                         ::core::panicking::panic_fmt(
+///                             format_args!("string contains interior NUL"),
+///                         );
+///                     }
+///                 };
+///                 C
+///             })
+///         };
+///         unsafe { DATA_ATTR_LIST.add::<0usize, 0u64, _>(&MESSAGE_ATTR_0) }
+///         static BAR_ATTR_1: kernel::configfs::Attribute<
+///             1u64,
+///             Configuration,
+///             Configuration,
+///         > = unsafe {
+///             kernel::configfs::Attribute::new({
+///                 const S: &str = "bar\u{0}";
+///                 const C: &kernel::str::CStr = match kernel::str::CStr::from_bytes_with_nul(
+///                     S.as_bytes(),
+///                 ) {
+///                     Ok(v) => v,
+///                     Err(_) => {
+///                         ::core::panicking::panic_fmt(
+///                             format_args!("string contains interior NUL"),
+///                         );
+///                     }
+///                 };
+///                 C
+///             })
+///         };
+///         unsafe { DATA_ATTR_LIST.add::<1usize, 1u64, _>(&BAR_ATTR_1) }
+///         {
+///             static DATA_TPE: kernel::configfs::ItemType<
+///                 Subsystem<Configuration>,
+///                 Configuration,
+///             > = kernel::configfs::ItemType::<
+///                 Subsystem<Configuration>,
+///                 Configuration,
+///             >::new_with_child_ctor::<3usize, Child>(&THIS_MODULE, &DATA_ATTR_LIST);
+///             &DATA_TPE
+///         }
+///     };
+/// ```
+#[cfg(CONFIG_CONFIGFS_FS)]
+#[proc_macro]
+pub fn configfs_attrs(input: TokenStream) -> TokenStream {
+    configfs_attrs::configfs_attrs(parse_macro_input!(input as configfs_attrs::ConfigfsAttrs))
+        .into()
+}
diff --git a/rust/macros/module.rs b/rust/macros/module.rs
index 06c18e207508..7ff6ad09b1a2 100644
--- a/rust/macros/module.rs
+++ b/rust/macros/module.rs
@@ -196,143 +196,6 @@ fn param_ops_path(param_type: &str) -> Path {
     }
 }
 
-/// Parse fields that are required to use a specific order.
-///
-/// As fields must follow a specific order, we *could* just parse fields one by one by peeking.
-/// However the error message generated when implementing that way is not very friendly.
-///
-/// So instead we parse fields in an arbitrary order, but only enforce the ordering after parsing,
-/// and if the wrong order is used, the proper order is communicated to the user with error message.
-///
-/// Usage looks like this:
-/// ```ignore
-/// parse_ordered_fields! {
-///     from input;
-///
-///     // This will extract "foo: <field>" into a variable named "foo".
-///     // The variable will have type `Option<_>`.
-///     foo => <expression that parses the field>,
-///
-///     // If you need the variable name to be different than the key name.
-///     // This extracts "baz: <field>" into a variable named "bar".
-///     // You might want this if "baz" is a keyword.
-///     baz as bar => <expression that parse the field>,
-///
-///     // You can mark a key as required, and the variable will no longer be `Option`.
-///     // foobar will be of type `Expr` instead of `Option<Expr>`.
-///     foobar [required] => input.parse::<Expr>()?,
-/// }
-/// ```
-macro_rules! parse_ordered_fields {
-    (@gen
-        [$input:expr]
-        [$([$name:ident; $key:ident; $parser:expr])*]
-        [$([$req_name:ident; $req_key:ident])*]
-    ) => {
-        $(let mut $name = None;)*
-
-        const EXPECTED_KEYS: &[&str] = &[$(stringify!($key),)*];
-        const REQUIRED_KEYS: &[&str] = &[$(stringify!($req_key),)*];
-
-        let span = $input.span();
-        let mut seen_keys = Vec::new();
-
-        while !$input.is_empty() {
-            let key = $input.call(Ident::parse_any)?;
-
-            if seen_keys.contains(&key) {
-                Err(Error::new_spanned(
-                    &key,
-                    format!(r#"duplicated key "{key}". Keys can only be specified once."#),
-                ))?
-            }
-
-            $input.parse::<Token![:]>()?;
-
-            match &*key.to_string() {
-                $(
-                    stringify!($key) => $name = Some($parser),
-                )*
-                _ => {
-                    Err(Error::new_spanned(
-                        &key,
-                        format!(r#"unknown key "{key}". Valid keys are: {EXPECTED_KEYS:?}."#),
-                    ))?
-                }
-            }
-
-            $input.parse::<Token![,]>()?;
-            seen_keys.push(key);
-        }
-
-        for key in REQUIRED_KEYS {
-            if !seen_keys.iter().any(|e| e == key) {
-                Err(Error::new(span, format!(r#"missing required key "{key}""#)))?
-            }
-        }
-
-        let mut ordered_keys: Vec<&str> = Vec::new();
-        for key in EXPECTED_KEYS {
-            if seen_keys.iter().any(|e| e == key) {
-                ordered_keys.push(key);
-            }
-        }
-
-        if seen_keys != ordered_keys {
-            Err(Error::new(
-                span,
-                format!(r#"keys are not ordered as expected. Order them like: {ordered_keys:?}."#),
-            ))?
-        }
-
-        $(let $req_name = $req_name.expect("required field");)*
-    };
-
-    // Handle required fields.
-    (@gen
-        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
-        $key:ident as $name:ident [required] => $parser:expr,
-        $($rest:tt)*
-    ) => {
-        parse_ordered_fields!(
-            @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)* [$name; $key]] $($rest)*
-        )
-    };
-    (@gen
-        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
-        $name:ident [required] => $parser:expr,
-        $($rest:tt)*
-    ) => {
-        parse_ordered_fields!(
-            @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)* [$name; $name]] $($rest)*
-        )
-    };
-
-    // Handle optional fields.
-    (@gen
-        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
-        $key:ident as $name:ident => $parser:expr,
-        $($rest:tt)*
-    ) => {
-        parse_ordered_fields!(
-            @gen [$input] [$($tok)* [$name; $key; $parser]] [$($req)*] $($rest)*
-        )
-    };
-    (@gen
-        [$input:expr] [$($tok:tt)*] [$($req:tt)*]
-        $name:ident => $parser:expr,
-        $($rest:tt)*
-    ) => {
-        parse_ordered_fields!(
-            @gen [$input] [$($tok)* [$name; $name; $parser]] [$($req)*] $($rest)*
-        )
-    };
-
-    (from $input:expr; $($tok:tt)*) => {
-        parse_ordered_fields!(@gen [$input] [] [] $($tok)*)
-    }
-}
-
 struct Parameter {
     name: Ident,
     ptype: Ident,
diff --git a/samples/rust/rust_configfs.rs b/samples/rust/rust_configfs.rs
index a1bd9db6010d..876462f7789d 100644
--- a/samples/rust/rust_configfs.rs
+++ b/samples/rust/rust_configfs.rs
@@ -4,7 +4,7 @@
 
 use kernel::alloc::flags;
 use kernel::configfs;
-use kernel::configfs::configfs_attrs;
+use kernel::macros::configfs_attrs;
 use kernel::new_mutex;
 use kernel::page::PAGE_SIZE;
 use kernel::prelude::*;

---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260417-configfs-syn-191e07130027

Best regards,
-- 
Malte Wechter <maltewechter@gmail.com>


^ permalink raw reply related

* Re: WARNING: at floppy_interrupt, CPU: swapper/NUM/NUM
From: Denis Efremov (Oracle) @ 2026-06-19  6:43 UTC (permalink / raw)
  To: sanan.hasanou, axboe, linux-block, linux-kernel; +Cc: syzkaller, contact
In-Reply-To: <6a34707b.25ac79d9.2b1a46.0a67@mx.google.com>

Hello,

Thank you for the report. This is a known warning that only happens in a virtualized
environment. You may want to add this piece of a config to your modified syzkaller
dashboard/config/linux/bits/unmaintained.yml

Thanks,
Denis

On 19/06/2026 02:26, sanan.hasanou@gmail.com wrote:
> Good day, dear maintainers,
> 
> We found a bug using a modified version of syzkaller.
> 
> Kernel Branch: 7.0-rc1
> Kernel Config: <https://drive.google.com/open?id=173DLEAEPKPhhR1TcqofdnkLpdoK7PMFl>
> Unfortunately, we don't have any reproducer for this bug yet.
> Thank you!
> 
> Best regards,
> Sanan Hasanov
> 
> ------------[ cut here ]------------
> WARNING: at schedule_bh drivers/block/floppy.c:1000 [inline], CPU#0: swapper/0/1
> WARNING: at floppy_interrupt+0x51b/0x560 drivers/block/floppy.c:1766, CPU#0: swapper/0/1
> Modules linked in:
> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc1 #1 PREEMPT(full) 
> Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:schedule_bh drivers/block/floppy.c:1000 [inline]
> RIP: 0010:floppy_interrupt+0x51b/0x560 drivers/block/floppy.c:1766
> Code: 35 3a c8 54 0c 48 c7 c7 80 fa 4b 8c 48 c7 c2 c0 f7 4b 8c 48 c7 c1 40 f9 4b 8c e8 a0 4a 3b fb e9 af fe ff ff e8 66 d9 d5 fb 90 <0f> 0b 90 e9 e8 fc ff ff 44 89 f9 80 e1 07 38 c1 0f 8c 27 fc ff ff
> RSP: 0018:ffffc90000007af8 EFLAGS: 00010006
> RAX: ffffffff85ec786a RBX: ffffffff85ecf380 RCX: ffff888016aeba80
> RDX: 0000000000010100 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: 0000000000000000 R08: ffffffff8f3e2467 R09: 1ffffffff1e7c48c
> R10: dffffc0000000000 R11: fffffbfff1e7c48d R12: dffffc0000000000
> R13: 0000000000000000 R14: 0000000002000011 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff8880d98df000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff888012801000 CR3: 000000000e6ff000 CR4: 00000000000006f0
> Call Trace:
>  <IRQ>
>  __handle_irq_event_percpu+0x1d9/0x5d0 kernel/irq/handle.c:209
>  handle_irq_event_percpu kernel/irq/handle.c:246 [inline]
>  handle_irq_event+0x90/0x1e0 kernel/irq/handle.c:263
>  handle_edge_irq+0x239/0x9e0 kernel/irq/chip.c:855
>  generic_handle_irq_desc include/linux/irqdesc.h:186 [inline]
>  handle_irq arch/x86/kernel/irq.c:262 [inline]
>  call_irq_handler arch/x86/kernel/irq.c:286 [inline]
>  __common_interrupt+0xc5/0x170 arch/x86/kernel/irq.c:333
>  common_interrupt+0x4a/0xc0 arch/x86/kernel/irq.c:326
>  asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:688
> RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:188 [inline]
> RIP: 0010:_raw_spin_unlock_irq+0x19/0x30 kernel/locking/spinlock.c:202
> Code: 00 02 00 00 75 db eb da e8 74 c0 a8 f5 5b c3 66 90 f3 0f 1e fa 0f 1f 44 00 00 e8 f2 b4 12 f6 e8 4d 86 41 f6 fb bf 01 00 00 00 <e8> d2 2a 07 f6 65 8b 05 8b 59 88 06 85 c0 74 01 c3 e8 41 c0 a8 f5
> RSP: 0018:ffffc90000007d58 EFLAGS: 00000246
> RAX: 0000000000000001 RBX: ffffffff85358ab0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
> RBP: ffffc90000007ef8 R08: ffff88806ba2f683 R09: 1ffff1100d745ed0
> R10: dffffc0000000000 R11: ffffed100d745ed1 R12: ffff88801d085478
> R13: dffffc0000000000 R14: ffff88806ba2f680 R15: ffff88806ba2f698
>  expire_timers kernel/time/timer.c:1798 [inline]
>  __run_timers kernel/time/timer.c:2373 [inline]
>  __run_timer_base+0x700/0xa30 kernel/time/timer.c:2385
>  run_timer_base kernel/time/timer.c:2394 [inline]
>  run_timer_softirq+0xbc/0x190 kernel/time/timer.c:2404
>  handle_softirqs+0x1ed/0x700 kernel/softirq.c:622
>  __do_softirq kernel/softirq.c:656 [inline]
>  invoke_softirq kernel/softirq.c:496 [inline]
>  __irq_exit_rcu+0x8e/0x270 kernel/softirq.c:723
>  irq_exit_rcu+0xe/0x30 kernel/softirq.c:739
>  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
>  sysvec_apic_timer_interrupt+0x92/0xb0 arch/x86/kernel/apic/apic.c:1056
>  </IRQ>
>  <TASK>
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> RIP: 0010:clear_pages arch/x86/include/asm/page_64.h:103 [inline]
> RIP: 0010:clear_page arch/x86/include/asm/page_64.h:114 [inline]
> RIP: 0010:clear_highpage_kasan_tagged include/linux/highmem.h:344 [inline]
> RIP: 0010:kernel_init_pages mm/page_alloc.c:1265 [inline]
> RIP: 0010:post_alloc_hook+0x3ff/0x480 mm/page_alloc.c:1887
> Code: 03 49 c7 c7 20 2e 43 8e 49 c1 ef 03 eb 2f 48 8b 3d c6 74 21 0c 49 c1 e5 06 4c 29 ef 4c 01 e7 b9 00 10 00 00 31 c0 48 c1 e9 03 <f3> 48 ab 49 81 c4 00 10 00 00 49 ff ce 0f 84 31 fd ff ff 48 b8 00
> RSP: 0018:ffffc9000001eed8 EFLAGS: 00000216
> RAX: 0000000000000000 RBX: 1ffffffff1c865c6 RCX: 0000000000000200
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88801dc20000
> RBP: 0000000000000003 R08: ffffffff9049fd6f R09: 0000000000000000
> R10: ffffed1003b84000 R11: fffffbfff2093fae R12: fffa80001dc20000
> R13: fffa800000000000 R14: 0000000000000008 R15: 1ffffffff1c865c4
>  prep_new_page mm/page_alloc.c:1897 [inline]
>  get_page_from_freelist+0x2240/0x2330 mm/page_alloc.c:3962
>  __alloc_frozen_pages_noprof+0x20e/0x3d0 mm/page_alloc.c:5250
>  __alloc_pages_noprof+0xf/0x30 mm/page_alloc.c:5284
>  vm_area_alloc_pages mm/vmalloc.c:-1 [inline]
>  __vmalloc_area_node mm/vmalloc.c:3876 [inline]
>  __vmalloc_node_range_noprof+0x79f/0x1580 mm/vmalloc.c:4064
>  __vmalloc_node_noprof mm/vmalloc.c:4124 [inline]
>  vzalloc_noprof+0xdf/0x120 mm/vmalloc.c:4202
>  allocate_partitions block/partitions/core.c:101 [inline]
>  check_partition block/partitions/core.c:123 [inline]
>  blk_add_partitions block/partitions/core.c:590 [inline]
>  bdev_disk_changed+0x628/0x1810 block/partitions/core.c:694
>  blkdev_get_whole+0x37e/0x500 block/bdev.c:764
>  bdev_open+0x35b/0xdc0 block/bdev.c:973
>  bdev_file_open_by_dev+0x1c3/0x240 block/bdev.c:1075
>  disk_scan_partitions+0x1be/0x2c0 block/genhd.c:387
>  add_disk_final block/genhd.c:416 [inline]
>  add_disk_fwnode+0x31e/0x470 block/genhd.c:610
>  add_disk include/linux/blkdev.h:785 [inline]
>  brd_alloc+0x5de/0x810 drivers/block/brd.c:340
>  brd_init+0xc6/0x120 drivers/block/brd.c:420
>  do_one_initcall+0x1a1/0x530 init/main.c:1382
>  do_initcall_level+0x117/0x1a0 init/main.c:1444
>  do_initcalls+0xe1/0x150 init/main.c:1460
>  kernel_init_freeable+0x207/0x310 init/main.c:1692
>  kernel_init+0x22/0x1d0 init/main.c:1582
>  ret_from_fork+0x608/0xc40 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:245
>  </TASK>
> ----------------
> Code disassembly (best guess):
>    0:	00 02                	add    %al,(%rdx)
>    2:	00 00                	add    %al,(%rax)
>    4:	75 db                	jne    0xffffffe1
>    6:	eb da                	jmp    0xffffffe2
>    8:	e8 74 c0 a8 f5       	call   0xf5a8c081
>    d:	5b                   	pop    %rbx
>    e:	c3                   	ret
>    f:	66 90                	xchg   %ax,%ax
>   11:	f3 0f 1e fa          	endbr64
>   15:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   1a:	e8 f2 b4 12 f6       	call   0xf612b511
>   1f:	e8 4d 86 41 f6       	call   0xf6418671
>   24:	fb                   	sti
>   25:	bf 01 00 00 00       	mov    $0x1,%edi
> * 2a:	e8 d2 2a 07 f6       	call   0xf6072b01 <-- trapping instruction
>   2f:	65 8b 05 8b 59 88 06 	mov    %gs:0x688598b(%rip),%eax        # 0x68859c1
>   36:	85 c0                	test   %eax,%eax
>   38:	74 01                	je     0x3b
>   3a:	c3                   	ret
>   3b:	e8 41 c0 a8 f5       	call   0xf5a8c081
> 
> <<<<<<<<<<<<<<< tail report >>>>>>>>>>>>>>>


^ permalink raw reply

* Re: [PATCH v2 4/5] rust: block: mq: use vertical import style
From: Miguel Ojeda @ 2026-06-19  6:29 UTC (permalink / raw)
  To: Andreas Hindborg
  Cc: Jens Axboe, Alvin Sun, Arnd Bergmann, Greg Kroah-Hartman,
	Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Alice Ryhl, Trevor Gross, Danilo Krummrich,
	rust-for-linux, linux-block
In-Reply-To: <87v7bgf8f3.fsf@t14s.mail-host-address-is-not-set>

On Thu, Jun 18, 2026 at 12:32 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> Acked-by: Andreas Hindborg <a.hindborg@kernel.org>
>
> Cc: Jens Axboe <axboe@kernel.dk>

There is a v3 (may be exactly the same):

  https://lore.kernel.org/rust-for-linux/20260521-miscdev-use-format-v3-4-56240ca70d0c@linux.dev/

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH 1/2 blktests] src/miniublk: switch to ioctl-encoded ublk commands
From: Shin'ichiro Kawasaki @ 2026-06-19  3:26 UTC (permalink / raw)
  To: Sebastian Chlad; +Cc: linux-block, Sebastian Chlad
In-Reply-To: <20260617072516.6238-2-sebastian.chlad@suse.com>

Hi Sebastian,

Thanks for the patches. I agree that this direction is good: it's the better
shift away from the legacy interface.

One point I noticed is that src/miniublk.c can no longer be built with the
kernel headers of the LTS kernel version v6.1.y, probably (v5.15.y does not have
ublk and v6.6.y supports the new interface). This is a rather small window, and
may be acceptable but I wonder what you think about it

If we drop the miniublk build with v6.1.y kernel headers, it might be the better
to check before building miniublk. I quickly created a Makefile change [1] for
that purpose.

Also, please find a comment in line below.

On Jun 17, 2026 / 09:25, Sebastian Chlad wrote:
> Kernels built without CONFIG_BLKDEV_UBLK_LEGACY_OPCODES reject the
> legacy raw UBLK_CMD_* and UBLK_IO_* opcodes. Switch miniublk to use
> the ioctl-encoded UBLK_U_CMD_* and UBLK_U_IO_* variants defined in
> linux/ublk_cmd.h instead.
> 
> For IO commands, the ioctl-encoded opcode is used for submission while
> _IOC_NR() extracts the raw NR bits for build_user_data(), keeping the
> user_data tag encoding intact.
> 
> Signed-off-by: Sebastian Chlad <sebastian.chlad@suse.com>
> ---
>  src/miniublk.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/src/miniublk.c b/src/miniublk.c
> index f98f850..5a35ca7 100644
> --- a/src/miniublk.c
> +++ b/src/miniublk.c
[...]
> @@ -624,9 +624,9 @@ static int ublk_queue_io_cmd(struct ublk_queue *q,
>  		return 0;
>  
>  	if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
> -		cmd_op = UBLK_IO_COMMIT_AND_FETCH_REQ;
> -	else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
> -		cmd_op = UBLK_IO_FETCH_REQ;
> +		cmd_op = UBLK_U_IO_COMMIT_AND_FETCH_REQ;
> +	else
> +		cmd_op = UBLK_U_IO_FETCH_REQ;

The hunk above changes the "else if" part, is this intentional?


[1]

diff --git a/src/Makefile b/src/Makefile
index d8833bf..adfe3ef 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -8,6 +8,10 @@ HAVE_C_MACRO = $(shell if echo "$(H)include <$(1)>" |	\
 		$(CC) $(CFLAGS) -E - 2>&1 /dev/null | grep $(2) > /dev/null 2>&1; \
 		then echo 1;else echo 0; fi)
 
+HAVE_C_DEF = $(shell if echo -e "$(H)include <$(1)>\n#ifdef $(2)\nHAVE_$(2)\n#endif" | \
+		$(CC) $(CFLAGS) -E - 2>&1 /dev/null | grep HAVE_$(2) > /dev/null 2>&1; \
+		then echo 1;else echo 0; fi)
+
 C_TARGETS := \
 	dio-offsets \
 	loblksize \
@@ -27,6 +31,7 @@ C_UBLK_TARGETS := miniublk
 
 HAVE_LIBURING := $(call HAVE_C_MACRO,liburing.h,IORING_OP_URING_CMD)
 HAVE_UBLK_HEADER := $(call HAVE_C_HEADER,linux/ublk_cmd.h,1)
+HAVE_NEW_UBLK_INTF := $(call HAVE_C_DEF,linux/ublk_cmd.h,UBLK_U_CMD_START_DEV)
 
 CXX_TARGETS := \
 	discontiguous-io
@@ -37,8 +42,12 @@ SYZKALLER_TARGETS := \
 TARGETS := $(C_TARGETS) $(CXX_TARGETS) $(SYZKALLER_TARGETS)
 
 ifeq ($(HAVE_UBLK_HEADER), 1)
+ifeq ($(HAVE_NEW_UBLK_INTF), 1)
 C_URING_TARGETS += $(C_UBLK_TARGETS)
 else
+$(info Skip $(C_UBLK_TARGETS) build due to missing new ublk interface(v6.4+))
+endif
+else
 $(info Skip $(C_UBLK_TARGETS) build due to missing kernel header(v6.0+))
 endif
 


^ permalink raw reply related

* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
From: Vjaceslavs Klimovs @ 2026-06-19  2:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Keith Busch, regressions, Keith Busch, dm-devel, linux-block,
	mpatocka
In-Reply-To: <ajLSaiKXWIekcA97@gallifrey>

On Tue, Jun 16, 2026 at 08:05:53AM -0700, Keith Busch wrote:
> For DM_IO_BIO requests, do_region() built each destination bio by walking
> the source bio's biovec and re-adding the pages one at a time, tracking
> the remaining transfer in sectors.

Thanks Keith, this clears the dm-mirror hang for me.

I reproduced the original regression in a nested-QEMU harness: an LVM
legacy mirror LV (lvcreate --type mirror -m1, core log) on virtio-blk
PVs, used as the backing disk of a nested guest that issues misaligned
O_DIRECT reads via virtio-blk with cache=none,aio=native.

Without this series, on v7.1-rc7 (424280953322), the mirror read path
splats and then wedges:

  device-mapper: raid1: Mirror read failed from 252:0. Trying
alternative device.
  WARNING: block/bio.c:1044 at bio_add_page+0x108/0x200, CPU#1: kworker/1:1
  Workqueue: kmirrord do_mirror
  RIP: 0010:bio_add_page+0x108/0x200

kmirrord then stays stuck in do_mirror and the I/O never completes (the
guest hangs indefinitely).

With 1/2 and 2/2 applied on the same base, the WARN is gone, kmirrord no
longer wedges, the array is not degraded, and the misaligned O_DIRECT
read now completes normally instead of hanging. The --type raid1 (dm-raid
on md/raid1) variant of the same test is also clean.


On Wed, Jun 17, 2026 at 9:59 AM Dr. David Alan Gilbert
<linux@treblig.org> wrote:
>
> * Keith Busch (kbusch@kernel.org) wrote:
> > On Wed, Jun 17, 2026 at 04:44:35PM +0000, Dr. David Alan Gilbert wrote:
> > > (It's a bit scary you're having to go around quite
> > > a few places and make similar fixes; I assume there
> > > are others that do similar things).
> >
> > Yes, I understand that. I'm looking into a common way to validate this.
> > The md raid doesn't have this problem because they always call
> > bio_split_to_limits() first, but that's not an optimal thing to do for
> > dm raid in the normal read/write path, so perhaps a common checker needs
> > to happen generically in the block layer. Yeah, I know I removed the
> > previous higher level validation ... I'll try find something less costly
> > than what we had before.
>
> OK, thanks again
> (and to Thomas for gluing my query to those other two which got this
> moving!)
>
> Dave.
> --
>  -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply

* WARNING: at floppy_interrupt, CPU: swapper/NUM/NUM
From: sanan.hasanou @ 2026-06-18 22:26 UTC (permalink / raw)
  To: efremov, axboe, linux-block, linux-kernel; +Cc: syzkaller, contact

Good day, dear maintainers,

We found a bug using a modified version of syzkaller.

Kernel Branch: 7.0-rc1
Kernel Config: <https://drive.google.com/open?id=173DLEAEPKPhhR1TcqofdnkLpdoK7PMFl>
Unfortunately, we don't have any reproducer for this bug yet.
Thank you!

Best regards,
Sanan Hasanov

------------[ cut here ]------------
WARNING: at schedule_bh drivers/block/floppy.c:1000 [inline], CPU#0: swapper/0/1
WARNING: at floppy_interrupt+0x51b/0x560 drivers/block/floppy.c:1766, CPU#0: swapper/0/1
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc1 #1 PREEMPT(full) 
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:schedule_bh drivers/block/floppy.c:1000 [inline]
RIP: 0010:floppy_interrupt+0x51b/0x560 drivers/block/floppy.c:1766
Code: 35 3a c8 54 0c 48 c7 c7 80 fa 4b 8c 48 c7 c2 c0 f7 4b 8c 48 c7 c1 40 f9 4b 8c e8 a0 4a 3b fb e9 af fe ff ff e8 66 d9 d5 fb 90 <0f> 0b 90 e9 e8 fc ff ff 44 89 f9 80 e1 07 38 c1 0f 8c 27 fc ff ff
RSP: 0018:ffffc90000007af8 EFLAGS: 00010006
RAX: ffffffff85ec786a RBX: ffffffff85ecf380 RCX: ffff888016aeba80
RDX: 0000000000010100 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff8f3e2467 R09: 1ffffffff1e7c48c
R10: dffffc0000000000 R11: fffffbfff1e7c48d R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000002000011 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880d98df000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888012801000 CR3: 000000000e6ff000 CR4: 00000000000006f0
Call Trace:
 <IRQ>
 __handle_irq_event_percpu+0x1d9/0x5d0 kernel/irq/handle.c:209
 handle_irq_event_percpu kernel/irq/handle.c:246 [inline]
 handle_irq_event+0x90/0x1e0 kernel/irq/handle.c:263
 handle_edge_irq+0x239/0x9e0 kernel/irq/chip.c:855
 generic_handle_irq_desc include/linux/irqdesc.h:186 [inline]
 handle_irq arch/x86/kernel/irq.c:262 [inline]
 call_irq_handler arch/x86/kernel/irq.c:286 [inline]
 __common_interrupt+0xc5/0x170 arch/x86/kernel/irq.c:333
 common_interrupt+0x4a/0xc0 arch/x86/kernel/irq.c:326
 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:688
RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:188 [inline]
RIP: 0010:_raw_spin_unlock_irq+0x19/0x30 kernel/locking/spinlock.c:202
Code: 00 02 00 00 75 db eb da e8 74 c0 a8 f5 5b c3 66 90 f3 0f 1e fa 0f 1f 44 00 00 e8 f2 b4 12 f6 e8 4d 86 41 f6 fb bf 01 00 00 00 <e8> d2 2a 07 f6 65 8b 05 8b 59 88 06 85 c0 74 01 c3 e8 41 c0 a8 f5
RSP: 0018:ffffc90000007d58 EFLAGS: 00000246
RAX: 0000000000000001 RBX: ffffffff85358ab0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: ffffc90000007ef8 R08: ffff88806ba2f683 R09: 1ffff1100d745ed0
R10: dffffc0000000000 R11: ffffed100d745ed1 R12: ffff88801d085478
R13: dffffc0000000000 R14: ffff88806ba2f680 R15: ffff88806ba2f698
 expire_timers kernel/time/timer.c:1798 [inline]
 __run_timers kernel/time/timer.c:2373 [inline]
 __run_timer_base+0x700/0xa30 kernel/time/timer.c:2385
 run_timer_base kernel/time/timer.c:2394 [inline]
 run_timer_softirq+0xbc/0x190 kernel/time/timer.c:2404
 handle_softirqs+0x1ed/0x700 kernel/softirq.c:622
 __do_softirq kernel/softirq.c:656 [inline]
 invoke_softirq kernel/softirq.c:496 [inline]
 __irq_exit_rcu+0x8e/0x270 kernel/softirq.c:723
 irq_exit_rcu+0xe/0x30 kernel/softirq.c:739
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
 sysvec_apic_timer_interrupt+0x92/0xb0 arch/x86/kernel/apic/apic.c:1056
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:clear_pages arch/x86/include/asm/page_64.h:103 [inline]
RIP: 0010:clear_page arch/x86/include/asm/page_64.h:114 [inline]
RIP: 0010:clear_highpage_kasan_tagged include/linux/highmem.h:344 [inline]
RIP: 0010:kernel_init_pages mm/page_alloc.c:1265 [inline]
RIP: 0010:post_alloc_hook+0x3ff/0x480 mm/page_alloc.c:1887
Code: 03 49 c7 c7 20 2e 43 8e 49 c1 ef 03 eb 2f 48 8b 3d c6 74 21 0c 49 c1 e5 06 4c 29 ef 4c 01 e7 b9 00 10 00 00 31 c0 48 c1 e9 03 <f3> 48 ab 49 81 c4 00 10 00 00 49 ff ce 0f 84 31 fd ff ff 48 b8 00
RSP: 0018:ffffc9000001eed8 EFLAGS: 00000216
RAX: 0000000000000000 RBX: 1ffffffff1c865c6 RCX: 0000000000000200
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88801dc20000
RBP: 0000000000000003 R08: ffffffff9049fd6f R09: 0000000000000000
R10: ffffed1003b84000 R11: fffffbfff2093fae R12: fffa80001dc20000
R13: fffa800000000000 R14: 0000000000000008 R15: 1ffffffff1c865c4
 prep_new_page mm/page_alloc.c:1897 [inline]
 get_page_from_freelist+0x2240/0x2330 mm/page_alloc.c:3962
 __alloc_frozen_pages_noprof+0x20e/0x3d0 mm/page_alloc.c:5250
 __alloc_pages_noprof+0xf/0x30 mm/page_alloc.c:5284
 vm_area_alloc_pages mm/vmalloc.c:-1 [inline]
 __vmalloc_area_node mm/vmalloc.c:3876 [inline]
 __vmalloc_node_range_noprof+0x79f/0x1580 mm/vmalloc.c:4064
 __vmalloc_node_noprof mm/vmalloc.c:4124 [inline]
 vzalloc_noprof+0xdf/0x120 mm/vmalloc.c:4202
 allocate_partitions block/partitions/core.c:101 [inline]
 check_partition block/partitions/core.c:123 [inline]
 blk_add_partitions block/partitions/core.c:590 [inline]
 bdev_disk_changed+0x628/0x1810 block/partitions/core.c:694
 blkdev_get_whole+0x37e/0x500 block/bdev.c:764
 bdev_open+0x35b/0xdc0 block/bdev.c:973
 bdev_file_open_by_dev+0x1c3/0x240 block/bdev.c:1075
 disk_scan_partitions+0x1be/0x2c0 block/genhd.c:387
 add_disk_final block/genhd.c:416 [inline]
 add_disk_fwnode+0x31e/0x470 block/genhd.c:610
 add_disk include/linux/blkdev.h:785 [inline]
 brd_alloc+0x5de/0x810 drivers/block/brd.c:340
 brd_init+0xc6/0x120 drivers/block/brd.c:420
 do_one_initcall+0x1a1/0x530 init/main.c:1382
 do_initcall_level+0x117/0x1a0 init/main.c:1444
 do_initcalls+0xe1/0x150 init/main.c:1460
 kernel_init_freeable+0x207/0x310 init/main.c:1692
 kernel_init+0x22/0x1d0 init/main.c:1582
 ret_from_fork+0x608/0xc40 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:245
 </TASK>
----------------
Code disassembly (best guess):
   0:	00 02                	add    %al,(%rdx)
   2:	00 00                	add    %al,(%rax)
   4:	75 db                	jne    0xffffffe1
   6:	eb da                	jmp    0xffffffe2
   8:	e8 74 c0 a8 f5       	call   0xf5a8c081
   d:	5b                   	pop    %rbx
   e:	c3                   	ret
   f:	66 90                	xchg   %ax,%ax
  11:	f3 0f 1e fa          	endbr64
  15:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  1a:	e8 f2 b4 12 f6       	call   0xf612b511
  1f:	e8 4d 86 41 f6       	call   0xf6418671
  24:	fb                   	sti
  25:	bf 01 00 00 00       	mov    $0x1,%edi
* 2a:	e8 d2 2a 07 f6       	call   0xf6072b01 <-- trapping instruction
  2f:	65 8b 05 8b 59 88 06 	mov    %gs:0x688598b(%rip),%eax        # 0x68859c1
  36:	85 c0                	test   %eax,%eax
  38:	74 01                	je     0x3b
  3a:	c3                   	ret
  3b:	e8 41 c0 a8 f5       	call   0xf5a8c081

<<<<<<<<<<<<<<< tail report >>>>>>>>>>>>>>>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox