From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Ming Lei <tom.leiming@gmail.com>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Nilay Shroff <nilay@linux.ibm.com>
Subject: Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
Date: Fri, 12 Jun 2026 18:47:50 +0900 [thread overview]
Message-ID: <aivMxPCd305WbBsk@shinmob> (raw)
In-Reply-To: <aiqaXfTqCLMu2DwF@fedora>
On Jun 11, 2026 / 06:22, Ming Lei wrote:
> Hi Shin'ichiro,
Hi Ming, thanks for the comments.
>
> On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
> > I observed that the blktests test case block/005 hangs on a specific
> > server hardware using a specific HDD as a block device. During the test
> > case run, the kernel reported a KASAN null-ptr-deref (and other memory
> > corruption symptoms) [2]. This failure looked sporadic and hardware-
> > dependent.
> >
> > From the kernel message, I noticed that udev-worker wrote to the
> > queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
> > The test case block/005 also wrote to the same sysfs attribute, which
>
> sysfs write is supposed to be serialized...
I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
I found elevator_change() call is guarded with the rw_semaphore
"set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
lock. This does not serialize the sysfs writes.
I tried the patch below to replace the reader lock with the writer lock. With
a quick trial, it looks working. The kernel message is no longer observed and
the new test case does not cause hangs. I will do further testing to confirm
that this change does not trigger other new lockdep WARNs. Assuming it does not
have such side effects, I hope this fix approach is acceptable. It doesn't add
the new lock, so I think it's the better.
diff --git a/block/elevator.c b/block/elevator.c
index 3bcd37c2aa34..b03185a217ff 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
* update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
* kn->active -> update_nr_hwq_lock (via this sysfs write path)
*/
- if (!down_read_trylock(&set->update_nr_hwq_lock)) {
+ if (!down_write_trylock(&set->update_nr_hwq_lock)) {
ret = -EBUSY;
goto out;
}
@@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
} else {
ret = -ENOENT;
}
- up_read(&set->update_nr_hwq_lock);
+ up_write(&set->update_nr_hwq_lock);
out:
if (ctx.type)
[...]
> blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
> fix could be check & avoid the null-ptr-deref.
Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
free is also observed [3]. Then I'm guessing adding null checks may not be
enough.
> Adding new lock should be the last straw usually, especially this one is
> depended by queue freeze.
Got it, thanks.
[3] KASAN slab-use-after-free
[ 802.836569][ T3919] run blktests block/005 at 2026-05-11 10:42:39
[ 804.256901][ T3866] debugfs: 'sched' already exists in 'sdd'
[ 804.874743][ T3919] debugfs: 'sched' already exists in 'sdd'
[ 804.882124][ T3919] ==================================================================
[ 804.882154][ T3866] debugfs: 'sched' already exists in 'sdd'
[ 804.890039][ T3919] BUG: KASAN: slab-use-after-free in elevator_change_done+0x304/0x610
[ 804.890053][ T3919] Write of size 8 at addr ffff8881273e08e0 by task check/3919
[ 804.890061][ T3919]
[ 804.890069][ T3919] CPU: 4 UID: 0 PID: 3919 Comm: check Not tainted 7.1.0-rc2-kts+ #1 PREEMPT(lazy)
[ 804.890080][ T3919] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
[ 804.890086][ T3919] Call Trace:
[ 804.890092][ T3919] <TASK>
[ 804.890098][ T3919] dump_stack_lvl+0x6e/0xa0
[ 804.890118][ T3919] print_address_description.constprop.0+0x70/0x300
[ 804.890135][ T3919] ? elevator_change_done+0x304/0x610
[ 804.890145][ T3919] print_report+0xfc/0x1ff
[ 804.890154][ T3919] ? __virt_addr_valid+0x1d1/0x3f0
[ 804.890163][ T3919] ? elevator_change_done+0x304/0x610
[ 804.890168][ T3919] kasan_report+0xf6/0x1c0
[ 804.890176][ T3919] ? elevator_change_done+0x304/0x610
[ 804.890185][ T3919] kasan_check_range+0x125/0x200
[ 804.890192][ T3919] elevator_change_done+0x304/0x610
[ 804.890198][ T3919] ? sysfs_file_ops+0x70/0x140
[ 804.890206][ T3919] ? __pfx_elevator_change_done+0x10/0x10
[ 804.890213][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890220][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890225][ T3919] elevator_change+0x283/0x4f0
[ 804.890233][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890239][ T3919] elv_iosched_store+0x30c/0x3a0
[ 804.890246][ T3919] ? __pfx_elv_iosched_store+0x10/0x10
[ 804.890255][ T3919] ? lock_acquire.part.0+0xb8/0x230 10:42 [84/1747]
[ 804.890262][ T3919] ? kernfs_fop_write_iter+0x25b/0x5e0
[ 804.890268][ T3919] ? lock_acquire.part.0+0xb8/0x230
[ 804.890274][ T3919] ? lock_acquire+0x126/0x140
[ 804.890281][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890286][ T3919] queue_attr_store+0x23f/0x360
[ 804.890295][ T3919] ? __pfx_queue_attr_store+0x10/0x10
[ 804.890300][ T3919] ? __lock_acquire+0x55d/0xbd0
[ 804.890308][ T3919] ? lock_acquire.part.0+0xb8/0x230
[ 804.890314][ T3919] ? sysfs_file_kobj+0x1d/0x1b0
[ 804.890319][ T3919] ? find_held_lock+0x2b/0x80
[ 804.890326][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890334][ T3919] ? lock_release.part.0+0x1c/0x50
[ 804.890340][ T3919] ? sysfs_file_kobj+0xb9/0x1b0
[ 804.890345][ T3919] ? sysfs_kf_write+0x65/0x170
[ 804.890352][ T3919] ? __pfx_sysfs_kf_write+0x10/0x10
[ 804.890357][ T3919] kernfs_fop_write_iter+0x3da/0x5e0
[ 804.890363][ T3919] ? __pfx_kernfs_fop_write_iter+0x10/0x10
[ 804.890368][ T3919] vfs_write+0x524/0x1010
[ 804.890378][ T3919] ? __pfx_vfs_write+0x10/0x10
[ 804.890393][ T3919] ksys_write+0xff/0x200
[ 804.890401][ T3919] ? __pfx_ksys_write+0x10/0x10
[ 804.890408][ T3919] ? __pfx_pte_val+0x10/0x10
[ 804.890414][ T3919] ? folio_xchg_last_cpupid+0xc6/0x130
[ 804.890421][ T3919] do_syscall_64+0xf4/0x1550
[ 804.890429][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890437][ T3919] ? lock_release.part.0+0x1c/0x50
[ 804.890444][ T3919] ? rcu_read_unlock+0x1c/0x60
[ 804.890449][ T3919] ? wp_page_reuse+0x160/0x1e0
[ 804.890455][ T3919] ? do_wp_page+0x5db/0x10a0
[ 804.890465][ T3919] ? handle_pte_fault+0x54e/0x760
[ 804.890472][ T3919] ? __pfx_handle_pte_fault+0x10/0x10
[ 804.890479][ T3919] ? __pfx_pmd_val+0x10/0x10
[ 804.890485][ T3919] ? __handle_mm_fault+0xa02/0xef0
[ 804.890493][ T3919] ? __lock_acquire+0x55d/0xbd0
[ 804.890499][ T3919] ? __pfx_css_rstat_updated+0x10/0x10
[ 804.890509][ T3919] ? lock_acquire.part.0+0xb8/0x230
[ 804.890515][ T3919] ? count_memcg_events_mm.constprop.0+0x22/0x130
[ 804.890522][ T3919] ? find_held_lock+0x2b/0x80
[ 804.890528][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890536][ T3919] ? find_held_lock+0x2b/0x80
[ 804.890542][ T3919] ? __lock_release.isra.0+0x59/0x170
[ 804.890550][ T3919] ? do_user_addr_fault+0x811/0xed0
[ 804.890559][ T3919] ? do_syscall_64+0x34/0x1550
[ 804.890564][ T3919] ? lockdep_hardirqs_on_prepare.part.0+0x9b/0x140
[ 804.890570][ T3919] ? do_syscall_64+0x34/0x1550
[ 804.890575][ T3919] ? trace_hardirqs_on+0x19/0x1a0
[ 804.890584][ T3919] ? do_syscall_64+0xab/0x1550
[ 804.890590][ T3919] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 804.890596][ T3919] RIP: 0033:0x7ff08cbe3bbe
[ 804.890603][ T3919] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f
3 0f 1e fa
[ 804.890609][ T3919] RSP: 002b:00007ffc95718820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 804.890616][ T3919] RAX: ffffffffffffffda RBX: 00007ff08cd5f5c0 RCX: 00007ff08cbe3bbe
[ 804.890621][ T3919] RDX: 0000000000000006 RSI: 0000563340f2c390 RDI: 0000000000000001
[ 804.890624][ T3919] RBP: 00007ffc95718830 R08: 0000000000000000 R09: 0000000000000000
[ 804.890627][ T3919] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
[ 804.890630][ T3919] R13: 0000000000000006 R14: 0000563340f2c390 R15: 0000563340f96890
[ 804.890641][ T3919] </TASK>
[ 804.890643][ T3919]
[ 805.368835][ T3919] Allocated by task 3919:
[ 805.373543][ T3919] kasan_save_stack+0x30/0x50
[ 805.378559][ T3919] kasan_save_track+0x14/0x30
[ 805.383559][ T3919] __kasan_kmalloc+0x9a/0xb0
[ 805.388465][ T3919] elevator_alloc+0xc5/0x2b0
[ 805.393366][ T3919] blk_mq_init_sched+0xa6/0x5e0
[ 805.398554][ T3919] elevator_switch+0x18e/0x680
[ 805.403702][ T3919] elevator_change+0x2d8/0x4f0
[ 805.408802][ T3919] elv_iosched_store+0x30c/0x3a0
[ 805.414116][ T3919] queue_attr_store+0x23f/0x360
[ 805.419289][ T3919] kernfs_fop_write_iter+0x3da/0x5e0
[ 805.424938][ T3919] vfs_write+0x524/0x1010
[ 805.429600][ T3919] ksys_write+0xff/0x200
[ 805.434159][ T3919] do_syscall_64+0xf4/0x1550
[ 805.439064][ T3919] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 805.445273][ T3919]
[ 805.447927][ T3919] Freed by task 3866:
[ 805.452231][ T3919] kasan_save_stack+0x30/0x50
[ 805.457287][ T3919] kasan_save_track+0x14/0x30
[ 805.462282][ T3919] kasan_save_free_info+0x3b/0x70
[ 805.467645][ T3919] __kasan_slab_free+0x6b/0x90
[ 805.472736][ T3919] kfree+0x21c/0x620
[ 805.476953][ T3919] kobject_cleanup+0x105/0x3a0
[ 805.482039][ T3919] elevator_change_done+0x196/0x610
[ 805.487633][ T3919] elevator_change+0x283/0x4f0
[ 805.492730][ T3919] elv_iosched_store+0x30c/0x3a0
[ 805.497989][ T3919] queue_attr_store+0x23f/0x360
[ 805.503144][ T3919] kernfs_fop_write_iter+0x3da/0x5e0
[ 805.508747][ T3919] vfs_write+0x524/0x1010
[ 805.513381][ T3919] ksys_write+0xff/0x200
[ 805.517944][ T3919] do_syscall_64+0xf4/0x1550
[ 805.522862][ T3919] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 805.529118][ T3919]
[ 805.531858][ T3919] The buggy address belongs to the object at ffff8881273e0800
[ 805.531858][ T3919] which belongs to the cache kmalloc-rnd-13-1k of size 1024
[ 805.547392][ T3919] The buggy address is located 224 bytes inside of
[ 805.547392][ T3919] freed 1024-byte region [ffff8881273e0800, ffff8881273e0c00)
[ 805.562078][ T3919]
[ 805.564734][ T3919] The buggy address belongs to the physical page:
[ 805.571446][ T3919] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1273e0
[ 805.580609][ T3919] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 805.589411][ T3919] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ 805.597524][ T3919] page_type: f5(slab)
[ 805.601916][ T3919] raw: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[ 805.610881][ T3919] raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[ 805.619808][ T3919] head: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[ 805.628815][ T3919] head: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[ 805.637838][ T3919] head: 0017ffffc0000003 fffffffffffffe01 00000000ffffffff 00000000ffffffff
[ 805.646901][ T3919] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[ 805.655983][ T3919] page dumped because: kasan: bad access detected
[ 805.662913][ T3919]
[ 805.665657][ T3919] Memory state around the buggy address:
[ 805.671717][ T3919] ffff8881273e0780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 805.680194][ T3919] ffff8881273e0800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.688697][ T3919] >ffff8881273e0880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.697130][ T3919] ^
[ 805.704717][ T3919] ffff8881273e0900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.713179][ T3919] ffff8881273e0980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 805.721720][ T3919] ==================================================================
[ 805.730526][ T3919] Disabling lock debugging due to kernel taint
...
next prev parent reply other threads:[~2026-06-12 9:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 7:41 [PATCH RFC 0/1] block: fix concurrent elevator change failure Shin'ichiro Kawasaki
2026-06-11 7:42 ` [PATCH RFC 1/1] block: serialize whole elevator change steps for the same queue Shin'ichiro Kawasaki
2026-06-11 11:22 ` [PATCH RFC 0/1] block: fix concurrent elevator change failure Ming Lei
2026-06-12 9:47 ` Shin'ichiro Kawasaki [this message]
2026-06-12 11:06 ` Ming Lei
2026-06-12 11:45 ` Nilay Shroff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aivMxPCd305WbBsk@shinmob \
--to=shinichiro.kawasaki@wdc.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=nilay@linux.ibm.com \
--cc=tom.leiming@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox