* [PATCH v2] block: serialize elevator changes for the same queue using a writer lock
@ 2026-06-23 1:32 Shin'ichiro Kawasaki
2026-06-23 5:29 ` Nilay Shroff
0 siblings, 1 reply; 2+ messages in thread
From: Shin'ichiro Kawasaki @ 2026-06-23 1:32 UTC (permalink / raw)
To: linux-block, Jens Axboe; +Cc: Ming Lei, Nilay Shroff, Shin'ichiro Kawasaki
When elevator_change() is called concurrently for the same queue, the
elevator_change_done() function runs concurrently as well. This function
adds or deletes kobjects for the debugfs entry of the queue. Then the
concurrent calls cause memory corruption of the kobjects and result in a
process hang. The core part of the elevator switch is protected by queue
freeze and q->elevator_lock. However, since the commit 559dc11143eb
("block: move elv_register[unregister]_queue out of elevator_lock"), the
elevator_change_done() is not serialized. Hence the memory corruption
and the hang.
The failures are observed when udev-worker writes to a sysfs
queue/scheduler attribute file while the blktests test case block/005
writes to the same attribute file. The failure also can be recreated by
running two processes that write to the same queue/scheduler file
concurrently. The failure is observed since another commit 370ac285f23a
("block: avoid cpu_hotplug_lock depedency on freeze_lock"). This commit
changed the behavior of queue freeze and it unveiled the failure.
Fix the failure by changing elv_iosched_store() to acquire
update_nr_hwq_lock as the writer lock instead of the reader lock. This
serializes the whole elevator switch steps, including the
elevator_change_done() call.
Fixes: 559dc11143eb ("block: move elv_register[unregister]_queue out of elevator_lock")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
I observed that the blktests test case block/005 hung on a specific
server hardware using a specific HDD as a block device. During the test
case run, the kernel reported KASAN null-ptr-deref and slab-use-after-
free errors. The failure happened when a sysfs queue/scheduler attribute
file is written concurrently. I reported the failure and shared a
candidate fix patch as RFC [1]. Based on the comments and discussion on
the RFC patch, I propose this v2 patch that avoids introducing a new
lock. My thanks go to Ming and Nilay for the discussion.
Please refer to [1] for details of the failure. Also, I created a
blktests test case that recreates the hang [2], which I used to test the
fix.
* Changes from RFC v1
- Instead of adding a new mutex to struct request_queue, replace the
reader lock on update_nr_hwq_lock with the writer lock in
elv_iosched_store().
[1] https://lore.kernel.org/linux-block/20260611074200.474676-1-shinichiro.kawasaki@wdc.com/
[2] https://github.com/kawasaki/blktests/commit/8e80b3ccc0bbbe3f209d00eacd138d020de97fc6
block/elevator.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/elevator.c b/block/elevator.c
index 3bcd37c2aa34..b03185a217ff 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
* update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
* kn->active -> update_nr_hwq_lock (via this sysfs write path)
*/
- if (!down_read_trylock(&set->update_nr_hwq_lock)) {
+ if (!down_write_trylock(&set->update_nr_hwq_lock)) {
ret = -EBUSY;
goto out;
}
@@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
} else {
ret = -ENOENT;
}
- up_read(&set->update_nr_hwq_lock);
+ up_write(&set->update_nr_hwq_lock);
out:
if (ctx.type)
--
2.54.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH v2] block: serialize elevator changes for the same queue using a writer lock
2026-06-23 1:32 [PATCH v2] block: serialize elevator changes for the same queue using a writer lock Shin'ichiro Kawasaki
@ 2026-06-23 5:29 ` Nilay Shroff
0 siblings, 0 replies; 2+ messages in thread
From: Nilay Shroff @ 2026-06-23 5:29 UTC (permalink / raw)
To: Shin'ichiro Kawasaki, linux-block, Jens Axboe; +Cc: Ming Lei
On 6/23/26 7:02 AM, Shin'ichiro Kawasaki wrote:
> When elevator_change() is called concurrently for the same queue, the
> elevator_change_done() function runs concurrently as well. This function
> adds or deletes kobjects for the debugfs entry of the queue. Then the
> concurrent calls cause memory corruption of the kobjects and result in a
> process hang. The core part of the elevator switch is protected by queue
> freeze and q->elevator_lock. However, since the commit 559dc11143eb
> ("block: move elv_register[unregister]_queue out of elevator_lock"), the
> elevator_change_done() is not serialized. Hence the memory corruption
> and the hang.
>
> The failures are observed when udev-worker writes to a sysfs
> queue/scheduler attribute file while the blktests test case block/005
> writes to the same attribute file. The failure also can be recreated by
> running two processes that write to the same queue/scheduler file
> concurrently. The failure is observed since another commit 370ac285f23a
> ("block: avoid cpu_hotplug_lock depedency on freeze_lock"). This commit
> changed the behavior of queue freeze and it unveiled the failure.
>
> Fix the failure by changing elv_iosched_store() to acquire
> update_nr_hwq_lock as the writer lock instead of the reader lock. This
> serializes the whole elevator switch steps, including the
> elevator_change_done() call.
>
> Fixes: 559dc11143eb ("block: move elv_register[unregister]_queue out of elevator_lock")
> Signed-off-by: Shin'ichiro Kawasaki<shinichiro.kawasaki@wdc.com>
Looks good to me.
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-23 5:29 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 1:32 [PATCH v2] block: serialize elevator changes for the same queue using a writer lock Shin'ichiro Kawasaki
2026-06-23 5:29 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox