Linux block layer
 help / color / mirror / Atom feed
From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Ming Lei <tom.leiming@gmail.com>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	 Nilay Shroff <nilay@linux.ibm.com>
Subject: Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
Date: Fri, 12 Jun 2026 18:47:50 +0900	[thread overview]
Message-ID: <aivMxPCd305WbBsk@shinmob> (raw)
In-Reply-To: <aiqaXfTqCLMu2DwF@fedora>

On Jun 11, 2026 / 06:22, Ming Lei wrote:
> Hi Shin'ichiro,

Hi Ming, thanks for the comments.

> 
> On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
> > I observed that the blktests test case block/005 hangs on a specific
> > server hardware using a specific HDD as a block device. During the test
> > case run, the kernel reported a KASAN null-ptr-deref (and other memory
> > corruption symptoms) [2]. This failure looked sporadic and hardware-
> > dependent.
> > 
> > From the kernel message, I noticed that udev-worker wrote to the
> > queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
> > The test case block/005 also wrote to the same sysfs attribute, which
> 
> sysfs write is supposed to be serialized...

I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
I found elevator_change() call is guarded with the rw_semaphore
"set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
lock. This does not serialize the sysfs writes.

I tried the patch below to replace the reader lock with the writer lock. With
a quick trial, it looks working. The kernel message is no longer observed and
the new test case does not cause hangs. I will do further testing to confirm
that this change does not trigger other new lockdep WARNs. Assuming it does not
have such side effects, I hope this fix approach is acceptable. It doesn't add
the new lock, so I think it's the better.

diff --git a/block/elevator.c b/block/elevator.c
index 3bcd37c2aa34..b03185a217ff 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	 *   update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
 	 *   kn->active -> update_nr_hwq_lock (via this sysfs write path)
 	 */
-	if (!down_read_trylock(&set->update_nr_hwq_lock)) {
+	if (!down_write_trylock(&set->update_nr_hwq_lock)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	} else {
 		ret = -ENOENT;
 	}
-	up_read(&set->update_nr_hwq_lock);
+	up_write(&set->update_nr_hwq_lock);
 
 out:
 	if (ctx.type)

[...]

> blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
> fix could be check & avoid the null-ptr-deref.

Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
free is also observed [3]. Then I'm guessing adding null checks may not be
enough.

> Adding new lock should be the last straw usually, especially this one is
> depended by queue freeze.

Got it, thanks.


[3] KASAN slab-use-after-free

[  802.836569][ T3919] run blktests block/005 at 2026-05-11 10:42:39
[  804.256901][ T3866] debugfs: 'sched' already exists in 'sdd'
[  804.874743][ T3919] debugfs: 'sched' already exists in 'sdd'
[  804.882124][ T3919] ==================================================================
[  804.882154][ T3866] debugfs: 'sched' already exists in 'sdd'
[  804.890039][ T3919] BUG: KASAN: slab-use-after-free in elevator_change_done+0x304/0x610
[  804.890053][ T3919] Write of size 8 at addr ffff8881273e08e0 by task check/3919
[  804.890061][ T3919]
[  804.890069][ T3919] CPU: 4 UID: 0 PID: 3919 Comm: check Not tainted 7.1.0-rc2-kts+ #1 PREEMPT(lazy)
[  804.890080][ T3919] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
[  804.890086][ T3919] Call Trace:
[  804.890092][ T3919]  <TASK>
[  804.890098][ T3919]  dump_stack_lvl+0x6e/0xa0
[  804.890118][ T3919]  print_address_description.constprop.0+0x70/0x300
[  804.890135][ T3919]  ? elevator_change_done+0x304/0x610
[  804.890145][ T3919]  print_report+0xfc/0x1ff
[  804.890154][ T3919]  ? __virt_addr_valid+0x1d1/0x3f0
[  804.890163][ T3919]  ? elevator_change_done+0x304/0x610
[  804.890168][ T3919]  kasan_report+0xf6/0x1c0
[  804.890176][ T3919]  ? elevator_change_done+0x304/0x610
[  804.890185][ T3919]  kasan_check_range+0x125/0x200
[  804.890192][ T3919]  elevator_change_done+0x304/0x610
[  804.890198][ T3919]  ? sysfs_file_ops+0x70/0x140
[  804.890206][ T3919]  ? __pfx_elevator_change_done+0x10/0x10
[  804.890213][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890220][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890225][ T3919]  elevator_change+0x283/0x4f0
[  804.890233][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890239][ T3919]  elv_iosched_store+0x30c/0x3a0
[  804.890246][ T3919]  ? __pfx_elv_iosched_store+0x10/0x10
[  804.890255][ T3919]  ? lock_acquire.part.0+0xb8/0x230                                                                                                                                             10:42 [84/1747]
[  804.890262][ T3919]  ? kernfs_fop_write_iter+0x25b/0x5e0
[  804.890268][ T3919]  ? lock_acquire.part.0+0xb8/0x230
[  804.890274][ T3919]  ? lock_acquire+0x126/0x140
[  804.890281][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890286][ T3919]  queue_attr_store+0x23f/0x360
[  804.890295][ T3919]  ? __pfx_queue_attr_store+0x10/0x10
[  804.890300][ T3919]  ? __lock_acquire+0x55d/0xbd0
[  804.890308][ T3919]  ? lock_acquire.part.0+0xb8/0x230
[  804.890314][ T3919]  ? sysfs_file_kobj+0x1d/0x1b0
[  804.890319][ T3919]  ? find_held_lock+0x2b/0x80
[  804.890326][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890334][ T3919]  ? lock_release.part.0+0x1c/0x50
[  804.890340][ T3919]  ? sysfs_file_kobj+0xb9/0x1b0
[  804.890345][ T3919]  ? sysfs_kf_write+0x65/0x170
[  804.890352][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890357][ T3919]  kernfs_fop_write_iter+0x3da/0x5e0
[  804.890363][ T3919]  ? __pfx_kernfs_fop_write_iter+0x10/0x10
[  804.890368][ T3919]  vfs_write+0x524/0x1010
[  804.890378][ T3919]  ? __pfx_vfs_write+0x10/0x10
[  804.890393][ T3919]  ksys_write+0xff/0x200
[  804.890401][ T3919]  ? __pfx_ksys_write+0x10/0x10
[  804.890408][ T3919]  ? __pfx_pte_val+0x10/0x10
[  804.890414][ T3919]  ? folio_xchg_last_cpupid+0xc6/0x130
[  804.890421][ T3919]  do_syscall_64+0xf4/0x1550
[  804.890429][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890437][ T3919]  ? lock_release.part.0+0x1c/0x50
[  804.890444][ T3919]  ? rcu_read_unlock+0x1c/0x60
[  804.890449][ T3919]  ? wp_page_reuse+0x160/0x1e0
[  804.890455][ T3919]  ? do_wp_page+0x5db/0x10a0
[  804.890465][ T3919]  ? handle_pte_fault+0x54e/0x760
[  804.890472][ T3919]  ? __pfx_handle_pte_fault+0x10/0x10
[  804.890479][ T3919]  ? __pfx_pmd_val+0x10/0x10
[  804.890485][ T3919]  ? __handle_mm_fault+0xa02/0xef0
[  804.890493][ T3919]  ? __lock_acquire+0x55d/0xbd0
[  804.890499][ T3919]  ? __pfx_css_rstat_updated+0x10/0x10
[  804.890509][ T3919]  ? lock_acquire.part.0+0xb8/0x230
[  804.890515][ T3919]  ? count_memcg_events_mm.constprop.0+0x22/0x130
[  804.890522][ T3919]  ? find_held_lock+0x2b/0x80
[  804.890528][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890536][ T3919]  ? find_held_lock+0x2b/0x80
[  804.890542][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890550][ T3919]  ? do_user_addr_fault+0x811/0xed0
[  804.890559][ T3919]  ? do_syscall_64+0x34/0x1550
[  804.890564][ T3919]  ? lockdep_hardirqs_on_prepare.part.0+0x9b/0x140
[  804.890570][ T3919]  ? do_syscall_64+0x34/0x1550
[  804.890575][ T3919]  ? trace_hardirqs_on+0x19/0x1a0
[  804.890584][ T3919]  ? do_syscall_64+0xab/0x1550
[  804.890590][ T3919]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  804.890596][ T3919] RIP: 0033:0x7ff08cbe3bbe
[  804.890603][ T3919] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f
3 0f 1e fa
[  804.890609][ T3919] RSP: 002b:00007ffc95718820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  804.890616][ T3919] RAX: ffffffffffffffda RBX: 00007ff08cd5f5c0 RCX: 00007ff08cbe3bbe
[  804.890621][ T3919] RDX: 0000000000000006 RSI: 0000563340f2c390 RDI: 0000000000000001
[  804.890624][ T3919] RBP: 00007ffc95718830 R08: 0000000000000000 R09: 0000000000000000
[  804.890627][ T3919] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
[  804.890630][ T3919] R13: 0000000000000006 R14: 0000563340f2c390 R15: 0000563340f96890
[  804.890641][ T3919]  </TASK>
[  804.890643][ T3919]
[  805.368835][ T3919] Allocated by task 3919:
[  805.373543][ T3919]  kasan_save_stack+0x30/0x50
[  805.378559][ T3919]  kasan_save_track+0x14/0x30
[  805.383559][ T3919]  __kasan_kmalloc+0x9a/0xb0
[  805.388465][ T3919]  elevator_alloc+0xc5/0x2b0
[  805.393366][ T3919]  blk_mq_init_sched+0xa6/0x5e0
[  805.398554][ T3919]  elevator_switch+0x18e/0x680
[  805.403702][ T3919]  elevator_change+0x2d8/0x4f0
[  805.408802][ T3919]  elv_iosched_store+0x30c/0x3a0
[  805.414116][ T3919]  queue_attr_store+0x23f/0x360
[  805.419289][ T3919]  kernfs_fop_write_iter+0x3da/0x5e0
[  805.424938][ T3919]  vfs_write+0x524/0x1010
[  805.429600][ T3919]  ksys_write+0xff/0x200
[  805.434159][ T3919]  do_syscall_64+0xf4/0x1550
[  805.439064][ T3919]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  805.445273][ T3919]
[  805.447927][ T3919] Freed by task 3866:
[  805.452231][ T3919]  kasan_save_stack+0x30/0x50
[  805.457287][ T3919]  kasan_save_track+0x14/0x30
[  805.462282][ T3919]  kasan_save_free_info+0x3b/0x70
[  805.467645][ T3919]  __kasan_slab_free+0x6b/0x90
[  805.472736][ T3919]  kfree+0x21c/0x620
[  805.476953][ T3919]  kobject_cleanup+0x105/0x3a0
[  805.482039][ T3919]  elevator_change_done+0x196/0x610
[  805.487633][ T3919]  elevator_change+0x283/0x4f0
[  805.492730][ T3919]  elv_iosched_store+0x30c/0x3a0
[  805.497989][ T3919]  queue_attr_store+0x23f/0x360
[  805.503144][ T3919]  kernfs_fop_write_iter+0x3da/0x5e0
[  805.508747][ T3919]  vfs_write+0x524/0x1010
[  805.513381][ T3919]  ksys_write+0xff/0x200
[  805.517944][ T3919]  do_syscall_64+0xf4/0x1550
[  805.522862][ T3919]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  805.529118][ T3919]
[  805.531858][ T3919] The buggy address belongs to the object at ffff8881273e0800
[  805.531858][ T3919]  which belongs to the cache kmalloc-rnd-13-1k of size 1024
[  805.547392][ T3919] The buggy address is located 224 bytes inside of
[  805.547392][ T3919]  freed 1024-byte region [ffff8881273e0800, ffff8881273e0c00)
[  805.562078][ T3919]
[  805.564734][ T3919] The buggy address belongs to the physical page:
[  805.571446][ T3919] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1273e0
[  805.580609][ T3919] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  805.589411][ T3919] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[  805.597524][ T3919] page_type: f5(slab)
[  805.601916][ T3919] raw: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[  805.610881][ T3919] raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[  805.619808][ T3919] head: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[  805.628815][ T3919] head: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[  805.637838][ T3919] head: 0017ffffc0000003 fffffffffffffe01 00000000ffffffff 00000000ffffffff
[  805.646901][ T3919] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[  805.655983][ T3919] page dumped because: kasan: bad access detected
[  805.662913][ T3919]
[  805.665657][ T3919] Memory state around the buggy address:
[  805.671717][ T3919]  ffff8881273e0780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  805.680194][ T3919]  ffff8881273e0800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.688697][ T3919] >ffff8881273e0880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.697130][ T3919]                                                        ^
[  805.704717][ T3919]  ffff8881273e0900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.713179][ T3919]  ffff8881273e0980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.721720][ T3919] ==================================================================
[  805.730526][ T3919] Disabling lock debugging due to kernel taint
...

  reply	other threads:[~2026-06-12  9:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11  7:41 [PATCH RFC 0/1] block: fix concurrent elevator change failure Shin'ichiro Kawasaki
2026-06-11  7:42 ` [PATCH RFC 1/1] block: serialize whole elevator change steps for the same queue Shin'ichiro Kawasaki
2026-06-11 11:22 ` [PATCH RFC 0/1] block: fix concurrent elevator change failure Ming Lei
2026-06-12  9:47   ` Shin'ichiro Kawasaki [this message]
2026-06-12 11:06     ` Ming Lei
2026-06-12 11:45       ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aivMxPCd305WbBsk@shinmob \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox