Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Ming Lei <tom.leiming@gmail.com>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	 Nilay Shroff <nilay@linux.ibm.com>
Subject: Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
Date: Fri, 12 Jun 2026 18:47:50 +0900	[thread overview]
Message-ID: <aivMxPCd305WbBsk@shinmob> (raw)
In-Reply-To: <aiqaXfTqCLMu2DwF@fedora>

On Jun 11, 2026 / 06:22, Ming Lei wrote:
> Hi Shin'ichiro,

Hi Ming, thanks for the comments.

> 
> On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
> > I observed that the blktests test case block/005 hangs on a specific
> > server hardware using a specific HDD as a block device. During the test
> > case run, the kernel reported a KASAN null-ptr-deref (and other memory
> > corruption symptoms) [2]. This failure looked sporadic and hardware-
> > dependent.
> > 
> > From the kernel message, I noticed that udev-worker wrote to the
> > queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
> > The test case block/005 also wrote to the same sysfs attribute, which
> 
> sysfs write is supposed to be serialized...

I checked the sysfs write handler elv_iosched_store() in block/elevator.c.
I found elevator_change() call is guarded with the rw_semaphore
"set->update_nr_hwq_lock", but the guard is not the writer lock but the reader
lock. This does not serialize the sysfs writes.

I tried the patch below to replace the reader lock with the writer lock. With
a quick trial, it looks working. The kernel message is no longer observed and
the new test case does not cause hangs. I will do further testing to confirm
that this change does not trigger other new lockdep WARNs. Assuming it does not
have such side effects, I hope this fix approach is acceptable. It doesn't add
the new lock, so I think it's the better.

diff --git a/block/elevator.c b/block/elevator.c
index 3bcd37c2aa34..b03185a217ff 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	 *   update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
 	 *   kn->active -> update_nr_hwq_lock (via this sysfs write path)
 	 */
-	if (!down_read_trylock(&set->update_nr_hwq_lock)) {
+	if (!down_write_trylock(&set->update_nr_hwq_lock)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	} else {
 		ret = -ENOENT;
 	}
-	up_read(&set->update_nr_hwq_lock);
+	up_write(&set->update_nr_hwq_lock);
 
 out:
 	if (ctx.type)

[...]

> blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
> fix could be check & avoid the null-ptr-deref.

Actually, null-ptr-deref is one of the failure symptoms. KASAN slab-user-after
free is also observed [3]. Then I'm guessing adding null checks may not be
enough.

> Adding new lock should be the last straw usually, especially this one is
> depended by queue freeze.

Got it, thanks.


[3] KASAN slab-use-after-free

[  802.836569][ T3919] run blktests block/005 at 2026-05-11 10:42:39
[  804.256901][ T3866] debugfs: 'sched' already exists in 'sdd'
[  804.874743][ T3919] debugfs: 'sched' already exists in 'sdd'
[  804.882124][ T3919] ==================================================================
[  804.882154][ T3866] debugfs: 'sched' already exists in 'sdd'
[  804.890039][ T3919] BUG: KASAN: slab-use-after-free in elevator_change_done+0x304/0x610
[  804.890053][ T3919] Write of size 8 at addr ffff8881273e08e0 by task check/3919
[  804.890061][ T3919]
[  804.890069][ T3919] CPU: 4 UID: 0 PID: 3919 Comm: check Not tainted 7.1.0-rc2-kts+ #1 PREEMPT(lazy)
[  804.890080][ T3919] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
[  804.890086][ T3919] Call Trace:
[  804.890092][ T3919]  <TASK>
[  804.890098][ T3919]  dump_stack_lvl+0x6e/0xa0
[  804.890118][ T3919]  print_address_description.constprop.0+0x70/0x300
[  804.890135][ T3919]  ? elevator_change_done+0x304/0x610
[  804.890145][ T3919]  print_report+0xfc/0x1ff
[  804.890154][ T3919]  ? __virt_addr_valid+0x1d1/0x3f0
[  804.890163][ T3919]  ? elevator_change_done+0x304/0x610
[  804.890168][ T3919]  kasan_report+0xf6/0x1c0
[  804.890176][ T3919]  ? elevator_change_done+0x304/0x610
[  804.890185][ T3919]  kasan_check_range+0x125/0x200
[  804.890192][ T3919]  elevator_change_done+0x304/0x610
[  804.890198][ T3919]  ? sysfs_file_ops+0x70/0x140
[  804.890206][ T3919]  ? __pfx_elevator_change_done+0x10/0x10
[  804.890213][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890220][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890225][ T3919]  elevator_change+0x283/0x4f0
[  804.890233][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890239][ T3919]  elv_iosched_store+0x30c/0x3a0
[  804.890246][ T3919]  ? __pfx_elv_iosched_store+0x10/0x10
[  804.890255][ T3919]  ? lock_acquire.part.0+0xb8/0x230                                                                                                                                             10:42 [84/1747]
[  804.890262][ T3919]  ? kernfs_fop_write_iter+0x25b/0x5e0
[  804.890268][ T3919]  ? lock_acquire.part.0+0xb8/0x230
[  804.890274][ T3919]  ? lock_acquire+0x126/0x140
[  804.890281][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890286][ T3919]  queue_attr_store+0x23f/0x360
[  804.890295][ T3919]  ? __pfx_queue_attr_store+0x10/0x10
[  804.890300][ T3919]  ? __lock_acquire+0x55d/0xbd0
[  804.890308][ T3919]  ? lock_acquire.part.0+0xb8/0x230
[  804.890314][ T3919]  ? sysfs_file_kobj+0x1d/0x1b0
[  804.890319][ T3919]  ? find_held_lock+0x2b/0x80
[  804.890326][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890334][ T3919]  ? lock_release.part.0+0x1c/0x50
[  804.890340][ T3919]  ? sysfs_file_kobj+0xb9/0x1b0
[  804.890345][ T3919]  ? sysfs_kf_write+0x65/0x170
[  804.890352][ T3919]  ? __pfx_sysfs_kf_write+0x10/0x10
[  804.890357][ T3919]  kernfs_fop_write_iter+0x3da/0x5e0
[  804.890363][ T3919]  ? __pfx_kernfs_fop_write_iter+0x10/0x10
[  804.890368][ T3919]  vfs_write+0x524/0x1010
[  804.890378][ T3919]  ? __pfx_vfs_write+0x10/0x10
[  804.890393][ T3919]  ksys_write+0xff/0x200
[  804.890401][ T3919]  ? __pfx_ksys_write+0x10/0x10
[  804.890408][ T3919]  ? __pfx_pte_val+0x10/0x10
[  804.890414][ T3919]  ? folio_xchg_last_cpupid+0xc6/0x130
[  804.890421][ T3919]  do_syscall_64+0xf4/0x1550
[  804.890429][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890437][ T3919]  ? lock_release.part.0+0x1c/0x50
[  804.890444][ T3919]  ? rcu_read_unlock+0x1c/0x60
[  804.890449][ T3919]  ? wp_page_reuse+0x160/0x1e0
[  804.890455][ T3919]  ? do_wp_page+0x5db/0x10a0
[  804.890465][ T3919]  ? handle_pte_fault+0x54e/0x760
[  804.890472][ T3919]  ? __pfx_handle_pte_fault+0x10/0x10
[  804.890479][ T3919]  ? __pfx_pmd_val+0x10/0x10
[  804.890485][ T3919]  ? __handle_mm_fault+0xa02/0xef0
[  804.890493][ T3919]  ? __lock_acquire+0x55d/0xbd0
[  804.890499][ T3919]  ? __pfx_css_rstat_updated+0x10/0x10
[  804.890509][ T3919]  ? lock_acquire.part.0+0xb8/0x230
[  804.890515][ T3919]  ? count_memcg_events_mm.constprop.0+0x22/0x130
[  804.890522][ T3919]  ? find_held_lock+0x2b/0x80
[  804.890528][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890536][ T3919]  ? find_held_lock+0x2b/0x80
[  804.890542][ T3919]  ? __lock_release.isra.0+0x59/0x170
[  804.890550][ T3919]  ? do_user_addr_fault+0x811/0xed0
[  804.890559][ T3919]  ? do_syscall_64+0x34/0x1550
[  804.890564][ T3919]  ? lockdep_hardirqs_on_prepare.part.0+0x9b/0x140
[  804.890570][ T3919]  ? do_syscall_64+0x34/0x1550
[  804.890575][ T3919]  ? trace_hardirqs_on+0x19/0x1a0
[  804.890584][ T3919]  ? do_syscall_64+0xab/0x1550
[  804.890590][ T3919]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  804.890596][ T3919] RIP: 0033:0x7ff08cbe3bbe
[  804.890603][ T3919] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f
3 0f 1e fa
[  804.890609][ T3919] RSP: 002b:00007ffc95718820 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  804.890616][ T3919] RAX: ffffffffffffffda RBX: 00007ff08cd5f5c0 RCX: 00007ff08cbe3bbe
[  804.890621][ T3919] RDX: 0000000000000006 RSI: 0000563340f2c390 RDI: 0000000000000001
[  804.890624][ T3919] RBP: 00007ffc95718830 R08: 0000000000000000 R09: 0000000000000000
[  804.890627][ T3919] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
[  804.890630][ T3919] R13: 0000000000000006 R14: 0000563340f2c390 R15: 0000563340f96890
[  804.890641][ T3919]  </TASK>
[  804.890643][ T3919]
[  805.368835][ T3919] Allocated by task 3919:
[  805.373543][ T3919]  kasan_save_stack+0x30/0x50
[  805.378559][ T3919]  kasan_save_track+0x14/0x30
[  805.383559][ T3919]  __kasan_kmalloc+0x9a/0xb0
[  805.388465][ T3919]  elevator_alloc+0xc5/0x2b0
[  805.393366][ T3919]  blk_mq_init_sched+0xa6/0x5e0
[  805.398554][ T3919]  elevator_switch+0x18e/0x680
[  805.403702][ T3919]  elevator_change+0x2d8/0x4f0
[  805.408802][ T3919]  elv_iosched_store+0x30c/0x3a0
[  805.414116][ T3919]  queue_attr_store+0x23f/0x360
[  805.419289][ T3919]  kernfs_fop_write_iter+0x3da/0x5e0
[  805.424938][ T3919]  vfs_write+0x524/0x1010
[  805.429600][ T3919]  ksys_write+0xff/0x200
[  805.434159][ T3919]  do_syscall_64+0xf4/0x1550
[  805.439064][ T3919]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  805.445273][ T3919]
[  805.447927][ T3919] Freed by task 3866:
[  805.452231][ T3919]  kasan_save_stack+0x30/0x50
[  805.457287][ T3919]  kasan_save_track+0x14/0x30
[  805.462282][ T3919]  kasan_save_free_info+0x3b/0x70
[  805.467645][ T3919]  __kasan_slab_free+0x6b/0x90
[  805.472736][ T3919]  kfree+0x21c/0x620
[  805.476953][ T3919]  kobject_cleanup+0x105/0x3a0
[  805.482039][ T3919]  elevator_change_done+0x196/0x610
[  805.487633][ T3919]  elevator_change+0x283/0x4f0
[  805.492730][ T3919]  elv_iosched_store+0x30c/0x3a0
[  805.497989][ T3919]  queue_attr_store+0x23f/0x360
[  805.503144][ T3919]  kernfs_fop_write_iter+0x3da/0x5e0
[  805.508747][ T3919]  vfs_write+0x524/0x1010
[  805.513381][ T3919]  ksys_write+0xff/0x200
[  805.517944][ T3919]  do_syscall_64+0xf4/0x1550
[  805.522862][ T3919]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  805.529118][ T3919]
[  805.531858][ T3919] The buggy address belongs to the object at ffff8881273e0800
[  805.531858][ T3919]  which belongs to the cache kmalloc-rnd-13-1k of size 1024
[  805.547392][ T3919] The buggy address is located 224 bytes inside of
[  805.547392][ T3919]  freed 1024-byte region [ffff8881273e0800, ffff8881273e0c00)
[  805.562078][ T3919]
[  805.564734][ T3919] The buggy address belongs to the physical page:
[  805.571446][ T3919] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1273e0
[  805.580609][ T3919] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  805.589411][ T3919] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[  805.597524][ T3919] page_type: f5(slab)
[  805.601916][ T3919] raw: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[  805.610881][ T3919] raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[  805.619808][ T3919] head: 0017ffffc0000040 ffff88810005c640 dead000000000100 dead000000000122
[  805.628815][ T3919] head: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
[  805.637838][ T3919] head: 0017ffffc0000003 fffffffffffffe01 00000000ffffffff 00000000ffffffff
[  805.646901][ T3919] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[  805.655983][ T3919] page dumped because: kasan: bad access detected
[  805.662913][ T3919]
[  805.665657][ T3919] Memory state around the buggy address:
[  805.671717][ T3919]  ffff8881273e0780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  805.680194][ T3919]  ffff8881273e0800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.688697][ T3919] >ffff8881273e0880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.697130][ T3919]                                                        ^
[  805.704717][ T3919]  ffff8881273e0900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.713179][ T3919]  ffff8881273e0980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  805.721720][ T3919] ==================================================================
[  805.730526][ T3919] Disabling lock debugging due to kernel taint
...

next prev parent reply	other threads:[~2026-06-12  9:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11  7:41 [PATCH RFC 0/1] block: fix concurrent elevator change failure Shin'ichiro Kawasaki
2026-06-11  7:42 ` [PATCH RFC 1/1] block: serialize whole elevator change steps for the same queue Shin'ichiro Kawasaki
2026-06-11 11:22 ` [PATCH RFC 0/1] block: fix concurrent elevator change failure Ming Lei
2026-06-12  9:47   ` Shin'ichiro Kawasaki [this message]
2026-06-12 11:06     ` Ming Lei
2026-06-12 11:45       ` Nilay Shroff
2026-06-16  1:20         ` Shin'ichiro Kawasaki
2026-06-17 11:08           ` Nilay Shroff
2026-06-18  8:04             ` Shin'ichiro Kawasaki

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:3bcd37c2aa3 dfblob:b03185a217f )
 OR (
bs:"Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aivMxPCd305WbBsk@shinmob \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.