All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Nilay Shroff <nilay@linux.ibm.com>
Subject: Re: [PATCH RFC 0/1] block: fix concurrent elevator change failure
Date: Thu, 11 Jun 2026 06:22:05 -0500	[thread overview]
Message-ID: <aiqaXfTqCLMu2DwF@fedora> (raw)
In-Reply-To: <20260611074200.474676-1-shinichiro.kawasaki@wdc.com>

Hi Shin'ichiro,

On Thu, Jun 11, 2026 at 04:41:59PM +0900, Shin'ichiro Kawasaki wrote:
> I observed that the blktests test case block/005 hangs on a specific
> server hardware using a specific HDD as a block device. During the test
> case run, the kernel reported a KASAN null-ptr-deref (and other memory
> corruption symptoms) [2]. This failure looked sporadic and hardware-
> dependent.
> 
> From the kernel message, I noticed that udev-worker wrote to the
> queue/scheduler sysfs attribute to change the IO scheduler, or elevator.
> The test case block/005 also wrote to the same sysfs attribute, which

sysfs write is supposed to be serialized...

> indicated that a concurrent elevator change caused the failure. I
> created a new blktests test case that simply does the concurrent
> elevator change with a null_blk device [1]. It recreates the failure in
> a stable manner on various server hardware.
> 
> Using the new test case, I bisected and found that the failure first
> appears at the commit 370ac285f23a ("block: avoid cpu_hotplug_lock
> depedency on freeze_lock") in the kernel tag v6.17-rc3. However, that
> commit does not appear to explain the failure by itself: it changed the
> queue freeze behavior and only unveiled a race, probably. Looking back
> at the changes to elevator_change(), I think the actual cause is the
> commit 559dc11143eb ("block: move elv_register[unregister]_queue out of
> elevator_lock") in the kernel tag v6.16-rc1. This commit moved
> elevator_change_done() out of the guard of ->elevator_lock and the queue
> freeze. As a result, when two threads write to the same queue/scheduler
> attribute concurrently, elevator_change_done() runs in parallel causing
> the memory corruption and the hang.
> 
> As the fix attempt, I created the patch in this series. It adds a new
> mutex that serializes the whole elevator switch sequence, including the
> elevator_change_done() call. I ran the reproducer with lockdep enabled
> and confirmed that the patch avoids the failure and new WARN was not
> observed.
> 
> However, the fix patch adds a new lock, and I'm not sure if it is the best
> solution. Comments on the patch, or suggestions for a better solution,
> would be appreciated.
> 
> [1] https://github.com/kawasaki/blktests/commit/4f8c63ed7d049f5e9c935c3fe00142b2a3629826
> 
> [2]
> 
> [30102.760660] [ T186170] run blktests block/005 at 2026-05-11 05:53:53
> [30104.969837] [ T186111] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
> [30104.983590] [ T186111] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> [30104.992929] [ T186111] CPU: 2 UID: 0 PID: 186111 Comm: (udev-worker) Not tainted 7.1.0-rc2-kts+ #1 PREEMPT(lazy)
> [30105.004019] [ T186111] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015
> [30105.013216] [ T186111] RIP: 0010:blk_mq_debugfs_register_sched+0x46/0x210
> [30105.020667] [ T186111] Code: 48 89 fa 48 c1 ea 03 48 83 ec 10 80 3c 02 00 0f 85 83 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6b 08 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 57 01 00 00 48 c7 c0 24 a3 b3 97 4
> 8 8b 6d 00 48
> [30105.041036] [ T186111] RSP: 0018:ffff88816b9c7708 EFLAGS: 00010246
> [30105.048111] [ T186111] RAX: dffffc0000000000 RBX: ffff888117f18000 RCX: 0000000000000000
> [30105.057097] [ T186111] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888117f18008
> [30105.066086] [ T186111] RBP: 0000000000000000 R08: ffffffff957c47ac R09: fffffbfff2f6633c
> [30105.075083] [ T186111] R10: ffff88816b9c7730 R11: 0000000000000001 R12: ffff88814c1f2000
> [30105.084088] [ T186111] R13: ffff88814c1f2018 R14: ffff8881b8a336ac R15: ffffffff95bfae30
> [30105.093111] [ T186111] FS:  00007fc1c7970c40(0000) GS:ffff8887c534e000(0000) knlGS:0000000000000000
> [30105.103093] [ T186111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [30105.110751] [ T186111] CR2: 000055fa37e182c0 CR3: 0000000108350003 CR4: 00000000001726f0
> [30105.119796] [ T186111] Call Trace:
> [30105.124154] [ T186111]  <TASK>
> [30105.128301] [ T186111]  blk_mq_sched_reg_debugfs+0x8d/0x1a0
> [30105.134193] [ T186111]  elevator_change_done+0x2f2/0x610

blk_mq_sched_reg_debugfs already includes debugfs lock, so I feel the proper
fix could be check & avoid the null-ptr-deref.

Adding new lock should be the last straw usually, especially this one is
depended by queue freeze.

Thanks,
Ming

      parent reply	other threads:[~2026-06-11 11:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11  7:41 [PATCH RFC 0/1] block: fix concurrent elevator change failure Shin'ichiro Kawasaki
2026-06-11  7:42 ` [PATCH RFC 1/1] block: serialize whole elevator change steps for the same queue Shin'ichiro Kawasaki
2026-06-11 11:22 ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiqaXfTqCLMu2DwF@fedora \
    --to=tom.leiming@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.