public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context
@ 2026-03-02  0:22 Chaitanya Kulkarni
  2026-03-02  6:02 ` Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chaitanya Kulkarni @ 2026-03-02  0:22 UTC (permalink / raw)
  To: axboe, rostedt, mhiramat, mathieu.desnoyers
  Cc: shinichiro.kawasaki, linux-block, linux-trace-kernel,
	Chaitanya Kulkarni

tracing_record_cmdline() internally uses __this_cpu_read() and
__this_cpu_write() on the per-CPU variable trace_cmdline_save, and
trace_save_cmdline() explicitly asserts preemption is disabled via
lockdep_assert_preemption_disabled(). These operations are only safe
when preemption is off, as they were designed to be called from the
scheduler context (probe_wakeup_sched_switch() / probe_wakeup()).

__blk_add_trace() was calling tracing_record_cmdline(current) early in
the blk_tracer path, before ring buffer reservation, from process
context where preemption is fully enabled. This triggers the following
using blktests/blktrace/002:

blktrace/002 (blktrace ftrace corruption with sysfs trace)   [failed]
    runtime  0.367s  ...  0.437s
    something found in dmesg:
    [   81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
    [   81.239580] null_blk: disk nullb1 created
    [   81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
    [   81.362842] caller is tracing_record_cmdline+0x10/0x40
    [   81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G                 N  7.0.0-rc1lblk+ #84 PREEMPT(full)
    [   81.362877] Tainted: [N]=TEST
    [   81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
    [   81.362881] Call Trace:
    [   81.362884]  <TASK>
    [   81.362886]  dump_stack_lvl+0x8d/0xb0
    ...
    (See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message)

[   81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[   81.239580] null_blk: disk nullb1 created
[   81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[   81.362842] caller is tracing_record_cmdline+0x10/0x40
[   81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G                 N  7.0.0-rc1lblk+ #84 PREEMPT(full)
[   81.362877] Tainted: [N]=TEST
[   81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[   81.362881] Call Trace:
[   81.362884]  <TASK>
[   81.362886]  dump_stack_lvl+0x8d/0xb0
[   81.362895]  check_preemption_disabled+0xce/0xe0
[   81.362902]  tracing_record_cmdline+0x10/0x40
[   81.362923]  __blk_add_trace+0x307/0x5d0
[   81.362934]  ? lock_acquire+0xe0/0x300
[   81.362940]  ? iov_iter_extract_pages+0x101/0xa30
[   81.362959]  blk_add_trace_bio+0x106/0x1e0
[   81.362968]  submit_bio_noacct_nocheck+0x24b/0x3a0
[   81.362979]  ? lockdep_init_map_type+0x58/0x260
[   81.362988]  submit_bio_wait+0x56/0x90
[   81.363009]  __blkdev_direct_IO_simple+0x16c/0x250
[   81.363026]  ? __pfx_submit_bio_wait_endio+0x10/0x10
[   81.363038]  ? rcu_read_lock_any_held+0x73/0xa0
[   81.363051]  blkdev_read_iter+0xc1/0x140
[   81.363059]  vfs_read+0x20b/0x330
[   81.363083]  ksys_read+0x67/0xe0
[   81.363090]  do_syscall_64+0xbf/0xf00
[   81.363102]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   81.363106] RIP: 0033:0x7f281906029d
[   81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
[   81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d
[   81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000
[   81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000
[   81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000
[   81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a
[   81.363142]  </TASK>

The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(),
and blk_add_trace_rq() paths as well.

The purpose of tracing_record_cmdline() is to cache the task->comm for
a given PID so that the trace can later resolve it. It is only
meaningful when a trace event is actually being recorded. Ring buffer
reservation via ring_buffer_lock_reserve() disables preemption, and
preemption remains disabled until the event is committed :-

__blk_add_trace()
       	__trace_buffer_lock_reserve()
       		__trace_buffer_lock_reserve()
       			ring_buffer_lock_reserve()
       				preempt_disable_notrace();  <---

With this fix blktests for blktrace pass:

  blktests (master) # ./check blktrace
  blktrace/001 (blktrace zone management command tracing)      [passed]
      runtime  3.650s  ...  3.647s
  blktrace/002 (blktrace ftrace corruption with sysfs trace)   [passed]
      runtime  0.411s  ...  0.384s

Fixes: 7ffbd48d5cab ("tracing: Cache comms only after an event occurred")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
v2:-

1. Remove preempt_disable_notrace() and preempt_enable_notrace() calls from V1.
   Fix the issue by moving a call to tracing_record_cmdline() after ring
   buffer reservation which also disables the preemption.

---
 kernel/trace/blktrace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 3b7c102a6eb3..ead03e0e0fbe 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -383,8 +383,6 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 	cpu = raw_smp_processor_id();
 
 	if (blk_tracer) {
-		tracing_record_cmdline(current);
-
 		buffer = blk_tr->array_buffer.buffer;
 		trace_ctx = tracing_gen_ctx_flags(0);
 		switch (bt->version) {
@@ -419,6 +417,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 		if (!event)
 			return;
 
+		tracing_record_cmdline(current);
 		switch (bt->version) {
 		case 1:
 			record_blktrace_event(ring_buffer_event_data(event),
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context
  2026-03-02  0:22 [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni
@ 2026-03-02  6:02 ` Jens Axboe
  2026-03-02  8:41   ` Chaitanya Kulkarni
  2026-03-02 15:59 ` Steven Rostedt
  2026-03-02 16:16 ` Jens Axboe
  2 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2026-03-02  6:02 UTC (permalink / raw)
  To: Chaitanya Kulkarni, rostedt, mhiramat, mathieu.desnoyers
  Cc: shinichiro.kawasaki, linux-block, linux-trace-kernel

On 3/1/26 5:22 PM, Chaitanya Kulkarni wrote:
> Fixes: 7ffbd48d5cab ("tracing: Cache comms only after an event occurred")

I don't understand, this dates back to 2012?

--
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context
  2026-03-02  6:02 ` Jens Axboe
@ 2026-03-02  8:41   ` Chaitanya Kulkarni
  2026-03-02 16:08     ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Chaitanya Kulkarni @ 2026-03-02  8:41 UTC (permalink / raw)
  To: Jens Axboe, rostedt@goodmis.org
  Cc: shinichiro.kawasaki@wdc.com, linux-block@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	mathieu.desnoyers@efficios.com, mhiramat@kernel.org,
	Chaitanya Kulkarni

On 3/1/26 22:02, Jens Axboe wrote:
> On 3/1/26 5:22 PM, Chaitanya Kulkarni wrote:
>> Fixes: 7ffbd48d5cab ("tracing: Cache comms only after an event occurred")
> I don't understand, this dates back to 2012?
>
> --
> Jens Axboe

The commit c71a896154119 ("blktrace: add ftrace plugin") added
tracing_record_cmdline() definition first.

Then commit 7ffbd48d5cab ("tracing: Cache comms only after an event occurred")
updated tracing_record_cmdline() function with  __trace_cpu_read() and
__trace_cpu_write().

Above added __trace_cpu_read() when used in process context in the call
chain from starting blk_add_trace() is resulting in the splat :-

run blktests blktrace/002 at 2026-02-25 22:24:33
null_blk: disk nullb1 created

_BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516_

caller is tracing_record_cmdline+0x10/0x40
CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G                 N  7.0.0-rc1lblk+ #84 PREEMPT(full)
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x8d/0xb0
check_preemption_disabled+0xce/0xe0
tracing_record_cmdline+0x10/0x40
  __blk_add_trace+0x307/0x5d0
? lock_acquire+0xe0/0x300
? iov_iter_extract_pages+0x101/0xa30
blk_add_trace_bio+0x106/0x

[...]

Hence when __trace_cpu_read() is added in the path from blk_add_trace()
this bug was introduced 7ffbd48d5cab ?

I totally failed to understand why this bug is appearing right now
than before.

-ck

Other reference commits :-

The commit 2cc621fd2e9b8
("tracing: Move saved_cmdline code into trace_sched_switch.c")
moved the lockdep to new file.

The commit c0a581d7126c0
("tracing: Disable interrupt or preemption before acquiring arch_spinlock_t")
Added lockdep_assert_preemption_disabled() in trace_save_cmdline().



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context
  2026-03-02  0:22 [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni
  2026-03-02  6:02 ` Jens Axboe
@ 2026-03-02 15:59 ` Steven Rostedt
  2026-03-02 16:16 ` Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2026-03-02 15:59 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: axboe, mhiramat, mathieu.desnoyers, shinichiro.kawasaki,
	linux-block, linux-trace-kernel

On Sun, 1 Mar 2026 16:22:07 -0800
Chaitanya Kulkarni <kch@nvidia.com> wrote:

> With this fix blktests for blktrace pass:
> 
>   blktests (master) # ./check blktrace
>   blktrace/001 (blktrace zone management command tracing)      [passed]
>       runtime  3.650s  ...  3.647s
>   blktrace/002 (blktrace ftrace corruption with sysfs trace)   [passed]
>       runtime  0.411s  ...  0.384s
> 
> Fixes: 7ffbd48d5cab ("tracing: Cache comms only after an event occurred")
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>

Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context
  2026-03-02  8:41   ` Chaitanya Kulkarni
@ 2026-03-02 16:08     ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2026-03-02 16:08 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Jens Axboe, shinichiro.kawasaki@wdc.com,
	linux-block@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	mathieu.desnoyers@efficios.com, mhiramat@kernel.org

On Mon, 2 Mar 2026 08:41:58 +0000
Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:

> I totally failed to understand why this bug is appearing right now
> than before.

I wonder if it is because it was never tested under PREEMPT_FULL, which
would be the only config that would trigger the warning, as
PREEMPT_VOLUNTARY has rcu_read_lock() not allow preemption.

Today, the default is now the new PREEMPT_LAZY and not PREEMPT_VOLUNTARY,
where it can preempt within rcu_read_lock(). It could be that this bug
existed since 2012 but was never triggered as most users use
PREEMPT_VOLUNTARY where it would not trigger. But if they used PREEMPT_FULL
back then, it may have.

Today we now use PREEMPT_LAZY where this does trigger.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context
  2026-03-02  0:22 [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni
  2026-03-02  6:02 ` Jens Axboe
  2026-03-02 15:59 ` Steven Rostedt
@ 2026-03-02 16:16 ` Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2026-03-02 16:16 UTC (permalink / raw)
  To: rostedt, mhiramat, mathieu.desnoyers, Chaitanya Kulkarni
  Cc: shinichiro.kawasaki, linux-block, linux-trace-kernel


On Sun, 01 Mar 2026 16:22:07 -0800, Chaitanya Kulkarni wrote:
> tracing_record_cmdline() internally uses __this_cpu_read() and
> __this_cpu_write() on the per-CPU variable trace_cmdline_save, and
> trace_save_cmdline() explicitly asserts preemption is disabled via
> lockdep_assert_preemption_disabled(). These operations are only safe
> when preemption is off, as they were designed to be called from the
> scheduler context (probe_wakeup_sched_switch() / probe_wakeup()).
> 
> [...]

Applied, thanks!

[1/1] blktrace: fix __this_cpu_read/write in preemptible context
      commit: da46b5dfef48658d03347cda21532bcdbb521e67

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-02 16:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02  0:22 [PATCH V2] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni
2026-03-02  6:02 ` Jens Axboe
2026-03-02  8:41   ` Chaitanya Kulkarni
2026-03-02 16:08     ` Steven Rostedt
2026-03-02 15:59 ` Steven Rostedt
2026-03-02 16:16 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox