* [PATCH] blktrace: fix __this_cpu_read/write in preemptible context
@ 2026-02-27 5:03 Chaitanya Kulkarni
2026-02-27 16:48 ` Jens Axboe
2026-02-27 19:19 ` Steven Rostedt
0 siblings, 2 replies; 4+ messages in thread
From: Chaitanya Kulkarni @ 2026-02-27 5:03 UTC (permalink / raw)
To: axboe, rostedt, mhiramat, mathieu.desnoyers
Cc: shinichiro.kawasaki, linux-block, linux-trace-kernel,
Chaitanya Kulkarni
tracing_record_cmdline() internally uses __this_cpu_read() and
__this_cpu_write() on the per-CPU variable trace_taskinfo_save, and
trace_save_cmdline() explicitly asserts preemption is disabled via
lockdep_assert_preemption_disabled(). These operations are safe only
when preemption is off, as they were designed to be called from the
scheduler's context probe_wakeup_sched_switch() / probe_wakeup().
__blk_add_trace() calls tracing_record_cmdline(current) from process
context where preemption is fully enabled, triggering the following
splat on using blktests/blktrace/002:
blktrace/002 (blktrace ftrace corruption with sysfs trace) [failed]
runtime 0.367s ... 0.437s
something found in dmesg:
[ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[ 81.239580] null_blk: disk nullb1 created
[ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[ 81.362842] caller is tracing_record_cmdline+0x10/0x40
[ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full)
[ 81.362877] Tainted: [N]=TEST
[ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 81.362881] Call Trace:
[ 81.362884] <TASK>
[ 81.362886] dump_stack_lvl+0x8d/0xb0
...
(See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message)
[ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[ 81.239580] null_blk: disk nullb1 created
[ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[ 81.362842] caller is tracing_record_cmdline+0x10/0x40
[ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full)
[ 81.362877] Tainted: [N]=TEST
[ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 81.362881] Call Trace:
[ 81.362884] <TASK>
[ 81.362886] dump_stack_lvl+0x8d/0xb0
[ 81.362895] check_preemption_disabled+0xce/0xe0
[ 81.362902] tracing_record_cmdline+0x10/0x40
[ 81.362923] __blk_add_trace+0x307/0x5d0
[ 81.362934] ? lock_acquire+0xe0/0x300
[ 81.362940] ? iov_iter_extract_pages+0x101/0xa30
[ 81.362959] blk_add_trace_bio+0x106/0x1e0
[ 81.362968] submit_bio_noacct_nocheck+0x24b/0x3a0
[ 81.362979] ? lockdep_init_map_type+0x58/0x260
[ 81.362988] submit_bio_wait+0x56/0x90
[ 81.363009] __blkdev_direct_IO_simple+0x16c/0x250
[ 81.363026] ? __pfx_submit_bio_wait_endio+0x10/0x10
[ 81.363038] ? rcu_read_lock_any_held+0x73/0xa0
[ 81.363051] blkdev_read_iter+0xc1/0x140
[ 81.363059] vfs_read+0x20b/0x330
[ 81.363083] ksys_read+0x67/0xe0
[ 81.363090] do_syscall_64+0xbf/0xf00
[ 81.363102] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.363106] RIP: 0033:0x7f281906029d
[ 81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
[ 81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d
[ 81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000
[ 81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000
[ 81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000
[ 81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a
[ 81.363142] </TASK>
The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(),
and blk_add_trace_rq() paths as well.
Fix by wrapping the tracing_record_cmdline() call with
preempt_disable()/preempt_enable(). This is a best-effort "record
the comm string for this PID" operation and briefly disabling
preemption around it is both should be safe.
With this patch now blktests for blktrace pass :-
blktests (master) # ./check blktrace
blktrace/001 (blktrace zone management command tracing) [passed]
runtime 3.652s ... 3.649s
blktrace/002 (blktrace ftrace corruption with sysfs trace) [passed]
runtime 0.437s ... 0.389s
blktests (master) #
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
kernel/trace/blktrace.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 3b7c102a6eb3..488552036583 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -383,7 +383,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
cpu = raw_smp_processor_id();
if (blk_tracer) {
+ preempt_disable_notrace();
tracing_record_cmdline(current);
+ preempt_enable_notrace();
buffer = blk_tr->array_buffer.buffer;
trace_ctx = tracing_gen_ctx_flags(0);
--
2.39.5
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] blktrace: fix __this_cpu_read/write in preemptible context 2026-02-27 5:03 [PATCH] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni @ 2026-02-27 16:48 ` Jens Axboe 2026-02-27 19:19 ` Steven Rostedt 1 sibling, 0 replies; 4+ messages in thread From: Jens Axboe @ 2026-02-27 16:48 UTC (permalink / raw) To: Chaitanya Kulkarni, rostedt, mhiramat, mathieu.desnoyers Cc: shinichiro.kawasaki, linux-block, linux-trace-kernel On 2/26/26 10:03 PM, Chaitanya Kulkarni wrote: > tracing_record_cmdline() internally uses __this_cpu_read() and > __this_cpu_write() on the per-CPU variable trace_taskinfo_save, and > trace_save_cmdline() explicitly asserts preemption is disabled via > lockdep_assert_preemption_disabled(). These operations are safe only > when preemption is off, as they were designed to be called from the > scheduler's context probe_wakeup_sched_switch() / probe_wakeup(). > > __blk_add_trace() calls tracing_record_cmdline(current) from process > context where preemption is fully enabled, triggering the following > splat on using blktests/blktrace/002: > > blktrace/002 (blktrace ftrace corruption with sysfs trace) [failed] > runtime 0.367s ... 0.437s > something found in dmesg: > [ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33 > [ 81.239580] null_blk: disk nullb1 created > [ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516 > [ 81.362842] caller is tracing_record_cmdline+0x10/0x40 > [ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full) > [ 81.362877] Tainted: [N]=TEST > [ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 > [ 81.362881] Call Trace: > [ 81.362884] <TASK> > [ 81.362886] dump_stack_lvl+0x8d/0xb0 > ... > (See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message) > > [ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33 > [ 81.239580] null_blk: disk nullb1 created > [ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516 > [ 81.362842] caller is tracing_record_cmdline+0x10/0x40 > [ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full) > [ 81.362877] Tainted: [N]=TEST > [ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 > [ 81.362881] Call Trace: > [ 81.362884] <TASK> > [ 81.362886] dump_stack_lvl+0x8d/0xb0 > [ 81.362895] check_preemption_disabled+0xce/0xe0 > [ 81.362902] tracing_record_cmdline+0x10/0x40 > [ 81.362923] __blk_add_trace+0x307/0x5d0 > [ 81.362934] ? lock_acquire+0xe0/0x300 > [ 81.362940] ? iov_iter_extract_pages+0x101/0xa30 > [ 81.362959] blk_add_trace_bio+0x106/0x1e0 > [ 81.362968] submit_bio_noacct_nocheck+0x24b/0x3a0 > [ 81.362979] ? lockdep_init_map_type+0x58/0x260 > [ 81.362988] submit_bio_wait+0x56/0x90 > [ 81.363009] __blkdev_direct_IO_simple+0x16c/0x250 > [ 81.363026] ? __pfx_submit_bio_wait_endio+0x10/0x10 > [ 81.363038] ? rcu_read_lock_any_held+0x73/0xa0 > [ 81.363051] blkdev_read_iter+0xc1/0x140 > [ 81.363059] vfs_read+0x20b/0x330 > [ 81.363083] ksys_read+0x67/0xe0 > [ 81.363090] do_syscall_64+0xbf/0xf00 > [ 81.363102] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 81.363106] RIP: 0033:0x7f281906029d > [ 81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec > [ 81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [ 81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d > [ 81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000 > [ 81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000 > [ 81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000 > [ 81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a > [ 81.363142] </TASK> Is this a new issue? Needs a Fixes tag. -- Jens Axboe ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] blktrace: fix __this_cpu_read/write in preemptible context 2026-02-27 5:03 [PATCH] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni 2026-02-27 16:48 ` Jens Axboe @ 2026-02-27 19:19 ` Steven Rostedt 2026-02-27 23:08 ` Chaitanya Kulkarni 1 sibling, 1 reply; 4+ messages in thread From: Steven Rostedt @ 2026-02-27 19:19 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: axboe, mhiramat, mathieu.desnoyers, shinichiro.kawasaki, linux-block, linux-trace-kernel On Thu, 26 Feb 2026 21:03:03 -0800 Chaitanya Kulkarni <kch@nvidia.com> wrote: > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c > index 3b7c102a6eb3..488552036583 100644 > --- a/kernel/trace/blktrace.c > +++ b/kernel/trace/blktrace.c > @@ -383,7 +383,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, > cpu = raw_smp_processor_id(); > > if (blk_tracer) { > + preempt_disable_notrace(); > tracing_record_cmdline(current); > + preempt_enable_notrace(); > > buffer = blk_tr->array_buffer.buffer; > trace_ctx = tracing_gen_ctx_flags(0); Do you know when this started? rcu_read_lock() doesn't disable preemption in PREEMPT environments, and hasn't for a very long time. I'm surprised it took this long to detect this? Perhaps this was a bug from day one? Anyway, the tracing_record_cmdline() is to update the COMM cache so that the trace has way to show the task->comm based on the saved PID in the trace. It sets a flag to record the COMM from the sched_switch event if a trace event happened. It's not needed if no trace event occurred. That means, instead of adding preempt_disable() here, just move it after the ring buffer event is reserved, as that means preemption is disabled until the event is committed. i.e. diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index e6988929ead2..3735cbc1f99f 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -383,8 +383,6 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, cpu = raw_smp_processor_id(); if (blk_tracer) { - tracing_record_cmdline(current); - buffer = blk_tr->array_buffer.buffer; trace_ctx = tracing_gen_ctx_flags(0); switch (bt->version) { @@ -419,6 +417,8 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, if (!event) return; + tracing_record_cmdline(current); + switch (bt->version) { case 1: record_blktrace_event(ring_buffer_event_data(event), -- Steve ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] blktrace: fix __this_cpu_read/write in preemptible context 2026-02-27 19:19 ` Steven Rostedt @ 2026-02-27 23:08 ` Chaitanya Kulkarni 0 siblings, 0 replies; 4+ messages in thread From: Chaitanya Kulkarni @ 2026-02-27 23:08 UTC (permalink / raw) To: Steven Rostedt, axboe@kernel.dk Cc: mhiramat@kernel.org, mathieu.desnoyers@efficios.com, shinichiro.kawasaki@wdc.com, linux-block@vger.kernel.org, Chaitanya Kulkarni, linux-trace-kernel@vger.kernel.org On 2/27/26 11:19, Steven Rostedt wrote: > On Thu, 26 Feb 2026 21:03:03 -0800 > Chaitanya Kulkarni <kch@nvidia.com> wrote: > >> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c >> index 3b7c102a6eb3..488552036583 100644 >> --- a/kernel/trace/blktrace.c >> +++ b/kernel/trace/blktrace.c >> @@ -383,7 +383,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, >> cpu = raw_smp_processor_id(); >> >> if (blk_tracer) { >> + preempt_disable_notrace(); >> tracing_record_cmdline(current); >> + preempt_enable_notrace(); >> >> buffer = blk_tr->array_buffer.buffer; >> trace_ctx = tracing_gen_ctx_flags(0); > Do you know when this started? rcu_read_lock() doesn't disable preemption > in PREEMPT environments, and hasn't for a very long time. I'm surprised it > took this long to detect this? Perhaps this was a bug from day one? This started with latest pull which I did on Wed. Last time same test passed on 2/11/26 since I ran blktests and posted following patch on 2/11/26 12:47 pacific :- [PATCH V4] blktrace: log dropped REQ_OP_ZONE_XXX events ver1 Shinichiro CC'd here also reported same bug. I think Fixes tag would be :- Fixes: 7ffbd48d5cab ("tracing: Cache comms only after an event occurred") Since above commit added __this_cpu_read(trace_cmdline_save) and __this_cpu_write(trace_cmdline_save) :- diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b90a827a4641..88111b08b2c1 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -77,6 +77,13 @@ static int dummy_set_flag(u32 old_flags, u32 bit, int set) return 0; } +/* + * To prevent the comm cache from being overwritten when no + * tracing is active, only save the comm when a trace event + * occurred. + */ +static DEFINE_PER_CPU(bool, trace_cmdline_save); + /* * Kill all tracing for good (never come back). * It is initialized to 1 but will turn to zero if the initialization @@ -1135,6 +1142,11 @@ void tracing_record_cmdline(struct task_struct *tsk) !tracing_is_on()) return; + if (!__this_cpu_read(trace_cmdline_save)) + return; + + __this_cpu_write(trace_cmdline_save, false); + trace_save_cmdline(tsk); } Full disclosure I've no idea why it has started showing up this month. I've worked on blktrace extensively and tested my code a lot for form 2020-2021 after above commit but never has seen this even with my patches. > Anyway, the tracing_record_cmdline() is to update the COMM cache so that > the trace has way to show the task->comm based on the saved PID in the > trace. It sets a flag to record the COMM from the sched_switch event if a > trace event happened. It's not needed if no trace event occurred. That > means, instead of adding preempt_disable() here, just move it after the > ring buffer event is reserved, as that means preemption is disabled until > the event is committed. > > i.e. > > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c > index e6988929ead2..3735cbc1f99f 100644 > --- a/kernel/trace/blktrace.c > +++ b/kernel/trace/blktrace.c > @@ -383,8 +383,6 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, > cpu = raw_smp_processor_id(); > > if (blk_tracer) { > - tracing_record_cmdline(current); > - > buffer = blk_tr->array_buffer.buffer; > trace_ctx = tracing_gen_ctx_flags(0); > switch (bt->version) { > @@ -419,6 +417,8 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes, > if (!event) > return; > > + tracing_record_cmdline(current); > + > switch (bt->version) { > case 1: > record_blktrace_event(ring_buffer_event_data(event), > > -- Steve Above does fix the problem and make testcase pass :- blktests (master) # ./check blktrace blktrace/001 (blktrace zone management command tracing) [passed] runtime 3.650s ... 3.647s blktrace/002 (blktrace ftrace corruption with sysfs trace) [passed] runtime 0.411s ... 0.384s blktests (master) # I'll send a V2. -ck ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-27 23:08 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-27 5:03 [PATCH] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni 2026-02-27 16:48 ` Jens Axboe 2026-02-27 19:19 ` Steven Rostedt 2026-02-27 23:08 ` Chaitanya Kulkarni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox