From: Chaitanya Kulkarni <chaitanyak@nvidia.com>
To: Steven Rostedt <rostedt@goodmis.org>,
"axboe@kernel.dk" <axboe@kernel.dk>
Cc: "mhiramat@kernel.org" <mhiramat@kernel.org>,
"mathieu.desnoyers@efficios.com" <mathieu.desnoyers@efficios.com>,
"shinichiro.kawasaki@wdc.com" <shinichiro.kawasaki@wdc.com>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
"linux-trace-kernel@vger.kernel.org"
<linux-trace-kernel@vger.kernel.org>
Subject: Re: [PATCH] blktrace: fix __this_cpu_read/write in preemptible context
Date: Fri, 27 Feb 2026 23:08:41 +0000 [thread overview]
Message-ID: <be02f48a-a4fa-4d4d-814a-61b01e9d62a2@nvidia.com> (raw)
In-Reply-To: <20260227141951.102e19ce@gandalf.local.home>
On 2/27/26 11:19, Steven Rostedt wrote:
> On Thu, 26 Feb 2026 21:03:03 -0800
> Chaitanya Kulkarni <kch@nvidia.com> wrote:
>
>> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
>> index 3b7c102a6eb3..488552036583 100644
>> --- a/kernel/trace/blktrace.c
>> +++ b/kernel/trace/blktrace.c
>> @@ -383,7 +383,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
>> cpu = raw_smp_processor_id();
>>
>> if (blk_tracer) {
>> + preempt_disable_notrace();
>> tracing_record_cmdline(current);
>> + preempt_enable_notrace();
>>
>> buffer = blk_tr->array_buffer.buffer;
>> trace_ctx = tracing_gen_ctx_flags(0);
> Do you know when this started? rcu_read_lock() doesn't disable preemption
> in PREEMPT environments, and hasn't for a very long time. I'm surprised it
> took this long to detect this? Perhaps this was a bug from day one?
This started with latest pull which I did on Wed.
Last time same test passed on 2/11/26 since I ran blktests and posted
following patch on 2/11/26 12:47 pacific :-
[PATCH V4] blktrace: log dropped REQ_OP_ZONE_XXX events ver1
Shinichiro CC'd here also reported same bug.
I think Fixes tag would be :-
Fixes: 7ffbd48d5cab ("tracing: Cache comms only after an event occurred")
Since above commit added __this_cpu_read(trace_cmdline_save) and
__this_cpu_write(trace_cmdline_save) :-
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b90a827a4641..88111b08b2c1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -77,6 +77,13 @@ static int dummy_set_flag(u32 old_flags, u32 bit, int set)
return 0;
}
+/*
+ * To prevent the comm cache from being overwritten when no
+ * tracing is active, only save the comm when a trace event
+ * occurred.
+ */
+static DEFINE_PER_CPU(bool, trace_cmdline_save);
+
/*
* Kill all tracing for good (never come back).
* It is initialized to 1 but will turn to zero if the initialization
@@ -1135,6 +1142,11 @@ void tracing_record_cmdline(struct task_struct *tsk)
!tracing_is_on())
return;
+ if (!__this_cpu_read(trace_cmdline_save))
+ return;
+
+ __this_cpu_write(trace_cmdline_save, false);
+
trace_save_cmdline(tsk);
}
Full disclosure I've no idea why it has started showing up this month.
I've worked on blktrace extensively and tested my code a lot for form
2020-2021 after above commit but never has seen this even with my patches.
> Anyway, the tracing_record_cmdline() is to update the COMM cache so that
> the trace has way to show the task->comm based on the saved PID in the
> trace. It sets a flag to record the COMM from the sched_switch event if a
> trace event happened. It's not needed if no trace event occurred. That
> means, instead of adding preempt_disable() here, just move it after the
> ring buffer event is reserved, as that means preemption is disabled until
> the event is committed.
>
> i.e.
>
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index e6988929ead2..3735cbc1f99f 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -383,8 +383,6 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
> cpu = raw_smp_processor_id();
>
> if (blk_tracer) {
> - tracing_record_cmdline(current);
> -
> buffer = blk_tr->array_buffer.buffer;
> trace_ctx = tracing_gen_ctx_flags(0);
> switch (bt->version) {
> @@ -419,6 +417,8 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
> if (!event)
> return;
>
> + tracing_record_cmdline(current);
> +
> switch (bt->version) {
> case 1:
> record_blktrace_event(ring_buffer_event_data(event),
>
> -- Steve
Above does fix the problem and make testcase pass :-
blktests (master) # ./check blktrace
blktrace/001 (blktrace zone management command tracing) [passed]
runtime 3.650s ... 3.647s
blktrace/002 (blktrace ftrace corruption with sysfs trace) [passed]
runtime 0.411s ... 0.384s
blktests (master) #
I'll send a V2.
-ck
prev parent reply other threads:[~2026-02-27 23:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 5:03 [PATCH] blktrace: fix __this_cpu_read/write in preemptible context Chaitanya Kulkarni
2026-02-27 16:48 ` Jens Axboe
2026-02-27 19:19 ` Steven Rostedt
2026-02-27 23:08 ` Chaitanya Kulkarni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=be02f48a-a4fa-4d4d-814a-61b01e9d62a2@nvidia.com \
--to=chaitanyak@nvidia.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rostedt@goodmis.org \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox