* [PATCH v2 01/15] blktrace: only calculate trace length once
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:12 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 02/15] blktrace: factor out recording a blktrace event Johannes Thumshirn
` (13 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
De-duplicate the calculation of the trace length instead of doing the
calculation twice, once for calling trace_buffer_lock_reserve() and once
for calling relay_reserve().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
kernel/trace/blktrace.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 6941145b5058..bc4b885f2cec 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -76,13 +76,14 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
int cpu = smp_processor_id();
bool blk_tracer = blk_tracer_enabled;
ssize_t cgid_len = cgid ? sizeof(cgid) : 0;
+ size_t trace_len;
+ trace_len = sizeof(*t) + cgid_len + len;
if (blk_tracer) {
buffer = blk_tr->array_buffer.buffer;
trace_ctx = tracing_gen_ctx_flags(0);
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
- sizeof(*t) + len + cgid_len,
- trace_ctx);
+ trace_len, trace_ctx);
if (!event)
return;
t = ring_buffer_event_data(event);
@@ -92,7 +93,7 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
if (!bt->rchan)
return;
- t = relay_reserve(bt->rchan, sizeof(*t) + len + cgid_len);
+ t = relay_reserve(bt->rchan, trace_len);
if (t) {
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
t->time = ktime_to_ns(ktime_get());
@@ -228,6 +229,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
bool blk_tracer = blk_tracer_enabled;
ssize_t cgid_len = cgid ? sizeof(cgid) : 0;
const enum req_op op = opf & REQ_OP_MASK;
+ size_t trace_len;
if (unlikely(bt->trace_state != Blktrace_running && !blk_tracer))
return;
@@ -250,14 +252,14 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
return;
cpu = raw_smp_processor_id();
+ trace_len = sizeof(*t) + pdu_len + cgid_len;
if (blk_tracer) {
tracing_record_cmdline(current);
buffer = blk_tr->array_buffer.buffer;
trace_ctx = tracing_gen_ctx_flags(0);
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
- sizeof(*t) + pdu_len + cgid_len,
- trace_ctx);
+ trace_len, trace_ctx);
if (!event)
return;
t = ring_buffer_event_data(event);
@@ -273,7 +275,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
* from coming in and stepping on our toes.
*/
local_irq_save(flags);
- t = relay_reserve(bt->rchan, sizeof(*t) + pdu_len + cgid_len);
+ t = relay_reserve(bt->rchan, trace_len);
if (t) {
sequence = per_cpu_ptr(bt->sequence, cpu);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 01/15] blktrace: only calculate trace length once
2025-09-25 15:02 ` [PATCH v2 01/15] blktrace: only calculate trace length once Johannes Thumshirn
@ 2025-10-01 6:12 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:12 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> De-duplicate the calculation of the trace length instead of doing the
> calculation twice, once for calling trace_buffer_lock_reserve() and once
> for calling relay_reserve().
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 02/15] blktrace: factor out recording a blktrace event
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
2025-09-25 15:02 ` [PATCH v2 01/15] blktrace: only calculate trace length once Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:14 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 03/15] blktrace: split out relaying " Johannes Thumshirn
` (12 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Factor out the recording of a blktrace event into its own function,
deduplicating the code.
This also enables recording different versions of the blktrace protocol
later on.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
kernel/trace/blktrace.c | 89 +++++++++++++++++++++++------------------
1 file changed, 49 insertions(+), 40 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index bc4b885f2cec..25a0a1b09747 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -63,6 +63,34 @@ static int blk_probes_ref;
static void blk_register_tracepoints(void);
static void blk_unregister_tracepoints(void);
+static void record_blktrace_event(struct blk_io_trace *t, pid_t pid, int cpu,
+ sector_t sector, int bytes, u32 what,
+ dev_t dev, int error, u64 cgid,
+ ssize_t cgid_len, void *pdu_data, int pdu_len)
+
+{
+ /*
+ * These two are not needed in ftrace as they are in the
+ * generic trace_entry, filled by tracing_generic_entry_update,
+ * but for the trace_event->bin() synthesizer benefit we do it
+ * here too.
+ */
+ t->cpu = cpu;
+ t->pid = pid;
+
+ t->sector = sector;
+ t->bytes = bytes;
+ t->action = what;
+ t->device = dev;
+ t->error = error;
+ t->pdu_len = pdu_len + cgid_len;
+
+ if (cgid_len)
+ memcpy((void *)t + sizeof(*t), &cgid, cgid_len);
+ if (pdu_len)
+ memcpy((void *)t + sizeof(*t) + cgid_len, pdu_data, pdu_len);
+}
+
/*
* Send out a notify message.
*/
@@ -87,7 +115,12 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
if (!event)
return;
t = ring_buffer_event_data(event);
- goto record_it;
+ record_blktrace_event(t, pid, cpu, 0, 0,
+ action | (cgid ? __BLK_TN_CGROUP : 0),
+ bt->dev, 0, cgid, cgid_len, (void *)data,
+ len);
+ trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
+ return;
}
if (!bt->rchan)
@@ -97,18 +130,11 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
if (t) {
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
t->time = ktime_to_ns(ktime_get());
-record_it:
- t->device = bt->dev;
- t->action = action | (cgid ? __BLK_TN_CGROUP : 0);
- t->pid = pid;
- t->cpu = cpu;
- t->pdu_len = len + cgid_len;
- if (cgid_len)
- memcpy((void *)t + sizeof(*t), &cgid, cgid_len);
- memcpy((void *) t + sizeof(*t) + cgid_len, data, len);
-
- if (blk_tracer)
- trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
+
+ record_blktrace_event(t, pid, cpu, 0, 0,
+ action | (cgid ? __BLK_TN_CGROUP : 0),
+ bt->dev, 0, cgid, cgid_len, (void *)data,
+ len);
}
}
@@ -263,7 +289,12 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
if (!event)
return;
t = ring_buffer_event_data(event);
- goto record_it;
+
+ record_blktrace_event(t, pid, cpu, sector, bytes, what, bt->dev,
+ error, cgid, cgid_len, pdu_data, pdu_len);
+
+ trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
+ return;
}
if (unlikely(tsk->btrace_seq != blktrace_seq))
@@ -282,32 +313,10 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
t->sequence = ++(*sequence);
t->time = ktime_to_ns(ktime_get());
-record_it:
- /*
- * These two are not needed in ftrace as they are in the
- * generic trace_entry, filled by tracing_generic_entry_update,
- * but for the trace_event->bin() synthesizer benefit we do it
- * here too.
- */
- t->cpu = cpu;
- t->pid = pid;
-
- t->sector = sector;
- t->bytes = bytes;
- t->action = what;
- t->device = bt->dev;
- t->error = error;
- t->pdu_len = pdu_len + cgid_len;
-
- if (cgid_len)
- memcpy((void *)t + sizeof(*t), &cgid, cgid_len);
- if (pdu_len)
- memcpy((void *)t + sizeof(*t) + cgid_len, pdu_data, pdu_len);
-
- if (blk_tracer) {
- trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
- return;
- }
+
+ record_blktrace_event(t, pid, cpu, sector, bytes, what,
+ bt->dev, error, cgid, cgid_len,
+ pdu_data, pdu_len);
}
local_irq_restore(flags);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 02/15] blktrace: factor out recording a blktrace event
2025-09-25 15:02 ` [PATCH v2 02/15] blktrace: factor out recording a blktrace event Johannes Thumshirn
@ 2025-10-01 6:14 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:14 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Factor out the recording of a blktrace event into its own function,
> deduplicating the code.
>
> This also enables recording different versions of the blktrace protocol
> later on.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 03/15] blktrace: split out relaying a blktrace event
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
2025-09-25 15:02 ` [PATCH v2 01/15] blktrace: only calculate trace length once Johannes Thumshirn
2025-09-25 15:02 ` [PATCH v2 02/15] blktrace: factor out recording a blktrace event Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:19 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 04/15] blktrace: untangle if/else sequence in __blk_add_trace Johannes Thumshirn
` (11 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Split out the code relaying a blktrace event to user-space using relayfs.
This enables adding a second version supporting a new version of the
protocol.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
kernel/trace/blktrace.c | 60 ++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 28 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 25a0a1b09747..51745832c713 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -91,6 +91,26 @@ static void record_blktrace_event(struct blk_io_trace *t, pid_t pid, int cpu,
memcpy((void *)t + sizeof(*t) + cgid_len, pdu_data, pdu_len);
}
+static void relay_blktrace_event(struct blk_trace *bt, unsigned long sequence,
+ pid_t pid, int cpu, sector_t sector, int bytes,
+ u32 what, int error, u64 cgid,
+ ssize_t cgid_len, void *pdu_data, int pdu_len)
+{
+ struct blk_io_trace *t;
+ size_t trace_len = sizeof(*t) + pdu_len + cgid_len;
+
+ t = relay_reserve(bt->rchan, trace_len);
+ if (!t)
+ return;
+
+ t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
+ t->sequence = sequence;
+ t->time = ktime_to_ns(ktime_get());
+
+ record_blktrace_event(t, pid, cpu, sector, bytes, what, bt->dev, error,
+ cgid, cgid_len, pdu_data, pdu_len);
+}
+
/*
* Send out a notify message.
*/
@@ -126,16 +146,9 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
if (!bt->rchan)
return;
- t = relay_reserve(bt->rchan, trace_len);
- if (t) {
- t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
- t->time = ktime_to_ns(ktime_get());
-
- record_blktrace_event(t, pid, cpu, 0, 0,
- action | (cgid ? __BLK_TN_CGROUP : 0),
- bt->dev, 0, cgid, cgid_len, (void *)data,
- len);
- }
+ relay_blktrace_event(bt, 0, pid, cpu, 0, 0,
+ action | (cgid ? __BLK_TN_CGROUP : 0), 0, cgid,
+ cgid_len, (void *)data, len);
}
/*
@@ -246,7 +259,6 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
struct task_struct *tsk = current;
struct ring_buffer_event *event = NULL;
struct trace_buffer *buffer = NULL;
- struct blk_io_trace *t;
unsigned long flags = 0;
unsigned long *sequence;
unsigned int trace_ctx = 0;
@@ -278,20 +290,21 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
return;
cpu = raw_smp_processor_id();
- trace_len = sizeof(*t) + pdu_len + cgid_len;
if (blk_tracer) {
tracing_record_cmdline(current);
buffer = blk_tr->array_buffer.buffer;
trace_ctx = tracing_gen_ctx_flags(0);
+ trace_len = sizeof(struct blk_io_trace) + pdu_len + cgid_len;
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
trace_len, trace_ctx);
if (!event)
return;
- t = ring_buffer_event_data(event);
- record_blktrace_event(t, pid, cpu, sector, bytes, what, bt->dev,
- error, cgid, cgid_len, pdu_data, pdu_len);
+ record_blktrace_event(ring_buffer_event_data(event),
+ pid, cpu, sector, bytes, what, bt->dev,
+ error, cgid, cgid_len, pdu_data,
+ pdu_len);
trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
return;
@@ -306,19 +319,10 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
* from coming in and stepping on our toes.
*/
local_irq_save(flags);
- t = relay_reserve(bt->rchan, trace_len);
- if (t) {
- sequence = per_cpu_ptr(bt->sequence, cpu);
-
- t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
- t->sequence = ++(*sequence);
- t->time = ktime_to_ns(ktime_get());
-
- record_blktrace_event(t, pid, cpu, sector, bytes, what,
- bt->dev, error, cgid, cgid_len,
- pdu_data, pdu_len);
- }
-
+ sequence = per_cpu_ptr(bt->sequence, cpu);
+ (*sequence)++;
+ relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes, what,
+ error, cgid, cgid_len, pdu_data, pdu_len);
local_irq_restore(flags);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 03/15] blktrace: split out relaying a blktrace event
2025-09-25 15:02 ` [PATCH v2 03/15] blktrace: split out relaying " Johannes Thumshirn
@ 2025-10-01 6:19 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:19 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Split out the code relaying a blktrace event to user-space using relayfs.
>
> This enables adding a second version supporting a new version of the
> protocol.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 04/15] blktrace: untangle if/else sequence in __blk_add_trace
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (2 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 03/15] blktrace: split out relaying " Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:19 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 05/15] blktrace: change the internal action to 64bit Johannes Thumshirn
` (10 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Untangle the if/else sequence setting the trace action in
__blk_add_trace() and turn it into a switch statement for better
extensibility.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
kernel/trace/blktrace.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 51745832c713..11e264f67851 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -278,10 +278,19 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
what |= MASK_TC_BIT(opf, META);
what |= MASK_TC_BIT(opf, PREFLUSH);
what |= MASK_TC_BIT(opf, FUA);
- if (op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE)
+
+ switch (op) {
+ case REQ_OP_DISCARD:
+ case REQ_OP_SECURE_ERASE:
what |= BLK_TC_ACT(BLK_TC_DISCARD);
- if (op == REQ_OP_FLUSH)
+ break;
+ case REQ_OP_FLUSH:
what |= BLK_TC_ACT(BLK_TC_FLUSH);
+ break;
+ default:
+ break;
+ }
+
if (cgid)
what |= __BLK_TA_CGROUP;
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 04/15] blktrace: untangle if/else sequence in __blk_add_trace
2025-09-25 15:02 ` [PATCH v2 04/15] blktrace: untangle if/else sequence in __blk_add_trace Johannes Thumshirn
@ 2025-10-01 6:19 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:19 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Untangle the if/else sequence setting the trace action in
> __blk_add_trace() and turn it into a switch statement for better
> extensibility.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 05/15] blktrace: change the internal action to 64bit
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (3 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 04/15] blktrace: untangle if/else sequence in __blk_add_trace Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:21 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 06/15] blktrace: split do_blk_trace_setup into two functions Johannes Thumshirn
` (9 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Change the internal use of the action in blktrace to 64bit. Although for
now only the lower 32bits will be used.
With the upcoming version 2 of the blktrace user-space protocol the upper
32bit will also be utilized.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
kernel/trace/blktrace.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 11e264f67851..51c001e4981c 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -127,6 +127,7 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
size_t trace_len;
trace_len = sizeof(*t) + cgid_len + len;
+ action = lower_32_bits(action | (cgid ? __BLK_TN_CGROUP : 0));
if (blk_tracer) {
buffer = blk_tr->array_buffer.buffer;
trace_ctx = tracing_gen_ctx_flags(0);
@@ -136,9 +137,8 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
return;
t = ring_buffer_event_data(event);
record_blktrace_event(t, pid, cpu, 0, 0,
- action | (cgid ? __BLK_TN_CGROUP : 0),
- bt->dev, 0, cgid, cgid_len, (void *)data,
- len);
+ action, bt->dev, 0, cgid, cgid_len,
+ (void *)data, len);
trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
return;
}
@@ -146,8 +146,7 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
if (!bt->rchan)
return;
- relay_blktrace_event(bt, 0, pid, cpu, 0, 0,
- action | (cgid ? __BLK_TN_CGROUP : 0), 0, cgid,
+ relay_blktrace_event(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
cgid_len, (void *)data, len);
}
@@ -222,7 +221,7 @@ void __blk_trace_note_message(struct blk_trace *bt,
}
EXPORT_SYMBOL_GPL(__blk_trace_note_message);
-static int act_log_check(struct blk_trace *bt, u32 what, sector_t sector,
+static int act_log_check(struct blk_trace *bt, u64 what, sector_t sector,
pid_t pid)
{
if (((bt->act_mask << BLK_TC_SHIFT) & what) == 0)
@@ -253,7 +252,7 @@ static const u32 ddir_act[2] = { BLK_TC_ACT(BLK_TC_READ),
* blk_io_trace structure and places it in a per-cpu subbuffer.
*/
static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
- const blk_opf_t opf, u32 what, int error,
+ const blk_opf_t opf, u64 what, int error,
int pdu_len, void *pdu_data, u64 cgid)
{
struct task_struct *tsk = current;
@@ -311,9 +310,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
return;
record_blktrace_event(ring_buffer_event_data(event),
- pid, cpu, sector, bytes, what, bt->dev,
- error, cgid, cgid_len, pdu_data,
- pdu_len);
+ pid, cpu, sector, bytes,
+ lower_32_bits(what), bt->dev, error,
+ cgid, cgid_len, pdu_data, pdu_len);
trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx);
return;
@@ -330,8 +329,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
local_irq_save(flags);
sequence = per_cpu_ptr(bt->sequence, cpu);
(*sequence)++;
- relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes, what,
- error, cgid, cgid_len, pdu_data, pdu_len);
+ relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes,
+ lower_32_bits(what), error, cgid, cgid_len,
+ pdu_data, pdu_len);
local_irq_restore(flags);
}
@@ -818,7 +818,7 @@ blk_trace_request_get_cgid(struct request *rq)
*
**/
static void blk_add_trace_rq(struct request *rq, blk_status_t error,
- unsigned int nr_bytes, u32 what, u64 cgid)
+ unsigned int nr_bytes, u64 what, u64 cgid)
{
struct blk_trace *bt;
@@ -882,7 +882,7 @@ static void blk_add_trace_rq_complete(void *ignore, struct request *rq,
*
**/
static void blk_add_trace_bio(struct request_queue *q, struct bio *bio,
- u32 what, int error)
+ u64 what, int error)
{
struct blk_trace *bt;
@@ -948,7 +948,7 @@ static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
bt = rcu_dereference(q->blk_trace);
if (bt) {
__be64 rpdu = cpu_to_be64(depth);
- u32 what;
+ u64 what;
if (explicit)
what = BLK_TA_UNPLUG_IO;
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 05/15] blktrace: change the internal action to 64bit
2025-09-25 15:02 ` [PATCH v2 05/15] blktrace: change the internal action to 64bit Johannes Thumshirn
@ 2025-10-01 6:21 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:21 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Change the internal use of the action in blktrace to 64bit. Although for
> now only the lower 32bits will be used.
>
> With the upcoming version 2 of the blktrace user-space protocol the upper
> 32bit will also be utilized.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 06/15] blktrace: split do_blk_trace_setup into two functions
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (4 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 05/15] blktrace: change the internal action to 64bit Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:25 ` Damien Le Moal
2025-10-03 7:27 ` Christoph Hellwig
2025-09-25 15:02 ` [PATCH v2 07/15] blktrace: add definitions for blk_user_trace_setup2 Johannes Thumshirn
` (8 subsequent siblings)
14 siblings, 2 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Split do_blk_trace_setup into two functions, this is done to prepare for
an incoming new BLKTRACESETUP2 ioctl(2) which can receive extended
parameters form user-space.
Also move the size verification logic to the callers.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
kernel/trace/blktrace.c | 95 ++++++++++++++++++++++++-----------------
1 file changed, 57 insertions(+), 38 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 51c001e4981c..f6a41e9510f6 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -518,9 +518,10 @@ static void blk_trace_setup_lba(struct blk_trace *bt,
/*
* Setup everything required to start tracing
*/
-static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
- struct block_device *bdev,
- struct blk_user_trace_setup *buts)
+static struct blk_trace *blk_trace_setup_prepare(struct request_queue *q,
+ char *name, dev_t dev,
+ u32 buf_size, u32 buf_nr,
+ struct block_device *bdev)
{
struct blk_trace *bt = NULL;
struct dentry *dir = NULL;
@@ -528,31 +529,19 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
lockdep_assert_held(&q->debugfs_mutex);
- if (!buts->buf_size || !buts->buf_nr)
- return -EINVAL;
-
- strscpy_pad(buts->name, name, BLKTRACE_BDEV_SIZE);
-
- /*
- * some device names have larger paths - convert the slashes
- * to underscores for this to work as expected
- */
- strreplace(buts->name, '/', '_');
-
/*
* bdev can be NULL, as with scsi-generic, this is a helpful as
* we can be.
*/
if (rcu_dereference_protected(q->blk_trace,
lockdep_is_held(&q->debugfs_mutex))) {
- pr_warn("Concurrent blktraces are not allowed on %s\n",
- buts->name);
- return -EBUSY;
+ pr_warn("Concurrent blktraces are not allowed on %s\n", name);
+ return ERR_PTR(-EBUSY);
}
bt = kzalloc(sizeof(*bt), GFP_KERNEL);
if (!bt)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
bt->sequence = alloc_percpu(unsigned long);
@@ -572,7 +561,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
if (bdev && !bdev_is_partition(bdev))
dir = q->debugfs_dir;
else
- bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root);
+ bt->dir = dir = debugfs_create_dir(name, blk_debugfs_root);
/*
* As blktrace relies on debugfs for its interface the debugfs directory
@@ -580,8 +569,7 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
* files or directories.
*/
if (IS_ERR_OR_NULL(dir)) {
- pr_warn("debugfs_dir not present for %s so skipping\n",
- buts->name);
+ pr_warn("debugfs_dir not present for %s so skipping\n", name);
ret = -ENOENT;
goto err;
}
@@ -593,17 +581,39 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
debugfs_create_file("dropped", 0444, dir, bt, &blk_dropped_fops);
debugfs_create_file("msg", 0222, dir, bt, &blk_msg_fops);
- bt->rchan = relay_open("trace", dir, buts->buf_size,
- buts->buf_nr, &blk_relay_callbacks, bt);
+ bt->rchan = relay_open("trace", dir, buf_size, buf_nr,
+ &blk_relay_callbacks, bt);
if (!bt->rchan)
goto err;
+ blk_trace_setup_lba(bt, bdev);
+
+ return bt;
+
+err:
+ if (ret)
+ blk_trace_free(q, bt);
+
+ return ERR_PTR(ret);
+}
+
+static void blk_trace_setup_finalize(struct request_queue *q,
+ char *name, struct blk_trace *bt,
+ struct blk_user_trace_setup *buts)
+
+{
+ strscpy_pad(buts->name, name, BLKTRACE_BDEV_SIZE);
+
+ /*
+ * some device names have larger paths - convert the slashes
+ * to underscores for this to work as expected
+ */
+ strreplace(buts->name, '/', '_');
+
bt->act_mask = buts->act_mask;
if (!bt->act_mask)
bt->act_mask = (u16) -1;
- blk_trace_setup_lba(bt, bdev);
-
/* overwrite with user settings */
if (buts->start_lba)
bt->start_lba = buts->start_lba;
@@ -615,12 +625,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
rcu_assign_pointer(q->blk_trace, bt);
get_probe_ref();
-
- ret = 0;
-err:
- if (ret)
- blk_trace_free(q, bt);
- return ret;
}
int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
@@ -628,17 +632,25 @@ int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
char __user *arg)
{
struct blk_user_trace_setup buts;
+ struct blk_trace *bt;
int ret;
ret = copy_from_user(&buts, arg, sizeof(buts));
if (ret)
return -EFAULT;
+ if (!buts.buf_size || !buts.buf_nr)
+ return -EINVAL;
+
mutex_lock(&q->debugfs_mutex);
- ret = do_blk_trace_setup(q, name, dev, bdev, &buts);
+ bt = blk_trace_setup_prepare(q, name, dev, buts.buf_size, buts.buf_nr,
+ bdev);
+ if (IS_ERR(bt)) {
+ mutex_unlock(&q->debugfs_mutex);
+ return PTR_ERR(bt);
+ }
+ blk_trace_setup_finalize(q, name, bt, &buts);
mutex_unlock(&q->debugfs_mutex);
- if (ret)
- return ret;
if (copy_to_user(arg, &buts, sizeof(buts))) {
blk_trace_remove(q);
@@ -655,11 +667,14 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
{
struct blk_user_trace_setup buts;
struct compat_blk_user_trace_setup cbuts;
- int ret;
+ struct blk_trace *bt;
if (copy_from_user(&cbuts, arg, sizeof(cbuts)))
return -EFAULT;
+ if (!cbuts.buf_size || !cbuts.buf_nr)
+ return -EINVAL;
+
buts = (struct blk_user_trace_setup) {
.act_mask = cbuts.act_mask,
.buf_size = cbuts.buf_size,
@@ -670,10 +685,14 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
};
mutex_lock(&q->debugfs_mutex);
- ret = do_blk_trace_setup(q, name, dev, bdev, &buts);
+ bt = blk_trace_setup_prepare(q, name, dev, buts.buf_size, buts.buf_nr,
+ bdev);
+ if (IS_ERR(bt)) {
+ mutex_unlock(&q->debugfs_mutex);
+ return PTR_ERR(bt);
+ }
+ blk_trace_setup_finalize(q, name, bt, &buts);
mutex_unlock(&q->debugfs_mutex);
- if (ret)
- return ret;
if (copy_to_user(arg, &buts.name, ARRAY_SIZE(buts.name))) {
blk_trace_remove(q);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 06/15] blktrace: split do_blk_trace_setup into two functions
2025-09-25 15:02 ` [PATCH v2 06/15] blktrace: split do_blk_trace_setup into two functions Johannes Thumshirn
@ 2025-10-01 6:25 ` Damien Le Moal
2025-10-03 7:27 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:25 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Split do_blk_trace_setup into two functions, this is done to prepare for
> an incoming new BLKTRACESETUP2 ioctl(2) which can receive extended
> parameters form user-space.
s/form/from
>
> Also move the size verification logic to the callers.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> @@ -593,17 +581,39 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
> debugfs_create_file("dropped", 0444, dir, bt, &blk_dropped_fops);
> debugfs_create_file("msg", 0222, dir, bt, &blk_msg_fops);
>
> - bt->rchan = relay_open("trace", dir, buts->buf_size,
> - buts->buf_nr, &blk_relay_callbacks, bt);
> + bt->rchan = relay_open("trace", dir, buf_size, buf_nr,
> + &blk_relay_callbacks, bt);
> if (!bt->rchan)
> goto err;
>
> + blk_trace_setup_lba(bt, bdev);
> +
> + return bt;
> +
> +err:
> + if (ret)
> + blk_trace_free(q, bt);
I do not think that the "if (ret)" is needed here.
> +
> + return ERR_PTR(ret);
> +}
With that fixed,
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 06/15] blktrace: split do_blk_trace_setup into two functions
2025-09-25 15:02 ` [PATCH v2 06/15] blktrace: split do_blk_trace_setup into two functions Johannes Thumshirn
2025-10-01 6:25 ` Damien Le Moal
@ 2025-10-03 7:27 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2025-10-03 7:27 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Jens Axboe, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block, linux-kernel, linux-trace-kernel, linux-btrace,
John Garry, Hannes Reinecke, Damien Le Moal, Christoph Hellwig,
Naohiro Aota, Shinichiro Kawasaki, Chaitanya Kulkarni,
Martin K . Petersen
On Thu, Sep 25, 2025 at 05:02:22PM +0200, Johannes Thumshirn wrote:
> Split do_blk_trace_setup into two functions, this is done to prepare for
> an incoming new BLKTRACESETUP2 ioctl(2) which can receive extended
> parameters form user-space.
>
> Also move the size verification logic to the callers.
Can you add a why for that move?
Otherwise this looks sane.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 07/15] blktrace: add definitions for blk_user_trace_setup2
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (5 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 06/15] blktrace: split do_blk_trace_setup into two functions Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:27 ` Damien Le Moal
2025-10-03 7:29 ` Christoph Hellwig
2025-09-25 15:02 ` [PATCH v2 08/15] blktrace: pass blk_user_trace2 to setup functions Johannes Thumshirn
` (7 subsequent siblings)
14 siblings, 2 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Add definitions for a version 2 of the blk_user_trace_setup ioctl. This
new will enable a different struct layout of the binary data passed to
user-space when using a new version of the blktrace utility requesting the
new struct layout.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/uapi/linux/blktrace_api.h | 15 +++++++++++++++
include/uapi/linux/fs.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
index 1bfb635e309b..a3b1f35ac026 100644
--- a/include/uapi/linux/blktrace_api.h
+++ b/include/uapi/linux/blktrace_api.h
@@ -143,4 +143,19 @@ struct blk_user_trace_setup {
__u32 pid;
};
+/*
+ * User setup structure passed with BLKTRACESETUP2
+ */
+struct blk_user_trace_setup2 {
+ char name[32]; /* output */
+ __u64 act_mask; /* input */
+ __u32 buf_size; /* input */
+ __u32 buf_nr; /* input */
+ __u64 start_lba;
+ __u64 end_lba;
+ __u32 pid;
+ __u32 flags; /* currently unused */
+ __u64 reserved[7];
+};
+
#endif /* _UAPIBLKTRACE_H */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 0bd678a4a10e..a85d0b52a3f6 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -300,6 +300,7 @@ struct file_attr {
#define BLKGETDISKSEQ _IOR(0x12,128,__u64)
/* 130-136 are used by zoned block device ioctls (uapi/linux/blkzoned.h) */
/* 137-141 are used by blk-crypto ioctls (uapi/linux/blk-crypto.h) */
+#define BLKTRACESETUP2 _IOWR(0x12, 142, struct blk_user_trace_setup2)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 07/15] blktrace: add definitions for blk_user_trace_setup2
2025-09-25 15:02 ` [PATCH v2 07/15] blktrace: add definitions for blk_user_trace_setup2 Johannes Thumshirn
@ 2025-10-01 6:27 ` Damien Le Moal
2025-10-03 7:29 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:27 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Add definitions for a version 2 of the blk_user_trace_setup ioctl. This
> new will enable a different struct layout of the binary data passed to
/s/This new/This new ioctl ?
> user-space when using a new version of the blktrace utility requesting the
> new struct layout.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 07/15] blktrace: add definitions for blk_user_trace_setup2
2025-09-25 15:02 ` [PATCH v2 07/15] blktrace: add definitions for blk_user_trace_setup2 Johannes Thumshirn
2025-10-01 6:27 ` Damien Le Moal
@ 2025-10-03 7:29 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2025-10-03 7:29 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Jens Axboe, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block, linux-kernel, linux-trace-kernel, linux-btrace,
John Garry, Hannes Reinecke, Damien Le Moal, Christoph Hellwig,
Naohiro Aota, Shinichiro Kawasaki, Chaitanya Kulkarni,
Martin K . Petersen
On Thu, Sep 25, 2025 at 05:02:23PM +0200, Johannes Thumshirn wrote:
> +struct blk_user_trace_setup2 {
> + char name[32]; /* output */
Is 32 still a good size limit? Or would this be a time to
allow for more?
Otherwise this looks fine.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 08/15] blktrace: pass blk_user_trace2 to setup functions
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (6 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 07/15] blktrace: add definitions for blk_user_trace_setup2 Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:34 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 09/15] blktrace: add definitions for struct blk_io_trace2 Johannes Thumshirn
` (6 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Pass struct blk_user_trace_setup2 to blktrace_setup_finalize(). This
prepares for the incoming extension of the blktrace protocol with a 64bit
act_mask.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/linux/blktrace_api.h | 3 ++-
kernel/trace/blktrace.c | 27 ++++++++++++++++++++-------
2 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index 122c62e561fc..05c8754456aa 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -14,11 +14,12 @@
#include <linux/sysfs.h>
struct blk_trace {
+ int version;
int trace_state;
struct rchan *rchan;
unsigned long __percpu *sequence;
unsigned char __percpu *msg_data;
- u16 act_mask;
+ u64 act_mask;
u64 start_lba;
u64 end_lba;
u32 pid;
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index f6a41e9510f6..9cd8eb9e7b4b 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -599,7 +599,7 @@ static struct blk_trace *blk_trace_setup_prepare(struct request_queue *q,
static void blk_trace_setup_finalize(struct request_queue *q,
char *name, struct blk_trace *bt,
- struct blk_user_trace_setup *buts)
+ struct blk_user_trace_setup2 *buts)
{
strscpy_pad(buts->name, name, BLKTRACE_BDEV_SIZE);
@@ -631,6 +631,7 @@ int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
struct block_device *bdev,
char __user *arg)
{
+ struct blk_user_trace_setup2 buts2;
struct blk_user_trace_setup buts;
struct blk_trace *bt;
int ret;
@@ -642,6 +643,15 @@ int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
if (!buts.buf_size || !buts.buf_nr)
return -EINVAL;
+ buts2 = (struct blk_user_trace_setup2) {
+ .act_mask = buts.act_mask,
+ .buf_size = buts.buf_size,
+ .buf_nr = buts.buf_nr,
+ .start_lba = buts.start_lba,
+ .end_lba = buts.end_lba,
+ .pid = buts.pid,
+ };
+
mutex_lock(&q->debugfs_mutex);
bt = blk_trace_setup_prepare(q, name, dev, buts.buf_size, buts.buf_nr,
bdev);
@@ -649,7 +659,9 @@ int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
mutex_unlock(&q->debugfs_mutex);
return PTR_ERR(bt);
}
- blk_trace_setup_finalize(q, name, bt, &buts);
+ bt->version = 1;
+ blk_trace_setup_finalize(q, name, bt, &buts2);
+ strcpy(buts.name, buts2.name);
mutex_unlock(&q->debugfs_mutex);
if (copy_to_user(arg, &buts, sizeof(buts))) {
@@ -665,7 +677,7 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
dev_t dev, struct block_device *bdev,
char __user *arg)
{
- struct blk_user_trace_setup buts;
+ struct blk_user_trace_setup2 buts2;
struct compat_blk_user_trace_setup cbuts;
struct blk_trace *bt;
@@ -675,7 +687,7 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
if (!cbuts.buf_size || !cbuts.buf_nr)
return -EINVAL;
- buts = (struct blk_user_trace_setup) {
+ buts2 = (struct blk_user_trace_setup2) {
.act_mask = cbuts.act_mask,
.buf_size = cbuts.buf_size,
.buf_nr = cbuts.buf_nr,
@@ -685,16 +697,17 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
};
mutex_lock(&q->debugfs_mutex);
- bt = blk_trace_setup_prepare(q, name, dev, buts.buf_size, buts.buf_nr,
+ bt = blk_trace_setup_prepare(q, name, dev, buts2.buf_size, buts2.buf_nr,
bdev);
if (IS_ERR(bt)) {
mutex_unlock(&q->debugfs_mutex);
return PTR_ERR(bt);
}
- blk_trace_setup_finalize(q, name, bt, &buts);
+ bt->version = 1;
+ blk_trace_setup_finalize(q, name, bt, &buts2);
mutex_unlock(&q->debugfs_mutex);
- if (copy_to_user(arg, &buts.name, ARRAY_SIZE(buts.name))) {
+ if (copy_to_user(arg, &buts2.name, ARRAY_SIZE(buts2.name))) {
blk_trace_remove(q);
return -EFAULT;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 08/15] blktrace: pass blk_user_trace2 to setup functions
2025-09-25 15:02 ` [PATCH v2 08/15] blktrace: pass blk_user_trace2 to setup functions Johannes Thumshirn
@ 2025-10-01 6:34 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:34 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Pass struct blk_user_trace_setup2 to blktrace_setup_finalize(). This
> prepares for the incoming extension of the blktrace protocol with a 64bit
> act_mask.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
One nit below.
> @@ -649,7 +659,9 @@ int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
> mutex_unlock(&q->debugfs_mutex);
> return PTR_ERR(bt);
> }
> - blk_trace_setup_finalize(q, name, bt, &buts);
> + bt->version = 1;
> + blk_trace_setup_finalize(q, name, bt, &buts2);
I wonder if it may not be cleaner to pass the version number to
blk_trace_setup_finalize() and have that function do "bt->version = version;" ?
> + strcpy(buts.name, buts2.name);
> mutex_unlock(&q->debugfs_mutex);
>
> if (copy_to_user(arg, &buts, sizeof(buts))) {
> @@ -665,7 +677,7 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
> dev_t dev, struct block_device *bdev,
> char __user *arg)
> {
> - struct blk_user_trace_setup buts;
> + struct blk_user_trace_setup2 buts2;
> struct compat_blk_user_trace_setup cbuts;
> struct blk_trace *bt;
>
> @@ -675,7 +687,7 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
> if (!cbuts.buf_size || !cbuts.buf_nr)
> return -EINVAL;
>
> - buts = (struct blk_user_trace_setup) {
> + buts2 = (struct blk_user_trace_setup2) {
> .act_mask = cbuts.act_mask,
> .buf_size = cbuts.buf_size,
> .buf_nr = cbuts.buf_nr,
> @@ -685,16 +697,17 @@ static int compat_blk_trace_setup(struct request_queue *q, char *name,
> };
>
> mutex_lock(&q->debugfs_mutex);
> - bt = blk_trace_setup_prepare(q, name, dev, buts.buf_size, buts.buf_nr,
> + bt = blk_trace_setup_prepare(q, name, dev, buts2.buf_size, buts2.buf_nr,
> bdev);
> if (IS_ERR(bt)) {
> mutex_unlock(&q->debugfs_mutex);
> return PTR_ERR(bt);
> }
> - blk_trace_setup_finalize(q, name, bt, &buts);
> + bt->version = 1;
> + blk_trace_setup_finalize(q, name, bt, &buts2);
> mutex_unlock(&q->debugfs_mutex);
>
> - if (copy_to_user(arg, &buts.name, ARRAY_SIZE(buts.name))) {
> + if (copy_to_user(arg, &buts2.name, ARRAY_SIZE(buts2.name))) {
> blk_trace_remove(q);
> return -EFAULT;
> }
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 09/15] blktrace: add definitions for struct blk_io_trace2
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (7 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 08/15] blktrace: pass blk_user_trace2 to setup functions Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 6:37 ` Damien Le Moal
2025-10-03 7:31 ` Christoph Hellwig
2025-09-25 15:02 ` [PATCH v2 10/15] blktrace: differentiate between blk_io_trace versions Johannes Thumshirn
` (5 subsequent siblings)
14 siblings, 2 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Add definitions for the extended version of the blktrace protocol using a
wider action type to be able to record new actions in the kernel.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/uapi/linux/blktrace_api.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
index a3b1f35ac026..d58ef484de49 100644
--- a/include/uapi/linux/blktrace_api.h
+++ b/include/uapi/linux/blktrace_api.h
@@ -94,6 +94,7 @@ enum blktrace_notify {
#define BLK_IO_TRACE_MAGIC 0x65617400
#define BLK_IO_TRACE_VERSION 0x07
+#define BLK_IO_TRACE2_VERSION 0x08
/*
* The trace itself
@@ -113,6 +114,21 @@ struct blk_io_trace {
/* cgroup id will be stored here if exists */
};
+struct blk_io_trace2 {
+ __u32 magic; /* MAGIC << 8 | BLK_IO_TRACE2_VERSION */
+ __u32 sequence; /* event number */
+ __u64 time; /* in nanoseconds */
+ __u64 sector; /* disk offset */
+ __u32 bytes; /* transfer length */
+ __u32 pid; /* who did it */
+ __u64 action; /* what happened */
+ __u32 device; /* device number */
+ __u32 cpu; /* on what cpu did it happen */
+ __u16 error; /* completion error */
+ __u16 pdu_len; /* length of data after this trace */
+ __u8 pad[8];
+ /* cgroup id will be stored here if exists */
+};
/*
* The remap event
*/
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/15] blktrace: add definitions for struct blk_io_trace2
2025-09-25 15:02 ` [PATCH v2 09/15] blktrace: add definitions for struct blk_io_trace2 Johannes Thumshirn
@ 2025-10-01 6:37 ` Damien Le Moal
2025-10-03 7:31 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 6:37 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Add definitions for the extended version of the blktrace protocol using a
> wider action type to be able to record new actions in the kernel.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> include/uapi/linux/blktrace_api.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
> index a3b1f35ac026..d58ef484de49 100644
> --- a/include/uapi/linux/blktrace_api.h
> +++ b/include/uapi/linux/blktrace_api.h
> @@ -94,6 +94,7 @@ enum blktrace_notify {
>
> #define BLK_IO_TRACE_MAGIC 0x65617400
> #define BLK_IO_TRACE_VERSION 0x07
> +#define BLK_IO_TRACE2_VERSION 0x08
>
> /*
> * The trace itself
> @@ -113,6 +114,21 @@ struct blk_io_trace {
> /* cgroup id will be stored here if exists */
> };
>
> +struct blk_io_trace2 {
> + __u32 magic; /* MAGIC << 8 | BLK_IO_TRACE2_VERSION */
> + __u32 sequence; /* event number */
> + __u64 time; /* in nanoseconds */
> + __u64 sector; /* disk offset */
> + __u32 bytes; /* transfer length */
> + __u32 pid; /* who did it */
> + __u64 action; /* what happened */
> + __u32 device; /* device number */
> + __u32 cpu; /* on what cpu did it happen */
> + __u16 error; /* completion error */
> + __u16 pdu_len; /* length of data after this trace */
> + __u8 pad[8];
Why 8 ? that makes the structure 4B 60 B. Padding to 12 would make it nicely
aligned to 64B...
> + /* cgroup id will be stored here if exists */
> +};
> /*
> * The remap event
> */
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/15] blktrace: add definitions for struct blk_io_trace2
2025-09-25 15:02 ` [PATCH v2 09/15] blktrace: add definitions for struct blk_io_trace2 Johannes Thumshirn
2025-10-01 6:37 ` Damien Le Moal
@ 2025-10-03 7:31 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2025-10-03 7:31 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Jens Axboe, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block, linux-kernel, linux-trace-kernel, linux-btrace,
John Garry, Hannes Reinecke, Damien Le Moal, Christoph Hellwig,
Naohiro Aota, Shinichiro Kawasaki, Chaitanya Kulkarni,
Martin K . Petersen
On Thu, Sep 25, 2025 at 05:02:25PM +0200, Johannes Thumshirn wrote:
> Add definitions for the extended version of the blktrace protocol using a
> wider action type to be able to record new actions in the kernel.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> include/uapi/linux/blktrace_api.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
> index a3b1f35ac026..d58ef484de49 100644
> --- a/include/uapi/linux/blktrace_api.h
> +++ b/include/uapi/linux/blktrace_api.h
> @@ -94,6 +94,7 @@ enum blktrace_notify {
>
> #define BLK_IO_TRACE_MAGIC 0x65617400
> #define BLK_IO_TRACE_VERSION 0x07
> +#define BLK_IO_TRACE2_VERSION 0x08
>
> /*
> * The trace itself
> @@ -113,6 +114,21 @@ struct blk_io_trace {
> /* cgroup id will be stored here if exists */
> };
>
> +struct blk_io_trace2 {
> + __u32 magic; /* MAGIC << 8 | BLK_IO_TRACE2_VERSION */
> + __u32 sequence; /* event number */
> + __u64 time; /* in nanoseconds */
> + __u64 sector; /* disk offset */
> + __u32 bytes; /* transfer length */
> + __u32 pid; /* who did it */
> + __u64 action; /* what happened */
> + __u32 device; /* device number */
> + __u32 cpu; /* on what cpu did it happen */
> + __u16 error; /* completion error */
> + __u16 pdu_len; /* length of data after this trace */
> + __u8 pad[8];
This will cause mismatching sizes between x86_32 and other
architectures because the size is not 8-byte aligned. You'll need
to add or remove 4 bytes of padding to ensure that.
I really wish we could have good helpers to check for that.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 10/15] blktrace: differentiate between blk_io_trace versions
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (8 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 09/15] blktrace: add definitions for struct blk_io_trace2 Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 7:21 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 11/15] blktrace: add block trace commands for zone operations Johannes Thumshirn
` (4 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Differentiate between blk_io_trace and blk_io_trace2 when relaying to
user-space depending on which version has been requested by the blktrace
utility.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
kernel/trace/blktrace.c | 62 +++++++++++++++++++++++++++++++++++++----
1 file changed, 57 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 9cd8eb9e7b4b..82ad626d6202 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -91,6 +91,29 @@ static void record_blktrace_event(struct blk_io_trace *t, pid_t pid, int cpu,
memcpy((void *)t + sizeof(*t) + cgid_len, pdu_data, pdu_len);
}
+static void record_blktrace_event2(struct blk_io_trace2 *t2, pid_t pid, int cpu,
+ sector_t sector, int bytes, u64 what,
+ dev_t dev, int error, u64 cgid,
+ ssize_t cgid_len, void *pdu_data,
+ int pdu_len)
+
+{
+ t2->pid = pid;
+ t2->cpu = cpu;
+
+ t2->sector = sector;
+ t2->bytes = bytes;
+ t2->action = what;
+ t2->device = dev;
+ t2->error = error;
+ t2->pdu_len = pdu_len + cgid_len;
+
+ if (cgid_len)
+ memcpy((void *)t2 + sizeof(*t2), &cgid, cgid_len);
+ if (pdu_len)
+ memcpy((void *)t2 + sizeof(*t2) + cgid_len, pdu_data, pdu_len);
+}
+
static void relay_blktrace_event(struct blk_trace *bt, unsigned long sequence,
pid_t pid, int cpu, sector_t sector, int bytes,
u32 what, int error, u64 cgid,
@@ -111,6 +134,26 @@ static void relay_blktrace_event(struct blk_trace *bt, unsigned long sequence,
cgid, cgid_len, pdu_data, pdu_len);
}
+static void relay_blktrace_event2(struct blk_trace *bt, unsigned long sequence,
+ pid_t pid, int cpu, sector_t sector,
+ int bytes, u64 what, int error, u64 cgid,
+ ssize_t cgid_len, void *pdu_data, int pdu_len)
+{
+ struct blk_io_trace2 *t;
+ size_t trace_len = sizeof(struct blk_io_trace2) + pdu_len + cgid_len;
+
+ t = relay_reserve(bt->rchan, trace_len);
+ if (!t)
+ return;
+
+ t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE2_VERSION;
+ t->sequence = sequence;
+ t->time = ktime_to_ns(ktime_get());
+
+ record_blktrace_event2(t, pid, cpu, sector, bytes, what, bt->dev, error,
+ cgid, cgid_len, pdu_data, pdu_len);
+}
+
/*
* Send out a notify message.
*/
@@ -146,8 +189,12 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
if (!bt->rchan)
return;
- relay_blktrace_event(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
- cgid_len, (void *)data, len);
+ if (bt->version == 1)
+ relay_blktrace_event(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
+ cgid_len, (void *)data, len);
+ else
+ relay_blktrace_event2(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
+ cgid_len, (void *)data, len);
}
/*
@@ -329,9 +376,14 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
local_irq_save(flags);
sequence = per_cpu_ptr(bt->sequence, cpu);
(*sequence)++;
- relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes,
- lower_32_bits(what), error, cgid, cgid_len,
- pdu_data, pdu_len);
+ if (bt->version == 1)
+ relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes,
+ lower_32_bits(what), error, cgid,
+ cgid_len, pdu_data, pdu_len);
+ else
+ relay_blktrace_event2(bt, *sequence, pid, cpu, sector, bytes,
+ what, error, cgid, cgid_len, pdu_data,
+ pdu_len);
local_irq_restore(flags);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 10/15] blktrace: differentiate between blk_io_trace versions
2025-09-25 15:02 ` [PATCH v2 10/15] blktrace: differentiate between blk_io_trace versions Johannes Thumshirn
@ 2025-10-01 7:21 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 7:21 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Differentiate between blk_io_trace and blk_io_trace2 when relaying to
> user-space depending on which version has been requested by the blktrace
> utility.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> kernel/trace/blktrace.c | 62 +++++++++++++++++++++++++++++++++++++----
> 1 file changed, 57 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index 9cd8eb9e7b4b..82ad626d6202 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -91,6 +91,29 @@ static void record_blktrace_event(struct blk_io_trace *t, pid_t pid, int cpu,
> memcpy((void *)t + sizeof(*t) + cgid_len, pdu_data, pdu_len);
> }
>
> +static void record_blktrace_event2(struct blk_io_trace2 *t2, pid_t pid, int cpu,
> + sector_t sector, int bytes, u64 what,
> + dev_t dev, int error, u64 cgid,
> + ssize_t cgid_len, void *pdu_data,
> + int pdu_len)
> +
Extra blank line not needed.
> +{
> + t2->pid = pid;
> + t2->cpu = cpu;
> +
> + t2->sector = sector;
> + t2->bytes = bytes;
> + t2->action = what;
> + t2->device = dev;
> + t2->error = error;
> + t2->pdu_len = pdu_len + cgid_len;
> +
> + if (cgid_len)
> + memcpy((void *)t2 + sizeof(*t2), &cgid, cgid_len);
> + if (pdu_len)
> + memcpy((void *)t2 + sizeof(*t2) + cgid_len, pdu_data, pdu_len);
> +}
> +
> static void relay_blktrace_event(struct blk_trace *bt, unsigned long sequence,
> pid_t pid, int cpu, sector_t sector, int bytes,
> u32 what, int error, u64 cgid,
> @@ -111,6 +134,26 @@ static void relay_blktrace_event(struct blk_trace *bt, unsigned long sequence,
> cgid, cgid_len, pdu_data, pdu_len);
> }
>
> +static void relay_blktrace_event2(struct blk_trace *bt, unsigned long sequence,
> + pid_t pid, int cpu, sector_t sector,
> + int bytes, u64 what, int error, u64 cgid,
> + ssize_t cgid_len, void *pdu_data, int pdu_len)
> +{
> + struct blk_io_trace2 *t;
> + size_t trace_len = sizeof(struct blk_io_trace2) + pdu_len + cgid_len;
> +
> + t = relay_reserve(bt->rchan, trace_len);
> + if (!t)
> + return;
> +
> + t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE2_VERSION;
> + t->sequence = sequence;
> + t->time = ktime_to_ns(ktime_get());
> +
> + record_blktrace_event2(t, pid, cpu, sector, bytes, what, bt->dev, error,
> + cgid, cgid_len, pdu_data, pdu_len);
> +}
See below.
> +
> /*
> * Send out a notify message.
> */
> @@ -146,8 +189,12 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
> if (!bt->rchan)
> return;
>
> - relay_blktrace_event(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
> - cgid_len, (void *)data, len);
> + if (bt->version == 1)
> + relay_blktrace_event(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
> + cgid_len, (void *)data, len);
> + else
> + relay_blktrace_event2(bt, 0, pid, cpu, 0, 0, action, 0, cgid,
> + cgid_len, (void *)data, len);
Since you pass bt pointer to the relay function, the version is known in that
function and this could be done inside it, no ?
That would avoid this if repetition.
> }
>
> /*
> @@ -329,9 +376,14 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
> local_irq_save(flags);
> sequence = per_cpu_ptr(bt->sequence, cpu);
> (*sequence)++;
> - relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes,
> - lower_32_bits(what), error, cgid, cgid_len,
> - pdu_data, pdu_len);
> + if (bt->version == 1)
> + relay_blktrace_event(bt, *sequence, pid, cpu, sector, bytes,
> + lower_32_bits(what), error, cgid,
> + cgid_len, pdu_data, pdu_len);
> + else
> + relay_blktrace_event2(bt, *sequence, pid, cpu, sector, bytes,
> + what, error, cgid, cgid_len, pdu_data,
> + pdu_len);
> local_irq_restore(flags);
> }
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (9 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 10/15] blktrace: differentiate between blk_io_trace versions Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 7:23 ` Damien Le Moal
2025-10-03 7:32 ` Christoph Hellwig
2025-09-25 15:02 ` [PATCH v2 12/15] blktrace: expose ZONE APPEND completions to blktrace Johannes Thumshirn
` (3 subsequent siblings)
14 siblings, 2 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Add block trace commands for zone operations. These are added as a
separate set of 'block trace commands' shifted by 32bit so that they do
not interfere with the old 16bit wide trace command field in 'struct
blk_io_trace' action.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/uapi/linux/blktrace_api.h | 13 ++++++++++++-
kernel/trace/blktrace.c | 18 ++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
index d58ef484de49..0f336140ce4e 100644
--- a/include/uapi/linux/blktrace_api.h
+++ b/include/uapi/linux/blktrace_api.h
@@ -26,11 +26,22 @@ enum blktrace_cat {
BLK_TC_DRV_DATA = 1 << 14, /* binary per-driver data */
BLK_TC_FUA = 1 << 15, /* fua requests */
- BLK_TC_END = 1 << 15, /* we've run out of bits! */
+ BLK_TC_END_V1 = 1 << 15, /* we've run out of bits! */
+
+ BLK_TC_ZONE_APPEND = 1 << 16ull, /* zone append */
+ BLK_TC_ZONE_RESET = 1 << 17ull, /* zone reset */
+ BLK_TC_ZONE_RESET_ALL = 1 << 18ull, /* zone reset all */
+ BLK_TC_ZONE_FINISH = 1 << 19ull, /* zone finish */
+ BLK_TC_ZONE_OPEN = 1 << 20ull, /* zone open */
+ BLK_TC_ZONE_CLOSE = 1 << 21ull, /* zone close */
+
+ BLK_TC_END_V2 = 1 << 21ull,
};
#define BLK_TC_SHIFT (16)
#define BLK_TC_ACT(act) ((act) << BLK_TC_SHIFT)
+#define BLK_TC_SHIFT2 (32)
+#define BLK_TC_ACT2(act) ((u64)(act) << BLK_TC_SHIFT2)
/*
* Basic trace actions
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 82ad626d6202..62f6cfcee4f6 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -333,6 +333,24 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
case REQ_OP_FLUSH:
what |= BLK_TC_ACT(BLK_TC_FLUSH);
break;
+ case REQ_OP_ZONE_APPEND:
+ what |= BLK_TC_ACT2(BLK_TC_ZONE_APPEND);
+ break;
+ case REQ_OP_ZONE_RESET:
+ what |= BLK_TC_ACT2(BLK_TC_ZONE_RESET);
+ break;
+ case REQ_OP_ZONE_RESET_ALL:
+ what |= BLK_TC_ACT2(BLK_TC_ZONE_RESET_ALL);
+ break;
+ case REQ_OP_ZONE_FINISH:
+ what |= BLK_TC_ACT2(BLK_TC_ZONE_FINISH);
+ break;
+ case REQ_OP_ZONE_OPEN:
+ what |= BLK_TC_ACT2(BLK_TC_ZONE_OPEN);
+ break;
+ case REQ_OP_ZONE_CLOSE:
+ what |= BLK_TC_ACT2(BLK_TC_ZONE_CLOSE);
+ break;
default:
break;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-09-25 15:02 ` [PATCH v2 11/15] blktrace: add block trace commands for zone operations Johannes Thumshirn
@ 2025-10-01 7:23 ` Damien Le Moal
2025-10-03 7:32 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 7:23 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Add block trace commands for zone operations. These are added as a
> separate set of 'block trace commands' shifted by 32bit so that they do
> not interfere with the old 16bit wide trace command field in 'struct
> blk_io_trace' action.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> include/uapi/linux/blktrace_api.h | 13 ++++++++++++-
> kernel/trace/blktrace.c | 18 ++++++++++++++++++
> 2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
> index d58ef484de49..0f336140ce4e 100644
> --- a/include/uapi/linux/blktrace_api.h
> +++ b/include/uapi/linux/blktrace_api.h
> @@ -26,11 +26,22 @@ enum blktrace_cat {
> BLK_TC_DRV_DATA = 1 << 14, /* binary per-driver data */
> BLK_TC_FUA = 1 << 15, /* fua requests */
>
> - BLK_TC_END = 1 << 15, /* we've run out of bits! */
> + BLK_TC_END_V1 = 1 << 15, /* we've run out of bits! */
> +
> + BLK_TC_ZONE_APPEND = 1 << 16ull, /* zone append */
> + BLK_TC_ZONE_RESET = 1 << 17ull, /* zone reset */
> + BLK_TC_ZONE_RESET_ALL = 1 << 18ull, /* zone reset all */
> + BLK_TC_ZONE_FINISH = 1 << 19ull, /* zone finish */
> + BLK_TC_ZONE_OPEN = 1 << 20ull, /* zone open */
> + BLK_TC_ZONE_CLOSE = 1 << 21ull, /* zone close */
Isn't it more common/correct to do "1ULL << 21" ?
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-09-25 15:02 ` [PATCH v2 11/15] blktrace: add block trace commands for zone operations Johannes Thumshirn
2025-10-01 7:23 ` Damien Le Moal
@ 2025-10-03 7:32 ` Christoph Hellwig
2025-10-07 13:08 ` Johannes Thumshirn
1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2025-10-03 7:32 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Jens Axboe, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block, linux-kernel, linux-trace-kernel, linux-btrace,
John Garry, Hannes Reinecke, Damien Le Moal, Christoph Hellwig,
Naohiro Aota, Shinichiro Kawasaki, Chaitanya Kulkarni,
Martin K . Petersen
On Thu, Sep 25, 2025 at 05:02:27PM +0200, Johannes Thumshirn wrote:
> Add block trace commands for zone operations. These are added as a
> separate set of 'block trace commands' shifted by 32bit so that they do
> not interfere with the old 16bit wide trace command field in 'struct
> blk_io_trace' action.
Can you explain how the commands are handled for old/new here?
Because I'd still much prefer to sort things out so that they make
sense for the new code if possible. i.e. have a 32-bit command
and 32 bit flags, and use sensible encoding for the new one, and
remap the supported once to the old organically grown one.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-10-03 7:32 ` Christoph Hellwig
@ 2025-10-07 13:08 ` Johannes Thumshirn
2025-10-08 6:14 ` hch
0 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-10-07 13:08 UTC (permalink / raw)
To: hch
Cc: Jens Axboe, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
John Garry, Hannes Reinecke, Damien Le Moal, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 10/3/25 9:33 AM, Christoph Hellwig wrote:
> On Thu, Sep 25, 2025 at 05:02:27PM +0200, Johannes Thumshirn wrote:
>> Add block trace commands for zone operations. These are added as a
>> separate set of 'block trace commands' shifted by 32bit so that they do
>> not interfere with the old 16bit wide trace command field in 'struct
>> blk_io_trace' action.
> Can you explain how the commands are handled for old/new here?
>
> Because I'd still much prefer to sort things out so that they make
> sense for the new code if possible. i.e. have a 32-bit command
> and 32 bit flags, and use sensible encoding for the new one, and
> remap the supported once to the old organically grown one.
Sure for the old commands everything is still in the lower 32bits, this
has the nice property that we don't need to duplicate all the code for
v1 and v2.
The commands added afterwards are intended to be in the upper 32bits,
which are discarded if the user requests the v1 format.
At least this was the original plan. I think I badly messed up v2 as the
new commands should re-start at 0 and be shifted up by 32bits.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-10-07 13:08 ` Johannes Thumshirn
@ 2025-10-08 6:14 ` hch
2025-10-08 6:16 ` Johannes Thumshirn
2025-10-09 11:17 ` Johannes Thumshirn
0 siblings, 2 replies; 43+ messages in thread
From: hch @ 2025-10-08 6:14 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: hch, Jens Axboe, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-btrace@vger.kernel.org, John Garry, Hannes Reinecke,
Damien Le Moal, Naohiro Aota, Shinichiro Kawasaki,
Chaitanya Kulkarni, Martin K . Petersen
On Tue, Oct 07, 2025 at 01:08:00PM +0000, Johannes Thumshirn wrote:
> Sure for the old commands everything is still in the lower 32bits, this
> has the nice property that we don't need to duplicate all the code for
> v1 and v2.
I don't think you need to duplicate anything, just have a little
function that maps from the free-form v2 commands and flags to the
v1 field. Preferably including a mapping of all unsupported ones to
a catchall unsupported command and flag each to indicate that the
trace includes something only visible with v2.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-10-08 6:14 ` hch
@ 2025-10-08 6:16 ` Johannes Thumshirn
2025-10-09 11:17 ` Johannes Thumshirn
1 sibling, 0 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2025-10-08 6:16 UTC (permalink / raw)
To: hch
Cc: Jens Axboe, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
John Garry, Hannes Reinecke, Damien Le Moal, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 10/8/25 8:14 AM, hch wrote:
> On Tue, Oct 07, 2025 at 01:08:00PM +0000, Johannes Thumshirn wrote:
>> Sure for the old commands everything is still in the lower 32bits, this
>> has the nice property that we don't need to duplicate all the code for
>> v1 and v2.
> I don't think you need to duplicate anything, just have a little
> function that maps from the free-form v2 commands and flags to the
> v1 field. Preferably including a mapping of all unsupported ones to
> a catchall unsupported command and flag each to indicate that the
> trace includes something only visible with v2.
I see your point, let me cook something.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-10-08 6:14 ` hch
2025-10-08 6:16 ` Johannes Thumshirn
@ 2025-10-09 11:17 ` Johannes Thumshirn
2025-10-10 7:32 ` hch
1 sibling, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-10-09 11:17 UTC (permalink / raw)
To: hch
Cc: Jens Axboe, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
John Garry, Hannes Reinecke, Damien Le Moal, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 10/8/25 8:14 AM, hch wrote:
> On Tue, Oct 07, 2025 at 01:08:00PM +0000, Johannes Thumshirn wrote:
>> Sure for the old commands everything is still in the lower 32bits, this
>> has the nice property that we don't need to duplicate all the code for
>> v1 and v2.
> I don't think you need to duplicate anything, just have a little
> function that maps from the free-form v2 commands and flags to the
> v1 field. Preferably including a mapping of all unsupported ones to
> a catchall unsupported command and flag each to indicate that the
> trace includes something only visible with v2.
So I've tried making a translation function (which is the trivial part)
but then it's a game of whack-a-mole to unbreak compilation, ftrace, etc..
I think it's not really worth the effort.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 11/15] blktrace: add block trace commands for zone operations
2025-10-09 11:17 ` Johannes Thumshirn
@ 2025-10-10 7:32 ` hch
0 siblings, 0 replies; 43+ messages in thread
From: hch @ 2025-10-10 7:32 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: hch, Jens Axboe, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-btrace@vger.kernel.org, John Garry, Hannes Reinecke,
Damien Le Moal, Naohiro Aota, Shinichiro Kawasaki,
Chaitanya Kulkarni, Martin K . Petersen
On Thu, Oct 09, 2025 at 11:17:21AM +0000, Johannes Thumshirn wrote:
> On 10/8/25 8:14 AM, hch wrote:
> > On Tue, Oct 07, 2025 at 01:08:00PM +0000, Johannes Thumshirn wrote:
> >> Sure for the old commands everything is still in the lower 32bits, this
> >> has the nice property that we don't need to duplicate all the code for
> >> v1 and v2.
> > I don't think you need to duplicate anything, just have a little
> > function that maps from the free-form v2 commands and flags to the
> > v1 field. Preferably including a mapping of all unsupported ones to
> > a catchall unsupported command and flag each to indicate that the
> > trace includes something only visible with v2.
>
> So I've tried making a translation function (which is the trivial part)
> but then it's a game of whack-a-mole to unbreak compilation, ftrace, etc..
What's the problem?
> I think it's not really worth the effort.
Why? We really want a clean slate going forward. Creating a permanent
split into legacy vs new commands seems very unfortunate.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 12/15] blktrace: expose ZONE APPEND completions to blktrace
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (10 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 11/15] blktrace: add block trace commands for zone operations Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 7:28 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 13/15] blktrace: trace zone management operations Johannes Thumshirn
` (2 subsequent siblings)
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Expose ZONE APPEND completions as a block trace completion action to
blktrace.
As tracing of zoned block commands needs the upper 32bit of the widened
64bit action, only add traces to blktrace if user-space has requested
version 2 of the blktrace protocol.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/uapi/linux/blktrace_api.h | 3 +++
kernel/trace/blktrace.c | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
index 0f336140ce4e..ddc9fedf4955 100644
--- a/include/uapi/linux/blktrace_api.h
+++ b/include/uapi/linux/blktrace_api.h
@@ -99,6 +99,9 @@ enum blktrace_notify {
#define BLK_TA_ABORT (__BLK_TA_ABORT | BLK_TC_ACT(BLK_TC_QUEUE))
#define BLK_TA_DRV_DATA (__BLK_TA_DRV_DATA | BLK_TC_ACT(BLK_TC_DRV_DATA))
+#define BLK_TA_ZONE_APPEND (__BLK_TA_COMPLETE |\
+ BLK_TC_ACT2(BLK_TC_ZONE_APPEND))
+
#define BLK_TN_PROCESS (__BLK_TN_PROCESS | BLK_TC_ACT(BLK_TC_NOTIFY))
#define BLK_TN_TIMESTAMP (__BLK_TN_TIMESTAMP | BLK_TC_ACT(BLK_TC_NOTIFY))
#define BLK_TN_MESSAGE (__BLK_TN_MESSAGE | BLK_TC_ACT(BLK_TC_NOTIFY))
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 62f6cfcee4f6..fea6e63ee27c 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -972,6 +972,22 @@ static void blk_add_trace_rq_complete(void *ignore, struct request *rq,
blk_trace_request_get_cgid(rq));
}
+static void blk_add_trace_zone_update_request(void *ignore, struct request *rq)
+{
+ struct blk_trace *bt;
+
+ rcu_read_lock();
+ bt = rcu_dereference(rq->q->blk_trace);
+ if (likely(!bt) || bt->version < 2) {
+ rcu_read_unlock();
+ return;
+ }
+ rcu_read_unlock();
+
+ blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_ZONE_APPEND,
+ blk_trace_request_get_cgid(rq));
+}
+
/**
* blk_add_trace_bio - Add a trace for a bio oriented action
* @q: queue the io is for
@@ -1202,6 +1218,9 @@ static void blk_register_tracepoints(void)
WARN_ON(ret);
ret = register_trace_block_getrq(blk_add_trace_getrq, NULL);
WARN_ON(ret);
+ ret = register_trace_blk_zone_append_update_request_bio(
+ blk_add_trace_zone_update_request, NULL);
+ WARN_ON(ret);
ret = register_trace_block_plug(blk_add_trace_plug, NULL);
WARN_ON(ret);
ret = register_trace_block_unplug(blk_add_trace_unplug, NULL);
@@ -1221,6 +1240,8 @@ static void blk_unregister_tracepoints(void)
unregister_trace_block_split(blk_add_trace_split, NULL);
unregister_trace_block_unplug(blk_add_trace_unplug, NULL);
unregister_trace_block_plug(blk_add_trace_plug, NULL);
+ unregister_trace_blk_zone_append_update_request_bio(
+ blk_add_trace_zone_update_request, NULL);
unregister_trace_block_getrq(blk_add_trace_getrq, NULL);
unregister_trace_block_bio_queue(blk_add_trace_bio_queue, NULL);
unregister_trace_block_bio_frontmerge(blk_add_trace_bio_frontmerge, NULL);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 12/15] blktrace: expose ZONE APPEND completions to blktrace
2025-09-25 15:02 ` [PATCH v2 12/15] blktrace: expose ZONE APPEND completions to blktrace Johannes Thumshirn
@ 2025-10-01 7:28 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 7:28 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Expose ZONE APPEND completions as a block trace completion action to
> blktrace.
>
> As tracing of zoned block commands needs the upper 32bit of the widened
> 64bit action, only add traces to blktrace if user-space has requested
> version 2 of the blktrace protocol.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 13/15] blktrace: trace zone management operations
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (11 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 12/15] blktrace: expose ZONE APPEND completions to blktrace Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 7:30 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 14/15] blktrace: trace zone write plugging operations Johannes Thumshirn
2025-09-25 15:02 ` [PATCH v2 15/15] blktrace: handle BLKTRACESETUP2 ioctl Johannes Thumshirn
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Trace zone management operations on block devices.
As tracing of zoned block commands needs the upper 32bit of the widened
64bit action, only add traces to blktrace if user-space has requested
version 2 of the blktrace protocol.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/uapi/linux/blktrace_api.h | 2 ++
kernel/trace/blktrace.c | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
index ddc9fedf4955..e4b6fbbc40ee 100644
--- a/include/uapi/linux/blktrace_api.h
+++ b/include/uapi/linux/blktrace_api.h
@@ -64,6 +64,7 @@ enum blktrace_act {
__BLK_TA_REMAP, /* bio was remapped */
__BLK_TA_ABORT, /* request aborted */
__BLK_TA_DRV_DATA, /* driver-specific binary data */
+ __BLK_TA_ZONE_MGMT, /* zone management command was issued */
__BLK_TA_CGROUP = 1 << 8, /* from a cgroup*/
};
@@ -101,6 +102,7 @@ enum blktrace_notify {
#define BLK_TA_ZONE_APPEND (__BLK_TA_COMPLETE |\
BLK_TC_ACT2(BLK_TC_ZONE_APPEND))
+#define BLK_TA_ZONE_MGMT __BLK_TA_ZONE_MGMT
#define BLK_TN_PROCESS (__BLK_TN_PROCESS | BLK_TC_ACT(BLK_TC_NOTIFY))
#define BLK_TN_TIMESTAMP (__BLK_TN_TIMESTAMP | BLK_TC_ACT(BLK_TC_NOTIFY))
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index fea6e63ee27c..13424efbb2f6 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -1046,6 +1046,22 @@ static void blk_add_trace_getrq(void *ignore, struct bio *bio)
blk_add_trace_bio(bio->bi_bdev->bd_disk->queue, bio, BLK_TA_GETRQ, 0);
}
+static void blk_add_trace_blkdev_zone_mgmt(void *ignore, struct bio *bio,
+ sector_t nr_sectors)
+{
+ struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+ struct blk_trace *bt;
+
+ rcu_read_lock();
+ bt = rcu_dereference(q->blk_trace);
+ if (unlikely(!bt) || bt->version < 2) {
+ rcu_read_unlock();
+ return;
+ }
+ rcu_read_unlock();
+ blk_add_trace_bio(q, bio, BLK_TA_ZONE_MGMT, 0);
+}
+
static void blk_add_trace_plug(void *ignore, struct request_queue *q)
{
struct blk_trace *bt;
@@ -1221,6 +1237,9 @@ static void blk_register_tracepoints(void)
ret = register_trace_blk_zone_append_update_request_bio(
blk_add_trace_zone_update_request, NULL);
WARN_ON(ret);
+ ret = register_trace_blkdev_zone_mgmt(blk_add_trace_blkdev_zone_mgmt,
+ NULL);
+ WARN_ON(ret);
ret = register_trace_block_plug(blk_add_trace_plug, NULL);
WARN_ON(ret);
ret = register_trace_block_unplug(blk_add_trace_unplug, NULL);
@@ -1240,6 +1259,7 @@ static void blk_unregister_tracepoints(void)
unregister_trace_block_split(blk_add_trace_split, NULL);
unregister_trace_block_unplug(blk_add_trace_unplug, NULL);
unregister_trace_block_plug(blk_add_trace_plug, NULL);
+ unregister_trace_blkdev_zone_mgmt(blk_add_trace_blkdev_zone_mgmt, NULL);
unregister_trace_blk_zone_append_update_request_bio(
blk_add_trace_zone_update_request, NULL);
unregister_trace_block_getrq(blk_add_trace_getrq, NULL);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 13/15] blktrace: trace zone management operations
2025-09-25 15:02 ` [PATCH v2 13/15] blktrace: trace zone management operations Johannes Thumshirn
@ 2025-10-01 7:30 ` Damien Le Moal
2025-10-08 13:29 ` Johannes Thumshirn
0 siblings, 1 reply; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 7:30 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Trace zone management operations on block devices.
>
> As tracing of zoned block commands needs the upper 32bit of the widened
> 64bit action, only add traces to blktrace if user-space has requested
> version 2 of the blktrace protocol.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Note: Are the zone management command completion traced ? I do not see a patch
for that...
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 13/15] blktrace: trace zone management operations
2025-10-01 7:30 ` Damien Le Moal
@ 2025-10-08 13:29 ` Johannes Thumshirn
2025-10-08 22:41 ` Damien Le Moal
0 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-10-08 13:29 UTC (permalink / raw)
To: Damien Le Moal, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
John Garry, Hannes Reinecke, hch, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 10/1/25 9:30 AM, Damien Le Moal wrote:
> On 9/26/25 00:02, Johannes Thumshirn wrote:
>> Trace zone management operations on block devices.
>>
>> As tracing of zoned block commands needs the upper 32bit of the widened
>> 64bit action, only add traces to blktrace if user-space has requested
>> version 2 of the blktrace protocol.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
>
> Note: Are the zone management command completion traced ? I do not see a patch
> for that...
>
>
I finally had a chance to look into zone management command tracing
again, but the problem here is we're having this pattern:
int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
sector_t sector, sector_t nr_sectors)
{
/* [...] */
trace_blkdev_zone_mgmt(bio, nr_sectors);
ret = submit_bio_wait(bio);
bio_put(bio);
return ret;
}
I'm not sure if it makes sense to do completion tracing here. At least
we cannot do it in the endio handler as usual.
One thing to get the error and the duration would be the following:
int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
sector_t sector, sector_t nr_sectors)
{
/* [...] */
trace_blkdev_zone_mgmt(bio, nr_sectors);
ret = submit_bio_wait(bio);
+ trace_blkdev_zone_mgmt_completion(bio, nr_sectors, bio->bi_error);
bio_put(bio);
return ret;
}
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 13/15] blktrace: trace zone management operations
2025-10-08 13:29 ` Johannes Thumshirn
@ 2025-10-08 22:41 ` Damien Le Moal
2025-10-09 9:57 ` Johannes Thumshirn
0 siblings, 1 reply; 43+ messages in thread
From: Damien Le Moal @ 2025-10-08 22:41 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
John Garry, Hannes Reinecke, hch, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 10/8/25 22:29, Johannes Thumshirn wrote:
> I'm not sure if it makes sense to do completion tracing here. At least
> we cannot do it in the endio handler as usual.
>
> One thing to get the error and the duration would be the following:
>
> int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
> sector_t sector, sector_t nr_sectors)
> {
>
> /* [...] */
>
> trace_blkdev_zone_mgmt(bio, nr_sectors);
> ret = submit_bio_wait(bio);
>
> + trace_blkdev_zone_mgmt_completion(bio, nr_sectors, bio->bi_error);
> bio_put(bio);
That does seem OK to me. Maybe try and see how it looks ?
Though the request alloc, insert, dispatch and completion for this BIO will
still be traced, right ? If these events show correctly that this is a zone
management command (and which one it is), then we should not need the above.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 13/15] blktrace: trace zone management operations
2025-10-08 22:41 ` Damien Le Moal
@ 2025-10-09 9:57 ` Johannes Thumshirn
0 siblings, 0 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2025-10-09 9:57 UTC (permalink / raw)
To: Damien Le Moal, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-btrace@vger.kernel.org,
John Garry, Hannes Reinecke, hch, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 10/9/25 12:41 AM, Damien Le Moal wrote:
> That does seem OK to me. Maybe try and see how it looks ?
> Though the request alloc, insert, dispatch and completion for this BIO will
> still be traced, right ? If these events show correctly that this is a zone
> management command (and which one it is), then we should not need the above.
Yes I /think/ this is enough.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 14/15] blktrace: trace zone write plugging operations
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (12 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 13/15] blktrace: trace zone management operations Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 7:31 ` Damien Le Moal
2025-09-25 15:02 ` [PATCH v2 15/15] blktrace: handle BLKTRACESETUP2 ioctl Johannes Thumshirn
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Trace zone write plugging operations on block devices.
As tracing of zoned block commands needs the upper 32bit of the widened
64bit action, only add traces to blktrace if user-space has requested
version 2 of the blktrace protocol.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
include/uapi/linux/blktrace_api.h | 5 ++++
kernel/trace/blktrace.c | 39 +++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/include/uapi/linux/blktrace_api.h b/include/uapi/linux/blktrace_api.h
index e4b6fbbc40ee..ab5daa1c1161 100644
--- a/include/uapi/linux/blktrace_api.h
+++ b/include/uapi/linux/blktrace_api.h
@@ -64,6 +64,8 @@ enum blktrace_act {
__BLK_TA_REMAP, /* bio was remapped */
__BLK_TA_ABORT, /* request aborted */
__BLK_TA_DRV_DATA, /* driver-specific binary data */
+ __BLK_TA_ZONE_PLUG, /* zone write plug was plugged */
+ __BLK_TA_ZONE_UNPLUG, /* zone write plug was unplugged */
__BLK_TA_ZONE_MGMT, /* zone management command was issued */
__BLK_TA_CGROUP = 1 << 8, /* from a cgroup*/
};
@@ -103,6 +105,9 @@ enum blktrace_notify {
#define BLK_TA_ZONE_APPEND (__BLK_TA_COMPLETE |\
BLK_TC_ACT2(BLK_TC_ZONE_APPEND))
#define BLK_TA_ZONE_MGMT __BLK_TA_ZONE_MGMT
+#define BLK_TA_ZONE_PLUG (__BLK_TA_ZONE_PLUG | BLK_TC_ACT(BLK_TC_QUEUE))
+#define BLK_TA_ZONE_UNPLUG (__BLK_TA_ZONE_UNPLUG |\
+ BLK_TC_ACT(BLK_TC_QUEUE))
#define BLK_TN_PROCESS (__BLK_TN_PROCESS | BLK_TC_ACT(BLK_TC_NOTIFY))
#define BLK_TN_TIMESTAMP (__BLK_TN_TIMESTAMP | BLK_TC_ACT(BLK_TC_NOTIFY))
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 13424efbb2f6..3e7cd8f46c0c 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -1094,6 +1094,37 @@ static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
rcu_read_unlock();
}
+static void blk_add_trace_zone_plug(void *ignore, struct request_queue *q,
+ unsigned int zno, sector_t sector,
+ unsigned int sectors)
+{
+ struct blk_trace *bt;
+
+ rcu_read_lock();
+ bt = rcu_dereference(q->blk_trace);
+ if (bt && bt->version >= 2)
+ __blk_add_trace(bt, sector, sectors << SECTOR_SHIFT, 0,
+ BLK_TA_ZONE_PLUG, 0, 0, NULL, 0);
+ rcu_read_unlock();
+
+ return;
+}
+
+static void blk_add_trace_zone_unplug(void *ignore, struct request_queue *q,
+ unsigned int zno, sector_t sector,
+ unsigned int sectors)
+{
+ struct blk_trace *bt;
+
+ rcu_read_lock();
+ bt = rcu_dereference(q->blk_trace);
+ if (bt && bt->version >= 2)
+ __blk_add_trace(bt, sector, sectors << SECTOR_SHIFT, 0,
+ BLK_TA_ZONE_UNPLUG, 0, 0, NULL, 0);
+ rcu_read_unlock();
+ return;
+}
+
static void blk_add_trace_split(void *ignore, struct bio *bio, unsigned int pdu)
{
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
@@ -1240,6 +1271,12 @@ static void blk_register_tracepoints(void)
ret = register_trace_blkdev_zone_mgmt(blk_add_trace_blkdev_zone_mgmt,
NULL);
WARN_ON(ret);
+ ret = register_trace_disk_zone_wplug_add_bio(blk_add_trace_zone_plug,
+ NULL);
+ WARN_ON(ret);
+ ret = register_trace_blk_zone_wplug_bio(blk_add_trace_zone_unplug,
+ NULL);
+ WARN_ON(ret);
ret = register_trace_block_plug(blk_add_trace_plug, NULL);
WARN_ON(ret);
ret = register_trace_block_unplug(blk_add_trace_unplug, NULL);
@@ -1259,6 +1296,8 @@ static void blk_unregister_tracepoints(void)
unregister_trace_block_split(blk_add_trace_split, NULL);
unregister_trace_block_unplug(blk_add_trace_unplug, NULL);
unregister_trace_block_plug(blk_add_trace_plug, NULL);
+ unregister_trace_blk_zone_wplug_bio(blk_add_trace_zone_unplug, NULL);
+ unregister_trace_disk_zone_wplug_add_bio(blk_add_trace_zone_plug, NULL);
unregister_trace_blkdev_zone_mgmt(blk_add_trace_blkdev_zone_mgmt, NULL);
unregister_trace_blk_zone_append_update_request_bio(
blk_add_trace_zone_update_request, NULL);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 14/15] blktrace: trace zone write plugging operations
2025-09-25 15:02 ` [PATCH v2 14/15] blktrace: trace zone write plugging operations Johannes Thumshirn
@ 2025-10-01 7:31 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 7:31 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Trace zone write plugging operations on block devices.
>
> As tracing of zoned block commands needs the upper 32bit of the widened
> 64bit action, only add traces to blktrace if user-space has requested
> version 2 of the blktrace protocol.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 15/15] blktrace: handle BLKTRACESETUP2 ioctl
2025-09-25 15:02 [PATCH v2 00/15] block: add blktrace support for zoned block device commands Johannes Thumshirn
` (13 preceding siblings ...)
2025-09-25 15:02 ` [PATCH v2 14/15] blktrace: trace zone write plugging operations Johannes Thumshirn
@ 2025-09-25 15:02 ` Johannes Thumshirn
2025-10-01 7:35 ` Damien Le Moal
14 siblings, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2025-09-25 15:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Damien Le Moal, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen,
Johannes Thumshirn
Handle the BLKTRACESETUP2 ioctl, requesting an extended version of the
blktrace protocol from user-space.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
block/ioctl.c | 1 +
kernel/trace/blktrace.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/block/ioctl.c b/block/ioctl.c
index f7b0006ca45d..e7f83a58c8ae 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -691,6 +691,7 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
/* Incompatible alignment on i386 */
case BLKTRACESETUP:
+ case BLKTRACESETUP2:
return blk_trace_ioctl(bdev, cmd, argp);
default:
break;
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 3e7cd8f46c0c..e16a3dbed527 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -742,6 +742,38 @@ int blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
}
EXPORT_SYMBOL_GPL(blk_trace_setup);
+static int blk_trace_setup2(struct request_queue *q, char *name, dev_t dev,
+ struct block_device *bdev, char __user *arg)
+{
+ struct blk_user_trace_setup2 buts2;
+ struct blk_trace *bt;
+ int ret;
+
+ ret = copy_from_user(&buts2, arg, sizeof(buts2));
+ if (ret)
+ return -EFAULT;
+
+ if (!buts2.buf_size || !buts2.buf_nr)
+ return -EINVAL;
+
+ mutex_lock(&q->debugfs_mutex);
+ bt = blk_trace_setup_prepare(q, name, dev, buts2.buf_size, buts2.buf_nr,
+ bdev);
+ if (IS_ERR(bt)) {
+ mutex_unlock(&q->debugfs_mutex);
+ return PTR_ERR(bt);
+ }
+ bt->version = 2;
+ blk_trace_setup_finalize(q, name, bt, &buts2);
+ mutex_unlock(&q->debugfs_mutex);
+
+ if (copy_to_user(arg, &buts2, sizeof(buts2))) {
+ blk_trace_remove(q);
+ return -EFAULT;
+ }
+ return 0;
+}
+
#if defined(CONFIG_COMPAT) && defined(CONFIG_X86_64)
static int compat_blk_trace_setup(struct request_queue *q, char *name,
dev_t dev, struct block_device *bdev,
@@ -833,6 +865,10 @@ int blk_trace_ioctl(struct block_device *bdev, unsigned cmd, char __user *arg)
char b[BDEVNAME_SIZE];
switch (cmd) {
+ case BLKTRACESETUP2:
+ snprintf(b, sizeof(b), "%pg", bdev);
+ ret = blk_trace_setup2(q, b, bdev->bd_dev, bdev, arg);
+ break;
case BLKTRACESETUP:
snprintf(b, sizeof(b), "%pg", bdev);
ret = blk_trace_setup(q, b, bdev->bd_dev, bdev, arg);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 15/15] blktrace: handle BLKTRACESETUP2 ioctl
2025-09-25 15:02 ` [PATCH v2 15/15] blktrace: handle BLKTRACESETUP2 ioctl Johannes Thumshirn
@ 2025-10-01 7:35 ` Damien Le Moal
0 siblings, 0 replies; 43+ messages in thread
From: Damien Le Moal @ 2025-10-01 7:35 UTC (permalink / raw)
To: Johannes Thumshirn, Jens Axboe
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-block,
linux-kernel, linux-trace-kernel, linux-btrace, John Garry,
Hannes Reinecke, Christoph Hellwig, Naohiro Aota,
Shinichiro Kawasaki, Chaitanya Kulkarni, Martin K . Petersen
On 9/26/25 00:02, Johannes Thumshirn wrote:
> Handle the BLKTRACESETUP2 ioctl, requesting an extended version of the
> blktrace protocol from user-space.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 43+ messages in thread