* [PATCH v2 15/19] btrfs: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Chris Mason, David Sterba, linux-btrfs, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
fs/btrfs/extent_map.c | 4 ++--
fs/btrfs/raid56.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 095a561d733f0..9284c0a81befb 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -1318,7 +1318,7 @@ static void btrfs_extent_map_shrinker_worker(struct work_struct *work)
if (trace_btrfs_extent_map_shrinker_scan_enter_enabled()) {
s64 nr = percpu_counter_sum_positive(&fs_info->evictable_extent_maps);
- trace_btrfs_extent_map_shrinker_scan_enter(fs_info, nr);
+ trace_call__btrfs_extent_map_shrinker_scan_enter(fs_info, nr);
}
while (ctx.scanned < ctx.nr_to_scan && !btrfs_fs_closing(fs_info)) {
@@ -1358,7 +1358,7 @@ static void btrfs_extent_map_shrinker_worker(struct work_struct *work)
if (trace_btrfs_extent_map_shrinker_scan_exit_enabled()) {
s64 nr = percpu_counter_sum_positive(&fs_info->evictable_extent_maps);
- trace_btrfs_extent_map_shrinker_scan_exit(fs_info, nr_dropped, nr);
+ trace_call__btrfs_extent_map_shrinker_scan_exit(fs_info, nr_dropped, nr);
}
atomic64_set(&fs_info->em_shrinker_nr_to_scan, 0);
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index b4511f560e929..0b5b2432aedf6 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1743,7 +1743,7 @@ static void submit_read_wait_bio_list(struct btrfs_raid_bio *rbio,
struct raid56_bio_trace_info trace_info = { 0 };
bio_get_trace_info(rbio, bio, &trace_info);
- trace_raid56_read(rbio, bio, &trace_info);
+ trace_call__raid56_read(rbio, bio, &trace_info);
}
submit_bio(bio);
}
@@ -2420,7 +2420,7 @@ static void submit_write_bios(struct btrfs_raid_bio *rbio,
struct raid56_bio_trace_info trace_info = { 0 };
bio_get_trace_info(rbio, bio, &trace_info);
- trace_raid56_write(rbio, bio, &trace_info);
+ trace_call__raid56_write(rbio, bio, &trace_info);
}
submit_bio(bio);
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 16/19] net: devlink: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_devlink_trap_report() with trace_call__devlink_trap_report()
at a site already guarded by tracepoint_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__devlink_trap_report() calls the tracepoint callbacks directly
without utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
net/devlink/trap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/devlink/trap.c b/net/devlink/trap.c
index 8edb31654a68a..d54276dcd62fb 100644
--- a/net/devlink/trap.c
+++ b/net/devlink/trap.c
@@ -1497,7 +1497,7 @@ void devlink_trap_report(struct devlink *devlink, struct sk_buff *skb,
devlink_trap_report_metadata_set(&metadata, trap_item,
in_devlink_port, fa_cookie);
- trace_devlink_trap_report(devlink, skb, &metadata);
+ trace_call__devlink_trap_report(devlink, skb, &metadata);
}
}
EXPORT_SYMBOL_GPL(devlink_trap_report);
--
2.53.0
^ permalink raw reply related
* [PATCH v2 14/19] scsi: ufs: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Alim Akhtar, Avri Altman, Bart Van Assche, James E.J. Bottomley,
Martin K. Petersen, Peter Wang, Bean Huo, Adrian Hunter,
Bao D. Nguyen, linux-scsi, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/ufs/core/ufshcd.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 899e663fea6e8..b27bde8ea7555 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -422,7 +422,7 @@ static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba,
else
header = &lrb->ucd_rsp_ptr->header;
- trace_ufshcd_upiu(hba, str_t, header, &rq->sc.cdb,
+ trace_call__ufshcd_upiu(hba, str_t, header, &rq->sc.cdb,
UFS_TSF_CDB);
}
@@ -433,7 +433,7 @@ static void ufshcd_add_query_upiu_trace(struct ufs_hba *hba,
if (!trace_ufshcd_upiu_enabled())
return;
- trace_ufshcd_upiu(hba, str_t, &rq_rsp->header,
+ trace_call__ufshcd_upiu(hba, str_t, &rq_rsp->header,
&rq_rsp->qr, UFS_TSF_OSF);
}
@@ -446,12 +446,12 @@ static void ufshcd_add_tm_upiu_trace(struct ufs_hba *hba, unsigned int tag,
return;
if (str_t == UFS_TM_SEND)
- trace_ufshcd_upiu(hba, str_t,
+ trace_call__ufshcd_upiu(hba, str_t,
&descp->upiu_req.req_header,
&descp->upiu_req.input_param1,
UFS_TSF_TM_INPUT);
else
- trace_ufshcd_upiu(hba, str_t,
+ trace_call__ufshcd_upiu(hba, str_t,
&descp->upiu_rsp.rsp_header,
&descp->upiu_rsp.output_param1,
UFS_TSF_TM_OUTPUT);
@@ -471,7 +471,7 @@ static void ufshcd_add_uic_command_trace(struct ufs_hba *hba,
else
cmd = ufshcd_readl(hba, REG_UIC_COMMAND);
- trace_ufshcd_uic_command(hba, str_t, cmd,
+ trace_call__ufshcd_uic_command(hba, str_t, cmd,
ufshcd_readl(hba, REG_UIC_COMMAND_ARG_1),
ufshcd_readl(hba, REG_UIC_COMMAND_ARG_2),
ufshcd_readl(hba, REG_UIC_COMMAND_ARG_3));
@@ -523,7 +523,7 @@ static void ufshcd_add_command_trace(struct ufs_hba *hba, struct scsi_cmnd *cmd,
} else {
doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
}
- trace_ufshcd_command(cmd->device, hba, str_t, tag, doorbell, hwq_id,
+ trace_call__ufshcd_command(cmd->device, hba, str_t, tag, doorbell, hwq_id,
transfer_len, intr, lba, opcode, group_id);
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 13/19] spi: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Michael Hennerich, Nuno Sá, David Lechner, Mark Brown,
linux-spi, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/spi/spi-axi-spi-engine.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-axi-spi-engine.c b/drivers/spi/spi-axi-spi-engine.c
index 48ee5bf481738..02bbc5d0cfc50 100644
--- a/drivers/spi/spi-axi-spi-engine.c
+++ b/drivers/spi/spi-axi-spi-engine.c
@@ -953,7 +953,7 @@ static int spi_engine_transfer_one_message(struct spi_controller *host,
struct spi_transfer *xfer;
list_for_each_entry(xfer, &msg->transfers, transfer_list)
- trace_spi_transfer_start(msg, xfer);
+ trace_call__spi_transfer_start(msg, xfer);
}
spin_lock_irqsave(&spi_engine->lock, flags);
@@ -987,7 +987,7 @@ static int spi_engine_transfer_one_message(struct spi_controller *host,
struct spi_transfer *xfer;
list_for_each_entry(xfer, &msg->transfers, transfer_list)
- trace_spi_transfer_stop(msg, xfer);
+ trace_call__spi_transfer_stop(msg, xfer);
}
out:
--
2.53.0
^ permalink raw reply related
* [PATCH v2 12/19] i2c: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Wolfram Sang, linux-i2c, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/i2c/i2c-core-slave.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i2c/i2c-core-slave.c b/drivers/i2c/i2c-core-slave.c
index 02ca55c2246bc..bebb7ba67e30d 100644
--- a/drivers/i2c/i2c-core-slave.c
+++ b/drivers/i2c/i2c-core-slave.c
@@ -89,7 +89,7 @@ int i2c_slave_event(struct i2c_client *client,
int ret = client->slave_cb(client, event, val);
if (trace_i2c_slave_enabled())
- trace_i2c_slave(client, event, val, ret);
+ trace_call__i2c_slave(client, event, val, ret);
return ret;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 11/19] HID: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Srinivas Pandruvada, Jiri Kosina, Benjamin Tissoires, Zhang Lixu,
Andy Shevchenko, linux-input, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/hid/intel-ish-hid/ipc/pci-ish.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/intel-ish-hid/ipc/pci-ish.c b/drivers/hid/intel-ish-hid/ipc/pci-ish.c
index ed3405c05e73c..8d36ae96a3eea 100644
--- a/drivers/hid/intel-ish-hid/ipc/pci-ish.c
+++ b/drivers/hid/intel-ish-hid/ipc/pci-ish.c
@@ -110,7 +110,7 @@ void ish_event_tracer(struct ishtp_device *dev, const char *format, ...)
vsnprintf(tmp_buf, sizeof(tmp_buf), format, args);
va_end(args);
- trace_ishtp_dump(tmp_buf);
+ trace_call__ishtp_dump(tmp_buf);
}
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 01/19] tracepoint: Add trace_call__##name() API
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
To: Steven Rostedt, Peter Zijlstra, Dmitry Ilvokhin
Cc: Vineeth Pillai (Google), Masami Hiramatsu, Mathieu Desnoyers,
Ingo Molnar, Jens Axboe, io-uring, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Aaron Conole,
Eelco Chaudron, Ilya Maximets, netdev, bpf, linux-sctp,
tipc-discussion, dev, Jiri Pirko, Oded Gabbay, Koby Elbaz,
dri-devel, Rafael J. Wysocki, Viresh Kumar, Gautham R. Shenoy,
Huang Rui, Mario Limonciello, Len Brown, Srinivas Pandruvada,
linux-pm, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
Christian König, Sumit Semwal, linaro-mm-sig, Eddie James,
Andrew Jeffery, Joel Stanley, linux-fsi, David Airlie,
Simona Vetter, Alex Deucher, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Harry Wentland, Leo Li, amd-gfx, Jiri Kosina,
Benjamin Tissoires, linux-input, Wolfram Sang, linux-i2c,
Mark Brown, Michael Hennerich, Nuno Sá, linux-spi,
James E.J. Bottomley, Martin K. Petersen, linux-scsi, Chris Mason,
David Sterba, linux-btrfs, Thomas Gleixner, Andrew Morton,
SeongJae Park, linux-mm, Borislav Petkov, Dave Hansen, x86,
linux-trace-kernel, linux-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Add trace_call__##name() as a companion to trace_##name(). When a
caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond)
trace_foo(args);
trace_foo() internally repeats the static_branch_unlikely() test, which
the compiler cannot fold since static branches are patched binary
instructions. This results in two static-branch evaluations for every
guarded call site.
trace_call__##name() calls __do_trace_##name() directly, skipping the
redundant static-branch re-check. This avoids leaking the internal
__do_trace_##name() symbol into call sites while still eliminating the
double evaluation:
if (trace_foo_enabled() && cond)
trace_invoke_foo(args); /* calls __do_trace_foo() directly */
Three locations are updated:
- __DECLARE_TRACE: invoke form omits static_branch_unlikely, retains
the LOCKDEP RCU-watching assertion.
- __DECLARE_TRACE_SYSCALL: same, plus retains might_fault().
- !TRACEPOINTS_ENABLED stub: empty no-op so callers compile cleanly
when tracepoints are compiled out.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
include/linux/tracepoint.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 22ca1c8b54f32..ed969705341f1 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -294,6 +294,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
} \
+ } \
+ static inline void trace_call__##name(proto) \
+ { \
+ __do_trace_##name(args); \
}
#define __DECLARE_TRACE_SYSCALL(name, proto, args, data_proto) \
@@ -313,6 +317,11 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
} \
+ } \
+ static inline void trace_call__##name(proto) \
+ { \
+ might_fault(); \
+ __do_trace_##name(args); \
}
/*
@@ -398,6 +407,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
#define __DECLARE_TRACE_COMMON(name, proto, args, data_proto) \
static inline void trace_##name(proto) \
{ } \
+ static inline void trace_call__##name(proto) \
+ { } \
static inline int \
register_trace_##name(void (*probe)(data_proto), \
void *data) \
--
2.53.0
^ permalink raw reply related
* [PATCH v2 09/19] fsi: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Eddie James, Ninad Palsule, Joel Stanley, Andrew Jeffery,
linux-fsi, linux-arm-kernel, linux-aspeed, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/fsi/fsi-master-aspeed.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/fsi/fsi-master-aspeed.c b/drivers/fsi/fsi-master-aspeed.c
index aa1380cdff338..8313c7d530eb7 100644
--- a/drivers/fsi/fsi-master-aspeed.c
+++ b/drivers/fsi/fsi-master-aspeed.c
@@ -229,7 +229,7 @@ static int check_errors(struct fsi_master_aspeed *aspeed, int err)
opb_readl(aspeed, ctrl_base + FSI_MSTAP0, &mstap0);
opb_readl(aspeed, ctrl_base + FSI_MESRB0, &mesrb0);
- trace_fsi_master_aspeed_opb_error(
+ trace_call__fsi_master_aspeed_opb_error(
be32_to_cpu(mresp0),
be32_to_cpu(mstap0),
be32_to_cpu(mesrb0));
--
2.53.0
^ permalink raw reply related
* [PATCH v2 08/19] dma-buf: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Sumit Semwal, Christian König, linux-media, dri-devel,
linaro-mm-sig, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/dma-buf/dma-fence.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 35afcfcac5910..232e92196da43 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -535,7 +535,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
if (trace_dma_fence_wait_start_enabled()) {
rcu_read_lock();
- trace_dma_fence_wait_start(fence);
+ trace_call__dma_fence_wait_start(fence);
rcu_read_unlock();
}
if (fence->ops->wait)
@@ -544,7 +544,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
ret = dma_fence_default_wait(fence, intr, timeout);
if (trace_dma_fence_wait_end_enabled()) {
rcu_read_lock();
- trace_dma_fence_wait_end(fence);
+ trace_call__dma_fence_wait_end(fence);
rcu_read_unlock();
}
return ret;
--
2.53.0
^ permalink raw reply related
* [PATCH v2 07/19] devfreq: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
MyungJoo Ham, Kyungmin Park, Chanwoo Choi, linux-pm, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/devfreq/devfreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index c0a74091b9041..d1b27d9b753df 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -370,7 +370,7 @@ static int devfreq_set_target(struct devfreq *devfreq, unsigned long new_freq,
* change order of between devfreq device and passive devfreq device.
*/
if (trace_devfreq_frequency_enabled() && new_freq != cur_freq)
- trace_devfreq_frequency(devfreq, new_freq, cur_freq);
+ trace_call__devfreq_frequency(devfreq, new_freq, cur_freq);
freqs.new = new_freq;
devfreq_notify_transition(devfreq, &freqs, DEVFREQ_POSTCHANGE);
--
2.53.0
^ permalink raw reply related
* [PATCH v2 06/19] cpufreq: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Huang Rui, Gautham R. Shenoy, Mario Limonciello, Perry Yuan,
Rafael J. Wysocki, Viresh Kumar, Srinivas Pandruvada, Len Brown,
linux-pm, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/cpufreq/amd-pstate.c | 10 +++++-----
drivers/cpufreq/cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 5aa9fcd80cf51..4c47324aa2f73 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -247,7 +247,7 @@ static int msr_update_perf(struct cpufreq_policy *policy, u8 min_perf,
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = READ_ONCE(cpudata->perf);
- trace_amd_pstate_epp_perf(cpudata->cpu,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu,
perf.highest_perf,
epp,
min_perf,
@@ -298,7 +298,7 @@ static int msr_set_epp(struct cpufreq_policy *policy, u8 epp)
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = cpudata->perf;
- trace_amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
epp,
FIELD_GET(AMD_CPPC_MIN_PERF_MASK,
cpudata->cppc_req_cached),
@@ -343,7 +343,7 @@ static int shmem_set_epp(struct cpufreq_policy *policy, u8 epp)
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = cpudata->perf;
- trace_amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
epp,
FIELD_GET(AMD_CPPC_MIN_PERF_MASK,
cpudata->cppc_req_cached),
@@ -507,7 +507,7 @@ static int shmem_update_perf(struct cpufreq_policy *policy, u8 min_perf,
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = READ_ONCE(cpudata->perf);
- trace_amd_pstate_epp_perf(cpudata->cpu,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu,
perf.highest_perf,
epp,
min_perf,
@@ -588,7 +588,7 @@ static void amd_pstate_update(struct amd_cpudata *cpudata, u8 min_perf,
}
if (trace_amd_pstate_perf_enabled() && amd_pstate_sample(cpudata)) {
- trace_amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq,
+ trace_call__amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq,
cpudata->cur.mperf, cpudata->cur.aperf, cpudata->cur.tsc,
cpudata->cpu, fast_switch);
}
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 277884d91913c..58901047eae5a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2222,7 +2222,7 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
if (trace_cpu_frequency_enabled()) {
for_each_cpu(cpu, policy->cpus)
- trace_cpu_frequency(freq, cpu);
+ trace_call__cpu_frequency(freq, cpu);
}
return freq;
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 11c58af419006..70be952209144 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -3132,7 +3132,7 @@ static void intel_cpufreq_trace(struct cpudata *cpu, unsigned int trace_type, in
return;
sample = &cpu->sample;
- trace_pstate_sample(trace_type,
+ trace_call__pstate_sample(trace_type,
0,
old_pstate,
cpu->pstate.current_pstate,
--
2.53.0
^ permalink raw reply related
* [PATCH v2 04/19] net: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Aaron Conole, Eelco Chaudron, Ilya Maximets,
Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Willem de Bruijn,
Samiullah Khawaja, Hangbin Liu, Kuniyuki Iwashima, netdev,
linux-kernel, bpf, dev, linux-sctp, tipc-discussion,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
net/core/dev.c | 2 +-
net/core/xdp.c | 2 +-
net/openvswitch/actions.c | 2 +-
net/openvswitch/datapath.c | 2 +-
net/sctp/outqueue.c | 2 +-
net/tipc/node.c | 2 +-
6 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 14a83f2035b93..f7602b1892fea 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6444,7 +6444,7 @@ void netif_receive_skb_list(struct list_head *head)
return;
if (trace_netif_receive_skb_list_entry_enabled()) {
list_for_each_entry(skb, head, list)
- trace_netif_receive_skb_list_entry(skb);
+ trace_call__netif_receive_skb_list_entry(skb);
}
netif_receive_skb_list_internal(head);
trace_netif_receive_skb_list_exit(0);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 9890a30584ba7..3003e5c574191 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -362,7 +362,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
xsk_pool_set_rxq_info(allocator, xdp_rxq);
if (trace_mem_connect_enabled() && xdp_alloc)
- trace_mem_connect(xdp_alloc, xdp_rxq);
+ trace_call__mem_connect(xdp_alloc, xdp_rxq);
return 0;
}
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 792ca44a461da..60823de201417 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1259,7 +1259,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
int err = 0;
if (trace_ovs_do_execute_action_enabled())
- trace_ovs_do_execute_action(dp, skb, key, a, rem);
+ trace_call__ovs_do_execute_action(dp, skb, key, a, rem);
/* Actions that rightfully have to consume the skb should do it
* and return directly.
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index e209099218b41..2b9755e2e4731 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -335,7 +335,7 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
int err;
if (trace_ovs_dp_upcall_enabled())
- trace_ovs_dp_upcall(dp, skb, key, upcall_info);
+ trace_call__ovs_dp_upcall(dp, skb, key, upcall_info);
if (upcall_info->portid == 0) {
err = -ENOTCONN;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index f6b8c13dafa4a..4025d863ffc84 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1267,7 +1267,7 @@ int sctp_outq_sack(struct sctp_outq *q, struct sctp_chunk *chunk)
/* SCTP path tracepoint for congestion control debugging. */
if (trace_sctp_probe_path_enabled()) {
list_for_each_entry(transport, transport_list, transports)
- trace_sctp_probe_path(transport, asoc);
+ trace_call__sctp_probe_path(transport, asoc);
}
sack_ctsn = ntohl(sack->cum_tsn_ack);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index af442a5ef8f3d..5745d6aa0a054 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1943,7 +1943,7 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
if (trace_tipc_node_check_state_enabled()) {
trace_tipc_skb_dump(skb, false, "skb for node state check");
- trace_tipc_node_check_state(n, true, " ");
+ trace_call__tipc_node_check_state(n, true, " ");
}
l = n->links[bearer_id].link;
if (!l)
--
2.53.0
^ permalink raw reply related
* [PATCH v2 05/19] accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Koby Elbaz, Konstantin Sinyuk, Oded Gabbay, Kees Cook,
Andy Shevchenko, dri-devel, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/accel/habanalabs/common/device.c | 12 ++++++------
drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++-
drivers/accel/habanalabs/common/pci/pci.c | 4 ++--
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 09b27bac3a31d..a9ad1edd0bf41 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -132,8 +132,8 @@ static void *hl_dma_alloc_common(struct hl_device *hdev, size_t size, dma_addr_t
}
if (trace_habanalabs_dma_alloc_enabled() && !ZERO_OR_NULL_PTR(ptr))
- trace_habanalabs_dma_alloc(&(hdev)->pdev->dev, (u64) (uintptr_t) ptr, *dma_handle,
- size, caller);
+ trace_call__habanalabs_dma_alloc(&(hdev)->pdev->dev, (u64) (uintptr_t) ptr,
+ *dma_handle, size, caller);
return ptr;
}
@@ -206,7 +206,7 @@ int hl_dma_map_sgtable_caller(struct hl_device *hdev, struct sg_table *sgt,
return 0;
for_each_sgtable_dma_sg(sgt, sg, i)
- trace_habanalabs_dma_map_page(&(hdev)->pdev->dev,
+ trace_call__habanalabs_dma_map_page(&(hdev)->pdev->dev,
page_to_phys(sg_page(sg)),
sg->dma_address - prop->device_dma_offset_for_host_access,
#ifdef CONFIG_NEED_SG_DMA_LENGTH
@@ -249,7 +249,7 @@ void hl_dma_unmap_sgtable_caller(struct hl_device *hdev, struct sg_table *sgt,
if (trace_habanalabs_dma_unmap_page_enabled()) {
for_each_sgtable_dma_sg(sgt, sg, i)
- trace_habanalabs_dma_unmap_page(&(hdev)->pdev->dev,
+ trace_call__habanalabs_dma_unmap_page(&(hdev)->pdev->dev,
page_to_phys(sg_page(sg)),
sg->dma_address - prop->device_dma_offset_for_host_access,
#ifdef CONFIG_NEED_SG_DMA_LENGTH
@@ -2656,7 +2656,7 @@ inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
u32 val = readl(hdev->rmmio + reg);
if (unlikely(trace_habanalabs_rreg32_enabled()))
- trace_habanalabs_rreg32(&(hdev)->pdev->dev, reg, val);
+ trace_call__habanalabs_rreg32(&(hdev)->pdev->dev, reg, val);
return val;
}
@@ -2674,7 +2674,7 @@ inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
inline void hl_wreg(struct hl_device *hdev, u32 reg, u32 val)
{
if (unlikely(trace_habanalabs_wreg32_enabled()))
- trace_habanalabs_wreg32(&(hdev)->pdev->dev, reg, val);
+ trace_call__habanalabs_wreg32(&(hdev)->pdev->dev, reg, val);
writel(val, hdev->rmmio + reg);
}
diff --git a/drivers/accel/habanalabs/common/mmu/mmu.c b/drivers/accel/habanalabs/common/mmu/mmu.c
index 6c7c4ff8a8a95..fdeca1fe5fc78 100644
--- a/drivers/accel/habanalabs/common/mmu/mmu.c
+++ b/drivers/accel/habanalabs/common/mmu/mmu.c
@@ -263,7 +263,8 @@ int hl_mmu_unmap_page(struct hl_ctx *ctx, u64 virt_addr, u32 page_size, bool flu
mmu_funcs->flush(ctx);
if (trace_habanalabs_mmu_unmap_enabled() && !rc)
- trace_habanalabs_mmu_unmap(&hdev->pdev->dev, virt_addr, 0, page_size, flush_pte);
+ trace_call__habanalabs_mmu_unmap(&hdev->pdev->dev, virt_addr,
+ 0, page_size, flush_pte);
return rc;
}
diff --git a/drivers/accel/habanalabs/common/pci/pci.c b/drivers/accel/habanalabs/common/pci/pci.c
index 81cbd8697d4cd..bffbde3c399be 100644
--- a/drivers/accel/habanalabs/common/pci/pci.c
+++ b/drivers/accel/habanalabs/common/pci/pci.c
@@ -123,7 +123,7 @@ int hl_pci_elbi_read(struct hl_device *hdev, u64 addr, u32 *data)
pci_read_config_dword(pdev, mmPCI_CONFIG_ELBI_DATA, data);
if (unlikely(trace_habanalabs_elbi_read_enabled()))
- trace_habanalabs_elbi_read(&hdev->pdev->dev, (u32) addr, val);
+ trace_call__habanalabs_elbi_read(&hdev->pdev->dev, (u32) addr, val);
return 0;
}
@@ -186,7 +186,7 @@ static int hl_pci_elbi_write(struct hl_device *hdev, u64 addr, u32 data)
if ((val & PCI_CONFIG_ELBI_STS_MASK) == PCI_CONFIG_ELBI_STS_DONE) {
if (unlikely(trace_habanalabs_elbi_write_enabled()))
- trace_habanalabs_elbi_write(&hdev->pdev->dev, (u32) addr, val);
+ trace_call__habanalabs_elbi_write(&hdev->pdev->dev, (u32) addr, val);
return 0;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 03/19] io_uring: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Jens Axboe, io-uring, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
io_uring/io_uring.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 0fa844faf2871..e99975ffdda12 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -299,7 +299,7 @@ static __always_inline bool io_fill_cqe_req(struct io_ring_ctx *ctx,
}
if (trace_io_uring_complete_enabled())
- trace_io_uring_complete(req->ctx, req, cqe);
+ trace_call__io_uring_complete(req->ctx, req, cqe);
return true;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 02/19] kernel: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Tejun Heo, David Vernet, Andrea Righi, Changwoo Min, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider, Thomas Gleixner,
Yury Norov [NVIDIA], Paul E. McKenney, Rik van Riel, Roman Kisel,
Joel Fernandes, Rafael J. Wysocki, Ulf Hansson, linux-kernel,
sched-ext, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
kernel/irq_work.c | 2 +-
kernel/sched/ext.c | 2 +-
kernel/smp.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 73f7e1fd4ab4d..120fd7365fbe2 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -79,7 +79,7 @@ void __weak arch_irq_work_raise(void)
static __always_inline void irq_work_raise(struct irq_work *work)
{
if (trace_ipi_send_cpu_enabled() && arch_irq_work_has_interrupt())
- trace_ipi_send_cpu(smp_processor_id(), _RET_IP_, work->func);
+ trace_call__ipi_send_cpu(smp_processor_id(), _RET_IP_, work->func);
arch_irq_work_raise();
}
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 1594987d637b0..cfbac9cf62f84 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4494,7 +4494,7 @@ static __printf(2, 3) void dump_line(struct seq_buf *s, const char *fmt, ...)
vscnprintf(line_buf, sizeof(line_buf), fmt, args);
va_end(args);
- trace_sched_ext_dump(line_buf);
+ trace_call__sched_ext_dump(line_buf);
}
#endif
/* @s may be zero sized and seq_buf triggers WARN if so */
diff --git a/kernel/smp.c b/kernel/smp.c
index f349960f79cad..537cf1f461d75 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -394,7 +394,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
sched_ttwu_pending : csd->func;
- trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
+ trace_call__csd_queue_cpu(cpu, _RET_IP_, func, csd);
}
/*
--
2.53.0
^ permalink raw reply related
* [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
To: Steven Rostedt, Peter Zijlstra, Dmitry Ilvokhin
Cc: Vineeth Pillai (Google), Masami Hiramatsu, Mathieu Desnoyers,
Ingo Molnar, Jens Axboe, io-uring, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Aaron Conole,
Eelco Chaudron, Ilya Maximets, netdev, bpf, linux-sctp,
tipc-discussion, dev, Jiri Pirko, Oded Gabbay, Koby Elbaz,
dri-devel, Rafael J. Wysocki, Viresh Kumar, Gautham R. Shenoy,
Huang Rui, Mario Limonciello, Len Brown, Srinivas Pandruvada,
linux-pm, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
Christian König, Sumit Semwal, linaro-mm-sig, Eddie James,
Andrew Jeffery, Joel Stanley, linux-fsi, David Airlie,
Simona Vetter, Alex Deucher, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Harry Wentland, Leo Li, amd-gfx, Jiri Kosina,
Benjamin Tissoires, linux-input, Wolfram Sang, linux-i2c,
Mark Brown, Michael Hennerich, Nuno Sá, linux-spi,
James E.J. Bottomley, Martin K. Petersen, linux-scsi, Chris Mason,
David Sterba, linux-btrfs, Thomas Gleixner, Andrew Morton,
SeongJae Park, linux-mm, Borislav Petkov, Dave Hansen, x86,
linux-trace-kernel, linux-kernel
When a caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond)
trace_foo(args);
trace_foo() internally re-evaluates the static_branch_unlikely() key.
Since static branches are patched binary instructions the compiler cannot
fold the two evaluations, so every such site pays the cost twice.
This series introduces trace_call__##name() as a companion to
trace_##name(). It calls __do_trace_##name() directly, bypassing the
redundant static-branch re-check, while preserving all other correctness
properties of the normal path (RCU-watching assertion, might_fault() for
syscall tracepoints). The internal __do_trace_##name() symbol is not
leaked to call sites; trace_call__##name() is the only new public API.
if (trace_foo_enabled() && cond)
trace_call__foo(args); /* calls __do_trace_foo() directly */
The first patch adds the three-location change to
include/linux/tracepoint.h (__DECLARE_TRACE, __DECLARE_TRACE_SYSCALL,
and the !TRACEPOINTS_ENABLED stub). The remaining 18 patches
mechanically convert all guarded call sites found in the tree:
kernel/, io_uring/, net/, accel/habanalabs, cpufreq/, devfreq/,
dma-buf/, fsi/, drm/, HID, i2c/, spi/, scsi/ufs/, btrfs/,
net/devlink/, kernel/time/, kernel/trace/, mm/damon/, and arch/x86/.
This series is motivated by Peter Zijlstra's observation in the discussion
around Dmitry Ilvokhin's locking tracepoint instrumentation series, where
he noted that compilers cannot optimize static branches and that guarded
call sites end up evaluating the static branch twice for no reason, and
by Steven Rostedt's suggestion to add a proper API instead of exposing
internal implementation details like __do_trace_##name() directly to
call sites:
https://lore.kernel.org/linux-trace-kernel/8298e098d3418cb446ef396f119edac58a3414e9.1772642407.git.d@ilvokhin.com
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Changes in v2:
- Renamed trace_invoke_##name() to trace_call__##name() (double
underscore) per review comments.
- Added 4 new patches covering sites missed in v1, found using
coccinelle to scan the tree (Keith Busch):
* net/devlink: guarded tracepoint_enabled() block in trap.c
* kernel/time: early-return guard in tick-sched.c (tick_stop)
* kernel/trace: early-return guard in trace_benchmark.c
* mm/damon: early-return guard in core.c
* arch/x86: do_trace_*() wrapper functions in lib/msr.c, which
are called exclusively from tracepoint_enabled()-guarded sites
in asm/msr.h
v1: https://lore.kernel.org/linux-trace-kernel/abSqrJ1J59RQC47U@kbusch-mbp/
Vineeth Pillai (Google) (19):
tracepoint: Add trace_call__##name() API
kernel: Use trace_call__##name() at guarded tracepoint call sites
io_uring: Use trace_call__##name() at guarded tracepoint call sites
net: Use trace_call__##name() at guarded tracepoint call sites
accel/habanalabs: Use trace_call__##name() at guarded tracepoint call
sites
cpufreq: Use trace_call__##name() at guarded tracepoint call sites
devfreq: Use trace_call__##name() at guarded tracepoint call sites
dma-buf: Use trace_call__##name() at guarded tracepoint call sites
fsi: Use trace_call__##name() at guarded tracepoint call sites
drm: Use trace_call__##name() at guarded tracepoint call sites
HID: Use trace_call__##name() at guarded tracepoint call sites
i2c: Use trace_call__##name() at guarded tracepoint call sites
spi: Use trace_call__##name() at guarded tracepoint call sites
scsi: ufs: Use trace_call__##name() at guarded tracepoint call sites
btrfs: Use trace_call__##name() at guarded tracepoint call sites
net: devlink: Use trace_call__##name() at guarded tracepoint call
sites
kernel: time, trace: Use trace_call__##name() at guarded tracepoint
call sites
mm: damon: Use trace_call__##name() at guarded tracepoint call sites
x86: msr: Use trace_call__##name() at guarded tracepoint call sites
arch/x86/lib/msr.c | 6 +++---
drivers/accel/habanalabs/common/device.c | 12 ++++++------
drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++-
drivers/accel/habanalabs/common/pci/pci.c | 4 ++--
drivers/cpufreq/amd-pstate.c | 10 +++++-----
drivers/cpufreq/cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 2 +-
drivers/devfreq/devfreq.c | 2 +-
drivers/dma-buf/dma-fence.c | 4 ++--
drivers/fsi/fsi-master-aspeed.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
drivers/gpu/drm/scheduler/sched_entity.c | 4 ++--
drivers/hid/intel-ish-hid/ipc/pci-ish.c | 2 +-
drivers/i2c/i2c-core-slave.c | 2 +-
drivers/spi/spi-axi-spi-engine.c | 4 ++--
drivers/ufs/core/ufshcd.c | 12 ++++++------
fs/btrfs/extent_map.c | 4 ++--
fs/btrfs/raid56.c | 4 ++--
include/linux/tracepoint.h | 11 +++++++++++
io_uring/io_uring.h | 2 +-
kernel/irq_work.c | 2 +-
kernel/sched/ext.c | 2 +-
kernel/smp.c | 2 +-
kernel/time/tick-sched.c | 12 ++++++------
kernel/trace/trace_benchmark.c | 2 +-
mm/damon/core.c | 2 +-
net/core/dev.c | 2 +-
net/core/xdp.c | 2 +-
net/devlink/trap.c | 2 +-
net/openvswitch/actions.c | 2 +-
net/openvswitch/datapath.c | 2 +-
net/sctp/outqueue.c | 2 +-
net/tipc/node.c | 2 +-
35 files changed, 74 insertions(+), 62 deletions(-)
--
2.53.0
^ permalink raw reply
* Re: [PATCH] tracing: fprobe: fix the length of unused fgraph_data
From: Steven Rostedt @ 2026-03-23 14:48 UTC (permalink / raw)
To: Martin Kaiser
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-trace-kernel,
linux-kernel, stable
In-Reply-To: <20260323102020.239567-1-martin@kaiser.cx>
On Mon, 23 Mar 2026 11:19:36 +0100
Martin Kaiser <martin@kaiser.cx> wrote:
> If fprobe_entry does not fill the allocated fgraph_data completely, the
> unused part is zeroed with memset.
>
> Fix the length for this memset call. Both reserved_words and used are in
> units of return stack words, but memset needs the number of bytes.
>
> Cc: stable@vger.kernel.org
> Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
> Signed-off-by: Martin Kaiser <martin@kaiser.cx>
> ---
> kernel/trace/fprobe.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> index dcadf1d23b8a..6a1192515afd 100644
> --- a/kernel/trace/fprobe.c
> +++ b/kernel/trace/fprobe.c
> @@ -451,7 +451,7 @@ static int fprobe_fgraph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops
> }
> }
> if (used < reserved_words)
> - memset(fgraph_data + used, 0, reserved_words - used);
> + memset(fgraph_data + used, 0, (reserved_words - used) * sizeof(long));
So fgraph_data is only used internally between the fprobe_fgraph_entry()
and fprobe_return() as it only exists on the fgraph shadow stack. I'm not
even sure if the unused portion needs to be zeroed out.
Thus, this may be correct, but it doesn't look like a true bug that needs a
stable tag.
-- Steve
>
> /* If any exit_handler is set, data must be used. */
> return used != 0;
^ permalink raw reply
* Re: [PATCH 3/3] rtla: Parse cmdline using libsubcmd
From: Tomas Glozar @ 2026-03-23 14:26 UTC (permalink / raw)
To: Costa Shulyupin
Cc: Steven Rostedt, John Kacur, Luis Goncalves, Crystal Wood,
Wander Lairson Costa, Ivan Pravdin, Namhyung Kim, Ian Rogers,
Arnaldo Carvalho de Melo, LKML, linux-trace-kernel,
linux-perf-users
In-Reply-To: <CADDUTFx7eSqQoA0Sv2hJCcpunosWcg3=VkhWQNoFv1iOC=Fc-w@mail.gmail.com>
so 21. 3. 2026 v 17:08 odesílatel Costa Shulyupin
<costa.shul@redhat.com> napsal:
>
> On Fri, 20 Mar 2026 at 17:07, Tomas Glozar <tglozar@redhat.com> wrote:
> >> +#define TIMERLAT_OPT_NANO OPT_CALLBACK('n', "nano", params, NULL, \
> > + "display data in nanoseconds", \
> > + opt_nano_cb)
>
> -n/--nano requires value incorrectly
>
Yes, this is a mistake.
> File: src/cli.c:463
> Cause: TIMERLAT_OPT_NANO used OPT_CALLBACK which expects an argument,
> but -n is a flag.
> Fix: Changed to OPT_CALLBACK_NOOPT:
>
> > + HIST_OPT_NO_IRQ,
> --no-irq clashes with auto-negation of --irq
>
> File: src/cli.c:1042
> Cause: libsubcmd auto-generates --no-X negations for every option.
> --no-irq (histogram boolean) collides with the auto-negation of --irq
> (stop threshold). The first match wins, so --irq was matched first and
> its negation intercepted the call.
> Fix: Moved HIST_OPT_NO_IRQ before RTLA_OPT_STOP('i', "irq", ...) in
> the options array so the explicit --no-irq boolean is found first.
>
Thank you for reporting that, I had no idea libsubcmd does this, I
haven't seen it documented anywhere, and I never used it in either
perf or bpftool.
It is a very strange thing to do for non-boolean options in my
opinion. If I understand the intention correctly, the option is
intended to unset another option specified before it, i.e. this:
$ ./rtla timerlat hist --event=timer --no-event
is supposed to behave the same as:
$ ./rtla timerlat hist
But it doesn't work:
$ ./rtla timerlat hist --event=timer --no-event
Usage: rtla timerlat hist [<options>]
perf fails even harder on this:
$ ./perf version
perf version 7.0.rc2.g61637af4f771
$ ./perf record --event=cycles --no-event
Segmentation fault (core dumped)
I'm thinking about modifying libsubcmd so this can be at least
disabled (not sure if the behavior is even intended in the non-boolean
case). Reshuffling the RTLA option array could work, too, but is
unintuitive from the developer's point of view, and will also force
histogram options to be before threshold options in the help message,
which is not what I wanted.
Tomas
^ permalink raw reply
* Re: [PATCH 3/3] rtla: Parse cmdline using libsubcmd
From: Tomas Glozar @ 2026-03-23 14:15 UTC (permalink / raw)
To: Wander Lairson Costa
Cc: Steven Rostedt, John Kacur, Luis Goncalves, Crystal Wood,
Costa Shulyupin, Ivan Pravdin, Namhyung Kim, Ian Rogers,
Arnaldo Carvalho de Melo, LKML, linux-trace-kernel,
linux-perf-users
In-Reply-To: <ab2D0a32yuwYTO1K@192.168.0.121>
pá 20. 3. 2026 v 18:32 odesílatel Wander Lairson Costa
<wander@redhat.com> napsal:
> [snip]
>
> > -int getopt_auto(int argc, char **argv, const struct option *long_opts);
> > int common_parse_options(int argc, char **argv, struct common_params *common);
>
>
> The function common_parse_options() body was removed, but the declaration remains.
>
> > int common_apply_config(struct osnoise_tool *tool, struct common_params *params);
>
> [snip]
>
Thanks for noticing that, I will fix that in v2 that will be needed either way.
Tomas
^ permalink raw reply
* Re: [PATCH] module/kallsyms: sort function symbols and use binary search
From: Petr Pavlu @ 2026-03-23 13:06 UTC (permalink / raw)
To: Stanislaw Gruszka
Cc: linux-modules, Sami Tolvanen, Luis Chamberlain, linux-kernel,
linux-trace-kernel, live-patching, Daniel Gomez, Aaron Tomlin,
Steven Rostedt, Masami Hiramatsu, Jordan Rome, Viktor Malik
In-Reply-To: <20260317110423.45481-1-stf_xl@wp.pl>
On 3/17/26 12:04 PM, Stanislaw Gruszka wrote:
> Module symbol lookup via find_kallsyms_symbol() performs a linear scan
> over the entire symtab when resolving an address. The number of symbols
> in module symtabs has grown over the years, largely due to additional
> metadata in non-standard sections, making this lookup very slow.
>
> Improve this by separating function symbols during module load, placing
> them at the beginning of the symtab, sorting them by address, and using
> binary search when resolving addresses in module text.
Doesn't considering only function symbols break the expected behavior
with CONFIG_KALLSYMS_ALL=y. For instance, when using kdb, is it still
able to see all symbols in a module? The module loader should be remain
consistent with the main kallsyms code regarding which symbols can be
looked up.
>
> This also should improve times for linear symbol name lookups, as valid
> function symbols are now located at the beginning of the symtab.
>
> The cost of sorting is small relative to module load time. In repeated
> module load tests [1], depending on .config options, this change
> increases load time between 2% and 4%. With cold caches, the difference
> is not measurable, as memory access latency dominates.
>
> The sorting theoretically could be done in compile time, but much more
> complicated as we would have to simulate kernel addresses resolution
> for symbols, and then correct relocation entries. That would be risky
> if get out of sync.
>
> The improvement can be observed when listing ftrace filter functions:
>
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74908
>
> real 0m1.315s
> user 0m0.000s
> sys 0m1.312s
>
> After:
>
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74911
>
> real 0m0.167s
> user 0m0.004s
> sys 0m0.175s
>
> (there are three more symbols introduced by the patch)
This looks as a reasonable improvement.
>
> For livepatch modules, the symtab layout is preserved and the existing
> linear search is used. For this case, it should be possible to keep
> the original ELF symtab instead of copying it 1:1, but that is outside
> the scope of this patch.
Livepatch modules are already handled specially by the kallsyms module
code so excluding them from this optimization is probably ok.
However, it might be worth revisiting this exception. I believe that
livepatch support requires the original symbol table for relocations to
remain usable. It might make sense to investigate whether updating the
relocation data with the adjusted symbol indexes would be sensible.
--
Thanks,
Petr
^ permalink raw reply
* [PATCH v11 4/4] ring-buffer: Add persistent ring buffer selftest
From: Masami Hiramatsu (Google) @ 2026-03-23 12:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a self-destractive test for the persistent ring buffer. This
will invalidate some sub-buffer pages in the persistent ring buffer
when kernel gets panic, and check whether the number of detected
invalid pages and the total entry_bytes are the same as record
after reboot.
This can ensure the kernel correctly recover partially corrupted
persistent ring buffer when boot.
The test only runs on the persistent ring buffer whose name is
"ptracingtest". And user has to fill it up with events before
kernel panics.
To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
and you have to setup the kernel cmdline;
reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
panic=1
And run following commands after the 1st boot;
cd /sys/kernel/tracing/instances/ptracingtest
echo 1 > tracing_on
echo 1 > events/enable
sleep 3
echo c > /proc/sysrq-trigger
After panic message, the kernel will reboot and run the verification
on the persistent ring buffer, e.g.
Ring buffer meta [2] invalid buffer page detected
Ring buffer meta [2] is from previous boot! (318 pages discarded)
Ring buffer testing [2] invalid pages: PASSED (318/318)
Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476)
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v10:
- Add entry_bytes test.
- Do not compile test code if CONFIG_RING_BUFFER_PERSISTENT_SELFTEST=n.
Changes in v9:
- Test also reader pages.
---
include/linux/ring_buffer.h | 1 +
kernel/trace/Kconfig | 15 +++++++++
kernel/trace/ring_buffer.c | 69 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.c | 4 ++
4 files changed, 89 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index d862fa610270..a92a22c44387 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -238,6 +238,7 @@ int ring_buffer_subbuf_size_get(struct trace_buffer *buffer);
enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
+ RB_FL_TESTING = 1 << 1,
};
#ifdef CONFIG_RING_BUFFER
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 49de13cae428..2e6f3b7c6a31 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -1202,6 +1202,21 @@ config RING_BUFFER_VALIDATE_TIME_DELTAS
Only say Y if you understand what this does, and you
still want it enabled. Otherwise say N
+config RING_BUFFER_PERSISTENT_SELFTEST
+ bool "Enable persistent ring buffer selftest"
+ depends on RING_BUFFER
+ help
+ Run a selftest on the persistent ring buffer which names
+ "ptracingtest" (and its backup) when panic_on_reboot by
+ invalidating ring buffer pages.
+ Note that user has to enable events on the persistent ring
+ buffer manually to fill up ring buffers before rebooting.
+ Since this invalidates the data on test target ring buffer,
+ "ptracingtest" persistent ring buffer must not be used for
+ actual tracing, but only for testing.
+
+ If unsure, say N
+
config MMIOTRACE_TEST
tristate "Test module for mmiotrace"
depends on MMIOTRACE && m
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 99f59f1c63e0..31e479143ca0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -63,6 +63,10 @@ struct ring_buffer_cpu_meta {
unsigned long commit_buffer;
__u32 subbuf_size;
__u32 nr_subbufs;
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+ __u32 nr_invalid;
+ __u32 entry_bytes;
+#endif
int buffers[];
};
@@ -2106,6 +2110,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
cpu_buffer->cpu, discarded);
+
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+ if (meta->nr_invalid)
+ pr_info("Ring buffer testing [%d] invalid pages: %s (%d/%d)\n",
+ cpu_buffer->cpu,
+ (discarded == meta->nr_invalid) ? "PASSED" : "FAILED",
+ discarded, meta->nr_invalid);
+ if (meta->entry_bytes)
+ pr_info("Ring buffer testing [%d] entry_bytes: %s (%ld/%ld)\n",
+ cpu_buffer->cpu,
+ (entry_bytes == meta->entry_bytes) ? "PASSED" : "FAILED",
+ (long)entry_bytes, (long)meta->entry_bytes);
+#endif
return;
invalid:
@@ -2508,12 +2525,64 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+static void rb_test_inject_invalid_pages(struct trace_buffer *buffer)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct ring_buffer_cpu_meta *meta;
+ struct buffer_data_page *dpage;
+ u32 entry_bytes = 0;
+ unsigned long ptr;
+ int subbuf_size;
+ int invalid = 0;
+ int cpu;
+ int i;
+
+ if (!(buffer->flags & RB_FL_TESTING))
+ return;
+
+ guard(preempt)();
+ cpu = smp_processor_id();
+
+ cpu_buffer = buffer->buffers[cpu];
+ meta = cpu_buffer->ring_meta;
+ ptr = (unsigned long)rb_subbufs_from_meta(meta);
+ subbuf_size = meta->subbuf_size;
+
+ for (i = 0; i < meta->nr_subbufs; i++) {
+ int idx = meta->buffers[i];
+
+ dpage = (void *)(ptr + idx * subbuf_size);
+ /* Skip unused pages */
+ if (!local_read(&dpage->commit))
+ continue;
+
+ /* Invalidate even pages. */
+ if (!(i & 0x1)) {
+ local_add(subbuf_size + 1, &dpage->commit);
+ invalid++;
+ } else {
+ /* Count total commit bytes. */
+ entry_bytes += local_read(&dpage->commit);
+ }
+ }
+
+ pr_info("Inject invalidated %d pages on CPU%d, total size: %ld\n",
+ invalid, cpu, (long)entry_bytes);
+ meta->nr_invalid = invalid;
+ meta->entry_bytes = entry_bytes;
+}
+#else /* !CONFIG_RING_BUFFER_PERSISTENT_SELFTEST */
+#define rb_test_inject_invalid_pages(buffer) do { } while (0)
+#endif
+
/* Stop recording on a persistent buffer and flush cache if needed. */
static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
{
struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
ring_buffer_record_off(buffer);
+ rb_test_inject_invalid_pages(buffer);
arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
return NOTIFY_DONE;
}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a626211ceb9a..561bcc1c9125 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9366,6 +9366,8 @@ static void setup_trace_scratch(struct trace_array *tr,
memset(tscratch, 0, size);
}
+#define TRACE_TEST_PTRACING_NAME "ptracingtest"
+
static int
allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned long size)
{
@@ -9378,6 +9380,8 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned
buf->tr = tr;
if (tr->range_addr_start && tr->range_addr_size) {
+ if (!strcmp(tr->name, TRACE_TEST_PTRACING_NAME))
+ rb_flags |= RB_FL_TESTING;
/* Add scratch buffer to handle 128 modules */
buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
tr->range_addr_start,
^ permalink raw reply related
* [PATCH v11 3/4] ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-23 12:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Skip invalid sub-buffers when rewinding the persistent ring buffer
instead of stopping the rewinding the ring buffer. The skipped
buffers are cleared.
To ensure the rewinding stops at the unused page, this also clears
buffer_data_page::time_stamp when tracing resets the buffer. This
allows us to identify unused pages and empty pages.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v12:
- Fix build error.
Changes in v11:
- Reset timestamp when the buffer is invalid.
- When rewinding, skip subbuf page if timestamp is wrong and
check timestamp after validating buffer data page.
Changes in v10:
- Newly added.
---
kernel/trace/ring_buffer.c | 76 +++++++++++++++++++++++++-------------------
1 file changed, 43 insertions(+), 33 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index dd4c8cf350df..99f59f1c63e0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -389,6 +389,7 @@ struct buffer_page {
static void rb_init_page(struct buffer_data_page *bpage)
{
local_set(&bpage->commit, 0);
+ bpage->time_stamp = 0;
}
static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
@@ -1907,12 +1908,14 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
return events;
}
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+static int rb_validate_buffer(struct buffer_page *bpage, int cpu,
struct ring_buffer_cpu_meta *meta)
{
+ struct buffer_data_page *dpage = bpage->page;
unsigned long long ts;
unsigned long tail;
u64 delta;
+ int ret = -1;
/*
* When a sub-buffer is recovered from a read, the commit value may
@@ -1921,9 +1924,17 @@ static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
* subbuf_size is considered invalid.
*/
tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
- if (tail > meta->subbuf_size)
- return -1;
- return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
+ if (tail <= meta->subbuf_size)
+ ret = rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
+
+ if (ret < 0) {
+ local_set(&bpage->entries, 0);
+ local_set(&bpage->page->commit, 0);
+ } else {
+ local_set(&bpage->entries, ret);
+ }
+
+ return ret;
}
/* If the meta data has been validated, now validate the events */
@@ -1944,18 +1955,14 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
orig_head = head_page = cpu_buffer->head_page;
/* Do the reader page first */
- ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
+ ret = rb_validate_buffer(cpu_buffer->reader_page, cpu_buffer->cpu, meta);
if (ret < 0) {
pr_info("Ring buffer meta [%d] invalid reader page detected\n",
cpu_buffer->cpu);
discarded++;
- /* Instead of discard whole ring buffer, discard only this sub-buffer. */
- local_set(&cpu_buffer->reader_page->entries, 0);
- local_set(&cpu_buffer->reader_page->page->commit, 0);
} else {
entries += ret;
entry_bytes += rb_page_size(cpu_buffer->reader_page);
- local_set(&cpu_buffer->reader_page->entries, ret);
}
ts = head_page->page->time_stamp;
@@ -1974,26 +1981,33 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->tail_page)
break;
- /* Ensure the page has older data than head. */
- if (ts < head_page->page->time_stamp)
- break;
-
- ts = head_page->page->time_stamp;
- /* Ensure the page has correct timestamp and some data. */
- if (!ts || rb_page_commit(head_page) == 0)
- break;
-
- /* Stop rewind if the page is invalid. */
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
- if (ret < 0)
+ /* Rewind until unused page (no timestamp, no commit). */
+ if (!head_page->page->time_stamp && rb_page_commit(head_page) == 0)
break;
- /* Recover the number of entries and update stats. */
- local_set(&head_page->entries, ret);
- if (ret)
- local_inc(&cpu_buffer->pages_touched);
- entries += ret;
- entry_bytes += rb_page_commit(head_page);
+ /*
+ * Skip if the page is invalid, or its timestamp is newer than the
+ * previous valid page.
+ */
+ ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta);
+ if (ret >= 0 && ts < head_page->page->time_stamp) {
+ local_set(&head_page->entries, 0);
+ local_set(&head_page->page->commit, 0);
+ head_page->page->time_stamp = ts;
+ ret = -1;
+ }
+ if (ret < 0) {
+ if (!discarded)
+ pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ } else {
+ entries += ret;
+ entry_bytes += rb_page_size(head_page);
+ if (ret > 0)
+ local_inc(&cpu_buffer->pages_touched);
+ ts = head_page->page->time_stamp;
+ }
}
if (i)
pr_info("Ring buffer [%d] rewound %d pages\n", cpu_buffer->cpu, i);
@@ -2063,15 +2077,12 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->reader_page)
continue;
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
+ ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta);
if (ret < 0) {
if (!discarded)
pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
cpu_buffer->cpu);
discarded++;
- /* Instead of discard whole ring buffer, discard only this sub-buffer. */
- local_set(&head_page->entries, 0);
- local_set(&head_page->page->commit, 0);
} else {
/* If the buffer has content, update pages_touched */
if (ret)
@@ -2079,7 +2090,6 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
entries += ret;
entry_bytes += rb_page_size(head_page);
- local_set(&head_page->entries, ret);
}
if (head_page == cpu_buffer->commit_page)
break;
@@ -2110,7 +2120,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
/* Reset all the subbuffers */
for (i = 0; i < meta->nr_subbufs - 1; i++, rb_inc_page(&head_page)) {
local_set(&head_page->entries, 0);
- local_set(&head_page->page->commit, 0);
+ rb_init_page(head_page->page);
}
}
^ permalink raw reply related
* [PATCH v11 2/4] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-23 12:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Only skipped buffers
are invalidated (cleared).
If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffers that are found to be corrupted.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v11:
- Fix a typo.
Changes in v9:
- Add meta->subbuf_size check.
- Fix a typo.
- Handle invalid reader_page case.
Changes in v8:
- Add comment in rb_valudate_buffer()
- Clear the RB_MISSED_* flags in rb_valudate_buffer() instead of
skipping subbuf.
- Remove unused subbuf local variable from rb_cpu_meta_valid().
Changes in v7:
- Combined with Handling RB_MISSED_* flags patch, focus on validation at boot.
- Remove checking subbuffer data when validating metadata, because it should be done
later.
- Do not mark the discarded sub buffer page but just reset it.
Changes in v6:
- Show invalid page detection message once per CPU.
Changes in v5:
- Instead of showing errors for each page, just show the number
of discarded pages at last.
Changes in v3:
- Record missed data event on commit.
---
kernel/trace/ring_buffer.c | 98 ++++++++++++++++++++++++++------------------
1 file changed, 58 insertions(+), 40 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0131ce4e018c..dd4c8cf350df 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
return local_read(&bpage->page->commit);
}
+/* Size is determined by what has been committed */
+static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
+{
+ return rb_page_commit(bpage) & ~RB_MISSED_MASK;
+}
+
static void free_buffer_page(struct buffer_page *bpage)
{
/* Range pages are not to be freed */
@@ -1791,7 +1797,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
unsigned long *subbuf_mask)
{
int subbuf_size = PAGE_SIZE;
- struct buffer_data_page *subbuf;
unsigned long buffers_start;
unsigned long buffers_end;
int i;
@@ -1799,6 +1804,11 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
if (!subbuf_mask)
return false;
+ if (meta->subbuf_size != PAGE_SIZE) {
+ pr_info("Ring buffer boot meta [%d] invalid subbuf_size\n", cpu);
+ return false;
+ }
+
buffers_start = meta->first_buffer;
buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
@@ -1815,11 +1825,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- subbuf = rb_subbufs_from_meta(meta);
-
bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
- /* Is the meta buffers and the subbufs themselves have correct data? */
+ /*
+ * Ensure the meta::buffers array has correct data. The data in each subbufs
+ * are checked later in rb_meta_validate_events().
+ */
for (i = 0; i < meta->nr_subbufs; i++) {
if (meta->buffers[i] < 0 ||
meta->buffers[i] >= meta->nr_subbufs) {
@@ -1827,18 +1838,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
- pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
- return false;
- }
-
if (test_bit(meta->buffers[i], subbuf_mask)) {
pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu);
return false;
}
set_bit(meta->buffers[i], subbuf_mask);
- subbuf = (void *)subbuf + subbuf_size;
}
return true;
@@ -1902,13 +1907,22 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
return events;
}
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+ struct ring_buffer_cpu_meta *meta)
{
unsigned long long ts;
+ unsigned long tail;
u64 delta;
- int tail;
- tail = local_read(&dpage->commit);
+ /*
+ * When a sub-buffer is recovered from a read, the commit value may
+ * have RB_MISSED_* bits set, as these bits are reset on reuse.
+ * Even after clearing these bits, a commit value greater than the
+ * subbuf_size is considered invalid.
+ */
+ tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
+ if (tail > meta->subbuf_size)
+ return -1;
return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
}
@@ -1919,6 +1933,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
struct buffer_page *head_page, *orig_head;
unsigned long entry_bytes = 0;
unsigned long entries = 0;
+ int discarded = 0;
int ret;
u64 ts;
int i;
@@ -1929,14 +1944,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
orig_head = head_page = cpu_buffer->head_page;
/* Do the reader page first */
- ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer reader page is invalid\n");
- goto invalid;
+ pr_info("Ring buffer meta [%d] invalid reader page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&cpu_buffer->reader_page->entries, 0);
+ local_set(&cpu_buffer->reader_page->page->commit, 0);
+ } else {
+ entries += ret;
+ entry_bytes += rb_page_size(cpu_buffer->reader_page);
+ local_set(&cpu_buffer->reader_page->entries, ret);
}
- entries += ret;
- entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
- local_set(&cpu_buffer->reader_page->entries, ret);
ts = head_page->page->time_stamp;
@@ -1964,7 +1984,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
break;
/* Stop rewind if the page is invalid. */
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0)
break;
@@ -2043,21 +2063,24 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->reader_page)
continue;
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer meta [%d] invalid buffer page\n",
- cpu_buffer->cpu);
- goto invalid;
- }
-
- /* If the buffer has content, update pages_touched */
- if (ret)
- local_inc(&cpu_buffer->pages_touched);
-
- entries += ret;
- entry_bytes += local_read(&head_page->page->commit);
- local_set(&head_page->entries, ret);
+ if (!discarded)
+ pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&head_page->entries, 0);
+ local_set(&head_page->page->commit, 0);
+ } else {
+ /* If the buffer has content, update pages_touched */
+ if (ret)
+ local_inc(&cpu_buffer->pages_touched);
+ entries += ret;
+ entry_bytes += rb_page_size(head_page);
+ local_set(&head_page->entries, ret);
+ }
if (head_page == cpu_buffer->commit_page)
break;
}
@@ -2071,7 +2094,8 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
local_set(&cpu_buffer->entries, entries);
local_set(&cpu_buffer->entries_bytes, entry_bytes);
- pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+ pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
+ cpu_buffer->cpu, discarded);
return;
invalid:
@@ -3258,12 +3282,6 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
return NULL;
}
-/* Size is determined by what has been committed */
-static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
-{
- return rb_page_commit(bpage) & ~RB_MISSED_MASK;
-}
-
static __always_inline unsigned
rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
{
^ permalink raw reply related
* [PATCH v11 1/4] ring-buffer: Flush and stop persistent ring buffer on panic
From: Masami Hiramatsu (Google) @ 2026-03-23 12:30 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.
To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v11:
- Do nothing by default since flush_cache_vmap() does nothing on x86
but it can cause deadlock on some architectures via on_each_cpu()
because other CPUs will be stoppped when panic notifier is called.
Changes in v9:
- Fix typo of & to &&.
- Fix typo of "Generic"
Changes in v6:
- Introduce asm/ring_buffer.h for arch_ring_buffer_flush_range().
- Use flush_cache_vmap() instead of flush_cache_all().
Changes in v5:
- Use ring_buffer_record_off() instead of ring_buffer_record_disable().
- Use flush_cache_all() to ensure flush all cache.
Changes in v3:
- update patch description.
---
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/Kbuild | 1 +
arch/arm64/include/asm/ring_buffer.h | 10 ++++++++++
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
arch/parisc/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/riscv/include/asm/Kbuild | 1 +
arch/s390/include/asm/Kbuild | 1 +
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/include/asm/Kbuild | 1 +
arch/x86/include/asm/Kbuild | 1 +
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/ring_buffer.h | 13 +++++++++++++
kernel/trace/ring_buffer.c | 22 ++++++++++++++++++++++
23 files changed, 65 insertions(+)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 483965c5a4de..b154b4e3dfa8 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += agp.h
generic-y += asm-offsets.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4c69522e0328..483caacc6988 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -5,5 +5,6 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 03657ff8fbe3..decad5f2c826 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -3,6 +3,7 @@ generic-y += early_ioremap.h
generic-y += extable.h
generic-y += flat.h
generic-y += parport.h
+generic-y += ring_buffer.h
generated-y += mach-types.h
generated-y += unistd-nr.h
diff --git a/arch/arm64/include/asm/ring_buffer.h b/arch/arm64/include/asm/ring_buffer.h
new file mode 100644
index 000000000000..62316c406888
--- /dev/null
+++ b/arch/arm64/include/asm/ring_buffer.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_ARM64_RING_BUFFER_H
+#define _ASM_ARM64_RING_BUFFER_H
+
+#include <asm/cacheflush.h>
+
+/* Flush D-cache on persistent ring buffer */
+#define arch_ring_buffer_flush_range(start, end) dcache_clean_pop(start, end)
+
+#endif /* _ASM_ARM64_RING_BUFFER_H */
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 3a5c7f6e5aac..7dca0c6cdc84 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 1efa1e993d4b..0f887d4238ed 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += extable.h
generic-y += iomap.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 9034b583a88a..7e92957baf6a 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -10,5 +10,6 @@ generic-y += qrwlock.h
generic-y += user.h
generic-y += ioctl.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
generic-y += statfs.h
generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index b282e0dd8dc1..62543bf305ff 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -3,5 +3,6 @@ generated-y += syscall_table.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 7178f990e8b3..0030309b47ad 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += syscalls.h
generic-y += tlb.h
generic-y += user.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 684569b2ecd6..9771c3d85074 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,5 +12,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 28004301c236..0a2530964413 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += cmpxchg.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index cef49d60d74c..8aa34621702d 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -8,4 +8,5 @@ generic-y += spinlock_types.h
generic-y += spinlock.h
generic-y += qrwlock_types.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 4fb596d94c89..d48d158f7241 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2e23533b67e3..805b5aeebb6f 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -5,4 +5,5 @@ generated-y += syscall_table_spu.h
generic-y += agp.h
generic-y += mcs_spinlock.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += early_ioremap.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index bd5fc9403295..7721b63642f4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -14,5 +14,6 @@ generic-y += ticket_spinlock.h
generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 80bad7de7a04..0c1fc47c3ba0 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,3 +7,4 @@ generated-y += unistd_nr.h
generic-y += asm-offsets.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 4d3f10ed8275..f0403d3ee8ab 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,4 +3,5 @@ generated-y += syscall_table.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 17ee8a273aa6..49c6bb326b75 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..2a1629ba8140 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += module.lds.h
generic-y += parport.h
generic-y += percpu.h
generic-y += preempt.h
+generic-y += ring_buffer.h
generic-y += runtime-const.h
generic-y += softirq_stack.h
generic-y += switch_to.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 4566000e15c4..078fd2c0d69d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -14,3 +14,4 @@ generic-y += early_ioremap.h
generic-y += fprobe.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 13fe45dea296..e57af619263a 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -6,5 +6,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/include/asm-generic/ring_buffer.h b/include/asm-generic/ring_buffer.h
new file mode 100644
index 000000000000..201d2aee1005
--- /dev/null
+++ b/include/asm-generic/ring_buffer.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic arch dependent ring_buffer macros.
+ */
+#ifndef __ASM_GENERIC_RING_BUFFER_H__
+#define __ASM_GENERIC_RING_BUFFER_H__
+
+#include <linux/cacheflush.h>
+
+/* Flush cache on ring buffer range if needed. Do nothing by default. */
+#define arch_ring_buffer_flush_range(start, end) do { } while (0)
+
+#endif /* __ASM_GENERIC_RING_BUFFER_H__ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 170170bd83bd..0131ce4e018c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6,6 +6,7 @@
*/
#include <linux/sched/isolation.h>
#include <linux/trace_recursion.h>
+#include <linux/panic_notifier.h>
#include <linux/trace_events.h>
#include <linux/ring_buffer.h>
#include <linux/trace_clock.h>
@@ -30,6 +31,7 @@
#include <linux/oom.h>
#include <linux/mm.h>
+#include <asm/ring_buffer.h>
#include <asm/local64.h>
#include <asm/local.h>
#include <asm/setup.h>
@@ -589,6 +591,7 @@ struct trace_buffer {
unsigned long range_addr_start;
unsigned long range_addr_end;
+ struct notifier_block flush_nb;
struct ring_buffer_meta *meta;
@@ -2471,6 +2474,16 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+/* Stop recording on a persistent buffer and flush cache if needed. */
+static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
+{
+ struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
+
+ ring_buffer_record_off(buffer);
+ arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
+ return NOTIFY_DONE;
+}
+
static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
int order, unsigned long start,
unsigned long end,
@@ -2590,6 +2603,12 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
mutex_init(&buffer->mutex);
+ /* Persistent ring buffer needs to flush cache before reboot. */
+ if (start && end) {
+ buffer->flush_nb.notifier_call = rb_flush_buffer_cb;
+ atomic_notifier_chain_register(&panic_notifier_list, &buffer->flush_nb);
+ }
+
return_ptr(buffer);
fail_free_buffers:
@@ -2677,6 +2696,9 @@ ring_buffer_free(struct trace_buffer *buffer)
{
int cpu;
+ if (buffer->range_addr_start && buffer->range_addr_end)
+ atomic_notifier_chain_unregister(&panic_notifier_list, &buffer->flush_nb);
+
cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
irq_work_sync(&buffer->irq_work.work);
^ permalink raw reply related
* [PATCH v11 0/4] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu (Google) @ 2026-03-23 12:30 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
Hi,
Here is the 12th version of improvement patches for making persistent
ring buffers robust to failures.
The previous version is here:
https://lore.kernel.org/all/177391152793.193994.8986943289250629418.stgit@mhiramat.tok.corp.google.com/
In this version, I rebased it on tracing/fixes branch (thus
the first fix patch has been removed) and fixed
a build error which came from a copy-paste mistake.
Thank you,
---
Masami Hiramatsu (Google) (4):
ring-buffer: Flush and stop persistent ring buffer on panic
ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
ring-buffer: Add persistent ring buffer selftest
arch/alpha/include/asm/Kbuild | 1
arch/arc/include/asm/Kbuild | 1
arch/arm/include/asm/Kbuild | 1
arch/arm64/include/asm/ring_buffer.h | 10 +
arch/csky/include/asm/Kbuild | 1
arch/hexagon/include/asm/Kbuild | 1
arch/loongarch/include/asm/Kbuild | 1
arch/m68k/include/asm/Kbuild | 1
arch/microblaze/include/asm/Kbuild | 1
arch/mips/include/asm/Kbuild | 1
arch/nios2/include/asm/Kbuild | 1
arch/openrisc/include/asm/Kbuild | 1
arch/parisc/include/asm/Kbuild | 1
arch/powerpc/include/asm/Kbuild | 1
arch/riscv/include/asm/Kbuild | 1
arch/s390/include/asm/Kbuild | 1
arch/sh/include/asm/Kbuild | 1
arch/sparc/include/asm/Kbuild | 1
arch/um/include/asm/Kbuild | 1
arch/x86/include/asm/Kbuild | 1
arch/xtensa/include/asm/Kbuild | 1
include/asm-generic/ring_buffer.h | 13 ++
include/linux/ring_buffer.h | 1
kernel/trace/Kconfig | 15 ++
kernel/trace/ring_buffer.c | 237 ++++++++++++++++++++++++++--------
kernel/trace/trace.c | 4 +
26 files changed, 241 insertions(+), 59 deletions(-)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox