* [PATCH] perf: Extend per event callchain limit to branch stack
@ 2025-03-10 18:15 kan.liang
2025-03-11 11:40 ` Peter Zijlstra
2025-03-17 10:34 ` [tip: perf/core] " tip-bot2 for Kan Liang
0 siblings, 2 replies; 3+ messages in thread
From: kan.liang @ 2025-03-10 18:15 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, linux-kernel; +Cc: ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The commit 97c79a38cd45 ("perf core: Per event callchain limit")
introduced a per-event term to allow finer tuning of the depth of
callchains to save space.
It should be applied to the branch stack as well. For example, autoFDO
collections require maximum LBR entries. In the meantime, other
system-wide LBR users may only be interested in the latest a few number
of LBRs. A per-event LBR depth would save the perf output buffer.
The patch simply drops the uninterested branches, but HW still collects
the maximum branches. There may be a model-specific optimization that
can reduce the HW depth for some cases to reduce the overhead further.
But it isn't included in the patch set. Because it's not useful for all
cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
LBRs. The depth should have less impact on the collecting overhead.
The model-specific optimization may be implemented later separately.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
include/linux/perf_event.h | 3 +++
include/uapi/linux/perf_event.h | 2 ++
2 files changed, 5 insertions(+)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24f2eba200ac..bca1dfd30276 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1347,6 +1347,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
if (branch_sample_hw_index(event))
size += sizeof(u64);
+
+ brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
+
size += brs->nr * sizeof(struct perf_branch_entry);
/*
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0524d541d4e3..5fc753c23734 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -385,6 +385,8 @@ enum perf_event_read_format {
*
* @sample_max_stack: Max number of frame pointers in a callchain,
* should be < /proc/sys/kernel/perf_event_max_stack
+ * Max number of entries of branch stack
+ * should be < hardware limit
*/
struct perf_event_attr {
--
2.38.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] perf: Extend per event callchain limit to branch stack
2025-03-10 18:15 [PATCH] perf: Extend per event callchain limit to branch stack kan.liang
@ 2025-03-11 11:40 ` Peter Zijlstra
2025-03-17 10:34 ` [tip: perf/core] " tip-bot2 for Kan Liang
1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2025-03-11 11:40 UTC (permalink / raw)
To: kan.liang; +Cc: mingo, acme, namhyung, linux-kernel, ak, eranian
On Mon, Mar 10, 2025 at 11:15:36AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> The commit 97c79a38cd45 ("perf core: Per event callchain limit")
> introduced a per-event term to allow finer tuning of the depth of
> callchains to save space.
>
> It should be applied to the branch stack as well. For example, autoFDO
> collections require maximum LBR entries. In the meantime, other
> system-wide LBR users may only be interested in the latest a few number
> of LBRs. A per-event LBR depth would save the perf output buffer.
>
> The patch simply drops the uninterested branches, but HW still collects
> the maximum branches. There may be a model-specific optimization that
> can reduce the HW depth for some cases to reduce the overhead further.
> But it isn't included in the patch set. Because it's not useful for all
> cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
> LBRs. The depth should have less impact on the collecting overhead.
> The model-specific optimization may be implemented later separately.
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Thanks!
> ---
> include/linux/perf_event.h | 3 +++
> include/uapi/linux/perf_event.h | 2 ++
> 2 files changed, 5 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 24f2eba200ac..bca1dfd30276 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1347,6 +1347,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
>
> if (branch_sample_hw_index(event))
> size += sizeof(u64);
> +
> + brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
> +
> size += brs->nr * sizeof(struct perf_branch_entry);
>
> /*
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 0524d541d4e3..5fc753c23734 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -385,6 +385,8 @@ enum perf_event_read_format {
> *
> * @sample_max_stack: Max number of frame pointers in a callchain,
> * should be < /proc/sys/kernel/perf_event_max_stack
> + * Max number of entries of branch stack
> + * should be < hardware limit
> */
> struct perf_event_attr {
>
> --
> 2.38.1
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [tip: perf/core] perf: Extend per event callchain limit to branch stack
2025-03-10 18:15 [PATCH] perf: Extend per event callchain limit to branch stack kan.liang
2025-03-11 11:40 ` Peter Zijlstra
@ 2025-03-17 10:34 ` tip-bot2 for Kan Liang
1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Kan Liang @ 2025-03-17 10:34 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: c53e14f1ea4a8f8ddd9b2cd850fcbc0d934b79f5
Gitweb: https://git.kernel.org/tip/c53e14f1ea4a8f8ddd9b2cd850fcbc0d934b79f5
Author: Kan Liang <kan.liang@linux.intel.com>
AuthorDate: Mon, 10 Mar 2025 11:15:36 -07:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 17 Mar 2025 11:23:36 +01:00
perf: Extend per event callchain limit to branch stack
The commit 97c79a38cd45 ("perf core: Per event callchain limit")
introduced a per-event term to allow finer tuning of the depth of
callchains to save space.
It should be applied to the branch stack as well. For example, autoFDO
collections require maximum LBR entries. In the meantime, other
system-wide LBR users may only be interested in the latest a few number
of LBRs. A per-event LBR depth would save the perf output buffer.
The patch simply drops the uninterested branches, but HW still collects
the maximum branches. There may be a model-specific optimization that
can reduce the HW depth for some cases to reduce the overhead further.
But it isn't included in the patch set. Because it's not useful for all
cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
LBRs. The depth should have less impact on the collecting overhead.
The model-specific optimization may be implemented later separately.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310181536.3645382-1-kan.liang@linux.intel.com
---
include/linux/perf_event.h | 3 +++
include/uapi/linux/perf_event.h | 2 ++
2 files changed, 5 insertions(+)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 76f4265..3e27082 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1347,6 +1347,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
if (branch_sample_hw_index(event))
size += sizeof(u64);
+
+ brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
+
size += brs->nr * sizeof(struct perf_branch_entry);
/*
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0524d54..5fc753c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -385,6 +385,8 @@ enum perf_event_read_format {
*
* @sample_max_stack: Max number of frame pointers in a callchain,
* should be < /proc/sys/kernel/perf_event_max_stack
+ * Max number of entries of branch stack
+ * should be < hardware limit
*/
struct perf_event_attr {
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-03-17 10:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-10 18:15 [PATCH] perf: Extend per event callchain limit to branch stack kan.liang
2025-03-11 11:40 ` Peter Zijlstra
2025-03-17 10:34 ` [tip: perf/core] " tip-bot2 for Kan Liang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox