* [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm
@ 2025-04-21 21:58 Yabin Cui
2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui
2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui
0 siblings, 2 replies; 14+ messages in thread
From: Yabin Cui @ 2025-04-21 21:58 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang Kan
Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users,
Yabin Cui
Hi perf maintainers,
Hi coresight maintainers,
This patch set (2 patches) addresses memory fragmentation caused by
contiguous AUX buffer allocation for the cs_etm PMU on Android.
The cs_etm PMU doesn't need contiguous AUX pages, yet perf always allocates
contiguous AUX pages based on aux_watermark. So repeated use of cs_etm
with large buffers leads to memory fragmentation, negatively impacting
other processes.
This solution introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES to allow
cs_etm to request non-contiguous AUX buffers, avoiding high-order page
allocations and reducing fragmentation.
This aims to reduce memory fragmentation for Android devices when using
cs_etm. Your review is appreciated.
Thanks,
Yabin
Yabin Cui (2):
perf: Allow non-contiguous AUX buffer pages via PMU capability
coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU
drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++-
include/linux/perf_event.h | 1 +
kernel/events/ring_buffer.c | 6 ++++++
3 files changed, 9 insertions(+), 1 deletion(-)
--
2.49.0.805.g082f7c87e0-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-21 21:58 [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm Yabin Cui
@ 2025-04-21 21:58 ` Yabin Cui
2025-04-22 10:21 ` James Clark
2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui
1 sibling, 1 reply; 14+ messages in thread
From: Yabin Cui @ 2025-04-21 21:58 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang Kan
Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users,
Yabin Cui
For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary
and increase memory fragmentation.
This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing
PMUs to request non-contiguous pages for their AUX buffers.
Signed-off-by: Yabin Cui <yabinc@google.com>
---
include/linux/perf_event.h | 1 +
kernel/events/ring_buffer.c | 6 ++++++
2 files changed, 7 insertions(+)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0069ba6866a4..26ca35d6a9f2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -301,6 +301,7 @@ struct perf_event_pmu_context;
#define PERF_PMU_CAP_AUX_OUTPUT 0x0080
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
#define PERF_PMU_CAP_AUX_PAUSE 0x0200
+#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400
/**
* pmu::scope
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 5130b119d0ae..87f42f4e8edc 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
max_order = ilog2(nr_pages);
watermark = 0;
}
+ /*
+ * When the PMU doesn't prefer contiguous AUX buffer pages, favor
+ * low-order allocations to reduce memory fragmentation.
+ */
+ if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES)
+ max_order = 0;
/*
* kcalloc_node() is unable to allocate buffer if the size is larger
--
2.49.0.805.g082f7c87e0-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU
2025-04-21 21:58 [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm Yabin Cui
2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui
@ 2025-04-21 21:58 ` Yabin Cui
2025-04-22 14:21 ` Leo Yan
1 sibling, 1 reply; 14+ messages in thread
From: Yabin Cui @ 2025-04-21 21:58 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang Kan
Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users,
Yabin Cui
The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or
TRBE), doesn't require contiguous pages for its AUX buffer.
This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability
to the cs_etm PMU. This allows the kernel to allocate non-contiguous
pages for the AUX buffer, reducing memory fragmentation when using
cs_etm.
Signed-off-by: Yabin Cui <yabinc@google.com>
---
drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index f4cccd68e625..c98646eca7f8 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -899,7 +899,8 @@ int __init etm_perf_init(void)
int ret;
etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
- PERF_PMU_CAP_ITRACE);
+ PERF_PMU_CAP_ITRACE |
+ PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES);
etm_pmu.attr_groups = etm_pmu_attr_groups;
etm_pmu.task_ctx_nr = perf_sw_context;
--
2.49.0.805.g082f7c87e0-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui
@ 2025-04-22 10:21 ` James Clark
2025-04-22 12:49 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: James Clark @ 2025-04-22 10:21 UTC (permalink / raw)
To: Yabin Cui
Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users,
Suzuki K Poulose, Mike Leach, Alexander Shishkin, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan
On 21/04/2025 10:58 pm, Yabin Cui wrote:
> For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary
> and increase memory fragmentation.
>
> This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing
> PMUs to request non-contiguous pages for their AUX buffers.
>
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
> include/linux/perf_event.h | 1 +
> kernel/events/ring_buffer.c | 6 ++++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0069ba6866a4..26ca35d6a9f2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400
>
> /**
> * pmu::scope
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5130b119d0ae..87f42f4e8edc 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> max_order = ilog2(nr_pages);
> watermark = 0;
> }
> + /*
> + * When the PMU doesn't prefer contiguous AUX buffer pages, favor
> + * low-order allocations to reduce memory fragmentation.
> + */
> + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES)
> + max_order = 0;
>
> /*
> * kcalloc_node() is unable to allocate buffer if the size is larger
Hi Yabin,
I was wondering if this is just the opposite of PERF_PMU_CAP_AUX_NO_SG,
and that order 0 should be used by default for all devices to solve the
issue you describe. Because we already have PERF_PMU_CAP_AUX_NO_SG for
devices that need contiguous pages. Then I found commit 5768402fd9c6
("perf/ring_buffer: Use high order allocations for AUX buffers
optimistically") that explains that the current allocation strategy is
an optimization.
Your change seems to decide that for certain devices we want to optimize
for fragmentation rather than performance. If these are rarely used
features specifically when looking at performance should we not continue
to optimize for performance? Or at least make it user configurable?
Thanks
James
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-22 10:21 ` James Clark
@ 2025-04-22 12:49 ` Ingo Molnar
2025-04-22 14:10 ` Leo Yan
0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2025-04-22 12:49 UTC (permalink / raw)
To: James Clark
Cc: Yabin Cui, coresight, linux-arm-kernel, linux-kernel,
linux-perf-users, Suzuki K Poulose, Mike Leach,
Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
Ian Rogers, Adrian Hunter, Liang Kan
* James Clark <james.clark@linaro.org> wrote:
>
>
> On 21/04/2025 10:58 pm, Yabin Cui wrote:
> > For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary
> > and increase memory fragmentation.
> >
> > This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing
> > PMUs to request non-contiguous pages for their AUX buffers.
> >
> > Signed-off-by: Yabin Cui <yabinc@google.com>
> > ---
> > include/linux/perf_event.h | 1 +
> > kernel/events/ring_buffer.c | 6 ++++++
> > 2 files changed, 7 insertions(+)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 0069ba6866a4..26ca35d6a9f2 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
> > #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> > #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> > #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> > +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400
> > /**
> > * pmu::scope
> > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> > index 5130b119d0ae..87f42f4e8edc 100644
> > --- a/kernel/events/ring_buffer.c
> > +++ b/kernel/events/ring_buffer.c
> > @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> > max_order = ilog2(nr_pages);
> > watermark = 0;
> > }
> > + /*
> > + * When the PMU doesn't prefer contiguous AUX buffer pages, favor
> > + * low-order allocations to reduce memory fragmentation.
> > + */
> > + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES)
> > + max_order = 0;
> > /*
> > * kcalloc_node() is unable to allocate buffer if the size is larger
>
> Hi Yabin,
>
> I was wondering if this is just the opposite of
> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default
> for all devices to solve the issue you describe. Because we already
> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages.
> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order
> allocations for AUX buffers optimistically") that explains that the
> current allocation strategy is an optimization.
>
> Your change seems to decide that for certain devices we want to
> optimize for fragmentation rather than performance. If these are
> rarely used features specifically when looking at performance should
> we not continue to optimize for performance? Or at least make it user
> configurable?
So there seems to be 3 categories:
- 1) Must have physically contiguous AUX buffers, it's a hardware ABI.
(PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.)
- 2) Would be nice to have continguous AUX buffers, for a bit more
performance.
- 3) Doesn't really care.
So we do have #1, and it appears Yabin's usecase is #3?
I strongly suspect that #2 and #3 are mostly the same in practice, and
that we don't really need a lot of differentiation and complexity here,
just the AUX_NO_SG flag that must have a max-order allocation - all
other cases should allocate the AUX buffer in a default-nice,
MM-friendly way.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-22 12:49 ` Ingo Molnar
@ 2025-04-22 14:10 ` Leo Yan
2025-04-23 19:52 ` Yabin Cui
0 siblings, 1 reply; 14+ messages in thread
From: Leo Yan @ 2025-04-22 14:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: James Clark, Yabin Cui, coresight, linux-arm-kernel, linux-kernel,
linux-perf-users, Mike Leach, Alexander Shishkin, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan
On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote:
[...]
> > Hi Yabin,
> >
> > I was wondering if this is just the opposite of
> > PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default
> > for all devices to solve the issue you describe. Because we already
> > have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages.
> > Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order
> > allocations for AUX buffers optimistically") that explains that the
> > current allocation strategy is an optimization.
> >
> > Your change seems to decide that for certain devices we want to
> > optimize for fragmentation rather than performance. If these are
> > rarely used features specifically when looking at performance should
> > we not continue to optimize for performance? Or at least make it user
> > configurable?
>
> So there seems to be 3 categories:
>
> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI.
> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.)
>
> - 2) Would be nice to have continguous AUX buffers, for a bit more
> performance.
>
> - 3) Doesn't really care.
>
> So we do have #1, and it appears Yabin's usecase is #3?
In Yabin's case, the AUX buffer work as a bounce buffer. The hardware
trace data is copied by a driver from low level's contiguous buffer to
the AUX buffer.
In this case we cannot benefit much from continguous AUX buffers.
Thanks,
Leo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU
2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui
@ 2025-04-22 14:21 ` Leo Yan
2025-04-23 20:01 ` Yabin Cui
0 siblings, 1 reply; 14+ messages in thread
From: Leo Yan @ 2025-04-22 14:21 UTC (permalink / raw)
To: Yabin Cui
Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang Kan, coresight, linux-arm-kernel, linux-kernel,
linux-perf-users
On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote:
> The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or
> TRBE), doesn't require contiguous pages for its AUX buffer.
Though contiguous pages are not mandatory for TRBE, I would set the
PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit
performance.
For non per CPU sinks, it is fine to allocate non-contiguous pages.
Thanks,
Leo
> This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability
> to the cs_etm PMU. This allows the kernel to allocate non-contiguous
> pages for the AUX buffer, reducing memory fragmentation when using
> cs_etm.
>
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
> drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index f4cccd68e625..c98646eca7f8 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -899,7 +899,8 @@ int __init etm_perf_init(void)
> int ret;
>
> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
> - PERF_PMU_CAP_ITRACE);
> + PERF_PMU_CAP_ITRACE |
> + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES);
>
> etm_pmu.attr_groups = etm_pmu_attr_groups;
> etm_pmu.task_ctx_nr = perf_sw_context;
> --
> 2.49.0.805.g082f7c87e0-goog
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-22 14:10 ` Leo Yan
@ 2025-04-23 19:52 ` Yabin Cui
2025-04-28 8:56 ` James Clark
0 siblings, 1 reply; 14+ messages in thread
From: Yabin Cui @ 2025-04-23 19:52 UTC (permalink / raw)
To: Leo Yan
Cc: Ingo Molnar, James Clark, coresight, linux-arm-kernel,
linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang Kan
On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote:
>
> [...]
>
> > > Hi Yabin,
> > >
> > > I was wondering if this is just the opposite of
> > > PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default
> > > for all devices to solve the issue you describe. Because we already
> > > have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages.
> > > Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order
> > > allocations for AUX buffers optimistically") that explains that the
> > > current allocation strategy is an optimization.
> > >
> > > Your change seems to decide that for certain devices we want to
> > > optimize for fragmentation rather than performance. If these are
> > > rarely used features specifically when looking at performance should
> > > we not continue to optimize for performance? Or at least make it user
> > > configurable?
> >
> > So there seems to be 3 categories:
> >
> > - 1) Must have physically contiguous AUX buffers, it's a hardware ABI.
> > (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.)
> >
> > - 2) Would be nice to have continguous AUX buffers, for a bit more
> > performance.
> >
> > - 3) Doesn't really care.
> >
> > So we do have #1, and it appears Yabin's usecase is #3?
Yes, in my usecase, I care much more about MM-friendly than a little potential
performance when using PMU. It's not a rarely used feature. On Android, we
collect ETM data periodically on internal user devices for AutoFDO optimization
(for both userspace libraries and the kernel). Allocating a large
chunk of contiguous
AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may
need to kill many processes to fulfill the request. It affects user
experience even
after using PMU.
I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to
sacrifice performance for MM-friendly, why support scatter gather mode? If there
are strong performance reasons to allocate contiguous AUX pages in
scatter gather
mode, I hope max_order is configurable in userspace.
Currently, max_order is affected by aux_watermark. But aux_watermark
also affects
how frequently the PMU overflows AUX buffer and notifies userspace.
It's not ideal
to set aux_watermark to 1 page size. So if we want to make max_order user
configurable, maybe we can add a one bit field in perf_event_attr?
>
> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware
> trace data is copied by a driver from low level's contiguous buffer to
> the AUX buffer.
>
> In this case we cannot benefit much from continguous AUX buffers.
>
> Thanks,
> Leo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU
2025-04-22 14:21 ` Leo Yan
@ 2025-04-23 20:01 ` Yabin Cui
2025-04-24 11:29 ` Anshuman Khandual
0 siblings, 1 reply; 14+ messages in thread
From: Yabin Cui @ 2025-04-23 20:01 UTC (permalink / raw)
To: Leo Yan
Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang Kan, coresight, linux-arm-kernel, linux-kernel,
linux-perf-users
On Tue, Apr 22, 2025 at 7:21 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote:
> > The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or
> > TRBE), doesn't require contiguous pages for its AUX buffer.
>
> Though contiguous pages are not mandatory for TRBE, I would set the
> PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit
> performance.
As explained in the patch 1/2, my use case periodically collects ETM data
from the field (using both TRBE and ETR), and needs to reduce memory
fragmentation. If the performance impact is big, we can make it user
configurable. Otherwise, shall we default it to non-contiguous pages?
>
> For non per CPU sinks, it is fine to allocate non-contiguous pages.
>
> Thanks,
> Leo
>
> > This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability
> > to the cs_etm PMU. This allows the kernel to allocate non-contiguous
> > pages for the AUX buffer, reducing memory fragmentation when using
> > cs_etm.
> >
> > Signed-off-by: Yabin Cui <yabinc@google.com>
> > ---
> > drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > index f4cccd68e625..c98646eca7f8 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > @@ -899,7 +899,8 @@ int __init etm_perf_init(void)
> > int ret;
> >
> > etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
> > - PERF_PMU_CAP_ITRACE);
> > + PERF_PMU_CAP_ITRACE |
> > + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES);
> >
> > etm_pmu.attr_groups = etm_pmu_attr_groups;
> > etm_pmu.task_ctx_nr = perf_sw_context;
> > --
> > 2.49.0.805.g082f7c87e0-goog
> >
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU
2025-04-23 20:01 ` Yabin Cui
@ 2025-04-24 11:29 ` Anshuman Khandual
2025-04-24 18:32 ` Yabin Cui
0 siblings, 1 reply; 14+ messages in thread
From: Anshuman Khandual @ 2025-04-24 11:29 UTC (permalink / raw)
To: Yabin Cui, Leo Yan
Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang Kan, coresight, linux-arm-kernel, linux-kernel,
linux-perf-users
On 4/24/25 01:31, Yabin Cui wrote:
> On Tue, Apr 22, 2025 at 7:21 AM Leo Yan <leo.yan@arm.com> wrote:
>>
>> On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote:
>>> The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or
>>> TRBE), doesn't require contiguous pages for its AUX buffer.
>>
>> Though contiguous pages are not mandatory for TRBE, I would set the
>> PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit
>> performance.
>
> As explained in the patch 1/2, my use case periodically collects ETM data
> from the field (using both TRBE and ETR), and needs to reduce memory
> fragmentation. If the performance impact is big, we can make it user
> configurable. Otherwise, shall we default it to non-contiguous pages?
But is not that already happening ? cs_etm does not set the PMU cap
PERF_PMU_CAP_AUX_NO_SG that means it can allocate non-contig memory
chunk. Where am I missing ?
>
>>
>> For non per CPU sinks, it is fine to allocate non-contiguous pages.
>>
>> Thanks,
>> Leo
>>
>>> This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability
>>> to the cs_etm PMU. This allows the kernel to allocate non-contiguous
>>> pages for the AUX buffer, reducing memory fragmentation when using
>>> cs_etm.
>>>
>>> Signed-off-by: Yabin Cui <yabinc@google.com>
>>> ---
>>> drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> index f4cccd68e625..c98646eca7f8 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> @@ -899,7 +899,8 @@ int __init etm_perf_init(void)
>>> int ret;
>>>
>>> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
>>> - PERF_PMU_CAP_ITRACE);
>>> + PERF_PMU_CAP_ITRACE |
>>> + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES);
>>>
>>> etm_pmu.attr_groups = etm_pmu_attr_groups;
>>> etm_pmu.task_ctx_nr = perf_sw_context;
>>> --
>>> 2.49.0.805.g082f7c87e0-goog
>>>
>>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU
2025-04-24 11:29 ` Anshuman Khandual
@ 2025-04-24 18:32 ` Yabin Cui
0 siblings, 0 replies; 14+ messages in thread
From: Yabin Cui @ 2025-04-24 18:32 UTC (permalink / raw)
To: Anshuman Khandual
Cc: Leo Yan, Suzuki K Poulose, Mike Leach, James Clark,
Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
Ian Rogers, Adrian Hunter, Liang Kan, coresight, linux-arm-kernel,
linux-kernel, linux-perf-users
On Thu, Apr 24, 2025 at 4:29 AM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 4/24/25 01:31, Yabin Cui wrote:
> > On Tue, Apr 22, 2025 at 7:21 AM Leo Yan <leo.yan@arm.com> wrote:
> >>
> >> On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote:
> >>> The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or
> >>> TRBE), doesn't require contiguous pages for its AUX buffer.
> >>
> >> Though contiguous pages are not mandatory for TRBE, I would set the
> >> PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit
> >> performance.
> >
> > As explained in the patch 1/2, my use case periodically collects ETM data
> > from the field (using both TRBE and ETR), and needs to reduce memory
> > fragmentation. If the performance impact is big, we can make it user
> > configurable. Otherwise, shall we default it to non-contiguous pages?
>
> But is not that already happening ? cs_etm does not set the PMU cap
> PERF_PMU_CAP_AUX_NO_SG that means it can allocate non-contig memory
> chunk. Where am I missing ?
Although cs_etm doesn't set AUX_NO_SG flag, the perf component still prefers
to allocate contiguous AUX pages for it. The new flag is to ask perf
component to
not allocate contiguous AUX pages.
>
> >
> >>
> >> For non per CPU sinks, it is fine to allocate non-contiguous pages.
> >>
> >> Thanks,
> >> Leo
> >>
> >>> This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability
> >>> to the cs_etm PMU. This allows the kernel to allocate non-contiguous
> >>> pages for the AUX buffer, reducing memory fragmentation when using
> >>> cs_etm.
> >>>
> >>> Signed-off-by: Yabin Cui <yabinc@google.com>
> >>> ---
> >>> drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++-
> >>> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> >>> index f4cccd68e625..c98646eca7f8 100644
> >>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> >>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> >>> @@ -899,7 +899,8 @@ int __init etm_perf_init(void)
> >>> int ret;
> >>>
> >>> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
> >>> - PERF_PMU_CAP_ITRACE);
> >>> + PERF_PMU_CAP_ITRACE |
> >>> + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES);
> >>>
> >>> etm_pmu.attr_groups = etm_pmu_attr_groups;
> >>> etm_pmu.task_ctx_nr = perf_sw_context;
> >>> --
> >>> 2.49.0.805.g082f7c87e0-goog
> >>>
> >>>
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-23 19:52 ` Yabin Cui
@ 2025-04-28 8:56 ` James Clark
2025-04-29 17:02 ` Yabin Cui
0 siblings, 1 reply; 14+ messages in thread
From: James Clark @ 2025-04-28 8:56 UTC (permalink / raw)
To: Yabin Cui, Leo Yan, Ingo Molnar
Cc: Ingo Molnar, coresight, linux-arm-kernel, linux-kernel,
linux-perf-users, Mike Leach, Alexander Shishkin, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan
On 23/04/2025 8:52 pm, Yabin Cui wrote:
> On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote:
>>
>> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote:
>>
>> [...]
>>
>>>> Hi Yabin,
>>>>
>>>> I was wondering if this is just the opposite of
>>>> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default
>>>> for all devices to solve the issue you describe. Because we already
>>>> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages.
>>>> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order
>>>> allocations for AUX buffers optimistically") that explains that the
>>>> current allocation strategy is an optimization.
>>>>
>>>> Your change seems to decide that for certain devices we want to
>>>> optimize for fragmentation rather than performance. If these are
>>>> rarely used features specifically when looking at performance should
>>>> we not continue to optimize for performance? Or at least make it user
>>>> configurable?
>>>
>>> So there seems to be 3 categories:
>>>
>>> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI.
>>> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.)
>>>
>>> - 2) Would be nice to have continguous AUX buffers, for a bit more
>>> performance.
>>>
>>> - 3) Doesn't really care.
>>>
>>> So we do have #1, and it appears Yabin's usecase is #3?
>
> Yes, in my usecase, I care much more about MM-friendly than a little potential
> performance when using PMU. It's not a rarely used feature. On Android, we
> collect ETM data periodically on internal user devices for AutoFDO optimization
> (for both userspace libraries and the kernel). Allocating a large
> chunk of contiguous
> AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may
> need to kill many processes to fulfill the request. It affects user
> experience even
> after using PMU.
>
> I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to
> sacrifice performance for MM-friendly, why support scatter gather mode? If there
> are strong performance reasons to allocate contiguous AUX pages in
> scatter gather
> mode, I hope max_order is configurable in userspace.
>
> Currently, max_order is affected by aux_watermark. But aux_watermark
> also affects
> how frequently the PMU overflows AUX buffer and notifies userspace.
> It's not ideal
> to set aux_watermark to 1 page size. So if we want to make max_order user
> configurable, maybe we can add a one bit field in perf_event_attr?
>
>>
>> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware
>> trace data is copied by a driver from low level's contiguous buffer to
>> the AUX buffer.
>>
>> In this case we cannot benefit much from continguous AUX buffers.
>>
>> Thanks,
>> Leo
Hi Yabin,
So after doing some testing it looks like there is 0 difference in
overhead for max_order=0 vs ensuring the buffer is one contiguous
allocation for Arm SPE, and TRBE would be exactly the same. This makes
sense because we're vmapping pages individually anyway regardless of the
base allocation.
Seems like the performance optimization of the optimistically large
mappings is only for devices that require extra buffer management stuff
other than normal virtual memory. Can we add a new capability
PERF_PMU_CAP_AUX_PREFER_LARGE and apply it to Intel PT and BTS? Then the
old (before the optimistic large allocs change) max_order=0 behavior
becomes the default again, and PREFER_LARGE is just for those two
devices. Other and new devices would get the more memory friendly
allocations by default, as it's unlikely they'll benefit from anything
different.
Thanks
James
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-28 8:56 ` James Clark
@ 2025-04-29 17:02 ` Yabin Cui
2025-04-29 21:35 ` Yabin Cui
0 siblings, 1 reply; 14+ messages in thread
From: Yabin Cui @ 2025-04-29 17:02 UTC (permalink / raw)
To: James Clark
Cc: Leo Yan, Ingo Molnar, Ingo Molnar, coresight, linux-arm-kernel,
linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan
On Mon, Apr 28, 2025 at 1:56 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 23/04/2025 8:52 pm, Yabin Cui wrote:
> > On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote:
> >>
> >> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote:
> >>
> >> [...]
> >>
> >>>> Hi Yabin,
> >>>>
> >>>> I was wondering if this is just the opposite of
> >>>> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default
> >>>> for all devices to solve the issue you describe. Because we already
> >>>> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages.
> >>>> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order
> >>>> allocations for AUX buffers optimistically") that explains that the
> >>>> current allocation strategy is an optimization.
> >>>>
> >>>> Your change seems to decide that for certain devices we want to
> >>>> optimize for fragmentation rather than performance. If these are
> >>>> rarely used features specifically when looking at performance should
> >>>> we not continue to optimize for performance? Or at least make it user
> >>>> configurable?
> >>>
> >>> So there seems to be 3 categories:
> >>>
> >>> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI.
> >>> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.)
> >>>
> >>> - 2) Would be nice to have continguous AUX buffers, for a bit more
> >>> performance.
> >>>
> >>> - 3) Doesn't really care.
> >>>
> >>> So we do have #1, and it appears Yabin's usecase is #3?
> >
> > Yes, in my usecase, I care much more about MM-friendly than a little potential
> > performance when using PMU. It's not a rarely used feature. On Android, we
> > collect ETM data periodically on internal user devices for AutoFDO optimization
> > (for both userspace libraries and the kernel). Allocating a large
> > chunk of contiguous
> > AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may
> > need to kill many processes to fulfill the request. It affects user
> > experience even
> > after using PMU.
> >
> > I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to
> > sacrifice performance for MM-friendly, why support scatter gather mode? If there
> > are strong performance reasons to allocate contiguous AUX pages in
> > scatter gather
> > mode, I hope max_order is configurable in userspace.
> >
> > Currently, max_order is affected by aux_watermark. But aux_watermark
> > also affects
> > how frequently the PMU overflows AUX buffer and notifies userspace.
> > It's not ideal
> > to set aux_watermark to 1 page size. So if we want to make max_order user
> > configurable, maybe we can add a one bit field in perf_event_attr?
> >
> >>
> >> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware
> >> trace data is copied by a driver from low level's contiguous buffer to
> >> the AUX buffer.
> >>
> >> In this case we cannot benefit much from continguous AUX buffers.
> >>
> >> Thanks,
> >> Leo
>
> Hi Yabin,
>
> So after doing some testing it looks like there is 0 difference in
> overhead for max_order=0 vs ensuring the buffer is one contiguous
> allocation for Arm SPE, and TRBE would be exactly the same. This makes
> sense because we're vmapping pages individually anyway regardless of the
> base allocation.
>
> Seems like the performance optimization of the optimistically large
> mappings is only for devices that require extra buffer management stuff
> other than normal virtual memory. Can we add a new capability
> PERF_PMU_CAP_AUX_PREFER_LARGE and apply it to Intel PT and BTS? Then the
> old (before the optimistic large allocs change) max_order=0 behavior
> becomes the default again, and PREFER_LARGE is just for those two
> devices. Other and new devices would get the more memory friendly
> allocations by default, as it's unlikely they'll benefit from anything
> different.
>
Good suggestion! I will upload a v2 patch for that.
>
> Thanks
> James
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability
2025-04-29 17:02 ` Yabin Cui
@ 2025-04-29 21:35 ` Yabin Cui
0 siblings, 0 replies; 14+ messages in thread
From: Yabin Cui @ 2025-04-29 21:35 UTC (permalink / raw)
To: James Clark
Cc: Leo Yan, Ingo Molnar, Ingo Molnar, coresight, linux-arm-kernel,
linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin,
Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan
On Tue, Apr 29, 2025 at 10:02 AM Yabin Cui <yabinc@google.com> wrote:
>
> On Mon, Apr 28, 2025 at 1:56 AM James Clark <james.clark@linaro.org> wrote:
> >
> >
> >
> > On 23/04/2025 8:52 pm, Yabin Cui wrote:
> > > On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote:
> > >>
> > >> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote:
> > >>
> > >> [...]
> > >>
> > >>>> Hi Yabin,
> > >>>>
> > >>>> I was wondering if this is just the opposite of
> > >>>> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default
> > >>>> for all devices to solve the issue you describe. Because we already
> > >>>> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages.
> > >>>> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order
> > >>>> allocations for AUX buffers optimistically") that explains that the
> > >>>> current allocation strategy is an optimization.
> > >>>>
> > >>>> Your change seems to decide that for certain devices we want to
> > >>>> optimize for fragmentation rather than performance. If these are
> > >>>> rarely used features specifically when looking at performance should
> > >>>> we not continue to optimize for performance? Or at least make it user
> > >>>> configurable?
> > >>>
> > >>> So there seems to be 3 categories:
> > >>>
> > >>> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI.
> > >>> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.)
> > >>>
> > >>> - 2) Would be nice to have continguous AUX buffers, for a bit more
> > >>> performance.
> > >>>
> > >>> - 3) Doesn't really care.
> > >>>
> > >>> So we do have #1, and it appears Yabin's usecase is #3?
> > >
> > > Yes, in my usecase, I care much more about MM-friendly than a little potential
> > > performance when using PMU. It's not a rarely used feature. On Android, we
> > > collect ETM data periodically on internal user devices for AutoFDO optimization
> > > (for both userspace libraries and the kernel). Allocating a large
> > > chunk of contiguous
> > > AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may
> > > need to kill many processes to fulfill the request. It affects user
> > > experience even
> > > after using PMU.
> > >
> > > I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to
> > > sacrifice performance for MM-friendly, why support scatter gather mode? If there
> > > are strong performance reasons to allocate contiguous AUX pages in
> > > scatter gather
> > > mode, I hope max_order is configurable in userspace.
> > >
> > > Currently, max_order is affected by aux_watermark. But aux_watermark
> > > also affects
> > > how frequently the PMU overflows AUX buffer and notifies userspace.
> > > It's not ideal
> > > to set aux_watermark to 1 page size. So if we want to make max_order user
> > > configurable, maybe we can add a one bit field in perf_event_attr?
> > >
> > >>
> > >> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware
> > >> trace data is copied by a driver from low level's contiguous buffer to
> > >> the AUX buffer.
> > >>
> > >> In this case we cannot benefit much from continguous AUX buffers.
> > >>
> > >> Thanks,
> > >> Leo
> >
> > Hi Yabin,
> >
> > So after doing some testing it looks like there is 0 difference in
> > overhead for max_order=0 vs ensuring the buffer is one contiguous
> > allocation for Arm SPE, and TRBE would be exactly the same. This makes
> > sense because we're vmapping pages individually anyway regardless of the
> > base allocation.
> >
> > Seems like the performance optimization of the optimistically large
> > mappings is only for devices that require extra buffer management stuff
> > other than normal virtual memory. Can we add a new capability
> > PERF_PMU_CAP_AUX_PREFER_LARGE and apply it to Intel PT and BTS? Then the
> > old (before the optimistic large allocs change) max_order=0 behavior
> > becomes the default again, and PREFER_LARGE is just for those two
> > devices. Other and new devices would get the more memory friendly
> > allocations by default, as it's unlikely they'll benefit from anything
> > different.
> >
> Good suggestion! I will upload a v2 patch for that.
Hi everyone,
I have sent the v2 patch for review, with the title
"[PATCH v2] perf: Allocate non-contiguous AUX pages by default".
Please help review it. Thanks!
> >
> > Thanks
> > James
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-04-29 21:35 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 21:58 [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm Yabin Cui
2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui
2025-04-22 10:21 ` James Clark
2025-04-22 12:49 ` Ingo Molnar
2025-04-22 14:10 ` Leo Yan
2025-04-23 19:52 ` Yabin Cui
2025-04-28 8:56 ` James Clark
2025-04-29 17:02 ` Yabin Cui
2025-04-29 21:35 ` Yabin Cui
2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui
2025-04-22 14:21 ` Leo Yan
2025-04-23 20:01 ` Yabin Cui
2025-04-24 11:29 ` Anshuman Khandual
2025-04-24 18:32 ` Yabin Cui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).