[PATCH v3] perf: Allocate non-contiguous AUX pages by default

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] perf: Allocate non-contiguous AUX pages by default
@ 2025-05-01 19:35 Yabin Cui
  2025-05-02 10:51 ` Anshuman Khandual
  0 siblings, 1 reply; 4+ messages in thread
From: Yabin Cui @ 2025-05-01 19:35 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang Kan, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users,
	Yabin Cui

perf always allocates contiguous AUX pages based on aux_watermark.
However, this contiguous allocation doesn't benefit all PMUs. For
instance, ARM SPE and TRBE operate with virtual pages, and Coresight
ETR allocates a separate buffer. For these PMUs, allocating contiguous
AUX pages unnecessarily exacerbates memory fragmentation. This
fragmentation can prevent their use on long-running devices.

This patch modifies the perf driver to be memory-friendly by default,
by allocating non-contiguous AUX pages. For PMUs requiring contiguous
pages (Intel BTS and some Intel PT), the existing
PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't
require but can benefit from contiguous pages (some Intel PT), a new
capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their
existing behavior.

Signed-off-by: Yabin Cui <yabinc@google.com>
---
Changes since v2:
Let NO_SG imply PREFER_LARGE. So PMUs don't need to set both flags.
Then the only place needing PREFER_LARGE is intel/pt.c.

Changes since v1:
In v1, default is preferring contiguous pages, and add a flag to
allocate non-contiguous pages. In v2, default is allocating
non-contiguous pages, and add a flag to prefer contiguous pages.

v1 patchset:
perf,coresight: Reduce fragmentation with non-contiguous AUX pages for
cs_etm

 arch/x86/events/intel/pt.c  |  2 ++
 include/linux/perf_event.h  |  1 +
 kernel/events/ring_buffer.c | 20 +++++++++++++-------
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index fa37565f6418..25ead919fc48 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1863,6 +1863,8 @@ static __init int pt_init(void)
 
 	if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries))
 		pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG;
+	else
+		pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_PREFER_LARGE;
 
 	pt_pmu.pmu.capabilities		|= PERF_PMU_CAP_EXCLUSIVE |
 					   PERF_PMU_CAP_ITRACE |
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0069ba6866a4..56d77348c511 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -301,6 +301,7 @@ struct perf_event_pmu_context;
 #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
 #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
 #define PERF_PMU_CAP_AUX_PAUSE			0x0200
+#define PERF_PMU_CAP_AUX_PREFER_LARGE		0x0400
 
 /**
  * pmu::scope
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 5130b119d0ae..4d2f1c95673e 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -679,7 +679,7 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
 {
 	bool overwrite = !(flags & RING_BUFFER_WRITABLE);
 	int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
-	int ret = -ENOMEM, max_order;
+	int ret = -ENOMEM, max_order = 0;
 
 	if (!has_aux(event))
 		return -EOPNOTSUPP;
@@ -689,8 +689,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
 
 	if (!overwrite) {
 		/*
-		 * Watermark defaults to half the buffer, and so does the
-		 * max_order, to aid PMU drivers in double buffering.
+		 * Watermark defaults to half the buffer, to aid PMU drivers
+		 * in double buffering.
 		 */
 		if (!watermark)
 			watermark = min_t(unsigned long,
@@ -698,16 +698,22 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
 					  (unsigned long)nr_pages << (PAGE_SHIFT - 1));
 
 		/*
-		 * Use aux_watermark as the basis for chunking to
+		 * For PMUs that need or prefer large contiguous buffers,
+		 * use aux_watermark as the basis for chunking to
 		 * help PMU drivers honor the watermark.
 		 */
-		max_order = get_order(watermark);
+		if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
+						PERF_PMU_CAP_AUX_PREFER_LARGE))
+			max_order = get_order(watermark);
 	} else {
 		/*
-		 * We need to start with the max_order that fits in nr_pages,
+		 * For PMUs that need or prefer large contiguous buffers,
+		 * we need to start with the max_order that fits in nr_pages,
 		 * not the other way around, hence ilog2() and not get_order.
 		 */
-		max_order = ilog2(nr_pages);
+		if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
+						PERF_PMU_CAP_AUX_PREFER_LARGE))
+			max_order = ilog2(nr_pages);
 		watermark = 0;
 	}
 
-- 
2.49.0.967.g6a0df3ecc3-goog



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] perf: Allocate non-contiguous AUX pages by default
  2025-05-01 19:35 [PATCH v3] perf: Allocate non-contiguous AUX pages by default Yabin Cui
@ 2025-05-02 10:51 ` Anshuman Khandual
  2025-05-02 17:30   ` Yabin Cui
  0 siblings, 1 reply; 4+ messages in thread
From: Anshuman Khandual @ 2025-05-02 10:51 UTC (permalink / raw)
  To: Yabin Cui, Suzuki K Poulose, Mike Leach, James Clark,
	Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Liang Kan, Thomas Gleixner,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
  Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users

On 5/2/25 01:05, Yabin Cui wrote:
> perf always allocates contiguous AUX pages based on aux_watermark.
> However, this contiguous allocation doesn't benefit all PMUs. For
> instance, ARM SPE and TRBE operate with virtual pages, and Coresight
> ETR allocates a separate buffer. For these PMUs, allocating contiguous
> AUX pages unnecessarily exacerbates memory fragmentation. This
> fragmentation can prevent their use on long-running devices.
> 
> This patch modifies the perf driver to be memory-friendly by default,
> by allocating non-contiguous AUX pages. For PMUs requiring contiguous
> pages (Intel BTS and some Intel PT), the existing
> PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't
> require but can benefit from contiguous pages (some Intel PT), a new
> capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their
> existing behavior.
> 
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
> Changes since v2:
> Let NO_SG imply PREFER_LARGE. So PMUs don't need to set both flags.
> Then the only place needing PREFER_LARGE is intel/pt.c.
> 
> Changes since v1:
> In v1, default is preferring contiguous pages, and add a flag to
> allocate non-contiguous pages. In v2, default is allocating
> non-contiguous pages, and add a flag to prefer contiguous pages.
> 
> v1 patchset:
> perf,coresight: Reduce fragmentation with non-contiguous AUX pages for
> cs_etm
> 
>  arch/x86/events/intel/pt.c  |  2 ++
>  include/linux/perf_event.h  |  1 +
>  kernel/events/ring_buffer.c | 20 +++++++++++++-------
>  3 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index fa37565f6418..25ead919fc48 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -1863,6 +1863,8 @@ static __init int pt_init(void)
>  
>  	if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries))
>  		pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG;
> +	else
> +		pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_PREFER_LARGE;
>  
>  	pt_pmu.pmu.capabilities		|= PERF_PMU_CAP_EXCLUSIVE |
>  					   PERF_PMU_CAP_ITRACE |

Why this PMU has PERF_PMU_CAP_AUX_PREFER_LARGE fallback option but
not the other PMU in arch/x86/events/intel/bts.c even though both
had PERF_PMU_CAP_AUX_NO_SG ?

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0069ba6866a4..56d77348c511 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
>  #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
>  #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
>  #define PERF_PMU_CAP_AUX_PAUSE			0x0200
> +#define PERF_PMU_CAP_AUX_PREFER_LARGE		0x0400
>  
>  /**
>   * pmu::scope
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5130b119d0ae..4d2f1c95673e 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -679,7 +679,7 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>  {
>  	bool overwrite = !(flags & RING_BUFFER_WRITABLE);
>  	int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
> -	int ret = -ENOMEM, max_order;
> +	int ret = -ENOMEM, max_order = 0;

0 order is now the default allocation granularity. This might benefit
from a comment above explaining that max_order could change only with
PERF_PMU_CAP_AUX_NO_SG or PERF_PMU_CAP_AUX_PREFER_LARGE PMU flags etc.

>  
>  	if (!has_aux(event))
>  		return -EOPNOTSUPP;
> @@ -689,8 +689,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>  
>  	if (!overwrite) {
>  		/*
> -		 * Watermark defaults to half the buffer, and so does the
> -		 * max_order, to aid PMU drivers in double buffering.
> +		 * Watermark defaults to half the buffer, to aid PMU drivers
> +		 * in double buffering.
>  		 */
>  		if (!watermark)
>  			watermark = min_t(unsigned long,
> @@ -698,16 +698,22 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>  					  (unsigned long)nr_pages << (PAGE_SHIFT - 1));
>  
>  		/*
> -		 * Use aux_watermark as the basis for chunking to
> +		 * For PMUs that need or prefer large contiguous buffers,
> +		 * use aux_watermark as the basis for chunking to
>  		 * help PMU drivers honor the watermark.
>  		 */
> -		max_order = get_order(watermark);
> +		if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
> +						PERF_PMU_CAP_AUX_PREFER_LARGE))
> +			max_order = get_order(watermark);
>  	} else {
>  		/*
> -		 * We need to start with the max_order that fits in nr_pages,
> +		 * For PMUs that need or prefer large contiguous buffers,
> +		 * we need to start with the max_order that fits in nr_pages,
>  		 * not the other way around, hence ilog2() and not get_order.
>  		 */
> -		max_order = ilog2(nr_pages);
> +		if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
> +						PERF_PMU_CAP_AUX_PREFER_LARGE))
> +			max_order = ilog2(nr_pages);
>  		watermark = 0;
>  	}
>  

Although not really sure, could event->pmu->capabilities check against the ORed
PMU flags PERF_PMU_CAP_AUX_NO_SG and PERF_PMU_CAP_AUX_PREFER_LARGE be contained
in a helper pmu_prefers_cont_alloc(struct *pmu ...) or something similar ?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] perf: Allocate non-contiguous AUX pages by default
  2025-05-02 10:51 ` Anshuman Khandual
@ 2025-05-02 17:30   ` Yabin Cui
  2025-05-06  4:26     ` Anshuman Khandual
  0 siblings, 1 reply; 4+ messages in thread
From: Yabin Cui @ 2025-05-02 17:30 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang Kan, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, coresight, linux-arm-kernel, linux-kernel,
	linux-perf-users

On Fri, May 2, 2025 at 3:51 AM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> On 5/2/25 01:05, Yabin Cui wrote:
> > perf always allocates contiguous AUX pages based on aux_watermark.
> > However, this contiguous allocation doesn't benefit all PMUs. For
> > instance, ARM SPE and TRBE operate with virtual pages, and Coresight
> > ETR allocates a separate buffer. For these PMUs, allocating contiguous
> > AUX pages unnecessarily exacerbates memory fragmentation. This
> > fragmentation can prevent their use on long-running devices.
> >
> > This patch modifies the perf driver to be memory-friendly by default,
> > by allocating non-contiguous AUX pages. For PMUs requiring contiguous
> > pages (Intel BTS and some Intel PT), the existing
> > PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't
> > require but can benefit from contiguous pages (some Intel PT), a new
> > capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their
> > existing behavior.
> >
> > Signed-off-by: Yabin Cui <yabinc@google.com>
> > ---
> > Changes since v2:
> > Let NO_SG imply PREFER_LARGE. So PMUs don't need to set both flags.
> > Then the only place needing PREFER_LARGE is intel/pt.c.
> >
> > Changes since v1:
> > In v1, default is preferring contiguous pages, and add a flag to
> > allocate non-contiguous pages. In v2, default is allocating
> > non-contiguous pages, and add a flag to prefer contiguous pages.
> >
> > v1 patchset:
> > perf,coresight: Reduce fragmentation with non-contiguous AUX pages for
> > cs_etm
> >
> >  arch/x86/events/intel/pt.c  |  2 ++
> >  include/linux/perf_event.h  |  1 +
> >  kernel/events/ring_buffer.c | 20 +++++++++++++-------
> >  3 files changed, 16 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> > index fa37565f6418..25ead919fc48 100644
> > --- a/arch/x86/events/intel/pt.c
> > +++ b/arch/x86/events/intel/pt.c
> > @@ -1863,6 +1863,8 @@ static __init int pt_init(void)
> >
> >       if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries))
> >               pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG;
> > +     else
> > +             pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_PREFER_LARGE;
> >
> >       pt_pmu.pmu.capabilities         |= PERF_PMU_CAP_EXCLUSIVE |
> >                                          PERF_PMU_CAP_ITRACE |
>
> Why this PMU has PERF_PMU_CAP_AUX_PREFER_LARGE fallback option but
> not the other PMU in arch/x86/events/intel/bts.c even though both
> had PERF_PMU_CAP_AUX_NO_SG ?

Because Intel BTS always use NO_SG, while in some cases Intel PT
doesn't use NO_SG.
>
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 0069ba6866a4..56d77348c511 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
> >  #define PERF_PMU_CAP_AUX_OUTPUT                      0x0080
> >  #define PERF_PMU_CAP_EXTENDED_HW_TYPE                0x0100
> >  #define PERF_PMU_CAP_AUX_PAUSE                       0x0200
> > +#define PERF_PMU_CAP_AUX_PREFER_LARGE                0x0400
> >
> >  /**
> >   * pmu::scope
> > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> > index 5130b119d0ae..4d2f1c95673e 100644
> > --- a/kernel/events/ring_buffer.c
> > +++ b/kernel/events/ring_buffer.c
> > @@ -679,7 +679,7 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> >  {
> >       bool overwrite = !(flags & RING_BUFFER_WRITABLE);
> >       int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
> > -     int ret = -ENOMEM, max_order;
> > +     int ret = -ENOMEM, max_order = 0;
>
> 0 order is now the default allocation granularity. This might benefit
> from a comment above explaining that max_order could change only with
> PERF_PMU_CAP_AUX_NO_SG or PERF_PMU_CAP_AUX_PREFER_LARGE PMU flags etc.
>
Will add the comment in the next respin.
> >
> >       if (!has_aux(event))
> >               return -EOPNOTSUPP;
> > @@ -689,8 +689,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> >
> >       if (!overwrite) {
> >               /*
> > -              * Watermark defaults to half the buffer, and so does the
> > -              * max_order, to aid PMU drivers in double buffering.
> > +              * Watermark defaults to half the buffer, to aid PMU drivers
> > +              * in double buffering.
> >                */
> >               if (!watermark)
> >                       watermark = min_t(unsigned long,
> > @@ -698,16 +698,22 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> >                                         (unsigned long)nr_pages << (PAGE_SHIFT - 1));
> >
> >               /*
> > -              * Use aux_watermark as the basis for chunking to
> > +              * For PMUs that need or prefer large contiguous buffers,
> > +              * use aux_watermark as the basis for chunking to
> >                * help PMU drivers honor the watermark.
> >                */
> > -             max_order = get_order(watermark);
> > +             if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
> > +                                             PERF_PMU_CAP_AUX_PREFER_LARGE))
> > +                     max_order = get_order(watermark);
> >       } else {
> >               /*
> > -              * We need to start with the max_order that fits in nr_pages,
> > +              * For PMUs that need or prefer large contiguous buffers,
> > +              * we need to start with the max_order that fits in nr_pages,
> >                * not the other way around, hence ilog2() and not get_order.
> >                */
> > -             max_order = ilog2(nr_pages);
> > +             if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
> > +                                             PERF_PMU_CAP_AUX_PREFER_LARGE))
> > +                     max_order = ilog2(nr_pages);
> >               watermark = 0;
> >       }
> >
>
> Although not really sure, could event->pmu->capabilities check against the ORed
> PMU flags PERF_PMU_CAP_AUX_NO_SG and PERF_PMU_CAP_AUX_PREFER_LARGE be contained
> in a helper pmu_prefers_cont_alloc(struct *pmu ...) or something similar ?

Sure, but I feel it's not very worthwhile. Maybe add a local variable
use_contiguous_pages? It can also work as another comment near
max_order.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] perf: Allocate non-contiguous AUX pages by default
  2025-05-02 17:30   ` Yabin Cui
@ 2025-05-06  4:26     ` Anshuman Khandual
  0 siblings, 0 replies; 4+ messages in thread
From: Anshuman Khandual @ 2025-05-06  4:26 UTC (permalink / raw)
  To: Yabin Cui
  Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang Kan, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, coresight, linux-arm-kernel, linux-kernel,
	linux-perf-users



On 5/2/25 23:00, Yabin Cui wrote:
> On Fri, May 2, 2025 at 3:51 AM Anshuman Khandual
> <anshuman.khandual@arm.com> wrote:
>>
>> On 5/2/25 01:05, Yabin Cui wrote:
>>> perf always allocates contiguous AUX pages based on aux_watermark.
>>> However, this contiguous allocation doesn't benefit all PMUs. For
>>> instance, ARM SPE and TRBE operate with virtual pages, and Coresight
>>> ETR allocates a separate buffer. For these PMUs, allocating contiguous
>>> AUX pages unnecessarily exacerbates memory fragmentation. This
>>> fragmentation can prevent their use on long-running devices.
>>>
>>> This patch modifies the perf driver to be memory-friendly by default,
>>> by allocating non-contiguous AUX pages. For PMUs requiring contiguous
>>> pages (Intel BTS and some Intel PT), the existing
>>> PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't
>>> require but can benefit from contiguous pages (some Intel PT), a new
>>> capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their
>>> existing behavior.
>>>
>>> Signed-off-by: Yabin Cui <yabinc@google.com>
>>> ---
>>> Changes since v2:
>>> Let NO_SG imply PREFER_LARGE. So PMUs don't need to set both flags.
>>> Then the only place needing PREFER_LARGE is intel/pt.c.
>>>
>>> Changes since v1:
>>> In v1, default is preferring contiguous pages, and add a flag to
>>> allocate non-contiguous pages. In v2, default is allocating
>>> non-contiguous pages, and add a flag to prefer contiguous pages.
>>>
>>> v1 patchset:
>>> perf,coresight: Reduce fragmentation with non-contiguous AUX pages for
>>> cs_etm
>>>
>>>  arch/x86/events/intel/pt.c  |  2 ++
>>>  include/linux/perf_event.h  |  1 +
>>>  kernel/events/ring_buffer.c | 20 +++++++++++++-------
>>>  3 files changed, 16 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
>>> index fa37565f6418..25ead919fc48 100644
>>> --- a/arch/x86/events/intel/pt.c
>>> +++ b/arch/x86/events/intel/pt.c
>>> @@ -1863,6 +1863,8 @@ static __init int pt_init(void)
>>>
>>>       if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries))
>>>               pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG;
>>> +     else
>>> +             pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_PREFER_LARGE;
>>>
>>>       pt_pmu.pmu.capabilities         |= PERF_PMU_CAP_EXCLUSIVE |
>>>                                          PERF_PMU_CAP_ITRACE |
>>
>> Why this PMU has PERF_PMU_CAP_AUX_PREFER_LARGE fallback option but
>> not the other PMU in arch/x86/events/intel/bts.c even though both
>> had PERF_PMU_CAP_AUX_NO_SG ?
> 
> Because Intel BTS always use NO_SG, while in some cases Intel PT
> doesn't use NO_SG.

Makes sense.

>>
>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>> index 0069ba6866a4..56d77348c511 100644
>>> --- a/include/linux/perf_event.h
>>> +++ b/include/linux/perf_event.h
>>> @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
>>>  #define PERF_PMU_CAP_AUX_OUTPUT                      0x0080
>>>  #define PERF_PMU_CAP_EXTENDED_HW_TYPE                0x0100
>>>  #define PERF_PMU_CAP_AUX_PAUSE                       0x0200
>>> +#define PERF_PMU_CAP_AUX_PREFER_LARGE                0x0400
>>>
>>>  /**
>>>   * pmu::scope
>>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
>>> index 5130b119d0ae..4d2f1c95673e 100644
>>> --- a/kernel/events/ring_buffer.c
>>> +++ b/kernel/events/ring_buffer.c
>>> @@ -679,7 +679,7 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>>>  {
>>>       bool overwrite = !(flags & RING_BUFFER_WRITABLE);
>>>       int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
>>> -     int ret = -ENOMEM, max_order;
>>> +     int ret = -ENOMEM, max_order = 0;
>>
>> 0 order is now the default allocation granularity. This might benefit
>> from a comment above explaining that max_order could change only with
>> PERF_PMU_CAP_AUX_NO_SG or PERF_PMU_CAP_AUX_PREFER_LARGE PMU flags etc.
>>
> Will add the comment in the next respin.
>>>
>>>       if (!has_aux(event))
>>>               return -EOPNOTSUPP;
>>> @@ -689,8 +689,8 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>>>
>>>       if (!overwrite) {
>>>               /*
>>> -              * Watermark defaults to half the buffer, and so does the
>>> -              * max_order, to aid PMU drivers in double buffering.
>>> +              * Watermark defaults to half the buffer, to aid PMU drivers
>>> +              * in double buffering.
>>>                */
>>>               if (!watermark)
>>>                       watermark = min_t(unsigned long,
>>> @@ -698,16 +698,22 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>>>                                         (unsigned long)nr_pages << (PAGE_SHIFT - 1));
>>>
>>>               /*
>>> -              * Use aux_watermark as the basis for chunking to
>>> +              * For PMUs that need or prefer large contiguous buffers,
>>> +              * use aux_watermark as the basis for chunking to
>>>                * help PMU drivers honor the watermark.
>>>                */
>>> -             max_order = get_order(watermark);
>>> +             if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
>>> +                                             PERF_PMU_CAP_AUX_PREFER_LARGE))
>>> +                     max_order = get_order(watermark);
>>>       } else {
>>>               /*
>>> -              * We need to start with the max_order that fits in nr_pages,
>>> +              * For PMUs that need or prefer large contiguous buffers,
>>> +              * we need to start with the max_order that fits in nr_pages,
>>>                * not the other way around, hence ilog2() and not get_order.
>>>                */
>>> -             max_order = ilog2(nr_pages);
>>> +             if (event->pmu->capabilities & (PERF_PMU_CAP_AUX_NO_SG |
>>> +                                             PERF_PMU_CAP_AUX_PREFER_LARGE))
>>> +                     max_order = ilog2(nr_pages);
>>>               watermark = 0;
>>>       }
>>>
>>
>> Although not really sure, could event->pmu->capabilities check against the ORed
>> PMU flags PERF_PMU_CAP_AUX_NO_SG and PERF_PMU_CAP_AUX_PREFER_LARGE be contained
>> in a helper pmu_prefers_cont_alloc(struct *pmu ...) or something similar ?
> 
> Sure, but I feel it's not very worthwhile. Maybe add a local variable
> use_contiguous_pages? It can also work as another comment near
> max_order.

Probably that will be better.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-06  6:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-01 19:35 [PATCH v3] perf: Allocate non-contiguous AUX pages by default Yabin Cui
2025-05-02 10:51 ` Anshuman Khandual
2025-05-02 17:30   ` Yabin Cui
2025-05-06  4:26     ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).