[PATCH -next 0/2] perf stat: a set of small fixes for bperf

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -next 0/2] perf stat: a set of small fixes for bperf
@ 2024-09-25 13:55 Tengda Wu
  2024-09-25 13:55 ` [PATCH -next 1/2] perf stat: Increase perf_attr_map entries Tengda Wu
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Tengda Wu @ 2024-09-25 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: song, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf

Hi,

This is a set of small fixes for bperf (perf-stat --bpf-counters).

It aims to fix the following two issues:
  1) bperf limited the number of events to a maximum of 16, which
     caused failures in some scenarios and lacked friendly prompts. 
  2) bperf failed to correctly handle whether events were supported,
     resulting in the incorrect display when the event count was 0.

The reason for fixing these issues is that bperf is very useful in
some cost-sensitive scenarios, such as top-down analysis scenarios.
Increasing the attr map size can allow these scenarios to collect
more events at the same time, making it possible to gather enough
information to perform complex metric calculations in top-down.

Thanks,
Tengda

Tengda Wu (2):
  perf stat: Increase perf_attr_map entries
  perf stat: Fix incorrect display of bperf when event count is 0

 tools/lib/perf/include/perf/bpf_perf.h |  1 +
 tools/perf/util/bpf_counter.c          | 26 +++++++++++++++++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -next 1/2] perf stat: Increase perf_attr_map entries
  2024-09-25 13:55 [PATCH -next 0/2] perf stat: a set of small fixes for bperf Tengda Wu
@ 2024-09-25 13:55 ` Tengda Wu
  2024-09-26  4:16   ` Namhyung Kim
  2024-09-25 13:55 ` [PATCH -next 2/2] perf stat: Fix incorrect display of bperf when event count is 0 Tengda Wu
  2024-10-09  5:20 ` [PATCH -next 0/2] perf stat: a set of small fixes for bperf Namhyung Kim
  2 siblings, 1 reply; 8+ messages in thread
From: Tengda Wu @ 2024-09-25 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: song, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf

bperf restricts the size of perf_attr_map's entries to 16, which
cannot hold all events in many scenarios. A typical example is
when the user specifies `-a -ddd` ([0]). And in other cases such as
top-down analysis, which often requires a set of more than 16 PMUs
to be collected simultaneously.

Fix this by increase perf_attr_map entries to 100, and an event
number check has been introduced when bperf__load() to ensure that
users receive a more friendly prompt when the event limit is reached.

  [0] https://lore.kernel.org/all/20230104064402.1551516-3-namhyung@kernel.org/

Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/perf/util/bpf_counter.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
index 7a8af60e0f51..3346129c20cf 100644
--- a/tools/perf/util/bpf_counter.c
+++ b/tools/perf/util/bpf_counter.c
@@ -28,7 +28,7 @@
 #include "bpf_skel/bperf_leader.skel.h"
 #include "bpf_skel/bperf_follower.skel.h"
 
-#define ATTR_MAP_SIZE 16
+#define ATTR_MAP_SIZE 100
 
 static inline void *u64_to_ptr(__u64 ptr)
 {
@@ -451,6 +451,12 @@ static int bperf__load(struct evsel *evsel, struct target *target)
 	enum bperf_filter_type filter_type;
 	__u32 filter_entry_cnt, i;
 
+	if (evsel->evlist->core.nr_entries > ATTR_MAP_SIZE) {
+		pr_err("Too many events, please limit to %d or less\n",
+			ATTR_MAP_SIZE);
+		return -1;
+	}
+
 	if (bperf_check_target(evsel, target, &filter_type, &filter_entry_cnt))
 		return -1;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH -next 2/2] perf stat: Fix incorrect display of bperf when event count is 0
  2024-09-25 13:55 [PATCH -next 0/2] perf stat: a set of small fixes for bperf Tengda Wu
  2024-09-25 13:55 ` [PATCH -next 1/2] perf stat: Increase perf_attr_map entries Tengda Wu
@ 2024-09-25 13:55 ` Tengda Wu
  2024-10-09  5:20 ` [PATCH -next 0/2] perf stat: a set of small fixes for bperf Namhyung Kim
  2 siblings, 0 replies; 8+ messages in thread
From: Tengda Wu @ 2024-09-25 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: song, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf

There are 2 possible reasons for the event count being 0: not
supported and not counted. Perf distinguishes between these two
possibilities through `evsel->supported`, but in bperf, this value
is always false. This is because bperf is prematurely break or
continue in the evlist__for_each_cpu loop under __run_perf_stat(),
skipping the `counter->supported` assignment, resulting in bperf
incorrectly displaying <not supported> when the count is 0.

The most direct way to fix it is to do `evsel->supported` assignment
when opening an event in bperf. However, since bperf only opens
events when loading the leader, the followers are not aware of
whether the event is supported or not. Therefore, we store the
`evsel->supported` value in a common location, which is the
`perf_event_attr_map`, to achieve synchronization of event support
across perf sessions.

Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 tools/lib/perf/include/perf/bpf_perf.h |  1 +
 tools/perf/util/bpf_counter.c          | 18 ++++++++++--------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/tools/lib/perf/include/perf/bpf_perf.h b/tools/lib/perf/include/perf/bpf_perf.h
index e7cf6ba7b674..64c8d211726d 100644
--- a/tools/lib/perf/include/perf/bpf_perf.h
+++ b/tools/lib/perf/include/perf/bpf_perf.h
@@ -23,6 +23,7 @@
 struct perf_event_attr_map_entry {
 	__u32 link_id;
 	__u32 diff_map_id;
+	__u8 supported;
 };
 
 /* default attr_map name */
diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
index 3346129c20cf..c04b274c3c45 100644
--- a/tools/perf/util/bpf_counter.c
+++ b/tools/perf/util/bpf_counter.c
@@ -425,18 +425,19 @@ static int bperf_reload_leader_program(struct evsel *evsel, int attr_map_fd,
 	diff_map_fd = bpf_map__fd(skel->maps.diff_readings);
 	entry->link_id = bpf_link_get_id(link_fd);
 	entry->diff_map_id = bpf_map_get_id(diff_map_fd);
-	err = bpf_map_update_elem(attr_map_fd, &evsel->core.attr, entry, BPF_ANY);
-	assert(err == 0);
-
-	evsel->bperf_leader_link_fd = bpf_link_get_fd_by_id(entry->link_id);
-	assert(evsel->bperf_leader_link_fd >= 0);
-
 	/*
 	 * save leader_skel for install_pe, which is called within
 	 * following evsel__open_per_cpu call
 	 */
 	evsel->leader_skel = skel;
-	evsel__open_per_cpu(evsel, all_cpu_map, -1);
+	if (!evsel__open_per_cpu(evsel, all_cpu_map, -1))
+		entry->supported = true;
+
+	err = bpf_map_update_elem(attr_map_fd, &evsel->core.attr, entry, BPF_ANY);
+	assert(err == 0);
+
+	evsel->bperf_leader_link_fd = bpf_link_get_fd_by_id(entry->link_id);
+	assert(evsel->bperf_leader_link_fd >= 0);
 
 out:
 	bperf_leader_bpf__destroy(skel);
@@ -446,7 +447,7 @@ static int bperf_reload_leader_program(struct evsel *evsel, int attr_map_fd,
 
 static int bperf__load(struct evsel *evsel, struct target *target)
 {
-	struct perf_event_attr_map_entry entry = {0xffffffff, 0xffffffff};
+	struct perf_event_attr_map_entry entry = {0xffffffff, 0xffffffff, false};
 	int attr_map_fd, diff_map_fd = -1, err;
 	enum bperf_filter_type filter_type;
 	__u32 filter_entry_cnt, i;
@@ -494,6 +495,7 @@ static int bperf__load(struct evsel *evsel, struct target *target)
 		err = -1;
 		goto out;
 	}
+	evsel->supported = entry.supported;
 	/*
 	 * The bpf_link holds reference to the leader program, and the
 	 * leader program holds reference to the maps. Therefore, if
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/2] perf stat: Increase perf_attr_map entries
  2024-09-25 13:55 ` [PATCH -next 1/2] perf stat: Increase perf_attr_map entries Tengda Wu
@ 2024-09-26  4:16   ` Namhyung Kim
  2024-09-27  2:35     ` Tengda Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2024-09-26  4:16 UTC (permalink / raw)
  To: Tengda Wu
  Cc: Peter Zijlstra, song, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf

On Wed, Sep 25, 2024 at 01:55:22PM +0000, Tengda Wu wrote:
> bperf restricts the size of perf_attr_map's entries to 16, which
> cannot hold all events in many scenarios. A typical example is
> when the user specifies `-a -ddd` ([0]). And in other cases such as
> top-down analysis, which often requires a set of more than 16 PMUs
> to be collected simultaneously.
> 
> Fix this by increase perf_attr_map entries to 100, and an event
> number check has been introduced when bperf__load() to ensure that
> users receive a more friendly prompt when the event limit is reached.
> 
>   [0] https://lore.kernel.org/all/20230104064402.1551516-3-namhyung@kernel.org/

Apparently this patch was never applied.  I don't know how much you need
but having too many events at the same time won't be very useful because
multiplexing could reduce the accuracy.

Thanks,
Namhyung

> 
> Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
>  tools/perf/util/bpf_counter.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> index 7a8af60e0f51..3346129c20cf 100644
> --- a/tools/perf/util/bpf_counter.c
> +++ b/tools/perf/util/bpf_counter.c
> @@ -28,7 +28,7 @@
>  #include "bpf_skel/bperf_leader.skel.h"
>  #include "bpf_skel/bperf_follower.skel.h"
>  
> -#define ATTR_MAP_SIZE 16
> +#define ATTR_MAP_SIZE 100
>  
>  static inline void *u64_to_ptr(__u64 ptr)
>  {
> @@ -451,6 +451,12 @@ static int bperf__load(struct evsel *evsel, struct target *target)
>  	enum bperf_filter_type filter_type;
>  	__u32 filter_entry_cnt, i;
>  
> +	if (evsel->evlist->core.nr_entries > ATTR_MAP_SIZE) {
> +		pr_err("Too many events, please limit to %d or less\n",
> +			ATTR_MAP_SIZE);
> +		return -1;
> +	}
> +
>  	if (bperf_check_target(evsel, target, &filter_type, &filter_entry_cnt))
>  		return -1;
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/2] perf stat: Increase perf_attr_map entries
  2024-09-26  4:16   ` Namhyung Kim
@ 2024-09-27  2:35     ` Tengda Wu
  2024-09-27 17:12       ` Namhyung Kim
  0 siblings, 1 reply; 8+ messages in thread
From: Tengda Wu @ 2024-09-27  2:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, song, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf



On 2024/9/26 12:16, Namhyung Kim wrote:
> On Wed, Sep 25, 2024 at 01:55:22PM +0000, Tengda Wu wrote:
>> bperf restricts the size of perf_attr_map's entries to 16, which
>> cannot hold all events in many scenarios. A typical example is
>> when the user specifies `-a -ddd` ([0]). And in other cases such as
>> top-down analysis, which often requires a set of more than 16 PMUs
>> to be collected simultaneously.
>>
>> Fix this by increase perf_attr_map entries to 100, and an event
>> number check has been introduced when bperf__load() to ensure that
>> users receive a more friendly prompt when the event limit is reached.
>>
>>   [0] https://lore.kernel.org/all/20230104064402.1551516-3-namhyung@kernel.org/
> 
> Apparently this patch was never applied.  I don't know how much you need
> but having too many events at the same time won't be very useful because
> multiplexing could reduce the accuracy.
> 

Could you please explain why patch [0] was not merged at that time? I couldn't
find this information from the previous emails.

In my scenario, we collect more than 40+ events to support necessary metric
calculations, which multiplexing is inevitable. Although multiplexing may
reduce accuracy, for the purpose of supporting metric calculations, these
accuracy losses can be acceptable. Perf also has the same issue with multiplexing.
Removing the event limit for bperf can provide users with additional options.

In addition to accuracy, we also care about overhead. I compared the overhead
of bperf and perf by testing ./lat_ctx in lmbench [1], and found that the
overhead of bperf stat is about 4% less than perf. This is why we choose to
use bperf in some extreme scenarios.

  [1] https://github.com/intel/lmbench

Thanks,
Tengda

> 
>>
>> Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
>> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
>> ---
>>  tools/perf/util/bpf_counter.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
>> index 7a8af60e0f51..3346129c20cf 100644
>> --- a/tools/perf/util/bpf_counter.c
>> +++ b/tools/perf/util/bpf_counter.c
>> @@ -28,7 +28,7 @@
>>  #include "bpf_skel/bperf_leader.skel.h"
>>  #include "bpf_skel/bperf_follower.skel.h"
>>  
>> -#define ATTR_MAP_SIZE 16
>> +#define ATTR_MAP_SIZE 100
>>  
>>  static inline void *u64_to_ptr(__u64 ptr)
>>  {
>> @@ -451,6 +451,12 @@ static int bperf__load(struct evsel *evsel, struct target *target)
>>  	enum bperf_filter_type filter_type;
>>  	__u32 filter_entry_cnt, i;
>>  
>> +	if (evsel->evlist->core.nr_entries > ATTR_MAP_SIZE) {
>> +		pr_err("Too many events, please limit to %d or less\n",
>> +			ATTR_MAP_SIZE);
>> +		return -1;
>> +	}
>> +
>>  	if (bperf_check_target(evsel, target, &filter_type, &filter_entry_cnt))
>>  		return -1;
>>  
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/2] perf stat: Increase perf_attr_map entries
  2024-09-27  2:35     ` Tengda Wu
@ 2024-09-27 17:12       ` Namhyung Kim
  2024-09-29  0:54         ` Tengda Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2024-09-27 17:12 UTC (permalink / raw)
  To: Tengda Wu
  Cc: Peter Zijlstra, song, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf

On Fri, Sep 27, 2024 at 10:35:54AM +0800, Tengda Wu wrote:
> 
> 
> On 2024/9/26 12:16, Namhyung Kim wrote:
> > On Wed, Sep 25, 2024 at 01:55:22PM +0000, Tengda Wu wrote:
> >> bperf restricts the size of perf_attr_map's entries to 16, which
> >> cannot hold all events in many scenarios. A typical example is
> >> when the user specifies `-a -ddd` ([0]). And in other cases such as
> >> top-down analysis, which often requires a set of more than 16 PMUs
> >> to be collected simultaneously.
> >>
> >> Fix this by increase perf_attr_map entries to 100, and an event
> >> number check has been introduced when bperf__load() to ensure that
> >> users receive a more friendly prompt when the event limit is reached.
> >>
> >>   [0] https://lore.kernel.org/all/20230104064402.1551516-3-namhyung@kernel.org/
> > 
> > Apparently this patch was never applied.  I don't know how much you need
> > but having too many events at the same time won't be very useful because
> > multiplexing could reduce the accuracy.
> > 
> 
> Could you please explain why patch [0] was not merged at that time? I couldn't
> find this information from the previous emails.

I guess it's just fell through the crack. :)

> 
> In my scenario, we collect more than 40+ events to support necessary metric
> calculations, which multiplexing is inevitable. Although multiplexing may
> reduce accuracy, for the purpose of supporting metric calculations, these
> accuracy losses can be acceptable. Perf also has the same issue with multiplexing.
> Removing the event limit for bperf can provide users with additional options.
> 
> In addition to accuracy, we also care about overhead. I compared the overhead
> of bperf and perf by testing ./lat_ctx in lmbench [1], and found that the
> overhead of bperf stat is about 4% less than perf. This is why we choose to
> use bperf in some extreme scenarios.

Ok, thanks for explanation.  I think it's ok to increase the limit.

Thanks,
Namhyung

> 
>   [1] https://github.com/intel/lmbench
> 
> Thanks,
> Tengda
> 
> > 
> >>
> >> Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
> >> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> >> ---
> >>  tools/perf/util/bpf_counter.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> >> index 7a8af60e0f51..3346129c20cf 100644
> >> --- a/tools/perf/util/bpf_counter.c
> >> +++ b/tools/perf/util/bpf_counter.c
> >> @@ -28,7 +28,7 @@
> >>  #include "bpf_skel/bperf_leader.skel.h"
> >>  #include "bpf_skel/bperf_follower.skel.h"
> >>  
> >> -#define ATTR_MAP_SIZE 16
> >> +#define ATTR_MAP_SIZE 100
> >>  
> >>  static inline void *u64_to_ptr(__u64 ptr)
> >>  {
> >> @@ -451,6 +451,12 @@ static int bperf__load(struct evsel *evsel, struct target *target)
> >>  	enum bperf_filter_type filter_type;
> >>  	__u32 filter_entry_cnt, i;
> >>  
> >> +	if (evsel->evlist->core.nr_entries > ATTR_MAP_SIZE) {
> >> +		pr_err("Too many events, please limit to %d or less\n",
> >> +			ATTR_MAP_SIZE);
> >> +		return -1;
> >> +	}
> >> +
> >>  	if (bperf_check_target(evsel, target, &filter_type, &filter_entry_cnt))
> >>  		return -1;
> >>  
> >> -- 
> >> 2.34.1
> >>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/2] perf stat: Increase perf_attr_map entries
  2024-09-27 17:12       ` Namhyung Kim
@ 2024-09-29  0:54         ` Tengda Wu
  0 siblings, 0 replies; 8+ messages in thread
From: Tengda Wu @ 2024-09-29  0:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, song, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf



On 2024/9/28 1:12, Namhyung Kim wrote:
> On Fri, Sep 27, 2024 at 10:35:54AM +0800, Tengda Wu wrote:
>>
>>
>> On 2024/9/26 12:16, Namhyung Kim wrote:
>>> On Wed, Sep 25, 2024 at 01:55:22PM +0000, Tengda Wu wrote:
>>>> bperf restricts the size of perf_attr_map's entries to 16, which
>>>> cannot hold all events in many scenarios. A typical example is
>>>> when the user specifies `-a -ddd` ([0]). And in other cases such as
>>>> top-down analysis, which often requires a set of more than 16 PMUs
>>>> to be collected simultaneously.
>>>>
>>>> Fix this by increase perf_attr_map entries to 100, and an event
>>>> number check has been introduced when bperf__load() to ensure that
>>>> users receive a more friendly prompt when the event limit is reached.
>>>>
>>>>   [0] https://lore.kernel.org/all/20230104064402.1551516-3-namhyung@kernel.org/
>>>
>>> Apparently this patch was never applied.  I don't know how much you need
>>> but having too many events at the same time won't be very useful because
>>> multiplexing could reduce the accuracy.
>>>
>>
>> Could you please explain why patch [0] was not merged at that time? I couldn't
>> find this information from the previous emails.
> 
> I guess it's just fell through the crack. :)

Hope it won't happen again. 😆

> 
>>
>> In my scenario, we collect more than 40+ events to support necessary metric
>> calculations, which multiplexing is inevitable. Although multiplexing may
>> reduce accuracy, for the purpose of supporting metric calculations, these
>> accuracy losses can be acceptable. Perf also has the same issue with multiplexing.
>> Removing the event limit for bperf can provide users with additional options.
>>
>> In addition to accuracy, we also care about overhead. I compared the overhead
>> of bperf and perf by testing ./lat_ctx in lmbench [1], and found that the
>> overhead of bperf stat is about 4% less than perf. This is why we choose to
>> use bperf in some extreme scenarios.
> 
> Ok, thanks for explanation.  I think it's ok to increase the limit.
> 
> Thanks,
> Namhyung
> 
>>
>>   [1] https://github.com/intel/lmbench
>>
>> Thanks,
>> Tengda
>>
>>>
>>>>
>>>> Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
>>>> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
>>>> ---
>>>>  tools/perf/util/bpf_counter.c | 8 +++++++-
>>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
>>>> index 7a8af60e0f51..3346129c20cf 100644
>>>> --- a/tools/perf/util/bpf_counter.c
>>>> +++ b/tools/perf/util/bpf_counter.c
>>>> @@ -28,7 +28,7 @@
>>>>  #include "bpf_skel/bperf_leader.skel.h"
>>>>  #include "bpf_skel/bperf_follower.skel.h"
>>>>  
>>>> -#define ATTR_MAP_SIZE 16
>>>> +#define ATTR_MAP_SIZE 100
>>>>  
>>>>  static inline void *u64_to_ptr(__u64 ptr)
>>>>  {
>>>> @@ -451,6 +451,12 @@ static int bperf__load(struct evsel *evsel, struct target *target)
>>>>  	enum bperf_filter_type filter_type;
>>>>  	__u32 filter_entry_cnt, i;
>>>>  
>>>> +	if (evsel->evlist->core.nr_entries > ATTR_MAP_SIZE) {
>>>> +		pr_err("Too many events, please limit to %d or less\n",
>>>> +			ATTR_MAP_SIZE);
>>>> +		return -1;
>>>> +	}
>>>> +
>>>>  	if (bperf_check_target(evsel, target, &filter_type, &filter_entry_cnt))
>>>>  		return -1;
>>>>  
>>>> -- 
>>>> 2.34.1
>>>>
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 0/2] perf stat: a set of small fixes for bperf
  2024-09-25 13:55 [PATCH -next 0/2] perf stat: a set of small fixes for bperf Tengda Wu
  2024-09-25 13:55 ` [PATCH -next 1/2] perf stat: Increase perf_attr_map entries Tengda Wu
  2024-09-25 13:55 ` [PATCH -next 2/2] perf stat: Fix incorrect display of bperf when event count is 0 Tengda Wu
@ 2024-10-09  5:20 ` Namhyung Kim
  2 siblings, 0 replies; 8+ messages in thread
From: Namhyung Kim @ 2024-10-09  5:20 UTC (permalink / raw)
  To: Tengda Wu
  Cc: Peter Zijlstra, song, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, kan.liang, linux-perf-users, linux-kernel, bpf

Hello,

On Wed, Sep 25, 2024 at 01:55:21PM +0000, Tengda Wu wrote:
> Hi,
> 
> This is a set of small fixes for bperf (perf-stat --bpf-counters).
> 
> It aims to fix the following two issues:
>   1) bperf limited the number of events to a maximum of 16, which
>      caused failures in some scenarios and lacked friendly prompts. 
>   2) bperf failed to correctly handle whether events were supported,
>      resulting in the incorrect display when the event count was 0.
> 
> The reason for fixing these issues is that bperf is very useful in
> some cost-sensitive scenarios, such as top-down analysis scenarios.
> Increasing the attr map size can allow these scenarios to collect
> more events at the same time, making it possible to gather enough
> information to perform complex metric calculations in top-down.

So I tried this patchset and found a couple of issues.

1. Running bperf failed due to perf_event_attr map locking issue.
   It seems that the message is misleading since it didn't to get the
   lock actually.  I think it failed the map is not compatible anymore
   and the error message should say about that.

2. It seems bperf keeps the map pinned in the file system.  I'm not sure
   if we have an option to unpin the map automatically.  I had to find
   the path and delete it manually.

3. Currently bperf doesn't support event groups.  On Intel machines,
   topdown metrics are enabled by default and it makes the perf stat
   failing when --bpf-counters option is used (after the step 2).

    $ sudo ./perf stat -a --bpf-counters true
    bpf managed perf events do not yet support groups.

   Maybe we need to drop the topdown metrics when bperf fails.

Thanks,
Namhyung

> 
> Thanks,
> Tengda
> 
> Tengda Wu (2):
>   perf stat: Increase perf_attr_map entries
>   perf stat: Fix incorrect display of bperf when event count is 0
> 
>  tools/lib/perf/include/perf/bpf_perf.h |  1 +
>  tools/perf/util/bpf_counter.c          | 26 +++++++++++++++++---------
>  2 files changed, 18 insertions(+), 9 deletions(-)
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-09  5:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-25 13:55 [PATCH -next 0/2] perf stat: a set of small fixes for bperf Tengda Wu
2024-09-25 13:55 ` [PATCH -next 1/2] perf stat: Increase perf_attr_map entries Tengda Wu
2024-09-26  4:16   ` Namhyung Kim
2024-09-27  2:35     ` Tengda Wu
2024-09-27 17:12       ` Namhyung Kim
2024-09-29  0:54         ` Tengda Wu
2024-09-25 13:55 ` [PATCH -next 2/2] perf stat: Fix incorrect display of bperf when event count is 0 Tengda Wu
2024-10-09  5:20 ` [PATCH -next 0/2] perf stat: a set of small fixes for bperf Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).