[PATCH] perf topdown: Correct leader selection with sample

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] perf topdown: Correct leader selection with sample_read enabled
@ 2024-06-14 21:39 Dapeng Mi
  2024-06-27 15:11 ` Liang, Kan
  0 siblings, 1 reply; 6+ messages in thread
From: Dapeng Mi @ 2024-06-14 21:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang
  Cc: linux-perf-users, linux-kernel, Dapeng Mi, Dapeng Mi

Addresses an issue where, in the absence of a topdown metrics event
within a sampling group, the slots event was incorrectly bypassed as
the sampling leader when sample_read was enabled.

perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1

In this case, the slots event should be sampled as leader but the
branches event is sampled in fact like the verbose output shows.

perf_event_attr:
  type                             4 (cpu)
  size                             168
  config                           0x400 (slots)
  sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
  read_format                      ID|GROUP|LOST
  disabled                         1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             168
  config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
  { sample_period, sample_freq }   10000
  sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
  read_format                      ID|GROUP|LOST
  sample_id_all                    1
  exclude_guest                    1

The sample period of slots event instead of branches event is reset to
0.

This fix ensures the slots event remains the leader under these
conditions.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/arch/x86/util/topdown.c | 42 ++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
index 3f9a267d4501..aea6896bbb57 100644
--- a/tools/perf/arch/x86/util/topdown.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "api/fs/fs.h"
 #include "util/evsel.h"
+#include "util/evlist.h"
 #include "util/pmu.h"
 #include "util/pmus.h"
 #include "util/topdown.h"
@@ -31,6 +32,32 @@ bool topdown_sys_has_perf_metrics(void)
 	return has_perf_metrics;
 }
 
+static int perf_pmus__topdown_event(void *vstate, struct pmu_event_info *info)
+{
+	if (!strcmp(info->name, (char *)vstate))
+		return 1;
+
+	return 0;
+}
+
+static bool is_topdown_metric_event(struct evsel *event)
+{
+	struct perf_pmu *pmu;
+
+	if (!topdown_sys_has_perf_metrics())
+		return false;
+
+	if (event->core.attr.type != PERF_TYPE_RAW)
+		return false;
+
+	pmu = perf_pmus__find_by_type(PERF_TYPE_RAW);
+	if (pmu && perf_pmu__for_each_event(pmu, false, event->name,
+					    perf_pmus__topdown_event))
+		return true;
+
+	return false;
+}
+
 #define TOPDOWN_SLOTS		0x0400
 
 /*
@@ -41,11 +68,22 @@ bool topdown_sys_has_perf_metrics(void)
  */
 bool arch_topdown_sample_read(struct evsel *leader)
 {
+	struct evsel *event;
+
 	if (!evsel__sys_has_perf_metrics(leader))
 		return false;
 
-	if (leader->core.attr.config == TOPDOWN_SLOTS)
-		return true;
+	if (leader->core.attr.config != TOPDOWN_SLOTS)
+		return false;
+
+	/*
+	 * If slots event as leader event but no topdown metric events in group,
+	 * slots event should still sample as leader.
+	 */
+	evlist__for_each_entry(leader->evlist, event) {
+		if (event != leader && is_topdown_metric_event(event))
+			return true;
+	}
 
 	return false;
 }

base-commit: 92e5605a199efbaee59fb19e15d6cc2103a04ec2
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf topdown: Correct leader selection with sample_read enabled
  2024-06-14 21:39 [PATCH] perf topdown: Correct leader selection with sample_read enabled Dapeng Mi
@ 2024-06-27 15:11 ` Liang, Kan
  2024-06-28  6:17   ` Mi, Dapeng
  0 siblings, 1 reply; 6+ messages in thread
From: Liang, Kan @ 2024-06-27 15:11 UTC (permalink / raw)
  To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Dapeng Mi

Hi Dapeng,

On 2024-06-14 5:39 p.m., Dapeng Mi wrote:
> Addresses an issue where, in the absence of a topdown metrics event
> within a sampling group, the slots event was incorrectly bypassed as
> the sampling leader when sample_read was enabled.
> 
> perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1
> 
> In this case, the slots event should be sampled as leader but the
> branches event is sampled in fact like the verbose output shows.
> 
> perf_event_attr:
>   type                             4 (cpu)
>   size                             168
>   config                           0x400 (slots)
>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>   read_format                      ID|GROUP|LOST
>   disabled                         1
>   sample_id_all                    1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             168
>   config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
>   { sample_period, sample_freq }   10000
>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>   read_format                      ID|GROUP|LOST
>   sample_id_all                    1
>   exclude_guest                    1
> 
> The sample period of slots event instead of branches event is reset to
> 0.
> 
> This fix ensures the slots event remains the leader under these
> conditions.

This should be just one of the issues with the slots/topdown related
sampling read.

If adding one more topdown event, the sampling read may still be broken.
 perf record -e "{slots,instructions,topdown-retiring}:S"  -C0 sleep 1
 WARNING: events were regrouped to match PMUs
 Error:
 The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (topdown-retiring).

That may require Yanfei's patch.
https://lore.kernel.org/lkml/20240411144852.2507143-1-yanfei.xu@intel.com/

Please give it try and summarize all the required patches for the
topdown sampling read feature.

Besides, we need a test for the sampling read as well.
Ian has provided a very good base. Please add a topdown sampling read
case on top of it as well.
https://lore.kernel.org/lkml/CAP-5=fUkg-cAXTb+3wbFOQCfdXgpQeZw40XHjfrNFbnBD=NMXg@mail.gmail.com/


Thanks,
Kan

> 
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/perf/arch/x86/util/topdown.c | 42 ++++++++++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
> index 3f9a267d4501..aea6896bbb57 100644
> --- a/tools/perf/arch/x86/util/topdown.c
> +++ b/tools/perf/arch/x86/util/topdown.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include "api/fs/fs.h"
>  #include "util/evsel.h"
> +#include "util/evlist.h"
>  #include "util/pmu.h"
>  #include "util/pmus.h"
>  #include "util/topdown.h"
> @@ -31,6 +32,32 @@ bool topdown_sys_has_perf_metrics(void)
>  	return has_perf_metrics;
>  }
>  
> +static int perf_pmus__topdown_event(void *vstate, struct pmu_event_info *info)
> +{
> +	if (!strcmp(info->name, (char *)vstate))
> +		return 1;
> +
> +	return 0;
> +}
> +
> +static bool is_topdown_metric_event(struct evsel *event)
> +{
> +	struct perf_pmu *pmu;
> +
> +	if (!topdown_sys_has_perf_metrics())
> +		return false;
> +
> +	if (event->core.attr.type != PERF_TYPE_RAW)
> +		return false;
> +
> +	pmu = perf_pmus__find_by_type(PERF_TYPE_RAW);
> +	if (pmu && perf_pmu__for_each_event(pmu, false, event->name,
> +					    perf_pmus__topdown_event))
> +		return true;
> +
> +	return false;
> +}
> +
>  #define TOPDOWN_SLOTS		0x0400
>  
>  /*
> @@ -41,11 +68,22 @@ bool topdown_sys_has_perf_metrics(void)
>   */
>  bool arch_topdown_sample_read(struct evsel *leader)
>  {
> +	struct evsel *event;
> +
>  	if (!evsel__sys_has_perf_metrics(leader))
>  		return false;
>  
> -	if (leader->core.attr.config == TOPDOWN_SLOTS)
> -		return true;
> +	if (leader->core.attr.config != TOPDOWN_SLOTS)
> +		return false;
> +
> +	/*
> +	 * If slots event as leader event but no topdown metric events in group,
> +	 * slots event should still sample as leader.
> +	 */
> +	evlist__for_each_entry(leader->evlist, event) {
> +		if (event != leader && is_topdown_metric_event(event))
> +			return true;
> +	}
>  
>  	return false;
>  }
> 
> base-commit: 92e5605a199efbaee59fb19e15d6cc2103a04ec2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf topdown: Correct leader selection with sample_read enabled
  2024-06-27 15:11 ` Liang, Kan
@ 2024-06-28  6:17   ` Mi, Dapeng
  2024-06-28 18:28     ` Ian Rogers
  0 siblings, 1 reply; 6+ messages in thread
From: Mi, Dapeng @ 2024-06-28  6:17 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Dapeng Mi


On 6/27/2024 11:11 PM, Liang, Kan wrote:
> Hi Dapeng,
>
> On 2024-06-14 5:39 p.m., Dapeng Mi wrote:
>> Addresses an issue where, in the absence of a topdown metrics event
>> within a sampling group, the slots event was incorrectly bypassed as
>> the sampling leader when sample_read was enabled.
>>
>> perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1
>>
>> In this case, the slots event should be sampled as leader but the
>> branches event is sampled in fact like the verbose output shows.
>>
>> perf_event_attr:
>>   type                             4 (cpu)
>>   size                             168
>>   config                           0x400 (slots)
>>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>>   read_format                      ID|GROUP|LOST
>>   disabled                         1
>>   sample_id_all                    1
>>   exclude_guest                    1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
>> ------------------------------------------------------------
>> perf_event_attr:
>>   type                             0 (PERF_TYPE_HARDWARE)
>>   size                             168
>>   config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
>>   { sample_period, sample_freq }   10000
>>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>>   read_format                      ID|GROUP|LOST
>>   sample_id_all                    1
>>   exclude_guest                    1
>>
>> The sample period of slots event instead of branches event is reset to
>> 0.
>>
>> This fix ensures the slots event remains the leader under these
>> conditions.
> This should be just one of the issues with the slots/topdown related
> sampling read.
>
> If adding one more topdown event, the sampling read may still be broken.
>  perf record -e "{slots,instructions,topdown-retiring}:S"  -C0 sleep 1
>  WARNING: events were regrouped to match PMUs
>  Error:
>  The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (topdown-retiring).
>
> That may require Yanfei's patch.
> https://lore.kernel.org/lkml/20240411144852.2507143-1-yanfei.xu@intel.com/

Yes, we need this patch. It would fix the error what you see.


>
> Please give it try and summarize all the required patches for the
> topdown sampling read feature.

I would talk with Yanfei, and collect all required patches into a whole
patchset. This would make the patch review more easily.


>
> Besides, we need a test for the sampling read as well.
> Ian has provided a very good base. Please add a topdown sampling read
> case on top of it as well.
> https://lore.kernel.org/lkml/CAP-5=fUkg-cAXTb+3wbFOQCfdXgpQeZw40XHjfrNFbnBD=NMXg@mail.gmail.com/

Sure. I would look at it and add a test case.


>
>
> Thanks,
> Kan
>
>> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/perf/arch/x86/util/topdown.c | 42 ++++++++++++++++++++++++++++--
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
>> index 3f9a267d4501..aea6896bbb57 100644
>> --- a/tools/perf/arch/x86/util/topdown.c
>> +++ b/tools/perf/arch/x86/util/topdown.c
>> @@ -1,6 +1,7 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include "api/fs/fs.h"
>>  #include "util/evsel.h"
>> +#include "util/evlist.h"
>>  #include "util/pmu.h"
>>  #include "util/pmus.h"
>>  #include "util/topdown.h"
>> @@ -31,6 +32,32 @@ bool topdown_sys_has_perf_metrics(void)
>>  	return has_perf_metrics;
>>  }
>>  
>> +static int perf_pmus__topdown_event(void *vstate, struct pmu_event_info *info)
>> +{
>> +	if (!strcmp(info->name, (char *)vstate))
>> +		return 1;
>> +
>> +	return 0;
>> +}
>> +
>> +static bool is_topdown_metric_event(struct evsel *event)
>> +{
>> +	struct perf_pmu *pmu;
>> +
>> +	if (!topdown_sys_has_perf_metrics())
>> +		return false;
>> +
>> +	if (event->core.attr.type != PERF_TYPE_RAW)
>> +		return false;
>> +
>> +	pmu = perf_pmus__find_by_type(PERF_TYPE_RAW);
>> +	if (pmu && perf_pmu__for_each_event(pmu, false, event->name,
>> +					    perf_pmus__topdown_event))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>  #define TOPDOWN_SLOTS		0x0400
>>  
>>  /*
>> @@ -41,11 +68,22 @@ bool topdown_sys_has_perf_metrics(void)
>>   */
>>  bool arch_topdown_sample_read(struct evsel *leader)
>>  {
>> +	struct evsel *event;
>> +
>>  	if (!evsel__sys_has_perf_metrics(leader))
>>  		return false;
>>  
>> -	if (leader->core.attr.config == TOPDOWN_SLOTS)
>> -		return true;
>> +	if (leader->core.attr.config != TOPDOWN_SLOTS)
>> +		return false;
>> +
>> +	/*
>> +	 * If slots event as leader event but no topdown metric events in group,
>> +	 * slots event should still sample as leader.
>> +	 */
>> +	evlist__for_each_entry(leader->evlist, event) {
>> +		if (event != leader && is_topdown_metric_event(event))
>> +			return true;
>> +	}
>>  
>>  	return false;
>>  }
>>
>> base-commit: 92e5605a199efbaee59fb19e15d6cc2103a04ec2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf topdown: Correct leader selection with sample_read enabled
  2024-06-28  6:17   ` Mi, Dapeng
@ 2024-06-28 18:28     ` Ian Rogers
  2024-06-28 20:27       ` Liang, Kan
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Rogers @ 2024-06-28 18:28 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Dapeng Mi

On Thu, Jun 27, 2024 at 11:17 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
> On 6/27/2024 11:11 PM, Liang, Kan wrote:
> > On 2024-06-14 5:39 p.m., Dapeng Mi wrote:
> >
> > Besides, we need a test for the sampling read as well.
> > Ian has provided a very good base. Please add a topdown sampling read
> > case on top of it as well.
> > https://lore.kernel.org/lkml/CAP-5=fUkg-cAXTb+3wbFOQCfdXgpQeZw40XHjfrNFbnBD=NMXg@mail.gmail.com/
>
> Sure. I would look at it and add a test case.

Thanks Dapeng and thanks Kan too! I wonder if we can do a regular
counter and a leader sample counter then compare the counts are
reasonably consistent. Something like this:

```
$ perf stat -e instructions perf test -w noploop

Performance counter stats for '/tmp/perf/perf test -w noploop':

   25,779,785,496      instructions

      1.008047696 seconds time elapsed

      1.003754000 seconds user
      0.003999000 seconds sys
```

```
cat << "_end_of_file_" > a.py
last_count = None

def process_event(param_dict):
   if ("ev_name" in param_dict and "sample" in param_dict and
       param_dict["ev_name"] == "instructions"):
       sample = param_dict["sample"]
       if "values" in sample:
           global last_count
           last_count = sample["values"][1][1]

def trace_end():
   global last_count
   print(last_count)
_end_of_file_
$ sudo perf record -o -  -e "{cycles,instructions}:S" perf test -w
noploop|perf script -i - -s ./a.py
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.459 MB - ]
22195356100
```

I didn't see a simpler way to get count and I don't think it is right.
There's some similar perf script checking of data in
tools/perf/tests/shell/test_intel_pt.sh.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf topdown: Correct leader selection with sample_read enabled
  2024-06-28 18:28     ` Ian Rogers
@ 2024-06-28 20:27       ` Liang, Kan
  2024-07-01  9:51         ` Mi, Dapeng
  0 siblings, 1 reply; 6+ messages in thread
From: Liang, Kan @ 2024-06-28 20:27 UTC (permalink / raw)
  To: Ian Rogers, Mi, Dapeng
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Dapeng Mi



On 2024-06-28 2:28 p.m., Ian Rogers wrote:
> On Thu, Jun 27, 2024 at 11:17 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>> On 6/27/2024 11:11 PM, Liang, Kan wrote:
>>> On 2024-06-14 5:39 p.m., Dapeng Mi wrote:
>>>
>>> Besides, we need a test for the sampling read as well.
>>> Ian has provided a very good base. Please add a topdown sampling read
>>> case on top of it as well.
>>> https://lore.kernel.org/lkml/CAP-5=fUkg-cAXTb+3wbFOQCfdXgpQeZw40XHjfrNFbnBD=NMXg@mail.gmail.com/
>>
>> Sure. I would look at it and add a test case.
> 
> Thanks Dapeng and thanks Kan too! I wonder if we can do a regular
> counter and a leader sample counter then compare the counts are
> reasonably consistent. Something like this:
> 
> ```
> $ perf stat -e instructions perf test -w noploop
> 
> Performance counter stats for '/tmp/perf/perf test -w noploop':
> 
>    25,779,785,496      instructions
> 
>       1.008047696 seconds time elapsed
> 
>       1.003754000 seconds user
>       0.003999000 seconds sys
> ```
> 
> ```
> cat << "_end_of_file_" > a.py
> last_count = None
> 
> def process_event(param_dict):
>    if ("ev_name" in param_dict and "sample" in param_dict and
>        param_dict["ev_name"] == "instructions"):
>        sample = param_dict["sample"]
>        if "values" in sample:
>            global last_count
>            last_count = sample["values"][1][1]
> 
> def trace_end():
>    global last_count
>    print(last_count)
> _end_of_file_
> $ sudo perf record -o -  -e "{cycles,instructions}:S" perf test -w
> noploop|perf script -i - -s ./a.py
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.459 MB - ]
> 22195356100
> ```
> 
> I didn't see a simpler way to get count and I don't think it is right.

The perf stat can cover the whole life cycle of a workload. But I think
the result of perf record can only give the sum from the beginning to
the last sample.
There are some differences.

> There's some similar perf script checking of data in
> tools/perf/tests/shell/test_intel_pt.sh.
>

I think the case should be to test the output of the perf script, rather
than verify the accuracy of an event.

If so, we may run two same events. They should show the exact same
results in a sample.

For example,

perf record  -e "{branches,branches}:Su" -c 1000000 ./perf test -w brstack
perf script
perf  752598 349300.123884:    1000002 branches:      7f18676a875a
do_lookup_x+0x2fa (/usr/lib64/l>
perf  752598 349300.123884:    1000002 branches:      7f18676a875a
do_lookup_x+0x2fa (/usr/lib64/l>
perf  752598 349300.124854:    1000005 branches:      7f18676a90b6
_dl_lookup_symbol_x+0x56 (/usr/>
perf  752598 349300.124854:    1000005 branches:      7f18676a90b6
_dl_lookup_symbol_x+0x56 (/usr/>
perf  752598 349300.125914:     999998 branches:      7f18676a8556
do_lookup_x+0xf6 (/usr/lib64/ld>
perf  752598 349300.125914:     999998 branches:      7f18676a8556
do_lookup_x+0xf6 (/usr/lib64/ld>
perf  752598 349300.127401:    1000009 branches:            4c1adf
brstack_bench+0x15 (/home/kan/o>
perf  752598 349300.127401:    1000009 branches:            4c1adf
brstack_bench+0x15 (/home/kan/o>

Thanks,
Kan

> Thanks,
> Ian
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf topdown: Correct leader selection with sample_read enabled
  2024-06-28 20:27       ` Liang, Kan
@ 2024-07-01  9:51         ` Mi, Dapeng
  0 siblings, 0 replies; 6+ messages in thread
From: Mi, Dapeng @ 2024-07-01  9:51 UTC (permalink / raw)
  To: Liang, Kan, Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, linux-perf-users,
	linux-kernel, Dapeng Mi


On 6/29/2024 4:27 AM, Liang, Kan wrote:
>
> On 2024-06-28 2:28 p.m., Ian Rogers wrote:
>> On Thu, Jun 27, 2024 at 11:17 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>>> On 6/27/2024 11:11 PM, Liang, Kan wrote:
>>>> On 2024-06-14 5:39 p.m., Dapeng Mi wrote:
>>>>
>>>> Besides, we need a test for the sampling read as well.
>>>> Ian has provided a very good base. Please add a topdown sampling read
>>>> case on top of it as well.
>>>> https://lore.kernel.org/lkml/CAP-5=fUkg-cAXTb+3wbFOQCfdXgpQeZw40XHjfrNFbnBD=NMXg@mail.gmail.com/
>>> Sure. I would look at it and add a test case.
>> Thanks Dapeng and thanks Kan too! I wonder if we can do a regular
>> counter and a leader sample counter then compare the counts are
>> reasonably consistent. Something like this:
>>
>> ```
>> $ perf stat -e instructions perf test -w noploop
>>
>> Performance counter stats for '/tmp/perf/perf test -w noploop':
>>
>>    25,779,785,496      instructions
>>
>>       1.008047696 seconds time elapsed
>>
>>       1.003754000 seconds user
>>       0.003999000 seconds sys
>> ```
>>
>> ```
>> cat << "_end_of_file_" > a.py
>> last_count = None
>>
>> def process_event(param_dict):
>>    if ("ev_name" in param_dict and "sample" in param_dict and
>>        param_dict["ev_name"] == "instructions"):
>>        sample = param_dict["sample"]
>>        if "values" in sample:
>>            global last_count
>>            last_count = sample["values"][1][1]
>>
>> def trace_end():
>>    global last_count
>>    print(last_count)
>> _end_of_file_
>> $ sudo perf record -o -  -e "{cycles,instructions}:S" perf test -w
>> noploop|perf script -i - -s ./a.py
>> [ perf record: Woken up 2 times to write data ]
>> [ perf record: Captured and wrote 0.459 MB - ]
>> 22195356100
>> ```
>>
>> I didn't see a simpler way to get count and I don't think it is right.
> The perf stat can cover the whole life cycle of a workload. But I think
> the result of perf record can only give the sum from the beginning to
> the last sample.
> There are some differences.
>
>> There's some similar perf script checking of data in
>> tools/perf/tests/shell/test_intel_pt.sh.
>>
> I think the case should be to test the output of the perf script, rather
> than verify the accuracy of an event.
>
> If so, we may run two same events. They should show the exact same
> results in a sample.
>
> For example,
>
> perf record  -e "{branches,branches}:Su" -c 1000000 ./perf test -w brstack
> perf script
> perf  752598 349300.123884:    1000002 branches:      7f18676a875a
> do_lookup_x+0x2fa (/usr/lib64/l>
> perf  752598 349300.123884:    1000002 branches:      7f18676a875a
> do_lookup_x+0x2fa (/usr/lib64/l>
> perf  752598 349300.124854:    1000005 branches:      7f18676a90b6
> _dl_lookup_symbol_x+0x56 (/usr/>
> perf  752598 349300.124854:    1000005 branches:      7f18676a90b6
> _dl_lookup_symbol_x+0x56 (/usr/>
> perf  752598 349300.125914:     999998 branches:      7f18676a8556
> do_lookup_x+0xf6 (/usr/lib64/ld>
> perf  752598 349300.125914:     999998 branches:      7f18676a8556
> do_lookup_x+0xf6 (/usr/lib64/ld>
> perf  752598 349300.127401:    1000009 branches:            4c1adf
> brstack_bench+0x15 (/home/kan/o>
> perf  752598 349300.127401:    1000009 branches:            4c1adf
> brstack_bench+0x15 (/home/kan/o>

This looks a more accurate validation. I would add this test.


>
> Thanks,
> Kan
>
>> Thanks,
>> Ian
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-01  9:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-14 21:39 [PATCH] perf topdown: Correct leader selection with sample_read enabled Dapeng Mi
2024-06-27 15:11 ` Liang, Kan
2024-06-28  6:17   ` Mi, Dapeng
2024-06-28 18:28     ` Ian Rogers
2024-06-28 20:27       ` Liang, Kan
2024-07-01  9:51         ` Mi, Dapeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).