From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C20EC77B7F for ; Sun, 14 May 2023 12:03:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229894AbjENMDu (ORCPT ); Sun, 14 May 2023 08:03:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229526AbjENMDs (ORCPT ); Sun, 14 May 2023 08:03:48 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6EA31BFD; Sun, 14 May 2023 05:03:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684065824; x=1715601824; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=yCOVdGGc/aZqYjzgoV5mk2s2X5cfZ9vAefWO6YSegRs=; b=hhUOm2PsF8J0PkNBYpXUap0wIe+F3mimgow7XCBsnK8EFLy6pUmWRh6w JFeiDtsOYnGfPcO4nWK0vgZ5N8UiVir5PeuJ1qbkRI9kREXsjoY3sQxQH Zj6CwI/A3o4ziBCF++c9bhBINaihAzZDM3Fn4rvESEsZhOUzgTXmujW0L tmNDu4420BkQ15XZ/iNCeF3qc71tTWgkAUd49eYbnKDwDF5S+KHjw4Dzk PJyVc1pleYWDuPYQZqeoKMSQwKPWmjucfnMOng4QLEC5BkD06mmBxUuCl yGII4+JIcc38FO4pFLEzgb6bHyU9KHWn+Urr73yF8Hkm8vROYfcf/N9dx Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10709"; a="340377950" X-IronPort-AV: E=Sophos;i="5.99,274,1677571200"; d="scan'208";a="340377950" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2023 05:03:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10709"; a="1030596385" X-IronPort-AV: E=Sophos;i="5.99,274,1677571200"; d="scan'208";a="1030596385" Received: from linux.intel.com ([10.54.29.200]) by fmsmga005.fm.intel.com with ESMTP; 14 May 2023 05:03:42 -0700 Received: from [10.212.213.15] (kliang2-mobl1.ccr.corp.intel.com [10.212.213.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id 5E5EB580BE0; Sun, 14 May 2023 05:03:39 -0700 (PDT) Message-ID: Date: Sun, 14 May 2023 08:03:38 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs To: Arnaldo Carvalho de Melo , Ian Rogers Cc: Ahmad Yasin , Peter Zijlstra , Ingo Molnar , Stephane Eranian , Andi Kleen , Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Adrian Hunter , Florian Fischer , Rob Herring , John Garry , Kajol Jain , Sumanth Korikkar , Thomas Richter , Tiezhu Yang , Ravi Bangoria , Leo Yan , Yang Jihong , James Clark , Suzuki Poulouse , Kang Minchul , Athira Rajeev , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org References: <20230502223851.2234828-1-irogers@google.com> Content-Language: en-US From: "Liang, Kan" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-perf-users@vger.kernel.org On 2023-05-12 2:33 p.m., Arnaldo Carvalho de Melo wrote: > Em Wed, May 03, 2023 at 04:56:36PM -0400, Liang, Kan escreveu: >> >> >> On 2023-05-02 6:38 p.m., Ian Rogers wrote: >>> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs >>> or individually, event parsing doesn't always scan all PMUs, more and >>> new tests that also run without hybrid, less code. >>> >>> The first 4 patches are aimed at Linux 6.4 to address issues raised, >>> in particular by Kan, on the existing perf stat behavior with json >>> metrics. They avoid duplicated events by removing groups. They don't >>> hide events and metrics to make event multiplexing obvious. They avoid >>> terminating perf when paranoia is higher due to certain events that >>> always fail. They avoid rearranging events by PMUs when the events >>> aren't in a group. >>> >>> The next 5 patches avoid grouping events for metrics where they could >>> never succeed and were previously posted as: >>> "perf vendor events intel: Add xxx metric constraints" >>> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/ >>> In general the generated json is coming from: >>> https://github.com/intel/perfmon/pull/73 >>> >>> Next are some general and test improvements. >>> >>> Next event parsing is rewritten to not scan all PMUs for the benefit >>> of raw and legacy cache parsing, instead these are handled by the >>> lexer and a new term type. This ultimately removes the need for the >>> event parser for hybrid to be recursive as legacy cache can be just a >>> term. Tests are re-enabled for events with hyphens, so AMD's >>> branch-brs event is now parsable. >>> >>> The cputype option is made a generic pmu filter flag and is tested >>> even on non-hybrid systems. >>> >>> The final patches address specific json metric issues on hybrid, in >>> both the json metrics and the metric code. >>> >>> The patches add slightly more code than they remove, in areas like >>> better json metric constraints and tests, but in the core util code, >>> the removal of hybrid is a net reduction: >>> 22 files changed, 711 insertions(+), 1016 deletions(-) >>> >>> Sample output is contained in the v1 patch set: >>> https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/ >>> >>> Tested on Tigerlake, Skylake and Alderlake CPUs. >>> >>> The v4 patch set: >>> - rebase, 1 of the Linux 6.4 recommended patches are merged leaving: >>> 1) perf metric: Change divide by zero and !support events behavior >>> 2) perf stat: Introduce skippable evsels >>> 3) perf metric: Json flag to not group events if gathering a metric group >>> 4) perf parse-events: Don't reorder ungrouped events by pmu >>> whose diffstat is: >>> 30 files changed, 326 insertions(+), 33 deletions(-) >>> but without the vendor event updates (the tend to be large as they >>> repeat something per architecture per metric) is just: >>> 10 files changed, 90 insertions(+), 32 deletions(-) >> >> I have tested the 4 patches on top of the perf-tools-next branch on both >> Cascade Lake and Raptor Lake. The result looks good to me. >> >> They address the permission error found in the default mode of perf stat >> on the Cascade Lake. Thanks Ian for the fix. >> >> Arnaldo, could you please consider to back port them for the 6.4? > > Yes, its in perf-tools now, will go to Linus next week. Thanks Arnaldo! > > What about the other patches? I saw some you provided your review, what > about the others, are you ok with them? > Yes, I'm OK with the patch set. It fixes many issues. Thanks Ian. (My tests mainly focus on the area in which the patch set may touch. I did the tests on various platforms, ADL (hybrid), Cascade Lake, SPR.) Tested-by: Kan Liang But there are still some issues. I don't think they are introduced by this patch set. We may fix them later separately. - Segmentation fault with perf stat --topdown on ADL (hybrid) and Cascade Lake. It looks like a legacy issue, may not be introduced by this patch set. Here is the backtrace. It looks like there is a NULL metric_group. (gdb) backtrace #0 0x00007ffff73035d1 in __strstr_sse2_unaligned () from /lib64/libc.so.6 #1 0x00000000004f9019 in metricgroup__topdown_max_level_callback (pm=, table=, data=0x7fffffff92f4) at util/metricgroup.c:1722 #2 0x00000000005e8a31 in pmu_metrics_table_for_each_metric (table=0xcb74d0 , fn=fn@entry=0x4f8ff0 , data=data@entry=0x7fffffff92f4) at pmu-events/pmu-events.c:61123 #3 0x00000000004fbc3b in metricgroups__topdown_max_level () at util/metricgroup.c:1742 #4 0x000000000042c135 in add_default_attributes () at builtin-stat.c:1845 #5 cmd_stat (argc=0, argv=0x7fffffffe3e0) at builtin-stat.c:2446 #6 0x00000000004b922b in run_builtin (p=p@entry=0xd5c530 , argc=argc@entry=2, argv=argv@entry=0x7fffffffe3e0) at perf.c:323 #7 0x000000000040e373 in handle_internal_command (argv=0x7fffffffe3e0, argc=2) at perf.c:377 #8 run_argv (argv=, argcp=) at perf.c:421 #9 main (argc=2, argv=0x7fffffffe3e0) at perf.c:537 (gdb) Also, the return type is unsigned int, but a bool is given. unsigned int metricgroups__topdown_max_level(void) { unsigned int max_level = 0; const struct pmu_metrics_table *table = pmu_metrics_table__find(); if (!table) return false; - The perf metric and metricgroups fail on different platforms. Ian and I have discussed it. We agree to address it later separately. 102: perf all metricgroups test ADL (hybrid) 103: perf all metrics test ADL (hybrid), Cascade Lake, SPR - perf list: The [Kernel PMU event] is missed for all the hardware cache events. It impacts both hybrid and non-hybrid platforms. It's a user-visible change introduced by the patch set. I don't know if anyone cares whether it's a kernel event or a regular event. It doesn't bother me. So I'm OK with it. cpu: L1-dcache-loads OR cpu/L1-dcache-loads/ L1-dcache-load-misses OR cpu/L1-dcache-load-misses/ L1-dcache-stores OR cpu/L1-dcache-stores/ L1-icache-load-misses OR cpu/L1-icache-load-misses/ LLC-loads OR cpu/LLC-loads/ LLC-load-misses OR cpu/LLC-load-misses/ LLC-stores OR cpu/LLC-stores/ LLC-store-misses OR cpu/LLC-store-misses/ dTLB-loads OR cpu/dTLB-loads/ dTLB-load-misses OR cpu/dTLB-load-misses/ dTLB-stores OR cpu/dTLB-stores/ dTLB-store-misses OR cpu/dTLB-store-misses/ iTLB-load-misses OR cpu/iTLB-load-misses/ branch-loads OR cpu/branch-loads/ branch-load-misses OR cpu/branch-load-misses/ node-loads OR cpu/node-loads/ node-load-misses OR cpu/node-load-misses/ branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] branch-misses OR cpu/branch-misses/ [Kernel PMU event] bus-cycles OR cpu/bus-cycles/ [Kernel PMU event] cache-misses OR cpu/cache-misses/ [Kernel PMU event] cache-references OR cpu/cache-references/ [Kernel PMU event] cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event] instructions OR cpu/instructions/ [Kernel PMU event] - The --cputype only works for the metric in the default mode. I can still see the cpu_atom events with --cputype core It may be something we can improve later. # ./perf stat --cputype core sleep 2 Performance counter stats for 'sleep 2': 0.52 msec task-clock # 0.000 CPUs utilized 1 context-switches # 1.939 K/sec 0 cpu-migrations # 0.000 /sec 69 page-faults # 133.770 K/sec 2,569,423 cpu_core/cycles/ # 4.981 G/sec cpu_atom/cycles/ (0.00%) 3,287,691 cpu_core/instructions/ # 6.374 G/sec cpu_atom/instructions/ (0.00%) 555,848 cpu_core/branches/ # 1.078 G/sec cpu_atom/branches/ (0.00%) 8,398 cpu_core/branch-misses/ # 16.281 M/sec cpu_atom/branch-misses/ (0.00%) 15,416,538 cpu_core/TOPDOWN.SLOTS/ # 36.1 % tma_backend_bound # 23.9 % tma_retiring # 5.6 % tma_bad_speculation # 34.4 % tma_frontend_bound 3,687,877 cpu_core/topdown-retiring/ 846,398 cpu_core/topdown-bad-spec/ 5,320,217 cpu_core/topdown-fe-bound/ 5,562,045 cpu_core/topdown-be-bound/ 14,149 cpu_core/INT_MISC.UOP_DROPPING/ # 27.431 M/sec Thanks, Kan