From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7847737BE92 for ; Fri, 15 May 2026 06:25:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778826346; cv=none; b=K2HMnQuUuuKqOVg9jHXHF2lwIRzsCSYSgjQhXfrSQXR9DgQYGjiDBuh8i39Wa48JwJKMiMsRzHO4XTpVh43bvGO9cmXgwdbJhaytahhxqNPUpiy2VVMEQusgrm38EAy8y+j0OOGKmqpfQ+48ImXAL8gSLnE1XAIIQJDigyBhNdw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778826346; c=relaxed/simple; bh=5s9WkE+DvQbSASeNGK2PjpHgJ41YEOMPYBPgMnpNXpQ=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=goUtnj54allHZyxEcOnrAQdnW/zgwdv8IqC+Z9gbw4CQjdzHqBLDS6iMTav8gFdSSpfLIUlxNrfwz5SBYHQe+oMc1iSGbY+7AYl0QnU/LaXJ9sp+DfwF358pCyi+R/KN4CqZbMD58N54mBK6Pw1YYDSzFOBgAtgyjoYsr7HbpRI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hGBL32xz; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hGBL32xz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778826345; x=1810362345; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=5s9WkE+DvQbSASeNGK2PjpHgJ41YEOMPYBPgMnpNXpQ=; b=hGBL32xzwukkNL/N1p8bueqZUWiiCbaSQpHXubJ0VHLqRGaGxqLk9urS mRt5dLW1YQyHsyrznWlWJ7nUfvT6Lppv178qftOK/ru5gCk7U132bLYQX GjUgQJoaJPwzo9gEF7uDWfxTmifPoe24l9e04pO8e+9iBhJMztj5ESsUG u7eL3mR+fgZddG2PbAw8eSaepmh2BSCUIEJTp2LyeOvAvbLpsgqZrQtEI w7yrzOUCterNCkKyX/iUWUoCLioPkHZ9VgnoO0XfkJfR6gjlPmpS6SqFv 1b+jGI4PDuioUCSyXXwPdPdcj+HzxPSjyDyLMfNSvZH0rFRl0z5sDDb/U A==; X-CSE-ConnectionGUID: YcMW/6+qTZCXSyUcCXGNtA== X-CSE-MsgGUID: vAvDPkEjQd2sSCbc8SD0gg== X-IronPort-AV: E=McAfee;i="6800,10657,11786"; a="79672370" X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="79672370" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 23:25:44 -0700 X-CSE-ConnectionGUID: rj14tv9ATyOibeTzgI0fMA== X-CSE-MsgGUID: RdxphKbsSaK0zCOxfA3ADQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="232215584" Received: from dapengmi-mobl1.ccr.corp.intel.com (HELO [10.124.241.147]) ([10.124.241.147]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2026 23:25:39 -0700 Message-ID: <80e5932e-df03-4619-bcd4-375537538c0d@linux.intel.com> Date: Fri, 15 May 2026 14:25:36 +0800 Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] tools/perf/test: Check for perf stat return code in perf all PMU test From: "Mi, Dapeng" To: Ian Rogers , "Falcon, Thomas" , "Kleen, Andi" Cc: "atrajeev@linux.ibm.com" , "venkat88@linux.ibm.com" , "Shivani.Nittor@ibm.com" , "tmricht@linux.ibm.com" , "hbathini@linux.vnet.ibm.com" , "mpetlan@redhat.com" , "Tanushree.Shah@ibm.com" , "Hunter, Adrian" , "linux-perf-users@vger.kernel.org" , "maddy@linux.ibm.com" , "Chen, Zide" , "vmolnaro@redhat.com" , "Tejas.Manhas1@ibm.com" , "linuxppc-dev@lists.ozlabs.org" , "acme@kernel.org" , "jolsa@kernel.org" , "Mi, Dapeng1" , "namhyung@kernel.org" References: <20260315105751.86835-1-atrajeev@linux.ibm.com> <7B7E5C6C-D15A-4B79-925B-B5F3EDD84774@linux.ibm.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 4/7/2026 8:48 AM, Mi, Dapeng wrote: > On 4/3/2026 11:39 PM, Ian Rogers wrote: >> On Fri, Apr 3, 2026 at 12:36 AM Mi, Dapeng wrote: >>> On 4/3/2026 1:32 AM, Falcon, Thomas wrote: >>>> On Wed, 2026-04-01 at 13:40 -0700, Ian Rogers wrote: >>>>> On Mon, Mar 23, 2026 at 3:40 AM Venkat >>>>> wrote: >>>>>>> On 15 Mar 2026, at 4:27 PM, Athira Rajeev >>>>>>> wrote: >>>>>>> >>>>>>> Currently in "perf all PMU test", for "perf stat -e >>>>>>> true", >>>>>>> below checks are done: >>>>>>> - if return code is zero, look for "not supported" to decide pass >>>>>>> scenario >>>>>>> - check for "not supported" to ignore the event >>>>>>> - looks for "No permission to enable" to skip the event. >>>>>>> - If output has "Bad event name", fail the test. >>>>>>> - Use "Access to performance monitoring and observability >>>>>>> operations is >>>>>>> limited." to ignore fail due to access limitations >>>>>>> >>>>>>> If we failed to see event and it is supported, retries with >>>>>>> longer >>>>>>> workload "perf bench internals synthesize". >>>>>>> - Here if output has , the test is a pass. >>>>>>> >>>>>>> Snippet of code check: >>>>>>> ``` >>>>>>> output=$(perf stat -e "$p" perf bench internals synthesize 2>&1) >>>>>>> if echo "$output" | grep -q "$p" >>>>>>> ``` >>>>>>> - if output doesn't have event printed in logs, considers it >>>>>>> fail. >>>>>>> >>>>>>> But this results in false pass for events in some cases. >>>>>>> Example, if perf stat fails as below: >>>>>>> >>>>>>> # ./perf stat -e pmu/event/ true >>>>>>> event syntax error: 'pmu/event/' >>>>>>> \___ Bad event or PMU >>>>>>> >>>>>>> Unable to find PMU or event on a PMU of 'pmu' >>>>>>> Run 'perf list' for a list of valid events >>>>>>> >>>>>>> Usage: perf stat [] [] >>>>>>> >>>>>>> -e, --event event selector. use 'perf list' to list >>>>>>> available events >>>>>>> # echo $? >>>>>>> 129 >>>>>>> >>>>>>> Since this has non-zero return code and doesn't have the >>>>>>> fail strings being checked in the test, it will enter check using >>>>>>> longer workload. and since the output fail log has event, it >>>>>>> declares test as "supported". >>>>>>> >>>>>>> Since all the fail strings can't be added in the check, update >>>>>>> the testcase to check return code before proceeding to longer >>>>>>> workload run. >>>>>>> >>>>>>> Another missing scenario is when system wide monitoring is >>>>>>> supported >>>>>>> example: >>>>>>> # ./perf stat -e pmu/event/ true >>>>>>> Error: >>>>>>> No supported events found. >>>>>>> Unsupported event (pmu/event/H) in per-thread mode, enable >>>>>>> system wide with '-a'. >>>>>>> >>>>>>> Update testcase to check with "perf stat -a -e $p" as well >>>>>>> >>>>>>> Signed-off-by: Athira Rajeev >>>>>>> --- >>>>>> Tested this patch. >>>>>> >>>>>> >>>>>> With this patch: >>>>>> >>>>>> Testing hv_24x7/CPM_ADJUNCT_INST/ -- perf stat failed with non-zero >>>>>> return code >>>>>> Testing hv_24x7/CPM_ADJUNCT_PCYC/ -- perf stat failed with non-zero >>>>>> return code >>>>>> >>>>>> >>>>>> >>>>>> Tested-by: Venkat Rao Bagalkote >>>>> Testing on an Intel Alderlake the test is now failing: >>>>> ``` >>>>> ... >>>>> Testing offcore_requests_outstanding.l3_miss_demand_data_rd -- >>>>> supported >>>>> Testing ocr.full_streaming_wr.any_response -- perf stat failed with >>>>> non-zero return code >>>>> Testing ocr.partial_streaming_wr.any_response -- perf stat failed >>>>> with >>>>> non-zero return code >>>>> Testing ocr.streaming_wr.any_response -- supported >>>>> ... >>>>> ``` >>>>> >>>>> Running `perf stat` manually reveals an issue with the event: >>>>> ``` >>>>> $ sudo perf stat -vv -e ocr.full_streaming_wr.any_response -a sleep >>>>> 1 >>>>> Using CPUID GenuineIntel-6-B7-1 >>>>> Attempt to add: cpu_atom/ocr.full_streaming_wr.any_response/ >>>>> ..after resolving event: >>>>> cpu_atom/event=0xb7,period=0x186a3,umask=0x1,offcore_rsp=0x8000000100 >>>>> 00/ >>>>> ocr.full_streaming_wr.any_response -> >>>>> cpu_atom/ocr.full_streaming_wr.any_response/ >>>>> Control descriptor is not initialized >>>>> ------------------------------------------------------------ >>>>> perf_event_attr: >>>>> type 10 (cpu_atom) >>>>> size 144 >>>>> ------------------------------------------------------------ >>>>> perf_event_attr: >>>>> type 0 (PERF_TYPE_HARDWARE) >>>>> config 0xa00000000 >>>>> (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/) >>>>> disabled 1 >>>>> ------------------------------------------------------------ >>>>> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3 >>>>> ------------------------------------------------------------ >>>>> perf_event_attr: >>>>> type 0 (PERF_TYPE_HARDWARE) >>>>> config 0x400000000 >>>>> (cpu_core/PERF_COUNT_HW_CPU_CYCLES/) >>>>> disabled 1 >>>>> ------------------------------------------------------------ >>>>> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3 >>>>> config 0x1b7 >>>>> (ocr.demand_data_rd.l3_hit.snoop_hit_no_fwd) >>>>> sample_type IDENTIFIER >>>>> read_format >>>>> TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING >>>>> disabled 1 >>>>> inherit 1 >>>>> { bp_addr, config1 } 0x800000010000 >>>>> ------------------------------------------------------------ >>>>> sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 >>>>> sys_perf_event_open failed, error -22 >>>>> switching off deferred callchain support >>>>> Warning: >>>>> ocr.full_streaming_wr.any_response event is not supported by the >>>>> kernel. >>>>> The sys_perf_event_open() syscall failed for event >>>>> (ocr.full_streaming_wr.any_response): Invalid argument >>>>> "dmesg | grep -i perf" may provide additional information. >>>>> >>>>> Error: >>>>> No supported events found. >>>>> The sys_perf_event_open() syscall failed for event >>>>> (ocr.full_streaming_wr.any_response): Invalid argument >>>>> "dmesg | grep -i perf" may provide additional information. >>>>> ``` >>>>> >>>>> This looks like a latent Intel cpu_atom PMU bug. Thomas, wdyt? >>> Hmm, it looks the error is caused by the invalid bitmask of OFFCORE_RSP_x >>> MSRs. Currently the valid bitmask of OFFCORE_RSP_x MSR is set to >>> 0x3fffffffff in intel_grt_extra_regs[], while the msr value is set >>> 0x800000010000 for the ocr.full_streaming_wr.any_response event. The bit 47 >>> is recognized an invalid bit and then abort the event creation. >>> >>> Base on the description "Table 21-56. MSR_OFFCORE_RSPx Request Type >>> Definition" in SDM, bit 47 should be a valid bit now. Suppose bit 47 should >>> not be a valid bit when adding the ADL PMU support, but it's updated and >>> becomes valid later. >>> >>> Along with the constant updates of perf event lists >>> (https://github.com/intel/perfmon), we have noticed there are mismatches >>> more or less between the driver hardcoded events and perfmon event list. >>> Currently we are summarizing the mismatches. Once these mismatches are >>> finalized. we would submit a patchset to fix these mismatches. >> That's great, if it takes too long perhaps we could just remove the >> events for now. > Suppose it won't be too long. I plan to post the patchset in next release > cycle. The code changes are simple but need much time to verify on all > kinds of platforms. Thanks. The patch (https://lore.kernel.org/all/20260515061143.338553-5-dapeng1.mi@linux.intel.com/) would fix this issue. Thanks. > > >> Thanks, >> Ian >> >>> Thanks. >>> >>>> +Dapeng, Zide, Andi >>>> >>>> Thanks, >>>> Tom >>>> >>>>> Thanks, >>>>> Ian >>>>> >>>>>> Regards, >>>>>> Venkat. >>>>>> >>>>>> >>>>>> >>>>>>> tools/perf/tests/shell/stat_all_pmu.sh | 20 ++++++++++++++++++++ >>>>>>> 1 file changed, 20 insertions(+) >>>>>>> >>>>>>> diff --git a/tools/perf/tests/shell/stat_all_pmu.sh >>>>>>> b/tools/perf/tests/shell/stat_all_pmu.sh >>>>>>> index 9c466c0efa85..6c4d59cbfa5f 100755 >>>>>>> --- a/tools/perf/tests/shell/stat_all_pmu.sh >>>>>>> +++ b/tools/perf/tests/shell/stat_all_pmu.sh >>>>>>> @@ -53,6 +53,26 @@ do >>>>>>> continue >>>>>>> fi >>>>>>> >>>>>>> + # check with system wide if it is supported. >>>>>>> + output=$(perf stat -a -e "$p" true 2>&1) >>>>>>> + stat_result=$? >>>>>>> + if echo "$output" | grep -q "not supported" >>>>>>> + then >>>>>>> + # Event not supported, so ignore. >>>>>>> + echo "not supported" >>>>>>> + continue >>>>>>> + fi >>>>>>> + >>>>>>> + # checked through possible access limitations and permissions. >>>>>>> + # At this step, non-zero return code from "perf stat" needs to >>>>>>> + # reported as fail for the user to investigate >>>>>>> + if [ $stat_result -ne 0 ] >>>>>>> + then >>>>>>> + echo "perf stat failed with non-zero return code" >>>>>>> + err=1 >>>>>>> + continue >>>>>>> + fi >>>>>>> + >>>>>>> # We failed to see the event and it is supported. Possibly the >>>>>>> workload was >>>>>>> # too small so retry with something longer. >>>>>>> output=$(perf stat -e "$p" perf bench internals synthesize >>>>>>> 2>&1) >>>>>>> -- >>>>>>> 2.47.3 >>>>>>>