From: James Clark <james.clark@linaro.org>
To: Anubhav Shelat <ashelat@redhat.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>,
Namhyung Kim <namhyung@kernel.org>,
mpetlan@redhat.com, acme@kernel.org, irogers@google.com,
linux-perf-users@vger.kernel.org, peterz@infradead.org,
mingo@redhat.com, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
adrian.hunter@intel.com, kan.liang@linux.intel.com,
dapeng1.mi@linux.intel.com
Subject: Re: [PATCH] perf tests record: allow for some difference in cycle count in leader sampling test on aarch64
Date: Thu, 9 Oct 2025 15:08:08 +0100 [thread overview]
Message-ID: <bda238f1-fd21-4411-8611-1bc246ec254c@linaro.org> (raw)
In-Reply-To: <CA+G8Dh+Odf40jdY4h1knjU+3sSjZokMx6OdzRT3o9v1=ndKORQ@mail.gmail.com>
On 09/10/2025 2:43 pm, Anubhav Shelat wrote:
> I tested on a new arm machine and I'm getting a similar issue as Thomas,
Which are your new and old Arm machines exactly? And which kernel
versions did you run the test on?
> but the test fails every 20 or so runs and I'm not getting the issue that I
> previously mentioned.
>
What do you mean here? Below I see the leader sampling test failure,
which I thought was the same issue that was previously mentioned?
> Running test #15
> 10bc60-10bcc4 g test_loop
> perf does have symbol 'test_loop'
> 10c354-10c418 l brstack
> perf does have symbol 'brstack'
> Basic leader sampling test
> Basic leader sampling test [Success]
> Invalid Counts: 1
> Valid Counts: 27
> Running test #16
> 10bc60-10bcc4 g test_loop
> perf does have symbol 'test_loop'
> 10c354-10c418 l brstack
> perf does have symbol 'brstack'
> Basic leader sampling test
> Basic leader sampling test [Success]
> Invalid Counts: 1
> Valid Counts: 27
> Running test #17
> 10bc60-10bcc4 g test_loop
> perf does have symbol 'test_loop'
> 10c354-10c418 l brstack
> perf does have symbol 'brstack'
> Basic leader sampling test
> Leader sampling [Failed inconsistent cycles count]
> Invalid Counts: 8
> Valid Counts: 28
>
> Initially I thought it was the throttling issue mentioned in the comment in
> test_leadership_sampling, but there's another thread says that it's fixed:
> https://lore.kernel.org/lkml/20250520181644.2673067-2-kan.liang@linux.intel.com/
>
>
>
> On Wed, Oct 8, 2025 at 12:24 PM James Clark <james.clark@linaro.org> wrote:
>
>>
>>
>> On 08/10/2025 11:48 am, Thomas Richter wrote:
>>> On 10/7/25 14:34, James Clark wrote:
>>>>
>>>>
>>>> On 07/10/2025 6:47 am, Thomas Richter wrote:
>>>>> On 10/2/25 15:39, Anubhav Shelat wrote:
>>>>>> On Oct 1, 2025 at 9:44 PM, Ian Rogers wrote:
>>>>>>> If cycles is 0 then this will always pass, should this be checking a
>>>>>> range?
>>>>>>
>>>>>> Yes you're right this will be better.
>>>>>>
>>>>>> On Oct 2, 2025 at 7:56 AM, Thomas Richter wrote:
>>>>>>> Can we use a larger range to allow the test to pass?
>>>>>>
>>>>>> What range do you get on s390? When I do group measurements using
>> "perf
>>>>>> record -e "{cycles,cycles}:Su" perf test -w brstack" like in the test
>> I
>>>>>> always get somewhere between 20 and 50 cycles difference. I haven't
>> tested
>>>>>> on s390x, but I see no cycle count difference when testing the same
>> command
>>>>>> on x86. I have observed much larger, more varied differences when
>> using
>>>>>> software events.
>>>>>>
>>>>>> Anubhav
>>>>>>
>>>>>
>>>>> Here is the output of the
>>>>>
>>>>> # perf record -e "{cycles,cycles}:Su" -- ./perf test -w brstack
>>>>> # perf script | grep brstack
>>>>>
>>>>> commands:
>>>>>
>>>>> perf 1110782 426394.696874: 6885000 cycles: 116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.696875: 1377000 cycles: 116fb98
>> brstack_foo+0x0 (/root>
>>>>> perf 1110782 426394.696877: 1377000 cycles: 116fb48
>> brstack_bar+0x0 (/root>
>>>>> perf 1110782 426394.696878: 1377000 cycles: 116fc94
>> brstack_bench+0xa4 (/r>
>>>>> perf 1110782 426394.696880: 1377000 cycles: 116fc84
>> brstack_bench+0x94 (/r>
>>>>> perf 1110782 426394.696881: 1377000 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696883: 1377000 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696884: 1377000 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696885: 1377000 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696887: 1377000 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696888: 1377000 cycles: 116fc98
>> brstack_bench+0xa8 (/r>
>>>>> perf 1110782 426394.696890: 1377000 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696891: 1377000 cycles: 116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.703542: 1377000 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.703542: 30971975 cycles: 116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.703543: 1377000 cycles: 116fc76
>> brstack_bench+0x86 (/r>
>>>>> perf 1110782 426394.703545: 1377000 cycles: 116fc06
>> brstack_bench+0x16 (/r>
>>>>> perf 1110782 426394.703546: 1377000 cycles: 116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.703547: 1377000 cycles: 116fc20
>> brstack_bench+0x30 (/r>
>>>>> perf 1110782 426394.703549: 1377000 cycles: 116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.703550: 1377000 cycles: 116fcbc
>> brstack_bench+0xcc
>>>>>
>>>>> They are usual identical values beside one or two which are way off.
>> Ignoring those would
>>>>> be good.
>>>>>
>>>>
>>>> FWIW I ran 100+ iterations my Arm Juno and N1SDP boards and the test
>> passed every time.
>>>>
>>>> Are we sure there isn't some kind of race condition or bug that the
>> test has found? Rather than a bug in the test?
>>> There is always a possibility of a bug, that can not be ruled out for
>> certain.
>>> However as LPARs on s390 run on top of a hypervisor, there is a chance
>> for the
>>> linux guest being stopped while hardware keeps running.
>>>
>>
>> I have no idea what's going on or how that works, so maybe this question
>> is useless, but doesn't that mean that guests can determine/guess the
>> counter values from other guests? If the hardware keeps the counter
>> running when the guest isn't, that sounds like something is leaking from
>> one guest to another? Should the hypervisor not be saving and restoring
>> context?
>>
>>> I see these runoff values time and again, roughly every second run fails
>> with
>>> one runoff value
>>>
>>> Hope this helps
>>>
>>
>> That may explain the issue for s390 then, but I'm assuming it doesn't
>> explain the issues on Arm if the failures there aren't in a VM. But even
>> if they were in a VM, the PMU is fully virtualised and the events would
>> be stopped and resumed when the guest is switched out.
>>
>> James
>>
>>
>
prev parent reply other threads:[~2025-10-09 14:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 19:50 [PATCH] perf tests record: allow for some difference in cycle count in leader sampling test on aarch64 Anubhav Shelat
2025-10-01 20:43 ` Ian Rogers
2025-10-02 6:55 ` Thomas Richter
[not found] ` <CA+G8DhL49FWD47bkbcXYeb9T=AbxNhC-ypqjkNxRnW0JqmYnPw@mail.gmail.com>
2025-10-02 17:44 ` Anubhav Shelat
2025-10-07 5:47 ` Thomas Richter
2025-10-07 12:34 ` James Clark
2025-10-08 7:52 ` Namhyung Kim
2025-10-08 10:48 ` Thomas Richter
2025-10-08 11:24 ` James Clark
2025-10-09 12:14 ` Thomas Richter
[not found] ` <CA+G8Dh+Odf40jdY4h1knjU+3sSjZokMx6OdzRT3o9v1=ndKORQ@mail.gmail.com>
2025-10-09 13:55 ` Anubhav Shelat
2025-10-09 14:17 ` James Clark
[not found] ` <CA+G8DhKQkTKoNer5GfZedPUj4xMizWVJUWFocP2eQ_cmPJtBOQ@mail.gmail.com>
2025-10-09 14:59 ` James Clark
2025-10-09 15:22 ` Anubhav Shelat
2025-10-13 15:36 ` Arnaldo Carvalho de Melo
2025-10-09 14:08 ` James Clark [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bda238f1-fd21-4411-8611-1bc246ec254c@linaro.org \
--to=james.clark@linaro.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=ashelat@redhat.com \
--cc=dapeng1.mi@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=mpetlan@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=tmricht@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).