linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Clark <james.clark@linaro.org>
To: Anubhav Shelat <ashelat@redhat.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>,
	Namhyung Kim <namhyung@kernel.org>,
	mpetlan@redhat.com, acme@kernel.org, irogers@google.com,
	linux-perf-users@vger.kernel.org, peterz@infradead.org,
	mingo@redhat.com, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	adrian.hunter@intel.com, kan.liang@linux.intel.com,
	dapeng1.mi@linux.intel.com
Subject: Re: [PATCH] perf tests record: allow for some difference in cycle count in leader sampling test on aarch64
Date: Thu, 9 Oct 2025 15:08:08 +0100	[thread overview]
Message-ID: <bda238f1-fd21-4411-8611-1bc246ec254c@linaro.org> (raw)
In-Reply-To: <CA+G8Dh+Odf40jdY4h1knjU+3sSjZokMx6OdzRT3o9v1=ndKORQ@mail.gmail.com>



On 09/10/2025 2:43 pm, Anubhav Shelat wrote:
> I tested on a new arm machine and I'm getting a similar issue as Thomas,

Which are your new and old Arm machines exactly? And which kernel 
versions did you run the test on?

> but the test fails every 20 or so runs and I'm not getting the issue that I
> previously mentioned.
> 

What do you mean here? Below I see the leader sampling test failure, 
which I thought was the same issue that was previously mentioned?

> Running test #15
>   10bc60-10bcc4 g test_loop
> perf does have symbol 'test_loop'
>   10c354-10c418 l brstack
> perf does have symbol 'brstack'
> Basic leader sampling test
> Basic leader sampling test [Success]
> Invalid Counts: 1
> Valid Counts: 27
> Running test #16
>   10bc60-10bcc4 g test_loop
> perf does have symbol 'test_loop'
>   10c354-10c418 l brstack
> perf does have symbol 'brstack'
> Basic leader sampling test
> Basic leader sampling test [Success]
> Invalid Counts: 1
> Valid Counts: 27
> Running test #17
>   10bc60-10bcc4 g test_loop
> perf does have symbol 'test_loop'
>   10c354-10c418 l brstack
> perf does have symbol 'brstack'
> Basic leader sampling test
> Leader sampling [Failed inconsistent cycles count]
> Invalid Counts: 8
> Valid Counts: 28
> 
> Initially I thought it was the throttling issue mentioned in the comment in
> test_leadership_sampling, but there's another thread says that it's fixed:
> https://lore.kernel.org/lkml/20250520181644.2673067-2-kan.liang@linux.intel.com/
> 
> 
> 
> On Wed, Oct 8, 2025 at 12:24 PM James Clark <james.clark@linaro.org> wrote:
> 
>>
>>
>> On 08/10/2025 11:48 am, Thomas Richter wrote:
>>> On 10/7/25 14:34, James Clark wrote:
>>>>
>>>>
>>>> On 07/10/2025 6:47 am, Thomas Richter wrote:
>>>>> On 10/2/25 15:39, Anubhav Shelat wrote:
>>>>>> On Oct 1, 2025 at 9:44 PM, Ian Rogers wrote:
>>>>>>> If cycles is 0 then this will always pass, should this be checking a
>>>>>> range?
>>>>>>
>>>>>> Yes you're right this will be better.
>>>>>>
>>>>>> On Oct 2, 2025 at 7:56 AM, Thomas Richter wrote:
>>>>>>> Can we use a larger range to allow the test to pass?
>>>>>>
>>>>>> What range do you get on s390? When I do group measurements using
>> "perf
>>>>>> record -e "{cycles,cycles}:Su" perf test -w brstack" like in the test
>> I
>>>>>> always get somewhere between 20 and 50 cycles difference. I haven't
>> tested
>>>>>> on s390x, but I see no cycle count difference when testing the same
>> command
>>>>>> on x86. I have observed much larger, more varied differences when
>> using
>>>>>> software events.
>>>>>>
>>>>>> Anubhav
>>>>>>
>>>>>
>>>>> Here is the output of the
>>>>>
>>>>>     # perf record  -e "{cycles,cycles}:Su" -- ./perf test -w brstack
>>>>>     # perf script | grep brstack
>>>>>
>>>>> commands:
>>>>>
>>>>> perf 1110782 426394.696874:    6885000 cycles:           116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.696875:    1377000 cycles:           116fb98
>> brstack_foo+0x0 (/root>
>>>>> perf 1110782 426394.696877:    1377000 cycles:           116fb48
>> brstack_bar+0x0 (/root>
>>>>> perf 1110782 426394.696878:    1377000 cycles:           116fc94
>> brstack_bench+0xa4 (/r>
>>>>> perf 1110782 426394.696880:    1377000 cycles:           116fc84
>> brstack_bench+0x94 (/r>
>>>>> perf 1110782 426394.696881:    1377000 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696883:    1377000 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696884:    1377000 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696885:    1377000 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696887:    1377000 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696888:    1377000 cycles:           116fc98
>> brstack_bench+0xa8 (/r>
>>>>> perf 1110782 426394.696890:    1377000 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.696891:    1377000 cycles:           116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.703542:    1377000 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.703542:   30971975 cycles:           116fb7c
>> brstack_bar+0x34 (/roo>
>>>>> perf 1110782 426394.703543:    1377000 cycles:           116fc76
>> brstack_bench+0x86 (/r>
>>>>> perf 1110782 426394.703545:    1377000 cycles:           116fc06
>> brstack_bench+0x16 (/r>
>>>>> perf 1110782 426394.703546:    1377000 cycles:           116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.703547:    1377000 cycles:           116fc20
>> brstack_bench+0x30 (/r>
>>>>> perf 1110782 426394.703549:    1377000 cycles:           116fc9e
>> brstack_bench+0xae (/r>
>>>>> perf 1110782 426394.703550:    1377000 cycles:           116fcbc
>> brstack_bench+0xcc
>>>>>
>>>>> They are usual identical values beside one or two which are way off.
>> Ignoring those would
>>>>> be good.
>>>>>
>>>>
>>>> FWIW I ran 100+ iterations my Arm Juno and N1SDP boards and the test
>> passed every time.
>>>>
>>>> Are we sure there isn't some kind of race condition or bug that the
>> test has found? Rather than a bug in the test?
>>> There is always a possibility of a bug, that can not be ruled out for
>> certain.
>>> However as LPARs on s390 run on top of a hypervisor, there is a chance
>> for the
>>> linux guest being stopped while hardware keeps running.
>>>
>>
>> I have no idea what's going on or how that works, so maybe this question
>> is useless, but doesn't that mean that guests can determine/guess the
>> counter values from other guests? If the hardware keeps the counter
>> running when the guest isn't, that sounds like something is leaking from
>> one guest to another? Should the hypervisor not be saving and restoring
>> context?
>>
>>> I see these runoff values time and again, roughly every second run fails
>> with
>>> one runoff value
>>>
>>> Hope this helps
>>>
>>
>> That may explain the issue for s390 then, but I'm assuming it doesn't
>> explain the issues on Arm if the failures there aren't in a VM. But even
>> if they were in a VM, the PMU is fully virtualised and the events would
>> be stopped and resumed when the guest is switched out.
>>
>> James
>>
>>
> 


      parent reply	other threads:[~2025-10-09 14:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-01 19:50 [PATCH] perf tests record: allow for some difference in cycle count in leader sampling test on aarch64 Anubhav Shelat
2025-10-01 20:43 ` Ian Rogers
2025-10-02  6:55 ` Thomas Richter
     [not found]   ` <CA+G8DhL49FWD47bkbcXYeb9T=AbxNhC-ypqjkNxRnW0JqmYnPw@mail.gmail.com>
2025-10-02 17:44     ` Anubhav Shelat
2025-10-07  5:47     ` Thomas Richter
2025-10-07 12:34       ` James Clark
2025-10-08  7:52         ` Namhyung Kim
2025-10-08 10:48         ` Thomas Richter
2025-10-08 11:24           ` James Clark
2025-10-09 12:14             ` Thomas Richter
     [not found]             ` <CA+G8Dh+Odf40jdY4h1knjU+3sSjZokMx6OdzRT3o9v1=ndKORQ@mail.gmail.com>
2025-10-09 13:55               ` Anubhav Shelat
2025-10-09 14:17                 ` James Clark
     [not found]                   ` <CA+G8DhKQkTKoNer5GfZedPUj4xMizWVJUWFocP2eQ_cmPJtBOQ@mail.gmail.com>
2025-10-09 14:59                     ` James Clark
2025-10-09 15:22                       ` Anubhav Shelat
2025-10-13 15:36                       ` Arnaldo Carvalho de Melo
2025-10-09 14:08               ` James Clark [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bda238f1-fd21-4411-8611-1bc246ec254c@linaro.org \
    --to=james.clark@linaro.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ashelat@redhat.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mpetlan@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tmricht@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).