public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Thomas Richter <tmricht@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-perf-users@vger.kernel.org, namhyung@kernel.org,
	agordeev@linux.ibm.com, gor@linux.ibm.com,
	sumanthk@linux.ibm.com, hca@linux.ibm.com, japo@linux.ibm.com,
	James Clark <james.clark@linaro.org>
Subject: Re: [PATCH] perf/test: Fix test case Leader sampling on s390.
Date: Fri, 6 Feb 2026 18:41:46 -0300	[thread overview]
Message-ID: <aYZgGlh3e84ZrUNQ@x1> (raw)
In-Reply-To: <20251128091139.1309755-1-tmricht@linux.ibm.com>

On Fri, Nov 28, 2025 at 10:11:39AM +0100, Thomas Richter wrote:
> The subtest 'Leader sampling' some time fails on s390.
> - for z/VM guest: Disable the test for z/VM guest. There is no
>   CPU Measurement facility to run the test successfully.
> - for LPAR: Use correct event names.

This one fell thru the cracks, still applies cleanly and the extra logic
affects only s390, applying to perf-tools-next,

- Arnaldo
 
> A detailed analysis follows here:
> Now to the debugging and investigation:
> 1. With command
>        perf record -e '{cycles,cycles}:S' -- ....
>    the first cycles event starts sampling.
>    On s390 this sets up sampling with a frequency of 4000 Hz.
>    This translates to hardware sample rate of 1377000 instructions per
>    micro-second to meet a frequency of 4000 HZ.
> 
> 2. With first event cycles now sampling into a hardware buffer, an
>    interrupt is triggered each time a sampling buffer gets full.
>    The interrupt handler is then invoked and debug output shows the
>    processing of samples.  The size of one hardware sample is 32 bytes.
>    With an interrupt triggered when the hardware buffer page of 4KB
>    gets full, the interrupt handler processes 128 samples.
>    (This is taken from s390 specific fast debug data gathering)
>    2025-11-07 14:35:51.977248  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x0 count 0x1502e8
>    2025-11-07 14:35:51.977248  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x1502e8 count 0x1502e8
>    2025-11-07 14:35:51.977248  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x2a05d0 count 0x1502e8
>    2025-11-07 14:35:51.977252  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x3f08b8 count 0x1502e8
>    2025-11-07 14:35:51.977252  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x540ba0 count 0x1502e8
>    2025-11-07 14:35:51.977253  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x690e88 count 0x1502e8
>    2025-11-07 14:35:51.977254  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x7e1170 count 0x1502e8
>    2025-11-07 14:35:51.977254  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0x931458 count 0x1502e8
>    2025-11-07 14:35:51.977254  000003ffe013cbfa \
> 	   perf_event_count_update event->count 0xa81740 count 0x1502e8
> 
> 3. The value is constantly increasing by the number of instructions
>    executed to generate a sample entry.  This is the first line of the
>    pairs of lines. count 0x1502e8 --> 1377000
> 
>    # perf script | grep 1377000 | wc -l
>    214
>    # perf script | wc -l
>    428
>    #
>    That is 428 lines in total, and half of the lines contain value
>    1377000.
> 
> 4. The second event cycles is opened against the counting PMU, which
>    is an independent PMU and is not interrupt driven.  Once enabled it
>    runs in the background and keeps running, incrementing silently
>    about 400+ counters. The counter values are read via assembly
>    instructions.
> 
>    This second counter PMU's read call back function is called when the
>    interrupt handler of the sampling facility processes each sample. The
>    function call sequence is:
> 
>    perf_event_overflow()
>    +--> __perf_event_overflow()
>         +--> __perf_event_output()
>                +--> perf_output_sample()
>                     +--> perf_output_read()
>                          +--> perf_output_read_group()
> 	                          for_each_sibling_event(sub, leader) {
> 		values[n++] = perf_event_count(sub, self);
> 		printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]);
> 			          }
> 
>    The last function perf_event_count() is invoked on the second event
>    cylces *on* the counting PMU. An added printk statement shows the
>    following lines in the dmesg output:
> 
>    # dmesg|grep perf_output_read_group |head -10
>    [  332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1)
>    [  332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2)
>    [  332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3)
>    [  332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4)
>    [  332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5)
>    [  332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b
>    [  332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790
>    [  332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b
>    [  332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888
>    #
> 
>    This correlates with the output of
>    # perf report -D | grep 'id 00000000000000'|head -10
>    ..... id 0000000000000006, value 00000000001502e8, lost 0
>    ..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above
>    ..... id 0000000000000006, value 00000000002a05d0, lost 0
>    ..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above
>    ..... id 0000000000000006, value 00000000003f08b8, lost 0
>    ..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above
>    ..... id 0000000000000006, value 0000000000540ba0, lost 0
>    ..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above
>    ..... id 0000000000000006, value 0000000000690e88, lost 0
>    ..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above
> 
> Summary:
> - Above command starts the CPU sampling facility, with runs interrupt
>   driven when a 4KB page is full. An interrupt processes the 128 samples
>   and calls eventually perf_output_read_group() for each sample to save it
>   in the event's ring buffer.
> 
> - At that time the CPU counting facility is invoked to read the value of
>   the event cycles. This value is saved as the second value in the
>   sample_read structure.
> 
> - The first and odd lines in the perf script output displays the period
>   value between 2 samples being created by hardware. It is the number
>   of instructions executes before the hardware writes a sample.
> 
> - The second and even lines in the perf script output displays the number
>   of CPU cycles needed to process each sample and save it in the event's
>   ring buffer.
> These 2 different values can never be identical on s390.
> 
> Since event leader sampling is not possible on s390 the perf tool will
> return EOPNOTSUPP soon. Perpare the test case for that.
> 
> Suggested-by: James Clark <james.clark@linaro.org>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Tested-by: Jan Polensky <japo@linux.ibm.com>
> Reviewed-by: Jan Polensky <japo@linux.ibm.com>
> ---
>  tools/perf/tests/shell/record.sh | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
> index 0f5841c479e7..46b96d565680 100755
> --- a/tools/perf/tests/shell/record.sh
> +++ b/tools/perf/tests/shell/record.sh
> @@ -260,7 +260,21 @@ test_uid() {
>  
>  test_leader_sampling() {
>    echo "Basic leader sampling test"
> -  if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \
> +  events="{cycles,cycles}:Su"
> +  [ $(uname -m) = "s390x" ] && {
> +    [ ! -d /sys/devices/cpum_sf ] && {
> +      echo "No CPUMF [Skipped record]"
> +      return
> +    }
> +    events="{cpum_sf/SF_CYCLES_BASIC/,cycles}:Su"
> +    perf record -o "${perfdata}" -e "$events" -- perf test -w brstack 2> /dev/null
> +    # Perf grouping might be unsupported, depends on version.
> +    [ "$?" -ne 0 ] && {
> +      echo "Grouping not support [Skipped record]"
> +      return
> +    }
> +  }
> +  if ! perf record -o "${perfdata}" -e "$events" -- \
>      perf test -w brstack 2> /dev/null
>    then
>      echo "Leader sampling [Failed record]"
> -- 
> 2.52.0

  reply	other threads:[~2026-02-06 21:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-28  9:11 [PATCH] perf/test: Fix test case Leader sampling on s390 Thomas Richter
2026-02-06 21:41 ` Arnaldo Carvalho de Melo [this message]
2026-02-07 22:06   ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYZgGlh3e84ZrUNQ@x1 \
    --to=acme@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=james.clark@linaro.org \
    --cc=japo@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=namhyung@kernel.org \
    --cc=sumanthk@linux.ibm.com \
    --cc=tmricht@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox