From: Andi Kleen <ak@linux.intel.com>
To: Ben Gainey <ben.gainey@arm.com>
Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, peterz@infradead.org,
mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
namhyung@kernel.org, irogers@google.com,
adrian.hunter@intel.com, will@kernel.org
Subject: Re: [RFC PATCH 0/2] A mechanism for efficient support for per-function metrics
Date: Wed, 14 Feb 2024 01:55:23 -0800 [thread overview]
Message-ID: <87r0hfwet0.fsf@linux.intel.com> (raw)
In-Reply-To: <20240123113420.1928154-1-ben.gainey@arm.com> (Ben Gainey's message of "Tue, 23 Jan 2024 11:34:18 +0000")
Ben Gainey <ben.gainey@arm.com> writes:
> I've been working on an approach to supporting per-function metrics for
> aarch64 cores, which requires some changes to the arm_pmuv3 driver, and
> I'm wondering if this approach would make sense as a generic feature
> that could be used to enable the same on other architectures?
>
> The basic idea is as follows:
>
> * Periodically sample one or more counters as needed for the chosen
> set of metrics.
> * Record a sample count for each symbol so as to identify hot
> functions.
> * Accumulate counter totals for each of the counters in each of the
> metrics *but* only do this where the previous sample's symbol
> matches the current sample's symbol.
It sounds very similar to what perf script -F +metric already does
(or did if it wasn't broken currently). It would be a straight forward
extension here to add this "same as previous" check.
Of course the feature is somewhat dubious in that it will have a very
strong systematic bias against short functions and even long functions
in some alternating execution patterns. I assume you did some
experiments to characterize this. It would be important
to emphasize this in any documentation.
> For this to work efficiently, it is useful to provide a means to
> decouple the sample window (time over which events are counted) from
> the sample period (time between interesting samples). This patcheset
> modifies the Arm PMU driver to support alternating between two
> sample_period values, providing a simple and inexpensive way for tools
> to separate out the sample period and the sample window. It is expected
> to be used with the cycle counter event, alternating between a long and
> short period and subsequently discarding the counter data for samples
> with the long period. The combined long and short period gives the
> overall sampling period, and the short sample period gives the sample
> window. The symbol taken from the sample at the end of the long period
> can be used by tools to ensure correct attribution as described
> previously. The cycle counter is recommended as it provides fair
> temporal distribution of samples as would be required for the
> per-symbol sample count mentioned previously, and because the PMU can
> be programmed to overflow after a sufficiently short window; this may
> not be possible with software timer (for example). This patch does not
> restrict to only the cycle counter, it is possible there could be other
> novel uses based on different events.
I don't see anything ARM specific with the technique, so if it's done
it should be done generically IMHO
> Cursory testing on a Xeon(R) W-2145 sampling every 300 cycles (without
> the patch) suggests this approach would work for some counters.
> Calculating branch miss rates for example appears to be correct,
> likewise UOPS_EXECUTED.THREAD seems to give something like a sensible
> cycles-per-uop value. On the other hand the fixed function instructions
> counter does not appear to sample correctly (it seems to report either
> very small or very large numbers). No idea whats going on there, so any
> insight welcome...
If you use precise samples with 3p there is a restriction on the periods
that is enforced by the kernel. Non precise or single/double p should
support arbitrary, except that any p is always period + 1.
One drawback of the technique on x86 is that it won't allow multi record
pebs (collecting samples without interrupts), so the overhead might
be intrinsically higher.
-Andi
WARNING: multiple messages have this Message-ID (diff)
From: Andi Kleen <ak@linux.intel.com>
To: Ben Gainey <ben.gainey@arm.com>
Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, peterz@infradead.org,
mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
namhyung@kernel.org, irogers@google.com,
adrian.hunter@intel.com, will@kernel.org
Subject: Re: [RFC PATCH 0/2] A mechanism for efficient support for per-function metrics
Date: Wed, 14 Feb 2024 01:55:23 -0800 [thread overview]
Message-ID: <87r0hfwet0.fsf@linux.intel.com> (raw)
In-Reply-To: <20240123113420.1928154-1-ben.gainey@arm.com> (Ben Gainey's message of "Tue, 23 Jan 2024 11:34:18 +0000")
Ben Gainey <ben.gainey@arm.com> writes:
> I've been working on an approach to supporting per-function metrics for
> aarch64 cores, which requires some changes to the arm_pmuv3 driver, and
> I'm wondering if this approach would make sense as a generic feature
> that could be used to enable the same on other architectures?
>
> The basic idea is as follows:
>
> * Periodically sample one or more counters as needed for the chosen
> set of metrics.
> * Record a sample count for each symbol so as to identify hot
> functions.
> * Accumulate counter totals for each of the counters in each of the
> metrics *but* only do this where the previous sample's symbol
> matches the current sample's symbol.
It sounds very similar to what perf script -F +metric already does
(or did if it wasn't broken currently). It would be a straight forward
extension here to add this "same as previous" check.
Of course the feature is somewhat dubious in that it will have a very
strong systematic bias against short functions and even long functions
in some alternating execution patterns. I assume you did some
experiments to characterize this. It would be important
to emphasize this in any documentation.
> For this to work efficiently, it is useful to provide a means to
> decouple the sample window (time over which events are counted) from
> the sample period (time between interesting samples). This patcheset
> modifies the Arm PMU driver to support alternating between two
> sample_period values, providing a simple and inexpensive way for tools
> to separate out the sample period and the sample window. It is expected
> to be used with the cycle counter event, alternating between a long and
> short period and subsequently discarding the counter data for samples
> with the long period. The combined long and short period gives the
> overall sampling period, and the short sample period gives the sample
> window. The symbol taken from the sample at the end of the long period
> can be used by tools to ensure correct attribution as described
> previously. The cycle counter is recommended as it provides fair
> temporal distribution of samples as would be required for the
> per-symbol sample count mentioned previously, and because the PMU can
> be programmed to overflow after a sufficiently short window; this may
> not be possible with software timer (for example). This patch does not
> restrict to only the cycle counter, it is possible there could be other
> novel uses based on different events.
I don't see anything ARM specific with the technique, so if it's done
it should be done generically IMHO
> Cursory testing on a Xeon(R) W-2145 sampling every 300 cycles (without
> the patch) suggests this approach would work for some counters.
> Calculating branch miss rates for example appears to be correct,
> likewise UOPS_EXECUTED.THREAD seems to give something like a sensible
> cycles-per-uop value. On the other hand the fixed function instructions
> counter does not appear to sample correctly (it seems to report either
> very small or very large numbers). No idea whats going on there, so any
> insight welcome...
If you use precise samples with 3p there is a restriction on the periods
that is enforced by the kernel. Non precise or single/double p should
support arbitrary, except that any p is always period + 1.
One drawback of the technique on x86 is that it won't allow multi record
pebs (collecting samples without interrupts), so the overhead might
be intrinsically higher.
-Andi
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-02-14 9:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-23 11:34 [RFC PATCH 0/2] A mechanism for efficient support for per-function metrics Ben Gainey
2024-01-23 11:34 ` Ben Gainey
2024-01-23 11:34 ` [RFC PATCH 1/2] arm_pmu: Allow the PMU to alternate between two sample_period values Ben Gainey
2024-01-23 11:34 ` Ben Gainey
2024-01-23 11:34 ` [RFC PATCH 2/2] arm_pmuv3: Add config bits for sample period strobing Ben Gainey
2024-01-23 11:34 ` Ben Gainey
2024-02-14 9:40 ` [RFC PATCH 0/2] perf: A mechanism for efficient support for per-function metrics Ben Gainey
2024-02-14 9:40 ` Ben Gainey
2024-02-14 9:55 ` Andi Kleen [this message]
2024-02-14 9:55 ` [RFC PATCH 0/2] " Andi Kleen
2024-02-14 19:13 ` Ben Gainey
2024-02-14 19:13 ` Ben Gainey
2024-02-15 7:08 ` Andi Kleen
2024-02-15 7:08 ` Andi Kleen
2024-03-10 13:00 ` Ben Gainey
2024-03-10 13:00 ` Ben Gainey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r0hfwet0.fsf@linux.intel.com \
--to=ak@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=ben.gainey@arm.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.