From: Eric Anholt <eric@anholt.net>
To: Boris Brezillon <boris.brezillon@collabora.com>,
Steven Price <steven.price@arm.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>,
Emil Velikov <emil.l.velikov@gmail.com>,
dri-devel@lists.freedesktop.org, Rob Herring <robh+dt@kernel.org>,
Mark Janes <mark.a.janes@intel.com>,
kernel@collabora.com, Alyssa Rosenzweig <alyssa@rosenzweig.io>
Subject: Re: [PATCH 0/3] drm/panfrost: Expose HW counters to userspace
Date: Wed, 01 May 2019 10:12:42 -0700 [thread overview]
Message-ID: <87zho614l1.fsf@anholt.net> (raw)
In-Reply-To: <20190430144238.49963521@collabora.com>
[-- Attachment #1.1: Type: text/plain, Size: 3785 bytes --]
Boris Brezillon <boris.brezillon@collabora.com> writes:
> +Rob, Eric, Mark and more
>
> Hi,
>
> On Fri, 5 Apr 2019 16:20:45 +0100
> Steven Price <steven.price@arm.com> wrote:
>
>> On 04/04/2019 16:20, Boris Brezillon wrote:
>> > Hello,
>> >
>> > This patch adds new ioctls to expose GPU counters to userspace.
>> > These will be used by the mesa driver (should be posted soon).
>> >
>> > A few words about the implementation: I followed the VC4/Etnaviv model
>> > where perf counters are retrieved on a per-job basis. This allows one
>> > to have get accurate results when there are users using the GPU
>> > concurrently.
>> > AFAICT, the mali kbase is using a different approach where several
>> > users can register a performance monitor but with no way to have fined
>> > grained control over what job/GPU-context to track.
>>
>> mali_kbase submits overlapping jobs. The jobs on slot 0 and slot 1 can
>> be from different contexts (address spaces), and mali_kbase also fully
>> uses the _NEXT registers. So there can be a job from one context
>> executing on slot 0 and a job from a different context waiting in the
>> _NEXT registers. (And the same for slot 1). This means that there's no
>> (visible) gap between the first job finishing and the second job
>> starting. Early versions of the driver even had a throttle to avoid
>> interrupt storms (see JOB_IRQ_THROTTLE) which would further delay the
>> IRQ - but thankfully that's gone.
>>
>> The upshot is that it's basically impossible to measure "per-job"
>> counters when running at full speed. Because multiple jobs are running
>> and the driver doesn't actually know when one ends and the next starts.
>>
>> Since one of the primary use cases is to draw pretty graphs of the
>> system load [1], this "per-job" information isn't all that relevant (and
>> minimal performance overhead is important). And if you want to monitor
>> just one application it is usually easiest to ensure that it is the only
>> thing running.
>>
>> [1]
>> https://developer.arm.com/tools-and-software/embedded/arm-development-studio/components/streamline-performance-analyzer
>>
>> > This design choice comes at a cost: every time the perfmon context
>> > changes (the perfmon context is the list of currently active
>> > perfmons), the driver has to add a fence to prevent new jobs from
>> > corrupting counters that will be dumped by previous jobs.
>> >
>> > Let me know if that's an issue and if you think we should approach
>> > things differently.
>>
>> It depends what you expect to do with the counters. Per-job counters are
>> certainly useful sometimes. But serialising all jobs can mess up the
>> thing you are trying to measure the performance of.
>
> I finally found some time to work on v2 this morning, and it turns out
> implementing global perf monitors as done in mali_kbase means rewriting
> almost everything (apart from the perfcnt layout stuff). I'm not against
> doing that, but I'd like to be sure this is really what we want.
>
> Eric, Rob, any opinion on that? Is it acceptable to expose counters
> through the pipe_query/AMD_perfmon interface if we don't have this
> job (or at least draw call) granularity? If not, should we keep the
> solution I'm proposing here to make sure counters values are accurate,
> or should we expose perf counters through a non-standard API?
You should definitely not count perf results from someone else's context
against your own! People doing perf analysis will expect slight
performance changes (like missing bin/render parallelism between
contexts) when doing perf queries, but they will be absolutely lost if
their non-texturing job starts showing texturing results from some
unrelated context.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2019-05-01 17:12 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-04 15:20 [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:20 ` [PATCH 1/3] drm/panfrost: Move gpu_{write, read}() macros to panfrost_regs.h Boris Brezillon
2019-04-04 15:20 ` [PATCH 2/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:41 ` Alyssa Rosenzweig
2019-04-04 18:17 ` Boris Brezillon
2019-04-04 22:40 ` Alyssa Rosenzweig
2019-04-05 15:36 ` Eric Anholt
2019-04-05 16:17 ` Alyssa Rosenzweig
2019-04-04 15:20 ` [PATCH 3/3] panfrost/drm: Define T860 perf counters Boris Brezillon
2019-04-05 15:20 ` [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Steven Price
2019-04-05 16:33 ` Alyssa Rosenzweig
2019-04-05 17:40 ` Boris Brezillon
2019-04-05 17:43 ` Alyssa Rosenzweig
2019-04-30 12:42 ` Boris Brezillon
2019-04-30 13:10 ` Rob Clark
2019-04-30 15:49 ` Jordan Crouse
2019-05-12 13:40 ` Boris Brezillon
2019-05-13 15:00 ` Jordan Crouse
2019-05-01 17:12 ` Eric Anholt [this message]
2019-05-12 13:17 ` Boris Brezillon
2019-05-11 22:32 ` Alyssa Rosenzweig
2019-05-12 13:38 ` Boris Brezillon
2019-05-13 12:48 ` Steven Price
2019-05-13 13:39 ` Boris Brezillon
2019-05-13 14:13 ` Steven Price
2019-05-13 14:56 ` Alyssa Rosenzweig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zho614l1.fsf@anholt.net \
--to=eric@anholt.net \
--cc=alyssa@rosenzweig.io \
--cc=boris.brezillon@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=emil.l.velikov@gmail.com \
--cc=kernel@collabora.com \
--cc=mark.a.janes@intel.com \
--cc=narmstrong@baylibre.com \
--cc=robh+dt@kernel.org \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.