From: Boris Brezillon <boris.brezillon@collabora.com>
To: Eric Anholt <eric@anholt.net>
Cc: Neil Armstrong <narmstrong@baylibre.com>,
Emil Velikov <emil.l.velikov@gmail.com>,
dri-devel@lists.freedesktop.org,
Steven Price <steven.price@arm.com>,
Rob Herring <robh+dt@kernel.org>,
Mark Janes <mark.a.janes@intel.com>,
kernel@collabora.com, Alyssa Rosenzweig <alyssa@rosenzweig.io>
Subject: Re: [PATCH 0/3] drm/panfrost: Expose HW counters to userspace
Date: Sun, 12 May 2019 15:17:10 +0200 [thread overview]
Message-ID: <20190512151710.4bea379d@collabora.com> (raw)
In-Reply-To: <87zho614l1.fsf@anholt.net>
On Wed, 01 May 2019 10:12:42 -0700
Eric Anholt <eric@anholt.net> wrote:
> Boris Brezillon <boris.brezillon@collabora.com> writes:
>
> > +Rob, Eric, Mark and more
> >
> > Hi,
> >
> > On Fri, 5 Apr 2019 16:20:45 +0100
> > Steven Price <steven.price@arm.com> wrote:
> >
> >> On 04/04/2019 16:20, Boris Brezillon wrote:
> >> > Hello,
> >> >
> >> > This patch adds new ioctls to expose GPU counters to userspace.
> >> > These will be used by the mesa driver (should be posted soon).
> >> >
> >> > A few words about the implementation: I followed the VC4/Etnaviv model
> >> > where perf counters are retrieved on a per-job basis. This allows one
> >> > to have get accurate results when there are users using the GPU
> >> > concurrently.
> >> > AFAICT, the mali kbase is using a different approach where several
> >> > users can register a performance monitor but with no way to have fined
> >> > grained control over what job/GPU-context to track.
> >>
> >> mali_kbase submits overlapping jobs. The jobs on slot 0 and slot 1 can
> >> be from different contexts (address spaces), and mali_kbase also fully
> >> uses the _NEXT registers. So there can be a job from one context
> >> executing on slot 0 and a job from a different context waiting in the
> >> _NEXT registers. (And the same for slot 1). This means that there's no
> >> (visible) gap between the first job finishing and the second job
> >> starting. Early versions of the driver even had a throttle to avoid
> >> interrupt storms (see JOB_IRQ_THROTTLE) which would further delay the
> >> IRQ - but thankfully that's gone.
> >>
> >> The upshot is that it's basically impossible to measure "per-job"
> >> counters when running at full speed. Because multiple jobs are running
> >> and the driver doesn't actually know when one ends and the next starts.
> >>
> >> Since one of the primary use cases is to draw pretty graphs of the
> >> system load [1], this "per-job" information isn't all that relevant (and
> >> minimal performance overhead is important). And if you want to monitor
> >> just one application it is usually easiest to ensure that it is the only
> >> thing running.
> >>
> >> [1]
> >> https://developer.arm.com/tools-and-software/embedded/arm-development-studio/components/streamline-performance-analyzer
> >>
> >> > This design choice comes at a cost: every time the perfmon context
> >> > changes (the perfmon context is the list of currently active
> >> > perfmons), the driver has to add a fence to prevent new jobs from
> >> > corrupting counters that will be dumped by previous jobs.
> >> >
> >> > Let me know if that's an issue and if you think we should approach
> >> > things differently.
> >>
> >> It depends what you expect to do with the counters. Per-job counters are
> >> certainly useful sometimes. But serialising all jobs can mess up the
> >> thing you are trying to measure the performance of.
> >
> > I finally found some time to work on v2 this morning, and it turns out
> > implementing global perf monitors as done in mali_kbase means rewriting
> > almost everything (apart from the perfcnt layout stuff). I'm not against
> > doing that, but I'd like to be sure this is really what we want.
> >
> > Eric, Rob, any opinion on that? Is it acceptable to expose counters
> > through the pipe_query/AMD_perfmon interface if we don't have this
> > job (or at least draw call) granularity? If not, should we keep the
> > solution I'm proposing here to make sure counters values are accurate,
> > or should we expose perf counters through a non-standard API?
>
> You should definitely not count perf results from someone else's context
> against your own!
Also had the feeling that doing that was a bad idea (which you and Rob
confirmed), but I listed it here to clear up my doubts.
> People doing perf analysis will expect slight
> performance changes (like missing bin/render parallelism between
> contexts) when doing perf queries, but they will be absolutely lost if
> their non-texturing job starts showing texturing results from some
> unrelated context.
Given the feedback I had so far, it looks like defining a new interface
for this type of 'global perfmon' is the way to go if we want to make
everyone happy. Also note that the work I've done on VC4 has the same
limitation: a perfmon context between 2 jobs will introduce a
serialization point. It seems that you're not as worried as Steven or
Rob is about extra cost if this serialization, but I wanted to point
that out.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2019-05-12 13:17 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-04 15:20 [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:20 ` [PATCH 1/3] drm/panfrost: Move gpu_{write, read}() macros to panfrost_regs.h Boris Brezillon
2019-04-04 15:20 ` [PATCH 2/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:41 ` Alyssa Rosenzweig
2019-04-04 18:17 ` Boris Brezillon
2019-04-04 22:40 ` Alyssa Rosenzweig
2019-04-05 15:36 ` Eric Anholt
2019-04-05 16:17 ` Alyssa Rosenzweig
2019-04-04 15:20 ` [PATCH 3/3] panfrost/drm: Define T860 perf counters Boris Brezillon
2019-04-05 15:20 ` [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Steven Price
2019-04-05 16:33 ` Alyssa Rosenzweig
2019-04-05 17:40 ` Boris Brezillon
2019-04-05 17:43 ` Alyssa Rosenzweig
2019-04-30 12:42 ` Boris Brezillon
2019-04-30 13:10 ` Rob Clark
2019-04-30 15:49 ` Jordan Crouse
2019-05-12 13:40 ` Boris Brezillon
2019-05-13 15:00 ` Jordan Crouse
2019-05-01 17:12 ` Eric Anholt
2019-05-12 13:17 ` Boris Brezillon [this message]
2019-05-11 22:32 ` Alyssa Rosenzweig
2019-05-12 13:38 ` Boris Brezillon
2019-05-13 12:48 ` Steven Price
2019-05-13 13:39 ` Boris Brezillon
2019-05-13 14:13 ` Steven Price
2019-05-13 14:56 ` Alyssa Rosenzweig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190512151710.4bea379d@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=alyssa@rosenzweig.io \
--cc=dri-devel@lists.freedesktop.org \
--cc=emil.l.velikov@gmail.com \
--cc=eric@anholt.net \
--cc=kernel@collabora.com \
--cc=mark.a.janes@intel.com \
--cc=narmstrong@baylibre.com \
--cc=robh+dt@kernel.org \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).