All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: Neil Armstrong <narmstrong@baylibre.com>,
	Emil Velikov <emil.l.velikov@gmail.com>,
	dri-devel@lists.freedesktop.org,
	Steven Price <steven.price@arm.com>,
	Rob Herring <robh+dt@kernel.org>,
	Mark Janes <mark.a.janes@intel.com>,
	kernel@collabora.com
Subject: Re: [PATCH 0/3] drm/panfrost: Expose HW counters to userspace
Date: Sun, 12 May 2019 15:38:03 +0200	[thread overview]
Message-ID: <20190512153803.471ef410@collabora.com> (raw)
In-Reply-To: <20190511223220.GA15155@rosenzweig.io>

On Sat, 11 May 2019 15:32:20 -0700
Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:

> Hi all,
> 
> As Steven Price explained, the "GPU top" kbase approach is often more
> useful and accurate than per-draw timing. 
> 
> For a 3D game inside a GPU-accelerated desktop, the games' counters
> *should* include desktop overhead. This external overhead does affect
> the game's performance, especially if the contexts are competing for
> resources like memory bandwidth. An isolated sample is easy to achieve
> running only the app of interest; in ideal conditions (zero-copy
> fullscreen), desktop interference is negligible. 
> 
> For driver developers, the system-wide measurements are preferable,
> painting a complete system picture and avoiding disruptions. There is no
> risk of confusion, as the driver developers understand how the counters
> are exposed. Further, benchmarks rendering direct to a GBM surface are
> available (glmark2-es2-drm), eliminating interference even with poor
> desktop performance.
> 
> For app developers, the confusion of multi-context interference is
> unfortunate. Nevertheless, if enabling counters were to slow down an
> app, the confusion could be worse. Consider second-order changes in the
> app's performance characteristics due to slowdown: if techniques like
> dynamic resolution scaling are employed, the counters' results can be
> invalid.  Likewise, even if the lower-performance counters are
> appropriate for purely graphical workloads, complex apps with variable
> CPU overhead (e.g. from an FPS-dependent physics engine) can further
> confound counters. Low-overhead system-wide measurements mitigate these
> concerns.

I'd just like to point out that dumping counters the way
mali_kbase/gator does likely has an impact on perfs (at least on some
GPUs) because of the clean+invalidate-cache that happens before (or
after, I don't remember) each dump. IIUC and this cache is actually
global and not a per address space thing (which would be possible if the
cache lines contain a tag attaching them to a specific address space),
that means all jobs running when the clean+invalidate happens will have
extra cache misses after each dump. Of course, that's not as invasive as
the full serialization that happens with my solution, but it's not free
either.

> 
> As Rob Clark suggested, system-wide counters could be exposed via a
> semi-standardized interface, perhaps within debugfs/sysfs. The interface
> could not be completely standard, as the list of counters exposed varies
> substantially by vendor and model. Nevertheless, the mechanics of
> discovering, enabling, reading, and disabling counters can be
> standardized, as can a small set of universally meaningful counters like
> total GPU utilization. This would permit a vendor-independent GPU top
> app as suggested, as is I believe currently possible with
> vendor-specific downstream kernels (e.g. via Gator/Streamline for Mali)
> 
> It looks like this discussion is dormant. Could we try to get this
> sorted? For Panfrost, I'm hitting GPU-side bottlenecks that I'm unable
> to diagnose without access to the counters, so I'm eager for a mainline
> solution to be implemented.

I spent a bit of time thinking about it and looking at different
solutions.

debugfs/sysfs might not be the best solution, especially if we think
about the multi-user case (several instances of GPU perfmon tool
running in parallel), if we want it to work properly we need a way to
instantiate several perf monitors and let the driver add values to all
active perfmons everytime a dump happens (no matter who triggered the
dump). That's exactly what mali_kbase/gator does BTW. That's achievable
through debugs if we consider exposing a knob to instantiate such
perfmon instances, but that also means risking perfmon leaks if the
user does not take care of killing the perfmon it created when it's done
with it (or when it crashes). It might also prove hard to expose that to
non-root users in a secure way.

I also had a quick look at the perf_event interface to see if we could
extend it to support monitoring GPU events. I might be wrong as I
didn't spend much time investigating how it works, but it seems that
perf counters are saved/dumped/restored at each thread context switch,
which is not what we want here (might add extra perfcnt dump points
thus impacting GPU perfs more than we expect).

So maybe the best option is a pseudo-generic ioctl-based interface to
expose those perf counters.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2019-05-12 13:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-04 15:20 [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:20 ` [PATCH 1/3] drm/panfrost: Move gpu_{write, read}() macros to panfrost_regs.h Boris Brezillon
2019-04-04 15:20 ` [PATCH 2/3] drm/panfrost: Expose HW counters to userspace Boris Brezillon
2019-04-04 15:41   ` Alyssa Rosenzweig
2019-04-04 18:17     ` Boris Brezillon
2019-04-04 22:40       ` Alyssa Rosenzweig
2019-04-05 15:36     ` Eric Anholt
2019-04-05 16:17       ` Alyssa Rosenzweig
2019-04-04 15:20 ` [PATCH 3/3] panfrost/drm: Define T860 perf counters Boris Brezillon
2019-04-05 15:20 ` [PATCH 0/3] drm/panfrost: Expose HW counters to userspace Steven Price
2019-04-05 16:33   ` Alyssa Rosenzweig
2019-04-05 17:40     ` Boris Brezillon
2019-04-05 17:43       ` Alyssa Rosenzweig
2019-04-30 12:42   ` Boris Brezillon
2019-04-30 13:10     ` Rob Clark
2019-04-30 15:49       ` Jordan Crouse
2019-05-12 13:40         ` Boris Brezillon
2019-05-13 15:00           ` Jordan Crouse
2019-05-01 17:12     ` Eric Anholt
2019-05-12 13:17       ` Boris Brezillon
2019-05-11 22:32     ` Alyssa Rosenzweig
2019-05-12 13:38       ` Boris Brezillon [this message]
2019-05-13 12:48         ` Steven Price
2019-05-13 13:39           ` Boris Brezillon
2019-05-13 14:13             ` Steven Price
2019-05-13 14:56             ` Alyssa Rosenzweig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190512153803.471ef410@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa@rosenzweig.io \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=emil.l.velikov@gmail.com \
    --cc=kernel@collabora.com \
    --cc=mark.a.janes@intel.com \
    --cc=narmstrong@baylibre.com \
    --cc=robh+dt@kernel.org \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.