linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Robert Bragg <robert@sixbynine.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Matt Fleming <matt.fleming@intel.com>,
	dri-devel@lists.freedesktop.org, David Airlie <airlied@linux.ie>,
	intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Sourab Gupta <sourab.gupta@intel.com>,
	linux-api@vger.kernel.org, Zheng Yan <zheng.z.yan@intel.com>,
	Daniel Vetter <daniel.vetter@intel.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
Date: Fri, 16 Oct 2015 11:43:06 +0200	[thread overview]
Message-ID: <20151016094306.GA3604@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <1443537549-6905-1-git-send-email-robert@sixbynine.org>

On Tue, Sep 29, 2015 at 03:39:03PM +0100, Robert Bragg wrote:
> - We're bridging two complex architectures
> 
>     To review this work I think it will be relevant to have a good
>     general familiarity with Gen graphics (e.g. thinking about the OA
>     unit's interaction with the command streamer and execlist
>     scheduling) as well as our userspace architecture and how we're
>     consuming OA data within Mesa to implement the
>     INTEL_performance_query extension.
>
>     On the flip side here, its necessary to understand the perf
>     userspace interface (for most this is hidden by tools so the details
>     aren't common knowledge) as well as the internal design, considering
>     that the PMU we're looking at seems to break several current design
>     assumptions. I can only claim a limited familiarity with perf's
>     design, just as a result of this work.

Right; but a little effort and patience on both sides should get us
there I think. At worst we'll both learn something new ;-)

> - The current OA PMU driver breaks some significant design assumptions.
> 
>     Existing perf pmus are used for profiling work on a cpu and we're
>     introducing the idea of _IS_DEVICE pmus with different security
>     implications, the need to fake cpu-related data (such as user/kernel
>     registers) to fit with perf's current design, and adding _DEVICE
>     records as a way to forward device-specific status records.

There are more devices with counters on than GPUs, so I think it might
make sense to look at extending perf to better deal with this.

>     The OA unit writes reports of counters into a circular buffer,
>     without involvement from the CPU, making our PMU driver the first of
>     a kind.

Agreed, this is somewhat 'odd' from where we are today.

>     Perf supports groups of counters and allows those to be read via
>     transactions internally but transactions currently seem designed to
>     be explicitly initiated from the cpu (say in response to a userspace
>     read()) and while we could pull a report out of the OA buffer we
>     can't trigger a report from the cpu on demand.
>
>     Related to being report based; the OA counters are configured in HW
>     as a set while perf generally expects counter configurations to be
>     orthogonal. Although counters can be associated with a group leader
>     as they are opened, there's no clear precedent for being able to
>     provide group-wide configuration attributes and no obvious solution
>     as yet that's expected to be acceptable to upstream and meets our
>     userspace needs.

I'm not entirely sure what you mean with group-wide configuration
attributes; could you elaborate?

>     We currently avoid using perf's grouping feature
>     and forward OA reports to userspace via perf's 'raw' sample field.
>     This suits our userspace well considering how coupled the counters
>     are when dealing with normalizing. It would be inconvenient to split
>     counters up into separate events, only to require userspace to
>     recombine them. 

So IF you were using a group, a single read from the leader can return
you a vector of all values (PERF_FORMAT_GROUP), this avoids having to
do that recombine.

Another option would be to view the arrival of an OA vector in the
datastream as an 'event' and generate a PERF_RECORD_READ in the perf
buffer (which again can use the GROUP vector format).

>     Related to counter orthogonality; we can't time share the OA unit,
>     while event scheduling is a central design idea within perf for
>     allowing userspace to open + enable more events than can be
>     configured in HW at any one time.

So we have other PMUs that cannot do this; Gen OA would not be unique in
this. Intel PT for example only allows a single active event.

That said; earlier today I saw:

  https://www.youtube.com/watch?v=9J3BQcAeHpI&list=PLe6I3NKr-I4J2oLGXhGOeBMEjh8h10jT3&index=7

where exactly this feature was mentioned as not fitting well into the
existing GPU performance interfaces (GL_AMD_performance_monitor /
GL_INTEL_performance_query).

So there is hardware (Nvidia) out there that does support this. Also
mentioned was that this hardware has global and local counters, where
the local ones are specific to a rendering context. That is not unlike
the per-cpu / per-task stuff perf does.

>     The OA unit is not designed to
>     allow re-configuration while in use. We can't reconfigure the OA
>     unit without loosing internal OA unit state which we can't access
>     explicitly to save and restore. Reconfiguring the OA unit is also
>     relatively slow, involving ~100 register writes. From userspace Mesa
>     also depends on a stable OA configuration when emitting
>     MI_REPORT_PERF_COUNT commands and importantly the OA unit can't be
>     disabled while there are outstanding MI_RPC commands lest we hang
>     the command streamer.

Right; see the PERF_PMU_CAP_EXCLUSIVE stuff.

> - We may be making some technical compromises a.t.m for the sake of
>   using perf.
> 
>     perf_event_open() requires events to either relate to a pid or a
>     specific cpu core, while our device pmu relates to neither.  Events
>     opened with a pid will be automatically enabled/disabled according
>     to the scheduling of that process - so not appropriate for us.

Right; the traditional cpu/pid mapping doesn't work well for devices;
but maybe, with some work, we can create something like that
global/local render context from it; although I've no clue what form
that would need at this time.

>     When
>     an event is related to a cpu id, perf ensures pmu methods will be
>     invoked via an inter process interrupt on that core. To avoid
>     invasive changes our userspace opens OA perf events for a specific
>     cpu. 

Some of that might still make sense in the sense that GPUs are subject
to the NUMA topology of machines. I would think you would want most
such things to be done on the node the device is attached to.

Granted, this might not be a concern for Intel graphics, but it might be
relevant for some of the discrete GPUs.

> - I'm not confident our use case benefits much from building on perf:
> 
>     We aren't using existing perf based tooling with our PMU. Existing
>     tools typically assume you're profiling work running on a cpu, e.g.
>     expecting samples to be associated with instruction pointers and
>     user/kernel registers and aiming to represent metrics in relation
>     to application source code. We're forwarding fake register values
>     and userspace needs needs to know how to decode the raw OA reports
>     before anything can be reported to a user.
>     
>     With the buffering done by the OA unit I don't think we currently
>     benefit from perf's mmapped circular buffer interface. We already
>     have a decoupled producer and consumer and since we have to copy out
>     of the OA buffer, it would work well for us to hide that copy in
>     a simpler read() based interface.
> 
> 
> - Logistically it might be more practical to contain this to the
>   graphics stack.
> 
>     It seems fair to consider that if we can't see a very compelling
>     benefit to building on perf, then containing this work to
>     drivers/gpu/drm/i915 may simplify the review process as well as
>     future maintenance and development.

> Peter; I wonder if you would tend to agree too that it could make sense
> for us to go with our own interface here?

Sorry this took so long; this wanted a well considered response and
those tend to get delayed in light of 'urgent' stuff.

While I can certainly see the pain points and why you would rather not
deal with them. I think it would make Linux a better place if we could
manage to come up with a generic interface that would work for 'all'
GPUs (and possibly more devices).
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2015-10-16  9:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
2015-09-29 14:39 ` [RFC 1/6] drm/i915: Add i915 perf infrastructure Robert Bragg
2015-09-29 14:39 ` [RFC 2/6] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
2015-09-29 14:39 ` [RFC 3/6] drm/i915: Add static '3D' Haswell OA unit config Robert Bragg
2015-09-29 14:39 ` [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
2015-09-29 14:55   ` [Intel-gfx] " kbuild test robot
2015-09-29 15:18     ` Peter Zijlstra
2015-09-29 23:19       ` [kbuild-all] " Fengguang Wu
     [not found] ` <1443537549-6905-1-git-send-email-robert-St23OQVBDYPNLxjTenLetw@public.gmane.org>
2015-09-29 14:39   ` [RFC 5/6] drm/i915: Add dev.i915.perf_event_paranoid sysctl option Robert Bragg
2015-09-30  8:30   ` [RFC 0/6] Non perf based Gen Graphics OA unit driver Chris Wilson
2015-09-30 13:36     ` Robert Bragg
2015-09-29 14:39 ` [RFC 6/6] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
2015-09-30  3:23 ` [RFC 0/6] Non perf based Gen Graphics OA unit driver Zhenyu Wang
2015-10-16  9:43 ` Peter Zijlstra [this message]
2015-10-16 10:02   ` Ingo Molnar
2015-10-16 10:33     ` Peter Zijlstra
     [not found]       ` <20151016103345.GS3816-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2015-10-16 12:08         ` Robert Bragg
2015-10-20 20:16   ` Robert Bragg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151016094306.GA3604@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=airlied@linux.ie \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=matt.fleming@intel.com \
    --cc=mingo@kernel.org \
    --cc=robert@sixbynine.org \
    --cc=sourab.gupta@intel.com \
    --cc=zheng.z.yan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).