All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Peres <martin.peres@free.fr>
To: Robert Bragg <robert@sixbynine.org>, intel-gfx@lists.freedesktop.org
Cc: Deepak S <deepak.s@intel.com>,
	Daniel Vetter <daniel.vetter@intel.com>,
	Sourab Gupta <sourab.gupta@intel.com>,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit
Date: Sat, 23 Apr 2016 13:34:57 +0300	[thread overview]
Message-ID: <571B4FD1.60702@free.fr> (raw)
In-Reply-To: <1461162194-1424-6-git-send-email-robert@sixbynine.org>

On 20/04/16 17:23, Robert Bragg wrote:
> Gen graphics hardware can be set up to periodically write snapshots of
> performance counters into a circular buffer via its Observation
> Architecture and this patch exposes that capability to userspace via the
> i915 perf interface.
>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         |  56 +-
>   drivers/gpu/drm/i915/i915_gem_context.c |  24 +-
>   drivers/gpu/drm/i915/i915_perf.c        | 940 +++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_reg.h         | 338 ++++++++++++
>   include/uapi/drm/i915_drm.h             |  70 ++-
>   5 files changed, 1408 insertions(+), 20 deletions(-)
>
> +
> +
> +	/* It takes a fairly long time for a new MUX configuration to
> +	 * be be applied after these register writes. This delay
> +	 * duration was derived empirically based on the render_basic
> +	 * config but hopefully it covers the maximum configuration
> +	 * latency...
> +	 */
> +	mdelay(100);

With such a HW and SW design, how can we ever expose hope to get any
kind of performance when we are trying to monitor different metrics on each
draw call? This may be acceptable for system monitoring, but it is 
problematic
for the GL extensions :s

Since it seems like we are going for a perf API, it means that for every 
change
of metrics, we need to flush the commands, wait for the GPU to be done, then
program the new set of metrics via an IOCTL, wait 100 ms, and then we may
resume rendering ... until the next change. We are talking about a 
latency of
6-7 frames at 60 Hz here... this is non-negligeable...

I understand that we have a ton of counters and we may hide latency by not
allowing using more than half of the counters for every draw call or 
frame, but
even then, this 100ms delay is killing this approach altogether.

To be honest, if it indeed is an HW bug, then the approach that Samuel 
Pitoiset
and I used for Nouveau involving pushing an handle representing a
pre-computed configuration to the command buffer so as a software method
can be ask the kernel to reprogram the counters with as little idle time as
possible, would be useless as waiting for the GPU to be idle would 
usually not
take more than a few ms... which is nothing compared to waiting 100ms.

So, now, the elephant in the room, how can it take that long to apply the
change? Are the OA registers double buffered (NVIDIA's are, so as we can
reconfigure and start monitoring multiple counters at the same time)?

Maybe this 100ms is the polling period and the HW does not allow changing
the configuration in the middle of a polling session. In this case, this 
delay
should be dependent on the polling frequency. But even then, I would really
hope that the HW would allow us to tear down everything, reconfigure and
start polling again without waiting for the next tick. If not possible, 
maybe we
can change the frequency for the polling clock to make the polling event 
happen
sooner.

HW delays are usually a few microseconds, not milliseconds, that really 
suggests
that something funny is happening and the HW design is not understood 
properly.
If the documentation has nothing on this and the HW teams cannot help, 
then I
suggest a little REing session. I really want to see this work land, but 
the way I see
it right now is that we cannot rely on it because of this bug. Maybe 
fixing this bug
would require changing the architecture, so better address it before 
landing the
patches.

Worst case scenario, do not hesitate to contact me if non of the proposed
explanation pans out, I will take the time to read through the OA 
material and try my
REing skills on it. As I said, I really want to see this upstream!

Sorry...

Martin
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-04-23 10:34 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-20 14:23 [PATCH 0/9] Enable Gen 7 Observation Architecture Robert Bragg
2016-04-20 14:23 ` [PATCH 1/9] drm/i915: Add i915 perf infrastructure Robert Bragg
2016-04-20 22:41   ` Chris Wilson
2016-04-20 14:23 ` [PATCH 2/9] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
2016-04-20 14:23 ` [PATCH 3/9] drm/i915: don't whitelist oacontrol in cmd parser Robert Bragg
2016-04-20 14:23 ` [PATCH 4/9] drm/i915: Add 'render basic' Haswell OA unit config Robert Bragg
2016-04-20 14:23 ` [PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit Robert Bragg
2016-04-20 16:16   ` kbuild test robot
2016-04-20 20:30   ` [Intel-gfx] " kbuild test robot
2016-04-20 21:11   ` Chris Wilson
2016-04-21 16:15     ` Robert Bragg
2016-04-23  8:48       ` Chris Wilson
2016-04-20 22:15   ` Chris Wilson
2016-04-20 22:46   ` Chris Wilson
2016-04-22 11:04     ` Robert Bragg
2016-04-22 11:18       ` Chris Wilson
2016-04-20 22:52   ` Chris Wilson
2016-04-21 15:43     ` Robert Bragg
2016-04-21 16:21       ` Chris Wilson
2016-04-20 23:09   ` Chris Wilson
2016-04-21 15:18     ` Robert Bragg
2016-04-22  1:10       ` Robert Bragg
2016-04-20 23:16   ` Chris Wilson
2016-04-21 15:01     ` Robert Bragg
2016-04-23 10:34   ` Martin Peres [this message]
2016-05-03 19:34     ` Robert Bragg
2016-05-03 20:03       ` Robert Bragg
2016-05-04  9:09         ` Martin Peres
2016-05-04  9:49           ` Robert Bragg
2016-05-04 12:24             ` Daniel Vetter
2016-05-04 13:24               ` Robert Bragg
2016-05-04 13:33                 ` Robert Bragg
2016-05-04 13:51                 ` Daniel Vetter
2016-05-04  9:04       ` [Intel-gfx] " Martin Peres
2016-05-04 11:15         ` Robert Bragg
2016-04-20 14:23 ` [PATCH 6/9] drm/i915: advertise available metrics via sysfs Robert Bragg
2016-04-20 14:23 ` [PATCH 7/9] drm/i915: Add dev.i915.perf_event_paranoid sysctl option Robert Bragg
2016-04-20 14:23 ` [PATCH 8/9] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
2016-04-20 14:23 ` [PATCH 9/9] drm/i915: Add more Haswell OA metric sets Robert Bragg
2016-04-20 14:56 ` [PATCH 0/9] Enable Gen 7 Observation Architecture Robert Bragg
2016-04-21  7:46 ` ✓ Fi.CI.BAT: success for Enable Gen 7 Observation Architecture (rev3) Patchwork
2016-04-21 12:41 ` ✗ Fi.CI.BAT: failure " Patchwork
2016-04-23  8:31 ` ✗ Fi.CI.BAT: warning " Patchwork
2016-04-24 17:23 ` ✓ Fi.CI.BAT: success " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571B4FD1.60702@free.fr \
    --to=martin.peres@free.fr \
    --cc=daniel.vetter@intel.com \
    --cc=deepak.s@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=robert@sixbynine.org \
    --cc=sourab.gupta@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.