public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Dave Gordon <david.s.gordon@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/i915/guc: default to using GuC submission where possible
Date: Tue, 26 Apr 2016 14:36:21 +0100	[thread overview]
Message-ID: <571F6ED5.60306@intel.com> (raw)
In-Reply-To: <20160426103533.GG27856@nuc-i3427.alporthouse.com>

On 26/04/16 11:35, Chris Wilson wrote:
> On Tue, Apr 26, 2016 at 10:52:41AM +0100, Dave Gordon wrote:
>> On 26/04/16 09:49, Dave Gordon wrote:
>>> On 25/04/16 11:39, Chris Wilson wrote:
>>>> On Mon, Apr 25, 2016 at 11:07:13AM +0100, Dave Gordon wrote:
>>>>> On 22/04/16 19:45, Chris Wilson wrote:
>>
>> [snip]
>>
>>>>>> And what exactly is that atomic64_cmpxchg() serialising with? There are
>>>>>> no other CPUs contending with the write, and neither does the GuC
>>>>>> (and I
>>>>>> doubt it is taking any notice of the lock cmpxchg). Using cmpxchg where
>>>>>> a single WRITE_ONCE() of a 32bit value wins the perf prize for hotest
>>>>>> instruction and function in the kernel.
>>>>>
>>>>> The doorbell controller hardware, I should think. The BSpec
>>>>> describes using LOCK_CMPXCHG8B to update doorbells, so I think this
>>>>> code is just based on what it says there. If the CPU hardware
>>>>> doesn't implement it efficiently, surely the GPU h/w designers
>>>>> wouldn't have mandated it in this way?
>>>>
>>>> Wow, I'm surprised that they would put into the same domain. Still,
>>>> unless you are actually serialising with another writer, what is the
>>>> point of using lock cmpxchg? E.g. an xchg would be enough to enforce
>>>> ordering, and you should ask them again if this is not a little overkill
>>>> for one-way signaling.
>>>> -Chris
>>
>> As for performance, while LOCK_CMPXCHG8B might be an expensive
>> instruction, we're only executing ONE per request. I suspect that
>> the cumulative cost of all the extra memory accesses caused by extra
>> indirections and poor structure layout cost far more than any single
>> instruction ever can.
>>
>> Top things in this area might be:
>>
>> * all the macros taking "dev" instead of "dev_priv"
>> * pointer dances in general (a->b->c.d->e) where we could add a
>> shortcut pointer from a to c (or c.d), or from a or b to e.
>> * way too much repetition of a->b->c, a->b->d, a->b->e in the same
>> function, which the compiler *may* optimise, but probably won't if
>> there are any function calls around. Adding a local for a->b will
>> almost certainly help, or at least incur no penalty and be easier to
>> read.
>> * awkwardly sized or misaligned structure members, and bitfield
>> bools rather than 1-byte flags
>>
>> So let's nibble away at these before we worry about the cost of a
>> single x86 instruction!
>
> You can either assume that I applied the patches I sent to the ml for
> the above points, or just look at the delta between execlists and guc
> and worry about the regressions.
> -Chris

With the latest version of this patchset (not yet posted), I see GuC 
submission just slightly *faster* than execlists on the render ring :)
However it's still slower on the others, where the minimum execlist time 
is less.

Just running noops, execlist submission on render shows a downward trend 
as more noop batches are submitted, flattening to a limit of just over 
3us/batch. GuC mode shows the same trend, and bottoms out at just UNDER 
3us/batch. The crossover seems to be 512-1024 batches/cycle, where the 
reduced interrupt overhead outweighs the added latency.

It probably helps that we're no longer mapping & unmapping the doorbell 
page on each request!

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-04-26 13:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-22 18:22 [PATCH 1/2] drm/i915/guc: add enable_guc_loading parameter Dave Gordon
2016-04-22 18:22 ` [PATCH 2/2] drm/i915/guc: default to using GuC submission where possible Dave Gordon
2016-04-22 18:45   ` Chris Wilson
2016-04-22 18:51     ` Chris Wilson
2016-04-25  7:31       ` Dave Gordon
2016-04-25  8:29         ` Chris Wilson
2016-04-26 14:00           ` Daniel Vetter
2016-04-27 17:53             ` Dave Gordon
2016-04-25 10:07     ` Dave Gordon
2016-04-25 10:39       ` Chris Wilson
2016-04-26  8:49         ` Dave Gordon
2016-04-26  9:52           ` Dave Gordon
2016-04-26 10:35             ` Chris Wilson
2016-04-26 13:36               ` Dave Gordon [this message]
2016-04-24 10:23 ` ✗ Fi.CI.BAT: warning for series starting with [1/2] drm/i915/guc: add enable_guc_loading parameter Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571F6ED5.60306@intel.com \
    --to=david.s.gordon@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox