All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Intel-gfx@lists.freedesktop.org,
	Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: Re: [RFC] drm/i915/bdw+: Do not emit user interrupts when not needed
Date: Fri, 18 Dec 2015 13:51:38 +0000	[thread overview]
Message-ID: <56740F6A.2080907@linux.intel.com> (raw)
In-Reply-To: <20151218122836.GH26780@nuc-i3427.alporthouse.com>


On 18/12/15 12:28, Chris Wilson wrote:
> On Fri, Dec 18, 2015 at 11:59:41AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We can rely on context complete interrupt to wake up the waiters
>> apart in the case where requests are merged into a single ELSP
>> submission. In this case we inject MI_USER_INTERRUPTS in the
>> ring buffer to ensure prompt wake-ups.
>>
>> This optimization has the effect on for example GLBenchmark
>> Egypt off-screen test of decreasing the number of generated
>> interrupts per second by a factor of two, and context switched
>> by factor of five to six.
>
> I half like it. Are the interupts a limiting factor in this case though?
> This should be ~100 waits/second with ~1000 batches/second, right? What
> is the delay between request completion and client wakeup - difficult to
> measure after you remove the user interrupt though! But I estimate it
> should be on the order of just a few GPU cycles.

Neither of the two benchmarks I ran (trex onscreen and egypt offscreen) 
show any framerate improvements.

The only thing I did manage to measure is that CPU energy usage goes 
down with the optimisation. Roughly 8-10%, courtesy of RAPL script 
someone posted here.

Benchmarking is generally very hard so it is a pity we don't have a farm 
similar to CI which does it all in a repeatable and solid manner.

>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 27f06198a51e..d9be878dbde7 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -359,6 +359,13 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>>   	spin_unlock(&dev_priv->uncore.lock);
>>   }
>>
>> +static void execlists_emit_user_interrupt(struct drm_i915_gem_request *req)
>> +{
>> +	struct intel_ringbuffer *ringbuf = req->ringbuf;
>> +
>> +	iowrite32(MI_USER_INTERRUPT, ringbuf->virtual_start + req->tail - 8);
>> +}
>> +
>>   static int execlists_update_context(struct drm_i915_gem_request *rq)
>>   {
>>   	struct intel_engine_cs *ring = rq->ring;
>> @@ -433,6 +440,12 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>>   			cursor->elsp_submitted = req0->elsp_submitted;
>>   			list_move_tail(&req0->execlist_link,
>>   				       &ring->execlist_retired_req_list);
>> +			/*
>> +			 * When merging requests make sure there is still
>> +			 * something after each batch buffer to wake up waiters.
>> +			 */
>> +			if (cursor != req0)
>> +				execlists_emit_user_interrupt(req0);
>
> You may have already missed this instruction as you patch it, and keep
> doing so as long as the context is resubmitted. I think to be safe, you
> need to patch cursor as well. You could then MI_NOOP out the MI_INTERUPT
> on the terminal request.

I don't at the moment see it could miss it? We don't do preemption, but 
granted I don't understand this code fully.

But patching it out definitely looks safer. And I even don't have to 
unbreak GuC in that case. So I'll try that approach.

> An interesting igt experiement I think would be:
>
> thread A, keep queuing batches with just a single MI_STORE_DWORD_IMM *addr
> thread B, waits on batch from A, reads *addr (asynchronously), measures
> latency (actual value - expected(batch))
>
> Run for 10s, report min/max/median latency.
>
> Repeat for more threads/contexts and more waiters. Ah, that may be the
> demonstration for the thundering herd I've been looking for!

Hm I'll think about it.

Wrt your second reply, that is an interesting question.

All I can tell that empirically it looks interrupts do arrive split, 
otherwise there would be no reduction in interrupt numbers. But why are 
they split I don't know.

I'll try adding some counters to get a feel how often does that happen 
in various scenarios.

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-12-18 13:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-18 11:59 [RFC] drm/i915/bdw+: Do not emit user interrupts when not needed Tvrtko Ursulin
2015-12-18 12:28 ` Chris Wilson
2015-12-18 13:51   ` Tvrtko Ursulin [this message]
2015-12-18 14:29     ` Chris Wilson
2015-12-19  2:11       ` Chris Wilson
2015-12-18 12:30 ` ✗ failure: Fi.CI.BAT Patchwork
2015-12-18 12:31 ` [RFC] drm/i915/bdw+: Do not emit user interrupts when not needed Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56740F6A.2080907@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.