intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
Date: Wed, 8 Jun 2016 13:44:27 +0100	[thread overview]
Message-ID: <5758132B.4050409@linux.intel.com> (raw)
In-Reply-To: <20160608123446.GA32344@nuc-i3427.alporthouse.com>


On 08/06/16 13:34, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 12:47:28PM +0100, Tvrtko Ursulin wrote:
>>
>> On 08/06/16 12:24, Chris Wilson wrote:
>>> On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 08/06/16 10:48, Chris Wilson wrote:
>>>>> On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
>>>>>>> +static int intel_breadcrumbs_signaler(void *arg)
>>>>>>> +{
>>>>>>> +	struct intel_engine_cs *engine = arg;
>>>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>>>> +	struct signal *signal;
>>>>>>> +
>>>>>>> +	/* Install ourselves with high priority to reduce signalling latency */
>>>>>>> +	signaler_set_rtpriority();
>>>>>>> +
>>>>>>> +	do {
>>>>>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>>>>>> +
>>>>>>> +		/* We are either woken up by the interrupt bottom-half,
>>>>>>> +		 * or by a client adding a new signaller. In both cases,
>>>>>>> +		 * the GPU seqno may have advanced beyond our oldest signal.
>>>>>>> +		 * If it has, propagate the signal, remove the waiter and
>>>>>>> +		 * check again with the next oldest signal. Otherwise we
>>>>>>> +		 * need to wait for a new interrupt from the GPU or for
>>>>>>> +		 * a new client.
>>>>>>> +		 */
>>>>>>> +		signal = READ_ONCE(b->first_signal);
>>>>>>> +		if (signal_complete(signal)) {
>>>>>>> +			/* Wake up all other completed waiters and select the
>>>>>>> +			 * next bottom-half for the next user interrupt.
>>>>>>> +			 */
>>>>>>> +			intel_engine_remove_wait(engine, &signal->wait);
>>>>>>> +
>>>>>>> +			i915_gem_request_unreference(signal->request);
>>>>>>> +
>>>>>>> +			/* Find the next oldest signal. Note that as we have
>>>>>>> +			 * not been holding the lock, another client may
>>>>>>> +			 * have installed an even older signal than the one
>>>>>>> +			 * we just completed - so double check we are still
>>>>>>> +			 * the oldest before picking the next one.
>>>>>>> +			 */
>>>>>>> +			spin_lock(&b->lock);
>>>>>>> +			if (signal == b->first_signal)
>>>>>>> +				b->first_signal = rb_next(&signal->node);
>>>>>>> +			rb_erase(&signal->node, &b->signals);
>>>>>>> +			spin_unlock(&b->lock);
>>>>>>> +
>>>>>>> +			kfree(signal);
>>>>>>> +		} else {
>>>>>>> +			if (kthread_should_stop())
>>>>>>> +				break;
>>>>>>> +
>>>>>>> +			schedule();
>>>>>>> +		}
>>>>>>> +	} while (1);
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>
>>>>>> So the thread is only because it is convenient to plug it in the
>>>>>> breadcrumbs infrastructure. Otherwise the processing above could be
>>>>>> done from a lighter weight context as well since nothing seems to
>>>>>> need the process context.
>>>>>
>>>>> No, seqno processing requires process/sleepable context. The delays we
>>>>> incur can be >100us and not suitable for irq/softirq context.
>>>>
>>>> Nothing in this patch needs it - please say in the commit why it is
>>>> choosing the process context then.
>>>
>>> Bottom half processing requires it. irq_seqno_barrier is not suitable
>>> for irq/softirq context.
>>
>> Why? Because of a single clflush? How long does that take?
>
> Because both Ironlake and Baytrail require definite delays on the order of
> 100us. Haswell, Broadwell, Skylake all need an extra delay that we don't
> yet have.

Okay, please mention in the commit so the choice is documented.

>>>> And why so long delays? It looks pretty lightweight to me.
>>>>
>>>>>> One alternative could perhaps be to add a waiter->wake_up vfunc and
>>>>>> signalers could then potentially use a tasklet?
>>>>>
>>>>> Hmm, I did find that in order to reduce execlists latency, I had to
>>>>> drive the tasklet processing from the signaler.
>>>>
>>>> What do you mean? The existing execlists tasklet? Now would that work?
>>>
>>> Due to how dma-fence signals, the softirq is never kicked
>>> (spin_lock_irq doesn't handle local_bh_enable()) and so we would only
>>> submit a new task via execlists on a reschedule. That latency added
>>> about 30% (30s on bsw) to gem_exec_parallel.
>>
>> I don't follow. User interrupts are separate from context complete
>> which drives the submission. How do fences interfere with the
>> latter?
>
> The biggest user benchmark (ala sysmark) regression we have for
> execlists is the latency in submitting the first request to hardware via
> elsp (or at least the hw responding to and executing that batch,
> the per-bb and per-ctx w/a are not free either). If we incur extra
> latency in the driver in even adding the request to the queue for an
> idle GPU, that is easily felt by userspace.

I still don't see how fences tie into that. But it is not so important 
since it was all along the lines of "do we really need a thread".

>>>>>>> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>>>>>>> +{
>>>>>>> +	struct intel_engine_cs *engine = request->engine;
>>>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>>>> +	struct rb_node *parent, **p;
>>>>>>> +	struct signal *signal;
>>>>>>> +	bool first, wakeup;
>>>>>>> +
>>>>>>> +	if (unlikely(IS_ERR(b->signaler)))
>>>>>>> +		return PTR_ERR(b->signaler);
>>>>>>
>>>>>> I don't see that there is a fallback is kthread creation failed. It
>>>>>> should just fail in intel_engine_init_breadcrumbs if that happens.
>>>>>
>>>>> Because it is not fatal to using the GPU, just one optional function.
>>>>
>>>> But we never expect it to fail and it is not even dependent on
>>>> anything user controllable. Just a random error which would cause
>>>> user experience to degrade. If thread creation failed it means
>>>> system is in such a poor shape I would just fail the driver init.
>>>
>>> A minimally functional system is better than nothing at all.
>>> GEM is not required for driver loading, interrupt driven dma-fences less
>>> so.
>>
>> If you are so hot for that, how about vfuncing enable signaling in
>> that case? Because I find the "have we created our kthread at driver
>> init time successfuly" question for every fence a bit too much.
>
> read + conditional that pulls in the cacheline we want? You can place
> the test after the spinlock if you want to avoid the cost I supose.
> Or we just mark the GPU as wedged.

What I meant was to pass in different fence_ops at fence_init time 
depending on whether or not signaler thread was created or not. If 
driver is wanted to be functional in that case, and 
fence->enable_signaling needs to keep returning errors, it sound like a 
much more elegant solution than to repeating the check at every 
fence->enable_signaling call.

Regards,

Tvrtko





_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-06-08 12:44 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 16:08 Breadcrumbs, again Chris Wilson
2016-06-03 16:08 ` [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-06-03 16:08 ` [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-06-08  8:42   ` Daniel Vetter
2016-06-08  9:13     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-06-06 12:52   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 04/21] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-06-03 16:08 ` [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
2016-06-06 13:00   ` Tvrtko Ursulin
2016-06-07 12:11     ` Arun Siluvery
2016-06-03 16:08 ` [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-06-06 13:58   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 07/21] drm/i915: Spin after waking up for an interrupt Chris Wilson
2016-06-06 14:39   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-06-06 14:55   ` Tvrtko Ursulin
2016-06-08  9:24     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
2016-06-06 15:03   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 10/21] drm/i915: Allocate scratch page from stolen Chris Wilson
2016-06-06 15:05   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
2016-06-06 15:09   ` Tvrtko Ursulin
2016-06-08  9:27     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
2016-06-03 16:08 ` [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
2016-06-06 15:10   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-06-06 15:34   ` Tvrtko Ursulin
2016-06-08  9:35     ` Chris Wilson
2016-06-08  9:57       ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-06-08  8:54   ` Daniel Vetter
2016-06-03 16:08 ` [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2016-06-06 13:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-06-07 12:04   ` Tvrtko Ursulin
2016-06-08  9:48     ` Chris Wilson
2016-06-08 10:16       ` Tvrtko Ursulin
2016-06-08 11:24         ` Chris Wilson
2016-06-08 11:47           ` Tvrtko Ursulin
2016-06-08 12:34             ` Chris Wilson
2016-06-08 12:44               ` Tvrtko Ursulin [this message]
2016-06-08 13:47                 ` Chris Wilson
2016-06-03 16:08 ` [PATCH 18/21] drm/i915: Embed signaling node into the GEM request Chris Wilson
2016-06-07 12:31   ` Tvrtko Ursulin
2016-06-08  9:54     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-06-07 12:46   ` Tvrtko Ursulin
2016-06-08 10:01     ` Chris Wilson
2016-06-08 10:18       ` Tvrtko Ursulin
2016-06-08 11:10         ` Chris Wilson
2016-06-08 11:49           ` Tvrtko Ursulin
2016-06-08 12:54             ` Chris Wilson
2016-06-03 16:08 ` [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
2016-06-07 12:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
2016-06-07 12:51   ` Tvrtko Ursulin
2016-06-03 16:35 ` ✗ Ro.CI.BAT: failure for series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5758132B.4050409@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).