Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
Date: Wed, 8 Jun 2016 12:47:28 +0100	[thread overview]
Message-ID: <575805D0.1070504@linux.intel.com> (raw)
In-Reply-To: <20160608112442.GX32344@nuc-i3427.alporthouse.com>

On 08/06/16 12:24, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
>>
>> On 08/06/16 10:48, Chris Wilson wrote:
>>> On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
>>>>> +static int intel_breadcrumbs_signaler(void *arg)
>>>>> +{
>>>>> +	struct intel_engine_cs *engine = arg;
>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>> +	struct signal *signal;
>>>>> +
>>>>> +	/* Install ourselves with high priority to reduce signalling latency */
>>>>> +	signaler_set_rtpriority();
>>>>> +
>>>>> +	do {
>>>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>>>> +
>>>>> +		/* We are either woken up by the interrupt bottom-half,
>>>>> +		 * or by a client adding a new signaller. In both cases,
>>>>> +		 * the GPU seqno may have advanced beyond our oldest signal.
>>>>> +		 * If it has, propagate the signal, remove the waiter and
>>>>> +		 * check again with the next oldest signal. Otherwise we
>>>>> +		 * need to wait for a new interrupt from the GPU or for
>>>>> +		 * a new client.
>>>>> +		 */
>>>>> +		signal = READ_ONCE(b->first_signal);
>>>>> +		if (signal_complete(signal)) {
>>>>> +			/* Wake up all other completed waiters and select the
>>>>> +			 * next bottom-half for the next user interrupt.
>>>>> +			 */
>>>>> +			intel_engine_remove_wait(engine, &signal->wait);
>>>>> +
>>>>> +			i915_gem_request_unreference(signal->request);
>>>>> +
>>>>> +			/* Find the next oldest signal. Note that as we have
>>>>> +			 * not been holding the lock, another client may
>>>>> +			 * have installed an even older signal than the one
>>>>> +			 * we just completed - so double check we are still
>>>>> +			 * the oldest before picking the next one.
>>>>> +			 */
>>>>> +			spin_lock(&b->lock);
>>>>> +			if (signal == b->first_signal)
>>>>> +				b->first_signal = rb_next(&signal->node);
>>>>> +			rb_erase(&signal->node, &b->signals);
>>>>> +			spin_unlock(&b->lock);
>>>>> +
>>>>> +			kfree(signal);
>>>>> +		} else {
>>>>> +			if (kthread_should_stop())
>>>>> +				break;
>>>>> +
>>>>> +			schedule();
>>>>> +		}
>>>>> +	} while (1);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>
>>>> So the thread is only because it is convenient to plug it in the
>>>> breadcrumbs infrastructure. Otherwise the processing above could be
>>>> done from a lighter weight context as well since nothing seems to
>>>> need the process context.
>>>
>>> No, seqno processing requires process/sleepable context. The delays we
>>> incur can be >100us and not suitable for irq/softirq context.
>>
>> Nothing in this patch needs it - please say in the commit why it is
>> choosing the process context then.
>
> Bottom half processing requires it. irq_seqno_barrier is not suitable
> for irq/softirq context.

Why? Because of a single clflush? How long does that take?

>> And why so long delays? It looks pretty lightweight to me.
>>
>>>> One alternative could perhaps be to add a waiter->wake_up vfunc and
>>>> signalers could then potentially use a tasklet?
>>>
>>> Hmm, I did find that in order to reduce execlists latency, I had to
>>> drive the tasklet processing from the signaler.
>>
>> What do you mean? The existing execlists tasklet? Now would that work?
>
> Due to how dma-fence signals, the softirq is never kicked
> (spin_lock_irq doesn't handle local_bh_enable()) and so we would only
> submit a new task via execlists on a reschedule. That latency added
> about 30% (30s on bsw) to gem_exec_parallel.

I don't follow. User interrupts are separate from context complete which 
drives the submission. How do fences interfere with the latter?

>>>>> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>>>>> +{
>>>>> +	struct intel_engine_cs *engine = request->engine;
>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>> +	struct rb_node *parent, **p;
>>>>> +	struct signal *signal;
>>>>> +	bool first, wakeup;
>>>>> +
>>>>> +	if (unlikely(IS_ERR(b->signaler)))
>>>>> +		return PTR_ERR(b->signaler);
>>>>
>>>> I don't see that there is a fallback is kthread creation failed. It
>>>> should just fail in intel_engine_init_breadcrumbs if that happens.
>>>
>>> Because it is not fatal to using the GPU, just one optional function.
>>
>> But we never expect it to fail and it is not even dependent on
>> anything user controllable. Just a random error which would cause
>> user experience to degrade. If thread creation failed it means
>> system is in such a poor shape I would just fail the driver init.
>
> A minimally functional system is better than nothing at all.
> GEM is not required for driver loading, interrupt driven dma-fences less
> so.

If you are so hot for that, how about vfuncing enable signaling in that 
case? Because I find the "have we created our kthread at driver init 
time successfuly" question for every fence a bit too much.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx