All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <John.C.Harrison@Intel.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [PATCH v9 4/6] drm/i915: Interrupt driven fences
Date: Tue, 14 Jun 2016 12:35:30 +0100	[thread overview]
Message-ID: <575FEC02.9020306@linux.intel.com> (raw)
In-Reply-To: <b667a9b7-f3a4-f4ff-9da3-a9d6a09e66ac@Intel.com>


On 13/06/16 16:51, John Harrison wrote:

[snip]

>>>> diff --git a/drivers/gpu/drm/i915/i915_dma.c
>>>> b/drivers/gpu/drm/i915/i915_dma.c
>>>> index 07edaed..f8f60bb 100644
>>>> --- a/drivers/gpu/drm/i915/i915_dma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>>>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct
>>>> drm_i915_private *dev_priv)
>>>>        if (dev_priv->wq == NULL)
>>>>            goto out_err;
>>>>
>>>> +    dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>>>> +    if (dev_priv->req_wq == NULL)
>>>> +        goto out_free_wq;
>>>> +
>>> Single (per-device) ordered workqueue will serialize interrupt
>>> processing across all engines to one thread. Together with the fact
>>> request worker does not seem to need the sleeping context, I am
>>> thinking that a tasklet per engine would be much better (see
>>> engine->irq_tasklet for an example).
> Other conversations have stated that tasklets are not the best option. I
> did think about having a work queue per engine but that seemed
> excessive. Plus any subsequent work triggered by the fence completion
> will almost certainly require grabbing the driver mutex lock (because
> everything requires the mutex lock) so serialisation of engines doesn't
> sound like much of an issue.

In this patch AFAICS i915_gem_request_worker calls 
i915_gem_request_notify on the engine which only holds the 
engine->fence_lock. So if you had a per engine wq all engines could do 
that processing in parallel.

It is only a matter of when fence_signal_locked wakes up the waiters 
that they might go and hammer on struct_mutex, but for any work they 
want to do between waking up and submitting new work, serializing via a 
single wq will be bad for latency.

>>>
>>>> +        ret = __i915_spin_request(req, state);
>>>> +        if (ret == 0)
>>>> +            goto out;
>>>>        }
>>>>
>>>> +    /*
>>>> +     * Enable interrupt completion of the request.
>>>> +     */
>>>> +    fence_enable_sw_signaling(&req->fence);
>>>> +
>>>>        for (;;) {
>>>>            struct timer_list timer;
>>>>
>>>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct
>>>> drm_i915_gem_request *req,
>>>>                break;
>>>>            }
>>>>
>>>> +        if (req->seqno) {
>>>> +            /*
>>>> +             * There is quite a lot of latency in the user interrupt
>>>> +             * path. So do an explicit seqno check and potentially
>>>> +             * remove all that delay.
>>>> +             */
>>>> +            if (req->engine->irq_seqno_barrier)
>>>> +                req->engine->irq_seqno_barrier(req->engine);
>>>> +            seqno = engine->get_seqno(engine);
>>>> +            if (i915_seqno_passed(seqno, req->seqno)) {
>>>> +                ret = 0;
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +
>>>>            if (signal_pending_state(state, current)) {
>>>>                ret = -ERESTARTSYS;
>>>>                break;
>>>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct
>>>> drm_i915_gem_request *req,
>>>>                destroy_timer_on_stack(&timer);
>>>>            }
>>>>        }
>>>> -    if (!irq_test_in_progress)
>>>> -        engine->irq_put(engine);
>>>>
>>>>        finish_wait(&engine->irq_queue, &wait);
>>> Hm I don't understand why our custom waiting remains? Shouldn't
>>> fence_wait just be called after the optimistic spin, more or less?
> That would solve the 'thundering problem' if we could. In theory, the
> entire wait function should just be a call to 'fence_wait(&req->fence)'.
> Unfortunately, the wait function goes to sleep holding the mutex lock
> and requires having a bail out option on the wait which is not currently
> part of the fence API. I have a work-in-progress patch that almost
> solves the issues but it isn't quite there yet (and I haven't had much
> chance to work on it for a while).

[Btw does it also mean i915 can not handle the incoming 3rd party fences?]

So the userspace waiters can either wait on a dma-buf or i915 wq, 
depending on what ioctl they have called. Then on interrupt i915 waiters 
are woken directly, while the fence api ones go via a worker.

i915 waiters also do a second wake up the fence API waiters by...

>>>
>>>>    out:
>>>>        trace_i915_gem_request_wait_end(req);
>>>>
>>>> +    if ((ret == 0) && (req->seqno)) {
>>>> +        if (req->engine->irq_seqno_barrier)
>>>> +            req->engine->irq_seqno_barrier(req->engine);
>>>> +        seqno = engine->get_seqno(engine);
>>>> +        if (i915_seqno_passed(seqno, req->seqno) &&
>>>> +            !i915_gem_request_completed(req)) {
>>>> +            /*
>>>> +             * Make sure the request is marked as completed before
>>>> +             * returning. NB: Need to acquire the spinlock around
>>>> +             * the whole call to avoid a race condition with the
>>>> +             * interrupt handler is running concurrently and could
>>>> +             * cause this invocation to early exit even though the
>>>> +             * request has not actually been fully processed yet.
>>>> +             */
>>>> +            spin_lock_irq(&req->engine->fence_lock);
>>>> +            i915_gem_request_notify(req->engine, true);

... this call here.

If there are both classes of waiters one of those calls will be a waste 
of cycles (spin lock, irq toggle, list iter, coherent seqno read) correct?

>>>> +            spin_unlock_irq(&req->engine->fence_lock);
>>>> +        }
>>>> +    }
>>>> +
>>>>        if (timeout) {
>>>>            s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>>>
>>>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct
>>>> drm_i915_gem_request *request)
>>>>    {
>>>>        trace_i915_gem_request_retire(request);
>>>>
>>>> +    if (request->irq_enabled) {
>>>> +        request->engine->irq_put(request->engine);
>>>> +        request->irq_enabled = false;
>>> What protects request->irq_enabled? Here versus enable_signalling
>>> bit? It can be called from the external fence users which would take
>>> the fence_lock, but here it does not.
> The flag can only be set when enabling interrupt driven completion
> (which can only happen once and only if the fence is not already
> signalled). The flag can only be cleared when the fence is signalled or
> when the request is retired. And retire without signal can only happen
> if the request is being cancelled in some way (e.g. GPU reset) and thus
> will not ever be signalled. So if we get here then none of the other
> paths are possible anymore.

Couldn't retire without signal happen via execbuf paths calling 
i915_gem_retire_requests_ring?

>>>> +        i915_gem_request_enable_interrupt(req, true);
>>>> +
>>>> +    return true;
>>>> +}
>>>> +
>>>> +/**
>>>> + * i915_gem_request_worker - request work handler callback.
>>>> + * @work: Work structure
>>>> + * Called in response to a seqno interrupt to process the completed
>>>> requests.
>>>> + */
>>>> +void i915_gem_request_worker(struct work_struct *work)
>>>> +{
>>>> +    struct intel_engine_cs *engine;
>>>> +
>>>> +    engine = container_of(work, struct intel_engine_cs, request_work);
>>>> +    i915_gem_request_notify(engine, false);
>>>> +}
>>>> +
>>>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool
>>>> fence_locked)
>>>> +{
>>>> +    struct drm_i915_gem_request *req, *req_next;
>>>> +    unsigned long flags;
>>>>        u32 seqno;
>>>>
>>>> -    seqno = req->engine->get_seqno(req->engine);
>>>> +    if (list_empty(&engine->fence_signal_list))
>>> Okay this without the lock still makes me nervous. I'd rather not
>>> having to think about why it is safe and can't miss a wakeup.
> I don't see how list_empty() can return a false negative. Even if the
> implementation was such that it could see a partially updated state
> across multiple memory accesses, that will just lead to it thinking
> not-empty which is fine. Any update which takes it from empty to
> not-empty is guaranteed to occur before the act of enabling interrupts
> and thus before notify() can be called. So while it could potentially do
> the full processing when an early exit was fine, it can never early exit
> when it needs to do something.

Something like that sounds like a good comment to put above then! :)

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-06-14 11:35 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
2016-06-01 17:07 ` [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects John.C.Harrison
2016-06-02 10:28   ` Tvrtko Ursulin
2016-06-09 16:08     ` John Harrison
2016-06-07 11:17   ` Maarten Lankhorst
2016-06-09 17:22     ` John Harrison
2016-06-01 17:07 ` [PATCH v9 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
2016-06-02 11:07   ` Tvrtko Ursulin
2016-06-07 11:42     ` Maarten Lankhorst
2016-06-07 12:11       ` Tvrtko Ursulin
2016-06-10 11:26       ` John Harrison
2016-06-13 10:16         ` Maarten Lankhorst
2016-06-01 17:07 ` [PATCH v9 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
2016-06-07 12:07   ` Maarten Lankhorst
2016-06-01 17:07 ` [PATCH v9 4/6] drm/i915: Interrupt driven fences John.C.Harrison
2016-06-02 13:25   ` Tvrtko Ursulin
2016-06-07 12:02     ` Maarten Lankhorst
2016-06-07 12:19       ` Tvrtko Ursulin
2016-06-13 15:51       ` John Harrison
2016-06-14 11:35         ` Tvrtko Ursulin [this message]
2016-06-01 17:07 ` [PATCH v9 5/6] drm/i915: Updated request structure tracing John.C.Harrison
2016-06-07 12:15   ` Maarten Lankhorst
2016-06-01 17:07 ` [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2016-06-07 12:47   ` Maarten Lankhorst
2016-06-16 12:10     ` John Harrison
2016-06-02 11:17 ` ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev6) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575FEC02.9020306@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=John.C.Harrison@Intel.com \
    --cc=maarten.lankhorst@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.