public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
Date: Thu, 19 Nov 2020 16:17:38 +0000	[thread overview]
Message-ID: <e6cc158c-2db1-88d1-e8bf-f0bf3173f282@linux.intel.com> (raw)
In-Reply-To: <160579572865.3416.10178835579091430788@build.alporthouse.com>


On 19/11/2020 14:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-11-19 14:06:00)
>>
>> On 18/11/2020 12:10, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-11-18 11:38:43)
>>>>
>>>> On 18/11/2020 11:24, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2020-11-18 11:05:24)
>>>>>>
>>>>>> On 17/11/2020 11:30, Chris Wilson wrote:
>>>>>>> Since preempt-to-busy, we may unsubmit a request while it is still on
>>>>>>> the HW and completes asynchronously. That means it may be retired and in
>>>>>>> the process destroy the virtual engine (as the user has closed their
>>>>>>> context), but that engine may still be holding onto the unsubmitted
>>>>>>> compelted request. Therefore we need to potentially cleanup the old
>>>>>>> request on destroying the virtual engine. We also have to keep the
>>>>>>> virtual_engine alive until after the sibling's execlists_dequeue() have
>>>>>>> finished peeking into the virtual engines, for which we serialise with
>>>>>>> RCU.
>>>>>>>
>>>>>>> v2: Be paranoid and flush the tasklet as well.
>>>>>>> v3: And flush the tasklet before the engines, as the tasklet may
>>>>>>> re-attach an rb_node after our removal from the siblings.
>>>>>>>
>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
>>>>>>>      1 file changed, 54 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> index 17cb7060eb29..c11433884cf6 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> @@ -182,6 +182,7 @@
>>>>>>>      struct virtual_engine {
>>>>>>>          struct intel_engine_cs base;
>>>>>>>          struct intel_context context;
>>>>>>> +     struct rcu_work rcu;
>>>>>>>      
>>>>>>>          /*
>>>>>>>           * We allow only a single request through the virtual engine at a time
>>>>>>> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
>>>>>>>          return &ve->base.execlists.default_priolist.requests[0];
>>>>>>>      }
>>>>>>>      
>>>>>>> -static void virtual_context_destroy(struct kref *kref)
>>>>>>> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
>>>>>>>      {
>>>>>>>          struct virtual_engine *ve =
>>>>>>> -             container_of(kref, typeof(*ve), context.ref);
>>>>>>> +             container_of(wrk, typeof(*ve), rcu.work);
>>>>>>>          unsigned int n;
>>>>>>>      
>>>>>>> -     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>>>>>>> -     GEM_BUG_ON(ve->request);
>>>>>>>          GEM_BUG_ON(ve->context.inflight);
>>>>>>>      
>>>>>>> +     /* Preempt-to-busy may leave a stale request behind. */
>>>>>>> +     if (unlikely(ve->request)) {
>>>>>>> +             struct i915_request *old;
>>>>>>> +
>>>>>>> +             spin_lock_irq(&ve->base.active.lock);
>>>>>>> +
>>>>>>> +             old = fetch_and_zero(&ve->request);
>>>>>>> +             if (old) {
>>>>>>> +                     GEM_BUG_ON(!i915_request_completed(old));
>>>>>>> +                     __i915_request_submit(old);
>>>>>>> +                     i915_request_put(old);
>>>>>>> +             }
>>>>>>> +
>>>>>>> +             spin_unlock_irq(&ve->base.active.lock);
>>>>>>> +     }
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Flush the tasklet in case it is still running on another core.
>>>>>>> +      *
>>>>>>> +      * This needs to be done before we remove ourselves from the siblings'
>>>>>>> +      * rbtrees as in the case it is running in parallel, it may reinsert
>>>>>>> +      * the rb_node into a sibling.
>>>>>>> +      */
>>>>>>> +     tasklet_kill(&ve->base.execlists.tasklet);
>>>>>>
>>>>>> Can it still be running after an RCU period?
>>>>>
>>>>> I think there is a window between checking to see if the request is
>>>>> completed and kicking the tasklet, that is not under the rcu lock and
>>>>> opportunity for the request to be retired, and barrier flushed to drop
>>>>> the context references.
>>>>
>>>>    From where would that check come?
>>>
>>> The window of opportunity extends all the way from the
>>> i915_request_completed check during unsubmit right until the virtual
>>> engine tasklet is executed -- we do not hold a reference to the virtual
>>> engine for the tasklet, and that request may be retired in the
>>> background, and along with it the virtual engine destroyed.
>>
>> In this case aren't sibling tasklets also a problem?
> 
> The next stanza decouples the siblings. At this point, we know that the
> request must have been completed (to retire and drop the context
> reference) so at this point nothing should be allowed to kick the
> virtual engine tasklet, it's just the outstanding execution we need to
> serialise. So the following assertion that nothing did kick the tasklet
> as we decoupled the siblings holds. After that assertion, there should
> be nothing else that knows about the virtual tasklet.

Let me see in step by step because I am slow today.

1. Tasklet runs, decides to preempt the VE away.
2. VE completes despite that.
3. Userspace closes the context.
4. RCU period.
5. All tasklets which were active during 1 have exited.
    + VE decoupled and
6. VE unlinked from siblings and destroyed.

Re-submit VE tasklet may start as soon after 1 and any time after. If it 
starts after 4, then RCU period from context close does not see it. Okay 
makes sense.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-11-19 16:17 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Small tweak to put the termination conditions together Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 03/28] drm/i915/gem: Drop free_work for GEM contexts Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows Chris Wilson
2020-11-17 11:42   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time Chris Wilson
2020-11-17 12:44   ` Tvrtko Ursulin
2020-11-17 13:05     ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Include semaphore status in print_request() Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show() Chris Wilson
2020-11-17 12:51   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging Chris Wilson
2020-11-17 12:59   ` Tvrtko Ursulin
2020-11-17 13:25     ` Chris Wilson
2020-11-18 15:51       ` Tvrtko Ursulin
2020-11-19 10:47         ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
2020-11-17 13:00   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug Chris Wilson
2020-11-17 13:06   ` Tvrtko Ursulin
2020-11-17 13:30     ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 11/28] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 12/28] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 13/28] drm/i915/gt: Don't cancel the interrupt shadow too early Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine Chris Wilson
2020-11-18 11:05   ` Tvrtko Ursulin
2020-11-18 11:24     ` Chris Wilson
2020-11-18 11:38       ` Tvrtko Ursulin
2020-11-18 12:10         ` Chris Wilson
2020-11-19 14:06           ` Tvrtko Ursulin
2020-11-19 14:22             ` Chris Wilson
2020-11-19 16:17               ` Tvrtko Ursulin [this message]
2020-11-17 11:30 ` [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
2020-11-18 11:36   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
2020-11-18 11:35   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 17/28] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 18/28] drm/i915/gt: Decouple completed requests on unwind Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 19/28] drm/i915/gt: Check for a completed last request once Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 20/28] drm/i915/gt: Replace direct submit with direct call to tasklet Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: ce->inflight updates are now serialised Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 22/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 23/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 24/28] drm/i915/gt: Defer schedule_out until after the next dequeue Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Remove virtual breadcrumb before transfer Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 26/28] drm/i915/gt: Shrink the critical section for irq signaling Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 28/28] drm/i915/gt: Simplify virtual engine handling for execlists_hold() Chris Wilson
2020-11-17 18:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks Patchwork
2020-11-17 18:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-11-17 19:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-11-17 22:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6cc158c-2db1-88d1-e8bf-f0bf3173f282@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox