From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
Date: Wed, 18 Nov 2020 11:05:24 +0000 [thread overview]
Message-ID: <86cb67d0-ac39-7f83-8ff8-eed3ef7c5943@linux.intel.com> (raw)
In-Reply-To: <20201117113103.21480-14-chris@chris-wilson.co.uk>
On 17/11/2020 11:30, Chris Wilson wrote:
> Since preempt-to-busy, we may unsubmit a request while it is still on
> the HW and completes asynchronously. That means it may be retired and in
> the process destroy the virtual engine (as the user has closed their
> context), but that engine may still be holding onto the unsubmitted
> compelted request. Therefore we need to potentially cleanup the old
> request on destroying the virtual engine. We also have to keep the
> virtual_engine alive until after the sibling's execlists_dequeue() have
> finished peeking into the virtual engines, for which we serialise with
> RCU.
>
> v2: Be paranoid and flush the tasklet as well.
> v3: And flush the tasklet before the engines, as the tasklet may
> re-attach an rb_node after our removal from the siblings.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
> 1 file changed, 54 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 17cb7060eb29..c11433884cf6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -182,6 +182,7 @@
> struct virtual_engine {
> struct intel_engine_cs base;
> struct intel_context context;
> + struct rcu_work rcu;
>
> /*
> * We allow only a single request through the virtual engine at a time
> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
> return &ve->base.execlists.default_priolist.requests[0];
> }
>
> -static void virtual_context_destroy(struct kref *kref)
> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
> {
> struct virtual_engine *ve =
> - container_of(kref, typeof(*ve), context.ref);
> + container_of(wrk, typeof(*ve), rcu.work);
> unsigned int n;
>
> - GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> - GEM_BUG_ON(ve->request);
> GEM_BUG_ON(ve->context.inflight);
>
> + /* Preempt-to-busy may leave a stale request behind. */
> + if (unlikely(ve->request)) {
> + struct i915_request *old;
> +
> + spin_lock_irq(&ve->base.active.lock);
> +
> + old = fetch_and_zero(&ve->request);
> + if (old) {
> + GEM_BUG_ON(!i915_request_completed(old));
> + __i915_request_submit(old);
> + i915_request_put(old);
> + }
> +
> + spin_unlock_irq(&ve->base.active.lock);
> + }
> +
> + /*
> + * Flush the tasklet in case it is still running on another core.
> + *
> + * This needs to be done before we remove ourselves from the siblings'
> + * rbtrees as in the case it is running in parallel, it may reinsert
> + * the rb_node into a sibling.
> + */
> + tasklet_kill(&ve->base.execlists.tasklet);
Can it still be running after an RCU period?
> +
> + /* Decouple ourselves from the siblings, no more access allowed. */
> for (n = 0; n < ve->num_siblings; n++) {
> struct intel_engine_cs *sibling = ve->siblings[n];
> struct rb_node *node = &ve->nodes[sibling->id].rb;
> - unsigned long flags;
>
> if (RB_EMPTY_NODE(node))
> continue;
>
> - spin_lock_irqsave(&sibling->active.lock, flags);
> + spin_lock_irq(&sibling->active.lock);
>
> /* Detachment is lazily performed in the execlists tasklet */
> if (!RB_EMPTY_NODE(node))
> rb_erase_cached(node, &sibling->execlists.virtual);
>
> - spin_unlock_irqrestore(&sibling->active.lock, flags);
> + spin_unlock_irq(&sibling->active.lock);
> }
> GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> + GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>
> if (ve->context.state)
> __execlists_context_fini(&ve->context);
> intel_context_fini(&ve->context);
>
> intel_engine_free_request_pool(&ve->base);
> + intel_breadcrumbs_free(ve->base.breadcrumbs);
This looks to belong to some other patch.
Regards,
Tvrtko
>
> kfree(ve->bonds);
> kfree(ve);
> }
>
> +static void virtual_context_destroy(struct kref *kref)
> +{
> + struct virtual_engine *ve =
> + container_of(kref, typeof(*ve), context.ref);
> +
> + GEM_BUG_ON(!list_empty(&ve->context.signals));
> +
> + /*
> + * When destroying the virtual engine, we have to be aware that
> + * it may still be in use from an hardirq/softirq context causing
> + * the resubmission of a completed request (background completion
> + * due to preempt-to-busy). Before we can free the engine, we need
> + * to flush the submission code and tasklets that are still potentially
> + * accessing the engine. Flushing the tasklets require process context,
> + * and since we can guard the resubmit onto the engine with an RCU read
> + * lock, we can delegate the free of the engine to an RCU worker.
> + */
> + INIT_RCU_WORK(&ve->rcu, rcu_virtual_context_destroy);
> + queue_rcu_work(system_wq, &ve->rcu);
> +}
> +
> static void virtual_engine_initial_hint(struct virtual_engine *ve)
> {
> int swp;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2020-11-18 11:05 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Small tweak to put the termination conditions together Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 03/28] drm/i915/gem: Drop free_work for GEM contexts Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows Chris Wilson
2020-11-17 11:42 ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time Chris Wilson
2020-11-17 12:44 ` Tvrtko Ursulin
2020-11-17 13:05 ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Include semaphore status in print_request() Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show() Chris Wilson
2020-11-17 12:51 ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging Chris Wilson
2020-11-17 12:59 ` Tvrtko Ursulin
2020-11-17 13:25 ` Chris Wilson
2020-11-18 15:51 ` Tvrtko Ursulin
2020-11-19 10:47 ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
2020-11-17 13:00 ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug Chris Wilson
2020-11-17 13:06 ` Tvrtko Ursulin
2020-11-17 13:30 ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 11/28] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 12/28] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 13/28] drm/i915/gt: Don't cancel the interrupt shadow too early Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine Chris Wilson
2020-11-18 11:05 ` Tvrtko Ursulin [this message]
2020-11-18 11:24 ` Chris Wilson
2020-11-18 11:38 ` Tvrtko Ursulin
2020-11-18 12:10 ` Chris Wilson
2020-11-19 14:06 ` Tvrtko Ursulin
2020-11-19 14:22 ` Chris Wilson
2020-11-19 16:17 ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
2020-11-18 11:36 ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
2020-11-18 11:35 ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 17/28] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 18/28] drm/i915/gt: Decouple completed requests on unwind Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 19/28] drm/i915/gt: Check for a completed last request once Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 20/28] drm/i915/gt: Replace direct submit with direct call to tasklet Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: ce->inflight updates are now serialised Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 22/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 23/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 24/28] drm/i915/gt: Defer schedule_out until after the next dequeue Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Remove virtual breadcrumb before transfer Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 26/28] drm/i915/gt: Shrink the critical section for irq signaling Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 28/28] drm/i915/gt: Simplify virtual engine handling for execlists_hold() Chris Wilson
2020-11-17 18:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks Patchwork
2020-11-17 18:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-11-17 19:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-11-17 22:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86cb67d0-ac39-7f83-8ff8-eed3ef7c5943@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox