All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: intel-xe@lists.freedesktop.org, jose.souza@intel.com,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Paulo Zanoni" <paulo.r.zanoni@intel.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Matthew Brost" <matthew.brost@intel.com>
Subject: Re: [PATCH 12/12] drm/xe: Fix missing runtime outer protection for ggtt_remove_node
Date: Sat, 17 Aug 2024 06:14:18 -0400	[thread overview]
Message-ID: <ZsB3-p0Dho-f8jcl@intel.com> (raw)
In-Reply-To: <mjynpl7ebpbmpzqncu2meyhwb7hfqpwxmu33pizqrfyluslxri@nvvhez7ybc45>

On Fri, Aug 16, 2024 at 11:08:26AM -0500, Lucas De Marchi wrote:
> On Fri, Aug 16, 2024 at 11:02:43AM GMT, Rodrigo Vivi wrote:
> > Defer the ggtt node removal to a thread if runtime_pm is not active.
> > 
> > The ggtt node removal can be called from multiple places, including
> > places where we cannot protect with outer callers and places we are
> > within other locks. So, try to grab the runtime reference if the
> > device is already active, otherwise defer the removal to a separate
> > thread from where we are sure we can wake the device up.
> > 
> > v2: - use xe wq instead of system wq (Matt and CI)
> >    - Avoid GFP_KERNEL to be future proof since this removal can
> >    be called from outside our drivers and we don't want to block
> >    if atomic is needed. (Brost)
> > v3: amend forgot chunk declaring xe_device.
> > v4: Use a xe_ggtt_region to encapsulate the node and remova info,
> >    wihtout the need for any memory allocation at runtime.
> > v5: Actually fill the delayed_removal.invalidate (Brost)
> > v6: - Ensure that ggtt_region is not freed before work finishes (Auld)
> >    - Own wq to ensures that the queued works are flushed before
> >      ggtt_fini (Brost)
> > v7: also free ggtt_region on early !bound return (Auld)
> > v8: Address the null deref (CI)
> > v9: Based on the new xe_ggtt_node for the proper care of the lifetime
> >    of the object.
> > v10: Redo the lost v5 change. (Brost)
> > 
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> > Cc: Francois Dugast <francois.dugast@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_ggtt.c       | 107 ++++++++++++++++++-----------
> > drivers/gpu/drm/xe/xe_ggtt_types.h |  12 ++++
> > 2 files changed, 79 insertions(+), 40 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > index 5c04c1bc8417..110acf828974 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > @@ -161,6 +161,7 @@ static void ggtt_fini_early(struct drm_device *drm, void *arg)
> > {
> > 	struct xe_ggtt *ggtt = arg;
> > 
> > +	destroy_workqueue(ggtt->wq);
> 
> better to follow the inverse order of init_early, but doesn't matter
> much in this case.

hmm...
maybe it does matter:
https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-137398v1/bat-adlp-vf/igt@core_hotunplug@unbind-rebind.html

but maybe this was only a missed case of xe_ggtt_node_fini that I just
fixed on VF case...

But I didn't understand why you believe this doesn't follow the init_early
order? I intended to flush the wq and get all the nodes removed before
we destroy the mutex and take mm down...

What am I missing?

Btw, thanks for all the comments. Addressed almost all of them with
the exception of s/invalidate/flag I believe...

> 
> > 	mutex_destroy(&ggtt->lock);
> > 	drm_mm_takedown(&ggtt->mm);
> > }
> > @@ -242,6 +243,8 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> > 	else
> > 		ggtt->pt_ops = &xelp_pt_ops;
> > 
> > +	ggtt->wq = alloc_workqueue("xe-ggtt-wq", 0, 0);
> > +
> > 	drm_mm_init(&ggtt->mm, xe_wopcm_size(xe),
> > 		    ggtt->size - xe_wopcm_size(xe));
> > 	mutex_init(&ggtt->lock);
> > @@ -276,6 +279,68 @@ static void xe_ggtt_initial_clear(struct xe_ggtt *ggtt)
> > 	mutex_unlock(&ggtt->lock);
> > }
> > 
> > +static void ggtt_node_remove(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
> > +			     bool invalidate)
> 
> you don't need the invalidate arg anymore. Just make sure it's always
> set in node.
> 
> > +{
> > +	struct xe_device *xe = tile_to_xe(ggtt->tile);
> > +	bool bound;
> > +	int idx;
> > +
> > +	if (!node || !node->ggtt)
> > +		return;
> > +
> > +	bound = drm_dev_enter(&xe->drm, &idx);
> > +
> > +	mutex_lock(&ggtt->lock);
> > +	if (bound)
> > +		xe_ggtt_clear(ggtt, node->base.start, node->base.size);
> > +	drm_mm_remove_node(&node->base);
> > +	node->base.size = 0;
> > +	mutex_unlock(&ggtt->lock);
> > +
> > +	if (!bound)
> > +		goto free_node;
> > +
> > +	if (invalidate)
> > +		xe_ggtt_invalidate(ggtt);
> > +
> > +	drm_dev_exit(idx);
> > +
> > +free_node:
> > +	xe_ggtt_node_fini(node);
> > +}
> > +
> > +static void ggtt_node_remove_work_func(struct work_struct *work)
> > +{
> > +	struct xe_ggtt_node *node = container_of(work, typeof(*node),
> > +						 delayed_removal.work);
> > +	struct xe_device *xe = tile_to_xe(node->ggtt->tile);
> > +
> > +	xe_pm_runtime_get(xe);
> > +	ggtt_node_remove(node->ggtt, node, node->delayed_removal.invalidate);
> > +	xe_pm_runtime_put(xe);
> > +}
> > +
> > +/**
> > + * xe_ggtt_node_remove - Remove a &xe_ggtt_node from the GGTT
> > + * @ggtt: the &xe_ggtt where node will be removed
> > + * @node: the &xe_ggtt_node to be removed
> > + * @invalidate: if node needs invalidation upon removal
> > + */
> > +void xe_ggtt_node_remove(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
> > +			 bool invalidate)
> > +{
> > +	struct xe_device *xe = tile_to_xe(ggtt->tile);
> > +
> > +	if (xe_pm_runtime_get_if_active(xe)) {
> > +		ggtt_node_remove(ggtt, node, invalidate);
> > +		xe_pm_runtime_put(xe);
> > +	} else {
> > +		node->delayed_removal.invalidate = invalidate;
> > +		queue_work(ggtt->wq, &node->delayed_removal.work);
> > +	}
> > +}
> > +
> > /**
> >  * xe_ggtt_init - Regular non-early GGTT initialization
> >  * @ggtt: the &xe_ggtt to be initialized
> > @@ -482,7 +547,9 @@ struct xe_ggtt_node *xe_ggtt_node_init(struct xe_ggtt *ggtt)
> > 	if (!node)
> > 		return ERR_PTR(-ENOMEM);
> > 
> > +	INIT_WORK(&node->delayed_removal.work, ggtt_node_remove_work_func);
> > 	node->ggtt = ggtt;
> > +
> > 	return node;
> > }
> > 
> > @@ -499,46 +566,6 @@ void xe_ggtt_node_fini(struct xe_ggtt_node *node)
> > 	kfree(node);
> > }
> > 
> > -/**
> > - * xe_ggtt_node_remove - Remove a &xe_ggtt_node from the GGTT
> > - * @ggtt: the &xe_ggtt where node will be removed
> > - * @node: the &xe_ggtt_node to be removed
> > - * @invalidate: if node needs invalidation upon removal
> > - */
> > -void xe_ggtt_node_remove(struct xe_ggtt *ggtt, struct xe_ggtt_node *node,
> > -			 bool invalidate)
> > -{
> > -	struct xe_device *xe = tile_to_xe(ggtt->tile);
> > -	bool bound;
> > -	int idx;
> > -
> > -	if (!node || !node->ggtt)
> > -		return;
> > -
> > -	bound = drm_dev_enter(&xe->drm, &idx);
> > -	if (bound)
> > -		xe_pm_runtime_get_noresume(xe);
> > -
> > -	mutex_lock(&ggtt->lock);
> > -	if (bound)
> > -		xe_ggtt_clear(ggtt, node->base.start, node->base.size);
> > -	drm_mm_remove_node(&node->base);
> > -	node->base.size = 0;
> > -	mutex_unlock(&ggtt->lock);
> > -
> > -	if (!bound)
> > -		goto free_node;
> > -
> > -	if (invalidate)
> > -		xe_ggtt_invalidate(ggtt);
> > -
> > -	xe_pm_runtime_put(xe);
> > -	drm_dev_exit(idx);
> > -
> > -free_node:
> > -	xe_ggtt_node_fini(node);
> > -}
> > -
> > /**
> >  * xe_ggtt_node_allocated - Check if node is allocated in GGTT
> >  * @node: the &xe_ggtt_node to be inspected
> > diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
> > index 0e8822ae13fc..8b83610c6ee6 100644
> > --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
> > +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
> > @@ -47,6 +47,8 @@ struct xe_ggtt {
> > 	struct drm_mm mm;
> > 	/** @access_count: counts GGTT writes */
> > 	unsigned int access_count;
> > +	/** @wq: Dedicated unordered work queue to process node removals */
> > +	struct workqueue_struct *wq;
> > };
> > 
> > /**
> > @@ -61,6 +63,16 @@ struct xe_ggtt_node {
> > 	struct xe_ggtt *ggtt;
> > 	/** @base: A drm_mm_node */
> > 	struct drm_mm_node base;
> > +	/**
> > +	 * @delayed_removal: Information for removal through work thread when
> > +	 * device runtime_pm is suspended
> > +	 */
> > +	struct {
> > +		/** @delayed_removal.work: The work struct for the delayed removal */
> > +		struct work_struct work;
> > +		/** @delayed_removal.invalidate: If it needs invalidation upon removal */
> > +		bool invalidate;
> 
> as noted above, I'd move this outside and always use it.
> 
> node->invalidate_on_remove
> 
> or something like that.... should make it simpler IMO so you can ignore
> my comment about using a flags arg in a previous patch. Up to you if
> doing in this patch or as a follow up
> 
> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
> 
> Lucas De Marchi
> 
> > +	} delayed_removal;
> > };
> > 
> > /**
> > -- 
> > 2.46.0
> > 

  reply	other threads:[~2024-08-17 10:14 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 15:02 [PATCH 01/12] drm/xe: Removed unused xe_ggtt_printk Rodrigo Vivi
2024-08-16 15:02 ` [PATCH 02/12] drm/xe: Introduce GGTT documentation Rodrigo Vivi
2024-08-16 15:13   ` Lucas De Marchi
2024-08-16 15:02 ` [PATCH 03/12] drm/xe: Remove unnecessary drm_mm.h includes Rodrigo Vivi
2024-08-16 15:02 ` [PATCH 04/12] drm/{i915, xe}: Avoid direct inspection of dpt_vma from outside dpt Rodrigo Vivi
2024-08-16 15:02 ` [PATCH 05/12] drm/xe: Encapsulate drm_mm_node inside xe_ggtt_node Rodrigo Vivi
2024-08-16 15:02 ` [PATCH 06/12] drm/xe: Rename xe_ggtt_node related functions Rodrigo Vivi
2024-08-16 15:24   ` Lucas De Marchi
2024-08-16 15:02 ` [PATCH 07/12] drm/xe: Limit drm_mm_node_allocated access to xe_ggtt_node Rodrigo Vivi
2024-08-16 15:26   ` Lucas De Marchi
2024-08-16 15:02 ` [PATCH 08/12] drm/xe: Introduce xe_ggtt_largest_hole Rodrigo Vivi
2024-08-16 15:35   ` Lucas De Marchi
2024-08-16 15:02 ` [PATCH 09/12] drm/xe: Introduce xe_ggtt_print_holes Rodrigo Vivi
2024-08-16 15:02 ` [PATCH 10/12] drm/xe: Rename xe_ggtt balloon functions to make the node clear Rodrigo Vivi
2024-08-16 15:45   ` Lucas De Marchi
2024-08-16 15:02 ` [PATCH 11/12] drm/xe: Make xe_ggtt_node struct independent Rodrigo Vivi
2024-08-16 15:02 ` [PATCH 12/12] drm/xe: Fix missing runtime outer protection for ggtt_remove_node Rodrigo Vivi
2024-08-16 16:08   ` Lucas De Marchi
2024-08-17 10:14     ` Rodrigo Vivi [this message]
2024-08-16 15:07 ` [PATCH 01/12] drm/xe: Removed unused xe_ggtt_printk Lucas De Marchi
2024-08-16 15:09 ` ✓ CI.Patch_applied: success for series starting with [01/12] " Patchwork
2024-08-16 15:09 ` ✓ CI.checkpatch: " Patchwork
2024-08-16 15:12 ` ✓ CI.KUnit: " Patchwork
2024-08-16 15:27 ` ✓ CI.Build: " Patchwork
2024-08-16 15:30 ` ✓ CI.Hooks: " Patchwork
2024-08-16 15:31 ` ✗ CI.checksparse: warning " Patchwork
2024-08-16 16:29 ` ✗ CI.BAT: failure " Patchwork
2024-08-17  1:50 ` ✗ CI.FULL: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2024-08-21 19:38 [PATCH 01/12] " Rodrigo Vivi
2024-08-21 19:38 ` [PATCH 12/12] drm/xe: Fix missing runtime outer protection for ggtt_remove_node Rodrigo Vivi
2024-08-20 20:25 [PATCH 01/12] drm/xe: Removed unused xe_ggtt_printk Rodrigo Vivi
2024-08-20 20:25 ` [PATCH 12/12] drm/xe: Fix missing runtime outer protection for ggtt_remove_node Rodrigo Vivi
2024-08-17 10:35 [PATCH 01/12] drm/xe: Removed unused xe_ggtt_printk Rodrigo Vivi
2024-08-17 10:35 ` [PATCH 12/12] drm/xe: Fix missing runtime outer protection for ggtt_remove_node Rodrigo Vivi
2024-08-15 22:07 [PATCH 01/12] drm/xe: Removed unused xe_ggtt_printk Rodrigo Vivi
2024-08-15 22:07 ` [PATCH 12/12] drm/xe: Fix missing runtime outer protection for ggtt_remove_node Rodrigo Vivi
2024-07-11 17:11 [PATCH 01/12] drm/xe: Removed unused xe_ggtt_printk Rodrigo Vivi
2024-07-11 17:11 ` [PATCH 12/12] drm/xe: Fix missing runtime outer protection for ggtt_remove_node Rodrigo Vivi
2024-07-16 17:32   ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsB3-p0Dho-f8jcl@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=paulo.r.zanoni@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.