All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org,
	Paulo Zanoni <paulo.zanoni@intel.com>,
	 Francois Dugast <francois.dugast@intel.com>
Subject: Re: [PATCH] drm/xe: Add missing runtime outer protection
Date: Thu, 30 May 2024 12:09:11 +0200	[thread overview]
Message-ID: <9deabf92ea72cd8001fa03fde607901340285e2c.camel@linux.intel.com> (raw)
In-Reply-To: <ZlgR3gA9ye8MhPEw@DUT025-TGLU.fm.intel.com>

On Thu, 2024-05-30 at 05:42 +0000, Matthew Brost wrote:
> On Wed, May 29, 2024 at 05:56:03PM -0400, Rodrigo Vivi wrote:
> > TTM BO destroy is called unlocked as a ref count worker when it
> > gets 0 users. When that happens we could be runtime suspended,
> > and waking up from inner locked places like ggtt_remove_node,
> > could potentially lead to deadlocks. Our warning system against
> > this case hit this case:
> > 
> > [ 2295.891269] xe 0000:03:00.0: Missing outer runtime PM protection
> > [snip]
> > [ 2295.891604]  ? xe_pm_runtime_get_noresume+0x5c/0x70 [xe]
> > [ 2295.891717]  ? report_bug+0x18d/0x1c0
> > [ 2295.891722]  ? handle_bug+0x3c/0x80
> > [ 2295.891724]  ? exc_invalid_op+0x13/0x60
> > [ 2295.891726]  ? asm_exc_invalid_op+0x16/0x20
> > [ 2295.891730]  ? xe_pm_runtime_get_noresume+0x5c/0x70 [xe]
> > [ 2295.891816]  xe_ggtt_remove_node+0x93/0xf0 [xe]
> > [ 2295.891870]  xe_ttm_bo_destroy+0xe9/0xf0 [xe]
> > [ 2295.891935]  process_one_work+0x225/0x730
> > [ 2295.891940]  worker_thread+0x1d8/0x3c0
> > 
> > Add this outer protection to avoid any potential deadlock.
> > 
> > Reported-by: Paulo Zanoni <paulo.zanoni@intel.com>
> > Cc: Francois Dugast <francois.dugast@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_bo.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > b/drivers/gpu/drm/xe/xe_bo.c
> > index 2bae01ce4e5b..a902f23bec0c 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1066,6 +1066,8 @@ static void xe_ttm_bo_destroy(struct
> > ttm_buffer_object *ttm_bo)
> >  	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
> >  	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> >  
> > +	xe_pm_runtime_get(xe);
> 
> Should we only do this if we are in a kthread? i.e. !current->mm

First, I think the xe_ggtt_remove_node() should be moved to
delete_mem_notify(), because all backing memory is released at that
point.

But I guess this needs to be a bit carefully considered. 
First, if we only do this from a kthread, then all instances of
xe_bo_put() needs to have a runtime pm ref, but OTOH putting a bo from
reclaim context would, if calling runtime_pm_get synchronously case a
lockdep splat?

I figure we need something like

if (xe_pm_runtime_get_if_active() || (current->flags & PF_WQ_WORKER))
	xe_ggtt_remove_node()
else
	remove_ggtt_node_from_worker()


Unless we can queue ggtt manipulations if device inactive and only
execute them at wakeup.

/Thomas

> 
> Matt
> 
> > +
> >  	if (bo->ttm.base.import_attach)
> >  		drm_prime_gem_destroy(&bo->ttm.base, NULL);
> >  	drm_gem_object_release(&bo->ttm.base);
> > @@ -1089,6 +1091,8 @@ static void xe_ttm_bo_destroy(struct
> > ttm_buffer_object *ttm_bo)
> >  	mutex_unlock(&xe->mem_access.vram_userfault.lock);
> >  
> >  	kfree(bo);
> > +
> > +	xe_pm_runtime_put(xe);
> >  }
> >  
> >  static void xe_gem_object_free(struct drm_gem_object *obj)
> > -- 
> > 2.45.1
> > 


      reply	other threads:[~2024-05-30 10:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 21:56 [PATCH] drm/xe: Add missing runtime outer protection Rodrigo Vivi
2024-05-29 23:23 ` ✓ CI.Patch_applied: success for " Patchwork
2024-05-29 23:23 ` ✗ CI.checkpatch: warning " Patchwork
2024-05-29 23:24 ` ✓ CI.KUnit: success " Patchwork
2024-05-29 23:36 ` ✓ CI.Build: " Patchwork
2024-05-29 23:36 ` ✗ CI.Hooks: failure " Patchwork
2024-05-29 23:37 ` ✓ CI.checksparse: success " Patchwork
2024-05-30  0:01 ` ✓ CI.BAT: " Patchwork
2024-05-30  2:50 ` ✗ CI.FULL: failure " Patchwork
2024-05-30  5:42 ` [PATCH] " Matthew Brost
2024-05-30 10:09   ` Thomas Hellström [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9deabf92ea72cd8001fa03fde607901340285e2c.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=paulo.zanoni@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.