From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH] drm/xe: Fix an invalid locking wait context bug
Date: Wed, 26 Jul 2023 13:24:56 -0400 [thread overview]
Message-ID: <ZMFW6AkUQCWNGdpl@intel.com> (raw)
In-Reply-To: <5041fe3d-2183-86fc-ce5a-db2537713eba@intel.com>
On Wed, Jul 26, 2023 at 12:38:53PM +0100, Matthew Auld wrote:
> On 25/07/2023 21:17, Rodrigo Vivi wrote:
> > We cannot have spin locks around xe_irq_reset, since it will
> > call the intel_display_power_is_enabled() function, and
> > that needs a mutex lock. Hence causing the undesired
> > "[ BUG: Invalid wait context ]"
> >
> > We cannot convert i915's power domain lock to spin lock
> > due to the nested dependency of non-atomic context waits.
> >
> > So, let's move the xe_irq_reset functions from the
> > critical area, while still ensuring that we are protecting
> > the irq.enabled and ensuring the right serialization
> > in the irq handlers.
> >
> > v2: On the first version, I had missed the fact that
> > irq.enabled is checked on the xe/display glue layer,
> > and that i915 display code is actually using the irq
> > spin lock properly. So, this got changed to a version
> > suggested by Matthew Auld, along with introducing
> > the lockdep_assert_held at the display glue code.
> >
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/463
> > Suggested-by: Matthew Auld <matthew.auld@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > drivers/gpu/drm/xe/display/ext/i915_irq.c | 1 +
> > drivers/gpu/drm/xe/xe_irq.c | 32 ++++++++++++++++++-----
> > 2 files changed, 27 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/display/ext/i915_irq.c b/drivers/gpu/drm/xe/display/ext/i915_irq.c
> > index fbb0e99143f6..ac435f7f854b 100644
> > --- a/drivers/gpu/drm/xe/display/ext/i915_irq.c
> > +++ b/drivers/gpu/drm/xe/display/ext/i915_irq.c
> > @@ -107,6 +107,7 @@ void intel_display_irq_uninstall(struct drm_i915_private *dev_priv)
> > bool intel_irqs_enabled(struct xe_device *xe)
> > {
> > + lockdep_assert_held(&xe->irq.lock);
> > return xe->irq.enabled;
> > }
> > diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
> > index eae190cb0969..d6b001f6a3c5 100644
> > --- a/drivers/gpu/drm/xe/xe_irq.c
> > +++ b/drivers/gpu/drm/xe/xe_irq.c
> > @@ -309,6 +309,13 @@ static irqreturn_t xelp_irq_handler(int irq, void *arg)
> > unsigned long intr_dw[2];
> > u32 identity[32];
> > + spin_lock_irq(&xe->irq.lock);
> > + if (!xe->irq.enabled) {
> > + spin_unlock_irq(&xe->irq.lock);
> > + return IRQ_NONE;
> > + }
> > + spin_unlock_irq(&xe->irq.lock);
>
> I guess spin_lock() or spin_lock_irq_save() here and below, since this is
> inside hard irq?
I had considered saving and restoring the flags, but exactly because we are
inside it and we sync before disabling, we don't have the risk of blindly
re-enable with the spin_unlock_irq(), no? or what could I be missing on
that way?
But well, maybe the spin_lock() and spin_unlock() are the right one?
I'm doing some experiments here.... so far with the flags I'm getting
some strange warnings...
>
> > +
> > master_ctl = xelp_intr_disable(xe);
> > if (!master_ctl) {
> > xelp_intr_enable(xe, false);
> > @@ -371,6 +378,13 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg)
> > /* TODO: This really shouldn't be copied+pasted */
> > + spin_lock_irq(&xe->irq.lock);
> > + if (!xe->irq.enabled) {
> > + spin_unlock_irq(&xe->irq.lock);
> > + return IRQ_NONE;
> > + }
> > + spin_unlock_irq(&xe->irq.lock);
> > +
> > master_tile_ctl = dg1_intr_disable(xe);
> > if (!master_tile_ctl) {
> > dg1_intr_enable(xe, false);
> > @@ -574,10 +588,14 @@ void xe_irq_shutdown(struct xe_device *xe)
> > void xe_irq_suspend(struct xe_device *xe)
> > {
> > + int irq = to_pci_dev(xe->drm.dev)->irq;
> > +
> > spin_lock_irq(&xe->irq.lock);
> > - xe->irq.enabled = false;
> > - xe_irq_reset(xe);
> > + xe->irq.enabled = false; /* no new irqs */
> > spin_unlock_irq(&xe->irq.lock);
> > +
> > + synchronize_irq(irq); /* flush irqs */
> > + xe_irq_reset(xe); /* turn irqs off */
> > }
> > void xe_irq_resume(struct xe_device *xe)
> > @@ -585,13 +603,15 @@ void xe_irq_resume(struct xe_device *xe)
> > struct xe_gt *gt;
> > int id;
> > - spin_lock_irq(&xe->irq.lock);
> > + /*
> > + * lock not needed:
> > + * 1. no irq will arrive before the postinstall
> > + * 2. display is not yet resumed
> > + */
> > xe->irq.enabled = true;
> > xe_irq_reset(xe);
> > - xe_irq_postinstall(xe);
> > + xe_irq_postinstall(xe); /* turn irqs on */
> > for_each_gt(gt, xe, id)
> > xe_irq_enable_hwe(gt);
> > -
> > - spin_unlock_irq(&xe->irq.lock);
> > }
next prev parent reply other threads:[~2023-07-26 17:25 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 19:27 [Intel-xe] [PATCH] drm/xe: Fix an invalid locking wait context bug Rodrigo Vivi
2023-07-19 21:07 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-07-19 21:07 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-07-19 21:08 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-07-19 21:12 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-19 21:12 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-19 21:14 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-19 21:14 ` [Intel-xe] [PATCH] " Matthew Brost
2023-07-20 9:11 ` Matthew Auld
2023-07-20 12:01 ` Rodrigo Vivi
2023-07-20 12:38 ` Matthew Auld
2023-07-20 15:42 ` Rodrigo Vivi
2023-07-21 11:17 ` Matthew Auld
2023-07-21 18:44 ` Rodrigo Vivi
2023-07-25 20:17 ` Rodrigo Vivi
2023-07-26 11:38 ` Matthew Auld
2023-07-26 17:24 ` Rodrigo Vivi [this message]
2023-07-26 21:30 ` Rodrigo Vivi
2023-07-27 16:08 ` Matthew Auld
2023-07-25 20:20 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: Fix an invalid locking wait context bug (rev2) Patchwork
2023-07-25 20:20 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-07-25 20:21 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-07-25 20:25 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-25 20:25 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-25 20:27 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-25 20:45 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-07-26 22:29 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: Fix an invalid locking wait context bug (rev3) Patchwork
2023-07-26 22:29 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-07-26 22:30 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-07-26 22:34 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-26 22:34 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-26 22:35 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-26 23:10 ` [Intel-xe] ○ CI.BAT: info " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZMFW6AkUQCWNGdpl@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.