From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH] drm/xe: Fix an invalid locking wait context bug
Date: Wed, 26 Jul 2023 13:24:56 -0400 [thread overview]
Message-ID: <ZMFW6AkUQCWNGdpl@intel.com> (raw)
In-Reply-To: <5041fe3d-2183-86fc-ce5a-db2537713eba@intel.com>
On Wed, Jul 26, 2023 at 12:38:53PM +0100, Matthew Auld wrote:
> On 25/07/2023 21:17, Rodrigo Vivi wrote:
> > We cannot have spin locks around xe_irq_reset, since it will
> > call the intel_display_power_is_enabled() function, and
> > that needs a mutex lock. Hence causing the undesired
> > "[ BUG: Invalid wait context ]"
> >
> > We cannot convert i915's power domain lock to spin lock
> > due to the nested dependency of non-atomic context waits.
> >
> > So, let's move the xe_irq_reset functions from the
> > critical area, while still ensuring that we are protecting
> > the irq.enabled and ensuring the right serialization
> > in the irq handlers.
> >
> > v2: On the first version, I had missed the fact that
> > irq.enabled is checked on the xe/display glue layer,
> > and that i915 display code is actually using the irq
> > spin lock properly. So, this got changed to a version
> > suggested by Matthew Auld, along with introducing
> > the lockdep_assert_held at the display glue code.
> >
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/463
> > Suggested-by: Matthew Auld <matthew.auld@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> > drivers/gpu/drm/xe/display/ext/i915_irq.c | 1 +
> > drivers/gpu/drm/xe/xe_irq.c | 32 ++++++++++++++++++-----
> > 2 files changed, 27 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/display/ext/i915_irq.c b/drivers/gpu/drm/xe/display/ext/i915_irq.c
> > index fbb0e99143f6..ac435f7f854b 100644
> > --- a/drivers/gpu/drm/xe/display/ext/i915_irq.c
> > +++ b/drivers/gpu/drm/xe/display/ext/i915_irq.c
> > @@ -107,6 +107,7 @@ void intel_display_irq_uninstall(struct drm_i915_private *dev_priv)
> > bool intel_irqs_enabled(struct xe_device *xe)
> > {
> > + lockdep_assert_held(&xe->irq.lock);
> > return xe->irq.enabled;
> > }
> > diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
> > index eae190cb0969..d6b001f6a3c5 100644
> > --- a/drivers/gpu/drm/xe/xe_irq.c
> > +++ b/drivers/gpu/drm/xe/xe_irq.c
> > @@ -309,6 +309,13 @@ static irqreturn_t xelp_irq_handler(int irq, void *arg)
> > unsigned long intr_dw[2];
> > u32 identity[32];
> > + spin_lock_irq(&xe->irq.lock);
> > + if (!xe->irq.enabled) {
> > + spin_unlock_irq(&xe->irq.lock);
> > + return IRQ_NONE;
> > + }
> > + spin_unlock_irq(&xe->irq.lock);
>
> I guess spin_lock() or spin_lock_irq_save() here and below, since this is
> inside hard irq?
I had considered saving and restoring the flags, but exactly because we are
inside it and we sync before disabling, we don't have the risk of blindly
re-enable with the spin_unlock_irq(), no? or what could I be missing on
that way?
But well, maybe the spin_lock() and spin_unlock() are the right one?
I'm doing some experiments here.... so far with the flags I'm getting
some strange warnings...
>
> > +
> > master_ctl = xelp_intr_disable(xe);
> > if (!master_ctl) {
> > xelp_intr_enable(xe, false);
> > @@ -371,6 +378,13 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg)
> > /* TODO: This really shouldn't be copied+pasted */
> > + spin_lock_irq(&xe->irq.lock);
> > + if (!xe->irq.enabled) {
> > + spin_unlock_irq(&xe->irq.lock);
> > + return IRQ_NONE;
> > + }
> > + spin_unlock_irq(&xe->irq.lock);
> > +
> > master_tile_ctl = dg1_intr_disable(xe);
> > if (!master_tile_ctl) {
> > dg1_intr_enable(xe, false);
> > @@ -574,10 +588,14 @@ void xe_irq_shutdown(struct xe_device *xe)
> > void xe_irq_suspend(struct xe_device *xe)
> > {
> > + int irq = to_pci_dev(xe->drm.dev)->irq;
> > +
> > spin_lock_irq(&xe->irq.lock);
> > - xe->irq.enabled = false;
> > - xe_irq_reset(xe);
> > + xe->irq.enabled = false; /* no new irqs */
> > spin_unlock_irq(&xe->irq.lock);
> > +
> > + synchronize_irq(irq); /* flush irqs */
> > + xe_irq_reset(xe); /* turn irqs off */
> > }
> > void xe_irq_resume(struct xe_device *xe)
> > @@ -585,13 +603,15 @@ void xe_irq_resume(struct xe_device *xe)
> > struct xe_gt *gt;
> > int id;
> > - spin_lock_irq(&xe->irq.lock);
> > + /*
> > + * lock not needed:
> > + * 1. no irq will arrive before the postinstall
> > + * 2. display is not yet resumed
> > + */
> > xe->irq.enabled = true;
> > xe_irq_reset(xe);
> > - xe_irq_postinstall(xe);
> > + xe_irq_postinstall(xe); /* turn irqs on */
> > for_each_gt(gt, xe, id)
> > xe_irq_enable_hwe(gt);
> > -
> > - spin_unlock_irq(&xe->irq.lock);
> > }
next prev parent reply other threads:[~2023-07-26 17:25 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 19:27 [Intel-xe] [PATCH] drm/xe: Fix an invalid locking wait context bug Rodrigo Vivi
2023-07-19 21:07 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-07-19 21:07 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-07-19 21:08 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-07-19 21:12 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-19 21:12 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-19 21:14 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-19 21:14 ` [Intel-xe] [PATCH] " Matthew Brost
2023-07-20 9:11 ` Matthew Auld
2023-07-20 12:01 ` Rodrigo Vivi
2023-07-20 12:38 ` Matthew Auld
2023-07-20 15:42 ` Rodrigo Vivi
2023-07-21 11:17 ` Matthew Auld
2023-07-21 18:44 ` Rodrigo Vivi
2023-07-25 20:17 ` Rodrigo Vivi
2023-07-26 11:38 ` Matthew Auld
2023-07-26 17:24 ` Rodrigo Vivi [this message]
2023-07-26 21:30 ` Rodrigo Vivi
2023-07-27 16:08 ` Matthew Auld
2023-07-25 20:20 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: Fix an invalid locking wait context bug (rev2) Patchwork
2023-07-25 20:20 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-07-25 20:21 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-07-25 20:25 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-25 20:25 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-25 20:27 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-25 20:45 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-07-26 22:29 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: Fix an invalid locking wait context bug (rev3) Patchwork
2023-07-26 22:29 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-07-26 22:30 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-07-26 22:34 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-07-26 22:34 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-07-26 22:35 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-07-26 23:10 ` [Intel-xe] ○ CI.BAT: info " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZMFW6AkUQCWNGdpl@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox