* [PATCH] drm/i915/guc: Add synchronization on interrupt enable flag
@ 2025-08-19 16:00 Zhanjun Dong
2025-08-20 10:17 ` Andi Shyti
0 siblings, 1 reply; 5+ messages in thread
From: Zhanjun Dong @ 2025-08-19 16:00 UTC (permalink / raw)
To: intel-gfx, dri-devel; +Cc: Zhanjun Dong, Daniele Ceraolo Spurio
Boolean flag access from interrupt context might have synchronous issue on
multiple processor platform, flage modified by one core might be read as an
old value by another core. This issue on interrupt enable flag might causes
interrupt missing or leakage.
Fixed by change the data type as automic to add memory synchronization.
Fixes: a187f13d51fa0 ("drm/i915/guc: handle interrupts from media GuC")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
drivers/gpu/drm/i915/gt/intel_gt_irq.c | 2 +-
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 8 ++++----
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 ++--
drivers/gpu/drm/i915/gt/uc/intel_uc.c | 4 ++--
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index 75e802e10be2..21804eec8320 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -20,7 +20,7 @@
static void guc_irq_handler(struct intel_guc *guc, u16 iir)
{
- if (unlikely(!guc->interrupts.enabled))
+ if (unlikely(!atomic_read(&guc->interrupts.enabled)))
return;
if (iir & GUC_INTR_GUC2HOST)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index f360f020d8f1..398651b1ba60 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -101,7 +101,7 @@ static void gen9_enable_guc_interrupts(struct intel_guc *guc)
gen6_gt_pm_enable_irq(gt, gt->pm_guc_events);
spin_unlock_irq(gt->irq_lock);
- guc->interrupts.enabled = true;
+ atomic_set(&guc->interrupts.enabled, true);
}
static void gen9_disable_guc_interrupts(struct intel_guc *guc)
@@ -109,7 +109,7 @@ static void gen9_disable_guc_interrupts(struct intel_guc *guc)
struct intel_gt *gt = guc_to_gt(guc);
assert_rpm_wakelock_held(>->i915->runtime_pm);
- guc->interrupts.enabled = false;
+ atomic_set(&guc->interrupts.enabled, false);
spin_lock_irq(gt->irq_lock);
@@ -146,14 +146,14 @@ static void gen11_enable_guc_interrupts(struct intel_guc *guc)
__gen11_reset_guc_interrupts(gt);
spin_unlock_irq(gt->irq_lock);
- guc->interrupts.enabled = true;
+ atomic_set(&guc->interrupts.enabled, true);
}
static void gen11_disable_guc_interrupts(struct intel_guc *guc)
{
struct intel_gt *gt = guc_to_gt(guc);
- guc->interrupts.enabled = false;
+ atomic_set(&guc->interrupts.enabled, false);
intel_synchronize_irq(gt->i915);
gen11_reset_guc_interrupts(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 053780f562c1..40242bbb166e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -93,7 +93,7 @@ struct intel_guc {
/** @interrupts: pointers to GuC interrupt-managing functions. */
struct {
- bool enabled;
+ atomic_t enabled;
void (*reset)(struct intel_guc *guc);
void (*enable)(struct intel_guc *guc);
void (*disable)(struct intel_guc *guc);
@@ -393,7 +393,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc *guc,
/* Only call this from the interrupt handler code */
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
{
- if (guc->interrupts.enabled)
+ if (atomic_read(&guc->interrupts.enabled))
intel_guc_ct_event_handler(&guc->ct);
}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 4a3493e8d433..964917387c08 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -659,7 +659,7 @@ void intel_uc_runtime_suspend(struct intel_uc *uc)
struct intel_guc *guc = &uc->guc;
if (!intel_guc_is_ready(guc)) {
- guc->interrupts.enabled = false;
+ atomic_set(&guc->interrupts.enabled, false);
return;
}
@@ -687,7 +687,7 @@ void intel_uc_suspend(struct intel_uc *uc)
wake_up_all_tlb_invalidate(guc);
if (!intel_guc_is_ready(guc)) {
- guc->interrupts.enabled = false;
+ atomic_set(&guc->interrupts.enabled, false);
return;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915/guc: Add synchronization on interrupt enable flag
2025-08-19 16:00 [PATCH] drm/i915/guc: Add synchronization on interrupt enable flag Zhanjun Dong
@ 2025-08-20 10:17 ` Andi Shyti
2025-08-20 18:00 ` Dong, Zhanjun
0 siblings, 1 reply; 5+ messages in thread
From: Andi Shyti @ 2025-08-20 10:17 UTC (permalink / raw)
To: Zhanjun Dong; +Cc: intel-gfx, dri-devel, Daniele Ceraolo Spurio
Hi Zhanjun,
On Tue, Aug 19, 2025 at 12:00:10PM -0400, Zhanjun Dong wrote:
> Boolean flag access from interrupt context might have synchronous issue on
/synchronous/synchronization/
/issue/issues/
> multiple processor platform, flage modified by one core might be read as an
/flage/flags/
> old value by another core. This issue on interrupt enable flag might causes
> interrupt missing or leakage.
/missing/misses/
> Fixed by change the data type as automic to add memory synchronization.
This can be re-written: "Change the interrupts.enable type to
atomic to ensure memory synchronization."
>
> Fixes: a187f13d51fa0 ("drm/i915/guc: handle interrupts from media GuC")
What issue are you exactly trying to fix? And have you tested
that this patch is this really fixing it? Where is the flag's
misalignment happening?
Please remember that when in interrupt context you are already
running in atomic, so that probably you don't need to have an
additional atomic access to variables.
Andi
> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915/guc: Add synchronization on interrupt enable flag
2025-08-20 10:17 ` Andi Shyti
@ 2025-08-20 18:00 ` Dong, Zhanjun
2025-08-20 22:05 ` Andi Shyti
0 siblings, 1 reply; 5+ messages in thread
From: Dong, Zhanjun @ 2025-08-20 18:00 UTC (permalink / raw)
To: Andi Shyti; +Cc: intel-gfx, dri-devel, Daniele Ceraolo Spurio
See my comments inline below.
Regards,
Zhanjun Dong
On 2025-08-20 6:17 a.m., Andi Shyti wrote:
> Hi Zhanjun,
>
> On Tue, Aug 19, 2025 at 12:00:10PM -0400, Zhanjun Dong wrote:
>> Boolean flag access from interrupt context might have synchronous issue on
>
> /synchronous/synchronization/
> /issue/issues/
>
>> multiple processor platform, flage modified by one core might be read as an
>
> /flage/flags/
>
>> old value by another core. This issue on interrupt enable flag might causes
>> interrupt missing or leakage.
>
> /missing/misses/
>
>> Fixed by change the data type as automic to add memory synchronization.
>
> This can be re-written: "Change the interrupts.enable type to
> atomic to ensure memory synchronization."
Will follow all above comments
>
>>
>> Fixes: a187f13d51fa0 ("drm/i915/guc: handle interrupts from media GuC")
>
> What issue are you exactly trying to fix? And have you tested
Will add:
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14834
> that this patch is this really fixing it? Where is the flag's
CI report this issue about every 1 or 2 months and close it because no
reproduce.
I can reproduce it in about 1000 rounds(about 9 hours) of IGT test but
not at 100% rate, so it is hard to say really fixed because of hard to
reproduce.
My latest test runs 2200 rounds in total has no reproduce.
> misalignment happening?
>
> Please remember that when in interrupt context you are already
> running in atomic, so that probably you don't need to have an
> additional atomic access to variables.
Interrupt context only read it. When the flag was changed and interrupt
was triggered in very short time, the flag read at interrupt context
might read out old value, if other core(non interrupt context) set the
flag and ISR read out 0, ISR will do simple return and misses interrupt
handling, making it appear as lost interrupt; if other core clear the
flag and ISR read out 1, ISR will continue to run and not stop as
expected, making it appear as an interrupt leak. >
> Andi
>
>> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915/guc: Add synchronization on interrupt enable flag
2025-08-20 18:00 ` Dong, Zhanjun
@ 2025-08-20 22:05 ` Andi Shyti
2025-08-21 14:39 ` Dong, Zhanjun
0 siblings, 1 reply; 5+ messages in thread
From: Andi Shyti @ 2025-08-20 22:05 UTC (permalink / raw)
To: Dong, Zhanjun; +Cc: intel-gfx, dri-devel, Daniele Ceraolo Spurio
Hi Zhanjun,
...
> > On Tue, Aug 19, 2025 at 12:00:10PM -0400, Zhanjun Dong wrote:
> > > Boolean flag access from interrupt context might have synchronous issue on
> >
> > /synchronous/synchronization/
> > /issue/issues/
> >
> > > multiple processor platform, flage modified by one core might be read as an
> >
> > /flage/flags/
> >
> > > old value by another core. This issue on interrupt enable flag might causes
> > > interrupt missing or leakage.
> >
> > /missing/misses/
> >
> > > Fixed by change the data type as automic to add memory synchronization.
> >
> > This can be re-written: "Change the interrupts.enable type to
> > atomic to ensure memory synchronization."
> Will follow all above comments
No need to resend for this if the patch is fine.
> > > Fixes: a187f13d51fa0 ("drm/i915/guc: handle interrupts from media GuC")
> >
> > What issue are you exactly trying to fix? And have you tested
> Will add:
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14834
>
> > that this patch is this really fixing it? Where is the flag's
> CI report this issue about every 1 or 2 months and close it because no
> reproduce.
> I can reproduce it in about 1000 rounds(about 9 hours) of IGT test but not
> at 100% rate, so it is hard to say really fixed because of hard to
> reproduce.
> My latest test runs 2200 rounds in total has no reproduce.
>
> > misalignment happening?
> >
> > Please remember that when in interrupt context you are already
> > running in atomic, so that probably you don't need to have an
> > additional atomic access to variables.
> Interrupt context only read it. When the flag was changed and interrupt was
> triggered in very short time, the flag read at interrupt context might read
> out old value, if other core(non interrupt context) set the flag and ISR
> read out 0, ISR will do simple return and misses interrupt handling, making
> it appear as lost interrupt; if other core clear the flag and ISR read out
> 1, ISR will continue to run and not stop as expected, making it appear as an
> interrupt leak. >
Heh... it looks to me very unlikely to happen, but if you say
that this fixes it, then I'm OK with the patch.
I see still one case missing: the error comes here:
ct_try_receive_message+0x336/0xa10
that is the tasklet that starts after the IRQ. Has the flag
changed from the irq to the tasklet? :-)
Would it make sense to add something like:
@@ -1338,6 +1338,9 @@ static void ct_receive_tasklet_func(struct tasklet_struct *t)
{
struct intel_guc_ct *ct = from_tasklet(ct, t, receive_tasklet);
+ if (!atomic_read(&guc->interrupts.enabled))
+ return;
+
ct_try_receive_message(ct);
}
BTW, have you tried adding a simple memory barrier (even though
in the i915 community people dislike memory barriers, but with a
proper explanation it might save us all this churn).
Thanks,
Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/i915/guc: Add synchronization on interrupt enable flag
2025-08-20 22:05 ` Andi Shyti
@ 2025-08-21 14:39 ` Dong, Zhanjun
0 siblings, 0 replies; 5+ messages in thread
From: Dong, Zhanjun @ 2025-08-21 14:39 UTC (permalink / raw)
To: Andi Shyti; +Cc: intel-gfx, dri-devel, Daniele Ceraolo Spurio
On 2025-08-20 6:05 p.m., Andi Shyti wrote:
> Hi Zhanjun,
>
> ...
>
>>> On Tue, Aug 19, 2025 at 12:00:10PM -0400, Zhanjun Dong wrote:
>>>> Boolean flag access from interrupt context might have synchronous issue on
>>>
>>> /synchronous/synchronization/
>>> /issue/issues/
>>>
>>>> multiple processor platform, flage modified by one core might be read as an
>>>
>>> /flage/flags/
>>>
>>>> old value by another core. This issue on interrupt enable flag might causes
>>>> interrupt missing or leakage.
>>>
>>> /missing/misses/
>>>
>>>> Fixed by change the data type as automic to add memory synchronization.
>>>
>>> This can be re-written: "Change the interrupts.enable type to
>>> atomic to ensure memory synchronization."
>> Will follow all above comments
>
> No need to resend for this if the patch is fine.
>
>>>> Fixes: a187f13d51fa0 ("drm/i915/guc: handle interrupts from media GuC")
>>>
>>> What issue are you exactly trying to fix? And have you tested
>> Will add:
>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14834
>>
>>> that this patch is this really fixing it? Where is the flag's
>> CI report this issue about every 1 or 2 months and close it because no
>> reproduce.
>> I can reproduce it in about 1000 rounds(about 9 hours) of IGT test but not
>> at 100% rate, so it is hard to say really fixed because of hard to
>> reproduce.
>> My latest test runs 2200 rounds in total has no reproduce.
>>
>>> misalignment happening?
>>>
>>> Please remember that when in interrupt context you are already
>>> running in atomic, so that probably you don't need to have an
>>> additional atomic access to variables.
>> Interrupt context only read it. When the flag was changed and interrupt was
>> triggered in very short time, the flag read at interrupt context might read
>> out old value, if other core(non interrupt context) set the flag and ISR
>> read out 0, ISR will do simple return and misses interrupt handling, making
>> it appear as lost interrupt; if other core clear the flag and ISR read out
>> 1, ISR will continue to run and not stop as expected, making it appear as an
>> interrupt leak. >
>
> Heh... it looks to me very unlikely to happen, but if you say
> that this fixes it, then I'm OK with the patch.
>
> I see still one case missing: the error comes here:
>
> ct_try_receive_message+0x336/0xa10
>
> that is the tasklet that starts after the IRQ. Has the flag
> changed from the irq to the tasklet? :-)
>
> Would it make sense to add something like:
>
> @@ -1338,6 +1338,9 @@ static void ct_receive_tasklet_func(struct tasklet_struct *t)
> {
> struct intel_guc_ct *ct = from_tasklet(ct, t, receive_tasklet);
>
> + if (!atomic_read(&guc->interrupts.enabled))
> + return;
> +
> ct_try_receive_message(ct);
> }
Yes, I did test with this check, result looks the same as not add.
While, consider if the system is under heavy load, tasklet might get
effected as well, so I will add it in next rev.
>
> BTW, have you tried adding a simple memory barrier (even though
> in the i915 community people dislike memory barriers, but with a
> proper explanation it might save us all this churn).
Yes, will add it.
New rev to be posted after another 9 hours test.
Regards,
Zhanjun Dong.
>
> Thanks,
> Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-21 14:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 16:00 [PATCH] drm/i915/guc: Add synchronization on interrupt enable flag Zhanjun Dong
2025-08-20 10:17 ` Andi Shyti
2025-08-20 18:00 ` Dong, Zhanjun
2025-08-20 22:05 ` Andi Shyti
2025-08-21 14:39 ` Dong, Zhanjun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).