public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] irqchip/sifive-plic: enable interrupt if needed before EOI
@ 2024-01-31  8:19 Nam Cao
  2024-02-13 10:26 ` Thomas Gleixner
  2024-02-19 14:08 ` [tip: irq/urgent] irqchip/sifive-plic: Enable " tip-bot2 for Nam Cao
  0 siblings, 2 replies; 5+ messages in thread
From: Nam Cao @ 2024-01-31  8:19 UTC (permalink / raw)
  To: Thomas Gleixner, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Marc Zyngier, Guo Ren, linux-kernel, linux-riscv
  Cc: Nam Cao, stable

RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as
explained in the description of Interrupt Completion in the PLIC spec:

"The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the claim/complete
register. The PLIC does not check whether the completion ID is the same
as the last claim ID for that target. If the completion ID does not match
an interrupt source that *is currently enabled* for the target, the
completion is silently ignored."

Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
ensured that EOI is successful by enabling interrupt first, before EOI.

Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
operations") removed the interrupt enabling code from the previous
commit, because it assumes that interrupt should already be enabled at the
point of EOI. However, this is incorrect: there is a window after a hart
claiming an interrupt and before irq_desc->lock getting acquired,
interrupt can be disabled during this window. Thus, EOI can be invoked
while the interrupt is disabled, effectively nullify this EOI. This
results in the interrupt never gets asserted again, and the device who
uses this interrupt appears frozen.

Make sure that interrupt is really enabled before EOI.

Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
Cc: <stable@vger.kernel.org>
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
v2:
  - add unlikely() for optimization
  - re-word commit message to make it clearer

 drivers/irqchip/irq-sifive-plic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index e1484905b7bd..0a233e9d9607 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
 {
 	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
 
-	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	if (unlikely(irqd_irq_disabled(d))) {
+		plic_toggle(handler, d->hwirq, 1);
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+		plic_toggle(handler, d->hwirq, 0);
+	} else {
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	}
 }
 
 #ifdef CONFIG_SMP
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] irqchip/sifive-plic: enable interrupt if needed before EOI
  2024-01-31  8:19 [PATCH v2] irqchip/sifive-plic: enable interrupt if needed before EOI Nam Cao
@ 2024-02-13 10:26 ` Thomas Gleixner
  2024-03-20 14:17   ` Palmer Dabbelt
  2024-02-19 14:08 ` [tip: irq/urgent] irqchip/sifive-plic: Enable " tip-bot2 for Nam Cao
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2024-02-13 10:26 UTC (permalink / raw)
  To: Nam Cao, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Marc Zyngier, Guo Ren, linux-kernel, linux-riscv
  Cc: Nam Cao, stable

Nam!

On Wed, Jan 31 2024 at 09:19, Nam Cao wrote:
> RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as
> explained in the description of Interrupt Completion in the PLIC spec:
>
> "The PLIC signals it has completed executing an interrupt handler by
> writing the interrupt ID it received from the claim to the claim/complete
> register. The PLIC does not check whether the completion ID is the same
> as the last claim ID for that target. If the completion ID does not match
> an interrupt source that *is currently enabled* for the target, the
> completion is silently ignored."
>
> Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
> ensured that EOI is successful by enabling interrupt first, before EOI.
>
> Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
> operations") removed the interrupt enabling code from the previous
> commit, because it assumes that interrupt should already be enabled at the
> point of EOI. However, this is incorrect: there is a window after a hart
> claiming an interrupt and before irq_desc->lock getting acquired,
> interrupt can be disabled during this window. Thus, EOI can be invoked
> while the interrupt is disabled, effectively nullify this EOI. This
> results in the interrupt never gets asserted again, and the device who
> uses this interrupt appears frozen.

Nice detective work!

> Make sure that interrupt is really enabled before EOI.
>
> Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
> v2:
>   - add unlikely() for optimization
>   - re-word commit message to make it clearer
>
>  drivers/irqchip/irq-sifive-plic.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index e1484905b7bd..0a233e9d9607 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
>  {
>  	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
>  
> -	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> +	if (unlikely(irqd_irq_disabled(d))) {
> +		plic_toggle(handler, d->hwirq, 1);
> +		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> +		plic_toggle(handler, d->hwirq, 0);

It's unfortunate to have this condition in the hotpath, though it should
be cache hot, easy to predict and compared to the writel() completely in
the noise.

> +	} else {
> +		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> +	}
>  }

Can the RISCV folks please have a look at this?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: irq/urgent] irqchip/sifive-plic: Enable interrupt if needed before EOI
  2024-01-31  8:19 [PATCH v2] irqchip/sifive-plic: enable interrupt if needed before EOI Nam Cao
  2024-02-13 10:26 ` Thomas Gleixner
@ 2024-02-19 14:08 ` tip-bot2 for Nam Cao
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Nam Cao @ 2024-02-19 14:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Nam Cao, Thomas Gleixner, Palmer Dabbelt, Paul Walmsley,
	Samuel Holland, Marc Zyngier, Guo Ren, linux-riscv, stable, x86,
	linux-kernel

The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     9c92006b896c767218aabe8947b62026a571cfd0
Gitweb:        https://git.kernel.org/tip/9c92006b896c767218aabe8947b62026a571cfd0
Author:        Nam Cao <namcao@linutronix.de>
AuthorDate:    Wed, 31 Jan 2024 09:19:33 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Feb 2024 15:05:18 +01:00

irqchip/sifive-plic: Enable interrupt if needed before EOI

RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as
explained in the description of Interrupt Completion in the PLIC spec:

"The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the claim/complete
register. The PLIC does not check whether the completion ID is the same
as the last claim ID for that target. If the completion ID does not match
an interrupt source that *is currently enabled* for the target, the
completion is silently ignored."

Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
ensured that EOI is successful by enabling interrupt first, before EOI.

Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
operations") removed the interrupt enabling code from the previous
commit, because it assumes that interrupt should already be enabled at the
point of EOI.

However, this is incorrect: there is a window after a hart claiming an
interrupt and before irq_desc->lock getting acquired, interrupt can be
disabled during this window. Thus, EOI can be invoked while the interrupt
is disabled, effectively nullify this EOI. This results in the interrupt
never gets asserted again, and the device who uses this interrupt appears
frozen.

Make sure that interrupt is really enabled before EOI.

Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Samuel Holland <samuel@sholland.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-riscv@lists.infradead.org
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240131081933.144512-1-namcao@linutronix.de
---
 drivers/irqchip/irq-sifive-plic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 5b7bc4f..bf0b40b 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
 {
 	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
 
-	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	if (unlikely(irqd_irq_disabled(d))) {
+		plic_toggle(handler, d->hwirq, 1);
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+		plic_toggle(handler, d->hwirq, 0);
+	} else {
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	}
 }
 
 #ifdef CONFIG_SMP

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] irqchip/sifive-plic: enable interrupt if needed before EOI
  2024-02-13 10:26 ` Thomas Gleixner
@ 2024-03-20 14:17   ` Palmer Dabbelt
  2024-03-20 15:12     ` Nam Cao
  0 siblings, 1 reply; 5+ messages in thread
From: Palmer Dabbelt @ 2024-03-20 14:17 UTC (permalink / raw)
  To: tglx
  Cc: namcao, Paul Walmsley, samuel, Marc Zyngier, guoren, linux-kernel,
	linux-riscv, namcao, stable

On Tue, 13 Feb 2024 02:26:40 PST (-0800), tglx@linutronix.de wrote:
> Nam!
>
> On Wed, Jan 31 2024 at 09:19, Nam Cao wrote:
>> RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as
>> explained in the description of Interrupt Completion in the PLIC spec:
>>
>> "The PLIC signals it has completed executing an interrupt handler by
>> writing the interrupt ID it received from the claim to the claim/complete
>> register. The PLIC does not check whether the completion ID is the same
>> as the last claim ID for that target. If the completion ID does not match
>> an interrupt source that *is currently enabled* for the target, the
>> completion is silently ignored."
>>
>> Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
>> ensured that EOI is successful by enabling interrupt first, before EOI.
>>
>> Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
>> operations") removed the interrupt enabling code from the previous
>> commit, because it assumes that interrupt should already be enabled at the
>> point of EOI. However, this is incorrect: there is a window after a hart
>> claiming an interrupt and before irq_desc->lock getting acquired,
>> interrupt can be disabled during this window. Thus, EOI can be invoked
>> while the interrupt is disabled, effectively nullify this EOI. This
>> results in the interrupt never gets asserted again, and the device who
>> uses this interrupt appears frozen.
>
> Nice detective work!
>
>> Make sure that interrupt is really enabled before EOI.
>>
>> Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Nam Cao <namcao@linutronix.de>
>> ---
>> v2:
>>   - add unlikely() for optimization
>>   - re-word commit message to make it clearer
>>
>>  drivers/irqchip/irq-sifive-plic.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
>> index e1484905b7bd..0a233e9d9607 100644
>> --- a/drivers/irqchip/irq-sifive-plic.c
>> +++ b/drivers/irqchip/irq-sifive-plic.c
>> @@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
>>  {
>>  	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
>>
>> -	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
>> +	if (unlikely(irqd_irq_disabled(d))) {
>> +		plic_toggle(handler, d->hwirq, 1);
>> +		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
>> +		plic_toggle(handler, d->hwirq, 0);
>
> It's unfortunate to have this condition in the hotpath, though it should
> be cache hot, easy to predict and compared to the writel() completely in
> the noise.

Ya, I think it's fine.

I guess we could try and play some tricks.  Maybe hide the load latency 
with a relaxed writel and some explict fencing, or claim interrupts when 
enabling them.  Those both seem somewhat race-prone, though, so I'm not 
even sure if they're sane.

Anything with a PLIC is going to have pretty poor interrupt latency 
already, so I doubt it's worth the headache.

>> +	} else {
>> +		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
>> +	}
>>  }
>
> Can the RISCV folks please have a look at this?

Sorry I missed this.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

in case anyone was worried, though I saw it got merged so I think we're 
safe there.  I'm always a bit lost with the IRQ stuff, I didn't even 
know that race condition was posisble.

Thanks for the fix!

>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] irqchip/sifive-plic: enable interrupt if needed before EOI
  2024-03-20 14:17   ` Palmer Dabbelt
@ 2024-03-20 15:12     ` Nam Cao
  0 siblings, 0 replies; 5+ messages in thread
From: Nam Cao @ 2024-03-20 15:12 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: tglx, Paul Walmsley, samuel, Marc Zyngier, guoren, linux-kernel,
	linux-riscv, stable

On 20/Mar/2024 Palmer Dabbelt wrote:
> On Tue, 13 Feb 2024 02:26:40 PST (-0800), tglx@linutronix.de wrote:
> > Nam!
> >
> > On Wed, Jan 31 2024 at 09:19, Nam Cao wrote:  
> >> RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as
> >> explained in the description of Interrupt Completion in the PLIC spec:
> >>
> >> "The PLIC signals it has completed executing an interrupt handler by
> >> writing the interrupt ID it received from the claim to the claim/complete
> >> register. The PLIC does not check whether the completion ID is the same
> >> as the last claim ID for that target. If the completion ID does not match
> >> an interrupt source that *is currently enabled* for the target, the
> >> completion is silently ignored."
> >>
> >> Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked")
> >> ensured that EOI is successful by enabling interrupt first, before EOI.
> >>
> >> Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
> >> operations") removed the interrupt enabling code from the previous
> >> commit, because it assumes that interrupt should already be enabled at the
> >> point of EOI. However, this is incorrect: there is a window after a hart
> >> claiming an interrupt and before irq_desc->lock getting acquired,
> >> interrupt can be disabled during this window. Thus, EOI can be invoked
> >> while the interrupt is disabled, effectively nullify this EOI. This
> >> results in the interrupt never gets asserted again, and the device who
> >> uses this interrupt appears frozen.  
> >
> > Nice detective work!
> >  
> >> Make sure that interrupt is really enabled before EOI.
> >>
> >> Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
> >> Cc: <stable@vger.kernel.org>
> >> Signed-off-by: Nam Cao <namcao@linutronix.de>
> >> ---
> >> v2:
> >>   - add unlikely() for optimization
> >>   - re-word commit message to make it clearer
> >>
> >>  drivers/irqchip/irq-sifive-plic.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> >> index e1484905b7bd..0a233e9d9607 100644
> >> --- a/drivers/irqchip/irq-sifive-plic.c
> >> +++ b/drivers/irqchip/irq-sifive-plic.c
> >> @@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
> >>  {
> >>  	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
> >>
> >> -	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> >> +	if (unlikely(irqd_irq_disabled(d))) {
> >> +		plic_toggle(handler, d->hwirq, 1);
> >> +		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
> >> +		plic_toggle(handler, d->hwirq, 0);  
> >
> > It's unfortunate to have this condition in the hotpath, though it should
> > be cache hot, easy to predict and compared to the writel() completely in
> > the noise.  
> 
> Ya, I think it's fine.
> 
> I guess we could try and play some tricks.  Maybe hide the load latency 
> with a relaxed writel and some explict fencing, or claim interrupts when 
                                                     ^ you mean complete?
> enabling them.  Those both seem somewhat race-prone, though, so I'm not 
> even sure if they're sane.

The latter option is what I also have in mind. Just need to make sure the
interrupt is masked and we should be safe. Though there is the question of
whether it's worth the effort.

I may do that one day when I stop being lazy.

Best regards,
Nam


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-20 15:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-31  8:19 [PATCH v2] irqchip/sifive-plic: enable interrupt if needed before EOI Nam Cao
2024-02-13 10:26 ` Thomas Gleixner
2024-03-20 14:17   ` Palmer Dabbelt
2024-03-20 15:12     ` Nam Cao
2024-02-19 14:08 ` [tip: irq/urgent] irqchip/sifive-plic: Enable " tip-bot2 for Nam Cao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox