[PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags
@ 2023-05-30 21:38 James Gowans
  2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans
  0 siblings, 1 reply; 6+ messages in thread
From: James Gowans @ 2023-05-30 21:38 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, James Gowans, Marc Zyngier, Liao Chang

Adding a bit more info about what the flags are used for may help future
code readers.

Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Liao Chang <liaochang1@huawei.com>
---
 kernel/irq/internals.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 5fdc0b557579..c443a0ddc07e 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -47,9 +47,12 @@ enum {
  *				  detection
  * IRQS_POLL_INPROGRESS		- polling in progress
  * IRQS_ONESHOT			- irq is not unmasked in primary handler
- * IRQS_REPLAY			- irq is replayed
+ * IRQS_REPLAY			- irq has been resent and will not be resent
+ * 				  again until the handler has run and cleared
+ * 				  this flag.
  * IRQS_WAITING			- irq is waiting
- * IRQS_PENDING			- irq is pending and replayed later
+ * IRQS_PENDING			- irq needs to be resent and should be resent
+ * 				  at the next available opportunity.
  * IRQS_SUSPENDED		- irq is suspended
  * IRQS_NMI			- irq line is used to deliver NMIs
  * IRQS_SYSFS			- descriptor has been added to sysfs
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke
  2023-05-30 21:38 [PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags James Gowans
@ 2023-05-30 21:38 ` James Gowans
  2023-05-30 23:32   ` Randy Dunlap
  2023-05-31  7:00   ` Marc Zyngier
  0 siblings, 2 replies; 6+ messages in thread
From: James Gowans @ 2023-05-30 21:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, James Gowans, Liao Chang, Marc Zyngier,
	KarimAllah Raslan, Yipeng Zou, Zhang Jianhua

Update the generic handle_fasteoi_irq to cater for the case when the
next interrupt comes in while the previous handler is still running.
Currently when that happens the irq_may_run() early out causes the next
IRQ to be lost. Change the behaviour to mark the interrupt as pending
and re-send the interrupt when handle_fasteoi_irq sees that the pending
flag has been set. This is largely inspired by handle_edge_irq.

Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the next interrupt should
only arrive after the EOI message has been sent and the previous handler
has returned. However, there is a race where if the interrupt affinity
is changed while the previous handler is running, then the next
interrupt can arrive at a different CPU while the previous handler is
still running. In that case there will be a concurrent invoke and the
early out will be taken.

For example:

           CPU 0             |          CPU 1
-----------------------------|-----------------------------
interrupt start              |
  handle_fasteoi_irq         | set_affinity(CPU 1)
    handler                  |
    ...                      | interrupt start
    ...                      |   handle_fasteoi_irq -> early out
  handle_fasteoi_irq return  | interrupt end
interrupt end                |

This issue was observed specifically on an arm64 system with a GIC-v3
handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
that the global ITS is responsible for affinity but does not know
whether interrupts are pending/running, only the CPU-local redistributor
handles the EOI. Hence when the affinity is changed in the ITS, the new
CPU's redistributor does not know that the original CPU is still running
the handler.

Implementation notes:

It is believed that it's NOT necessary to mask the interrupt in
handle_fasteoi_irq() the way that handle_edge_irq() does. This is
because handle_edge_irq() caters for controllers which are too simple to
gate interrupts from the same source, so the kernel explicitly masks the
interrupt if it re-occurs [0].

[0] https://lore.kernel.org/all/bf94a380-fadd-8c38-cc51-4b54711d84b3@huawei.com/

Suggested-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
---
 kernel/irq/chip.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 49e7bc871fec..42f33e77c16b 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc)

 	raw_spin_lock(&desc->lock);

-	if (!irq_may_run(desc))
+	/*
+	 * When an affinity change races with IRQ delivery, the next interrupt
+	 * can arrive on the new CPU before the original CPU has completed
+	 * handling the previous one. Mark it as pending and return EOI.
+	 */
+	if (!irq_may_run(desc)) {
+		desc->istate |= IRQS_PENDING;
 		goto out;
+	}

 	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);

@@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)

 	cond_unmask_eoi_irq(desc, chip);

+	/*
+	 * When the race descibed above happens, this will resend the interrupt.
+	 */
+	if (unlikely(desc->istate & IRQS_PENDING))
+		check_irq_resend(desc, false);
+
 	raw_spin_unlock(&desc->lock);
 	return;
 out:
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke
  2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans
@ 2023-05-30 23:32   ` Randy Dunlap
  2023-05-31  7:00   ` Marc Zyngier
  1 sibling, 0 replies; 6+ messages in thread
From: Randy Dunlap @ 2023-05-30 23:32 UTC (permalink / raw)
  To: James Gowans, Thomas Gleixner
  Cc: linux-kernel, Liao Chang, Marc Zyngier, KarimAllah Raslan,
	Yipeng Zou, Zhang Jianhua

Hi--

On 5/30/23 14:38, James Gowans wrote:

> ---
>  kernel/irq/chip.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 49e7bc871fec..42f33e77c16b 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>  
>  	raw_spin_lock(&desc->lock);
>  
> -	if (!irq_may_run(desc))
> +	/*
> +	 * When an affinity change races with IRQ delivery, the next interrupt
> +	 * can arrive on the new CPU before the original CPU has completed
> +	 * handling the previous one. Mark it as pending and return EOI.
> +	 */
> +	if (!irq_may_run(desc)) {
> +		desc->istate |= IRQS_PENDING;
>  		goto out;
> +	}
>  
>  	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
>  
> @@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>  
>  	cond_unmask_eoi_irq(desc, chip);
>  
> +	/*
> +	 * When the race descibed above happens, this will resend the interrupt.

	                 described

> +	 */
> +	if (unlikely(desc->istate & IRQS_PENDING))
> +		check_irq_resend(desc, false);
> +
>  	raw_spin_unlock(&desc->lock);
>  	return;
>  out:

-- 
~Randy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke
  2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans
  2023-05-30 23:32   ` Randy Dunlap
@ 2023-05-31  7:00   ` Marc Zyngier
  2023-06-01  7:24     ` Gowans, James
  1 sibling, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2023-05-31  7:00 UTC (permalink / raw)
  To: James Gowans
  Cc: Thomas Gleixner, linux-kernel, Liao Chang, KarimAllah Raslan,
	Yipeng Zou, Zhang Jianhua

On Tue, 30 May 2023 22:38:48 +0100,
James Gowans <jgowans@amazon.com> wrote:
> 
> Update the generic handle_fasteoi_irq to cater for the case when the
> next interrupt comes in while the previous handler is still running.
> Currently when that happens the irq_may_run() early out causes the next
> IRQ to be lost. Change the behaviour to mark the interrupt as pending
> and re-send the interrupt when handle_fasteoi_irq sees that the pending
> flag has been set. This is largely inspired by handle_edge_irq.
> 
> Generally it should not be possible for the next interrupt to arrive
> while the previous handler is still running: the next interrupt should
> only arrive after the EOI message has been sent and the previous handler
> has returned.

There is no such message with LPIs. I pointed that out previously.

> However, there is a race where if the interrupt affinity
> is changed while the previous handler is running, then the next
> interrupt can arrive at a different CPU while the previous handler is
> still running. In that case there will be a concurrent invoke and the
> early out will be taken.
> 
> For example:
> 
>            CPU 0             |          CPU 1
> -----------------------------|-----------------------------
> interrupt start              |
>   handle_fasteoi_irq         | set_affinity(CPU 1)
>     handler                  |
>     ...                      | interrupt start
>     ...                      |   handle_fasteoi_irq -> early out
>   handle_fasteoi_irq return  | interrupt end
> interrupt end                |
> 
> This issue was observed specifically on an arm64 system with a GIC-v3
> handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
> that the global ITS is responsible for affinity but does not know
> whether interrupts are pending/running, only the CPU-local redistributor
> handles the EOI. Hence when the affinity is changed in the ITS, the new
> CPU's redistributor does not know that the original CPU is still running
> the handler.

Similar to your previous patch, you don't explain *why* the interrupt
gets delivered when it is an LPI, and not for any of the other GICv3
interrupt types. That's an important point.

> 
> Implementation notes:
> 
> It is believed that it's NOT necessary to mask the interrupt in
> handle_fasteoi_irq() the way that handle_edge_irq() does. This is
> because handle_edge_irq() caters for controllers which are too simple to
> gate interrupts from the same source, so the kernel explicitly masks the
> interrupt if it re-occurs [0].
> 
> [0] https://lore.kernel.org/all/bf94a380-fadd-8c38-cc51-4b54711d84b3@huawei.com/
> 
> Suggested-by: Liao Chang <liaochang1@huawei.com>
> Signed-off-by: James Gowans <jgowans@amazon.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: KarimAllah Raslan <karahmed@amazon.com>
> Cc: Yipeng Zou <zouyipeng@huawei.com>
> Cc: Zhang Jianhua <chris.zjh@huawei.com>
> ---
>  kernel/irq/chip.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 49e7bc871fec..42f33e77c16b 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>  
>  	raw_spin_lock(&desc->lock);
>  
> -	if (!irq_may_run(desc))
> +	/*
> +	 * When an affinity change races with IRQ delivery, the next interrupt
> +	 * can arrive on the new CPU before the original CPU has completed
> +	 * handling the previous one. Mark it as pending and return EOI.
> +	 */
> +	if (!irq_may_run(desc)) {
> +		desc->istate |= IRQS_PENDING;
>  		goto out;
> +	}
>  
>  	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
>  
> @@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>  
>  	cond_unmask_eoi_irq(desc, chip);
>  
> +	/*
> +	 * When the race descibed above happens, this will resend the interrupt.
> +	 */
> +	if (unlikely(desc->istate & IRQS_PENDING))
> +		check_irq_resend(desc, false);
> +
>  	raw_spin_unlock(&desc->lock);
>  	return;
>  out:

While I'm glad that you eventually decided to use the resend mechanism
instead of spinning on the "old" CPU, I still think imposing this
behaviour on all users without any discrimination is wrong.

Look at what it does if an interrupt is a wake-up source. You'd
pointlessly requeue the interrupt (bonus points if the irqchip doesn't
provide a HW-based retrigger mechanism).

I still maintain that this change should only be applied for the
particular interrupts that *require* it, and not as a blanket change
affecting everything under the sun. I have proposed such a change in
the past, feel free to use it or roll your own.

In the meantime, I strongly oppose this change as proposed.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke
  2023-05-31  7:00   ` Marc Zyngier
@ 2023-06-01  7:24     ` Gowans, James
  2023-06-05 16:10       ` Gowans, James
  0 siblings, 1 reply; 6+ messages in thread
From: Gowans, James @ 2023-06-01  7:24 UTC (permalink / raw)
  To: maz@kernel.org
  Cc: tglx@linutronix.de, Raslan, KarimAllah, liaochang1@huawei.com,
	linux-kernel@vger.kernel.org, zouyipeng@huawei.com,
	chris.zjh@huawei.com

On Wed, 2023-05-31 at 08:00 +0100, Marc Zyngier wrote:
> > Generally it should not be possible for the next interrupt to arrive
> > while the previous handler is still running: the next interrupt should
> > only arrive after the EOI message has been sent and the previous handler
> > has returned.
> 
> There is no such message with LPIs. I pointed that out previously.

Arg, thanks, I'll re-word this to:

"Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the CPU will not preempt an
interrupt with another from the same source or same priority."

I hope that's more accurate?

> > This issue was observed specifically on an arm64 system with a GIC-v3
> > handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
> > that the global ITS is responsible for affinity but does not know
> > whether interrupts are pending/running, only the CPU-local redistributor
> > handles the EOI. Hence when the affinity is changed in the ITS, the new
> > CPU's redistributor does not know that the original CPU is still running
> > the handler.
> 
> Similar to your previous patch, you don't explain *why* the interrupt
> gets delivered when it is an LPI, and not for any of the other GICv3
> interrupt types. That's an important point.

Right, you pointed out the issue with this sentence too and I missed
updating it. :-/ How about:

"This issue was observed specifically on an arm64 system with a GIC-v3
handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
that the GIC-v3's physical LPIs do not have a global active state. If LPIs
had an active state, then it would not be be able to be retriggered until
the first CPU had issued a deactivation"

> 
> > 
> > +     /*
> > +      * When the race descibed above happens, this will resend the interrupt.
> > +      */
> > +     if (unlikely(desc->istate & IRQS_PENDING))
> > +             check_irq_resend(desc, false);
> > +
> >       raw_spin_unlock(&desc->lock);
> >       return;
> >  out:
> 
> While I'm glad that you eventually decided to use the resend mechanism
> instead of spinning on the "old" CPU, I still think imposing this
> behaviour on all users without any discrimination is wrong.
> 
> Look at what it does if an interrupt is a wake-up source. You'd
> pointlessly requeue the interrupt (bonus points if the irqchip doesn't
> provide a HW-based retrigger mechanism).
> 
> I still maintain that this change should only be applied for the
> particular interrupts that *require* it, and not as a blanket change
> affecting everything under the sun. I have proposed such a change in
> the past, feel free to use it or roll your own.

Thanks for the example of where this blanket functionality wouldn't be
desired - I'll re-work this to introduce and use
the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested.

Just one more thing before I post V3: are you okay with doing the resend
here *after* the handler finished running, and using the IRQ_PENDING flag
to know to resend it? Or would you like it to be resent in
the !irq_may_run(desc) block as you suggested?

I have a slight preference to do it after, only when we know it's ready to
be run again, and hence not needed to modify check_irq_resend() to cater
for multiple retries.

JG

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke
  2023-06-01  7:24     ` Gowans, James
@ 2023-06-05 16:10       ` Gowans, James
  0 siblings, 0 replies; 6+ messages in thread
From: Gowans, James @ 2023-06-05 16:10 UTC (permalink / raw)
  To: maz@kernel.org
  Cc: tglx@linutronix.de, Raslan, KarimAllah, liaochang1@huawei.com,
	linux-kernel@vger.kernel.org, zouyipeng@huawei.com,
	chris.zjh@huawei.com

Hi Marc,

On Thu, 2023-06-01 at 09:24 +0200, James Gowans wrote:
> While I'm glad that you eventually decided to use the resend mechanism
> > instead of spinning on the "old" CPU, I still think imposing this
> > behaviour on all users without any discrimination is wrong.
> > 
> > Look at what it does if an interrupt is a wake-up source. You'd
> > pointlessly requeue the interrupt (bonus points if the irqchip doesn't
> > provide a HW-based retrigger mechanism).
> > 
> > I still maintain that this change should only be applied for the
> > particular interrupts that *require* it, and not as a blanket change
> > affecting everything under the sun. I have proposed such a change in
> > the past, feel free to use it or roll your own.
> 
> Thanks for the example of where this blanket functionality wouldn't be
> desired - I'll re-work this to introduce and use
> the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested.
> 
> Just one more thing before I post V3: are you okay with doing the resend
> here *after* the handler finished running, and using the IRQ_PENDING flag
> to know to resend it? Or would you like it to be resent in
> the !irq_may_run(desc) block as you suggested?
> 
> I have a slight preference to do it after, only when we know it's ready to
> be run again, and hence not needed to modify check_irq_resend() to cater
> for multiple retries.

Hoping/assuming that you're okay with the keeping the resend at the end of
the function, here's V3:
https://lore.kernel.org/lkml/20230605155723.2628097-2-jgowans@amazon.com/

It's now very close to what to suggested originally. :-)

One thing: I'm not totally sure that it's necessary or correct to set this
flag on its_vpe_irq_domain_alloc() too - on my system (r6g.metal Graviton
2 host) that doesn't seem to be run. My guess is that it's for a GIC-v4
system which supports posted interrupt which generally get delivery
directly to the vCPU if running, but sometimes need to be delivered to the
hypervisor (for example if vCPU not running). I can try test this on
r7g.metal which might have GIC-v4 to be sure...

JG

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-05 16:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-30 21:38 [PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags James Gowans
2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans
2023-05-30 23:32   ` Randy Dunlap
2023-05-31  7:00   ` Marc Zyngier
2023-06-01  7:24     ` Gowans, James
2023-06-05 16:10       ` Gowans, James

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox