* [PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags @ 2023-05-30 21:38 James Gowans 2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans 0 siblings, 1 reply; 6+ messages in thread From: James Gowans @ 2023-05-30 21:38 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, James Gowans, Marc Zyngier, Liao Chang Adding a bit more info about what the flags are used for may help future code readers. Signed-off-by: James Gowans <jgowans@amazon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Liao Chang <liaochang1@huawei.com> --- kernel/irq/internals.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index 5fdc0b557579..c443a0ddc07e 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -47,9 +47,12 @@ enum { * detection * IRQS_POLL_INPROGRESS - polling in progress * IRQS_ONESHOT - irq is not unmasked in primary handler - * IRQS_REPLAY - irq is replayed + * IRQS_REPLAY - irq has been resent and will not be resent + * again until the handler has run and cleared + * this flag. * IRQS_WAITING - irq is waiting - * IRQS_PENDING - irq is pending and replayed later + * IRQS_PENDING - irq needs to be resent and should be resent + * at the next available opportunity. * IRQS_SUSPENDED - irq is suspended * IRQS_NMI - irq line is used to deliver NMIs * IRQS_SYSFS - descriptor has been added to sysfs -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke 2023-05-30 21:38 [PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags James Gowans @ 2023-05-30 21:38 ` James Gowans 2023-05-30 23:32 ` Randy Dunlap 2023-05-31 7:00 ` Marc Zyngier 0 siblings, 2 replies; 6+ messages in thread From: James Gowans @ 2023-05-30 21:38 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, James Gowans, Liao Chang, Marc Zyngier, KarimAllah Raslan, Yipeng Zou, Zhang Jianhua Update the generic handle_fasteoi_irq to cater for the case when the next interrupt comes in while the previous handler is still running. Currently when that happens the irq_may_run() early out causes the next IRQ to be lost. Change the behaviour to mark the interrupt as pending and re-send the interrupt when handle_fasteoi_irq sees that the pending flag has been set. This is largely inspired by handle_edge_irq. Generally it should not be possible for the next interrupt to arrive while the previous handler is still running: the next interrupt should only arrive after the EOI message has been sent and the previous handler has returned. However, there is a race where if the interrupt affinity is changed while the previous handler is running, then the next interrupt can arrive at a different CPU while the previous handler is still running. In that case there will be a concurrent invoke and the early out will be taken. For example: CPU 0 | CPU 1 -----------------------------|----------------------------- interrupt start | handle_fasteoi_irq | set_affinity(CPU 1) handler | ... | interrupt start ... | handle_fasteoi_irq -> early out handle_fasteoi_irq return | interrupt end interrupt end | This issue was observed specifically on an arm64 system with a GIC-v3 handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is that the global ITS is responsible for affinity but does not know whether interrupts are pending/running, only the CPU-local redistributor handles the EOI. Hence when the affinity is changed in the ITS, the new CPU's redistributor does not know that the original CPU is still running the handler. Implementation notes: It is believed that it's NOT necessary to mask the interrupt in handle_fasteoi_irq() the way that handle_edge_irq() does. This is because handle_edge_irq() caters for controllers which are too simple to gate interrupts from the same source, so the kernel explicitly masks the interrupt if it re-occurs [0]. [0] https://lore.kernel.org/all/bf94a380-fadd-8c38-cc51-4b54711d84b3@huawei.com/ Suggested-by: Liao Chang <liaochang1@huawei.com> Signed-off-by: James Gowans <jgowans@amazon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Cc: KarimAllah Raslan <karahmed@amazon.com> Cc: Yipeng Zou <zouyipeng@huawei.com> Cc: Zhang Jianhua <chris.zjh@huawei.com> --- kernel/irq/chip.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 49e7bc871fec..42f33e77c16b 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc) raw_spin_lock(&desc->lock); - if (!irq_may_run(desc)) + /* + * When an affinity change races with IRQ delivery, the next interrupt + * can arrive on the new CPU before the original CPU has completed + * handling the previous one. Mark it as pending and return EOI. + */ + if (!irq_may_run(desc)) { + desc->istate |= IRQS_PENDING; goto out; + } desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING); @@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc) cond_unmask_eoi_irq(desc, chip); + /* + * When the race descibed above happens, this will resend the interrupt. + */ + if (unlikely(desc->istate & IRQS_PENDING)) + check_irq_resend(desc, false); + raw_spin_unlock(&desc->lock); return; out: -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke 2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans @ 2023-05-30 23:32 ` Randy Dunlap 2023-05-31 7:00 ` Marc Zyngier 1 sibling, 0 replies; 6+ messages in thread From: Randy Dunlap @ 2023-05-30 23:32 UTC (permalink / raw) To: James Gowans, Thomas Gleixner Cc: linux-kernel, Liao Chang, Marc Zyngier, KarimAllah Raslan, Yipeng Zou, Zhang Jianhua Hi-- On 5/30/23 14:38, James Gowans wrote: > --- > kernel/irq/chip.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c > index 49e7bc871fec..42f33e77c16b 100644 > --- a/kernel/irq/chip.c > +++ b/kernel/irq/chip.c > @@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc) > > raw_spin_lock(&desc->lock); > > - if (!irq_may_run(desc)) > + /* > + * When an affinity change races with IRQ delivery, the next interrupt > + * can arrive on the new CPU before the original CPU has completed > + * handling the previous one. Mark it as pending and return EOI. > + */ > + if (!irq_may_run(desc)) { > + desc->istate |= IRQS_PENDING; > goto out; > + } > > desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING); > > @@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc) > > cond_unmask_eoi_irq(desc, chip); > > + /* > + * When the race descibed above happens, this will resend the interrupt. described > + */ > + if (unlikely(desc->istate & IRQS_PENDING)) > + check_irq_resend(desc, false); > + > raw_spin_unlock(&desc->lock); > return; > out: -- ~Randy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke 2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans 2023-05-30 23:32 ` Randy Dunlap @ 2023-05-31 7:00 ` Marc Zyngier 2023-06-01 7:24 ` Gowans, James 1 sibling, 1 reply; 6+ messages in thread From: Marc Zyngier @ 2023-05-31 7:00 UTC (permalink / raw) To: James Gowans Cc: Thomas Gleixner, linux-kernel, Liao Chang, KarimAllah Raslan, Yipeng Zou, Zhang Jianhua On Tue, 30 May 2023 22:38:48 +0100, James Gowans <jgowans@amazon.com> wrote: > > Update the generic handle_fasteoi_irq to cater for the case when the > next interrupt comes in while the previous handler is still running. > Currently when that happens the irq_may_run() early out causes the next > IRQ to be lost. Change the behaviour to mark the interrupt as pending > and re-send the interrupt when handle_fasteoi_irq sees that the pending > flag has been set. This is largely inspired by handle_edge_irq. > > Generally it should not be possible for the next interrupt to arrive > while the previous handler is still running: the next interrupt should > only arrive after the EOI message has been sent and the previous handler > has returned. There is no such message with LPIs. I pointed that out previously. > However, there is a race where if the interrupt affinity > is changed while the previous handler is running, then the next > interrupt can arrive at a different CPU while the previous handler is > still running. In that case there will be a concurrent invoke and the > early out will be taken. > > For example: > > CPU 0 | CPU 1 > -----------------------------|----------------------------- > interrupt start | > handle_fasteoi_irq | set_affinity(CPU 1) > handler | > ... | interrupt start > ... | handle_fasteoi_irq -> early out > handle_fasteoi_irq return | interrupt end > interrupt end | > > This issue was observed specifically on an arm64 system with a GIC-v3 > handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is > that the global ITS is responsible for affinity but does not know > whether interrupts are pending/running, only the CPU-local redistributor > handles the EOI. Hence when the affinity is changed in the ITS, the new > CPU's redistributor does not know that the original CPU is still running > the handler. Similar to your previous patch, you don't explain *why* the interrupt gets delivered when it is an LPI, and not for any of the other GICv3 interrupt types. That's an important point. > > Implementation notes: > > It is believed that it's NOT necessary to mask the interrupt in > handle_fasteoi_irq() the way that handle_edge_irq() does. This is > because handle_edge_irq() caters for controllers which are too simple to > gate interrupts from the same source, so the kernel explicitly masks the > interrupt if it re-occurs [0]. > > [0] https://lore.kernel.org/all/bf94a380-fadd-8c38-cc51-4b54711d84b3@huawei.com/ > > Suggested-by: Liao Chang <liaochang1@huawei.com> > Signed-off-by: James Gowans <jgowans@amazon.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Marc Zyngier <maz@kernel.org> > Cc: KarimAllah Raslan <karahmed@amazon.com> > Cc: Yipeng Zou <zouyipeng@huawei.com> > Cc: Zhang Jianhua <chris.zjh@huawei.com> > --- > kernel/irq/chip.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c > index 49e7bc871fec..42f33e77c16b 100644 > --- a/kernel/irq/chip.c > +++ b/kernel/irq/chip.c > @@ -692,8 +692,15 @@ void handle_fasteoi_irq(struct irq_desc *desc) > > raw_spin_lock(&desc->lock); > > - if (!irq_may_run(desc)) > + /* > + * When an affinity change races with IRQ delivery, the next interrupt > + * can arrive on the new CPU before the original CPU has completed > + * handling the previous one. Mark it as pending and return EOI. > + */ > + if (!irq_may_run(desc)) { > + desc->istate |= IRQS_PENDING; > goto out; > + } > > desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING); > > @@ -715,6 +722,12 @@ void handle_fasteoi_irq(struct irq_desc *desc) > > cond_unmask_eoi_irq(desc, chip); > > + /* > + * When the race descibed above happens, this will resend the interrupt. > + */ > + if (unlikely(desc->istate & IRQS_PENDING)) > + check_irq_resend(desc, false); > + > raw_spin_unlock(&desc->lock); > return; > out: While I'm glad that you eventually decided to use the resend mechanism instead of spinning on the "old" CPU, I still think imposing this behaviour on all users without any discrimination is wrong. Look at what it does if an interrupt is a wake-up source. You'd pointlessly requeue the interrupt (bonus points if the irqchip doesn't provide a HW-based retrigger mechanism). I still maintain that this change should only be applied for the particular interrupts that *require* it, and not as a blanket change affecting everything under the sun. I have proposed such a change in the past, feel free to use it or roll your own. In the meantime, I strongly oppose this change as proposed. Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke 2023-05-31 7:00 ` Marc Zyngier @ 2023-06-01 7:24 ` Gowans, James 2023-06-05 16:10 ` Gowans, James 0 siblings, 1 reply; 6+ messages in thread From: Gowans, James @ 2023-06-01 7:24 UTC (permalink / raw) To: maz@kernel.org Cc: tglx@linutronix.de, Raslan, KarimAllah, liaochang1@huawei.com, linux-kernel@vger.kernel.org, zouyipeng@huawei.com, chris.zjh@huawei.com On Wed, 2023-05-31 at 08:00 +0100, Marc Zyngier wrote: > > Generally it should not be possible for the next interrupt to arrive > > while the previous handler is still running: the next interrupt should > > only arrive after the EOI message has been sent and the previous handler > > has returned. > > There is no such message with LPIs. I pointed that out previously. Arg, thanks, I'll re-word this to: "Generally it should not be possible for the next interrupt to arrive while the previous handler is still running: the CPU will not preempt an interrupt with another from the same source or same priority." I hope that's more accurate? > > This issue was observed specifically on an arm64 system with a GIC-v3 > > handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is > > that the global ITS is responsible for affinity but does not know > > whether interrupts are pending/running, only the CPU-local redistributor > > handles the EOI. Hence when the affinity is changed in the ITS, the new > > CPU's redistributor does not know that the original CPU is still running > > the handler. > > Similar to your previous patch, you don't explain *why* the interrupt > gets delivered when it is an LPI, and not for any of the other GICv3 > interrupt types. That's an important point. Right, you pointed out the issue with this sentence too and I missed updating it. :-/ How about: "This issue was observed specifically on an arm64 system with a GIC-v3 handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is that the GIC-v3's physical LPIs do not have a global active state. If LPIs had an active state, then it would not be be able to be retriggered until the first CPU had issued a deactivation" > > > > > + /* > > + * When the race descibed above happens, this will resend the interrupt. > > + */ > > + if (unlikely(desc->istate & IRQS_PENDING)) > > + check_irq_resend(desc, false); > > + > > raw_spin_unlock(&desc->lock); > > return; > > out: > > While I'm glad that you eventually decided to use the resend mechanism > instead of spinning on the "old" CPU, I still think imposing this > behaviour on all users without any discrimination is wrong. > > Look at what it does if an interrupt is a wake-up source. You'd > pointlessly requeue the interrupt (bonus points if the irqchip doesn't > provide a HW-based retrigger mechanism). > > I still maintain that this change should only be applied for the > particular interrupts that *require* it, and not as a blanket change > affecting everything under the sun. I have proposed such a change in > the past, feel free to use it or roll your own. Thanks for the example of where this blanket functionality wouldn't be desired - I'll re-work this to introduce and use the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested. Just one more thing before I post V3: are you okay with doing the resend here *after* the handler finished running, and using the IRQ_PENDING flag to know to resend it? Or would you like it to be resent in the !irq_may_run(desc) block as you suggested? I have a slight preference to do it after, only when we know it's ready to be run again, and hence not needed to modify check_irq_resend() to cater for multiple retries. JG ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke 2023-06-01 7:24 ` Gowans, James @ 2023-06-05 16:10 ` Gowans, James 0 siblings, 0 replies; 6+ messages in thread From: Gowans, James @ 2023-06-05 16:10 UTC (permalink / raw) To: maz@kernel.org Cc: tglx@linutronix.de, Raslan, KarimAllah, liaochang1@huawei.com, linux-kernel@vger.kernel.org, zouyipeng@huawei.com, chris.zjh@huawei.com Hi Marc, On Thu, 2023-06-01 at 09:24 +0200, James Gowans wrote: > While I'm glad that you eventually decided to use the resend mechanism > > instead of spinning on the "old" CPU, I still think imposing this > > behaviour on all users without any discrimination is wrong. > > > > Look at what it does if an interrupt is a wake-up source. You'd > > pointlessly requeue the interrupt (bonus points if the irqchip doesn't > > provide a HW-based retrigger mechanism). > > > > I still maintain that this change should only be applied for the > > particular interrupts that *require* it, and not as a blanket change > > affecting everything under the sun. I have proposed such a change in > > the past, feel free to use it or roll your own. > > Thanks for the example of where this blanket functionality wouldn't be > desired - I'll re-work this to introduce and use > the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested. > > Just one more thing before I post V3: are you okay with doing the resend > here *after* the handler finished running, and using the IRQ_PENDING flag > to know to resend it? Or would you like it to be resent in > the !irq_may_run(desc) block as you suggested? > > I have a slight preference to do it after, only when we know it's ready to > be run again, and hence not needed to modify check_irq_resend() to cater > for multiple retries. Hoping/assuming that you're okay with the keeping the resend at the end of the function, here's V3: https://lore.kernel.org/lkml/20230605155723.2628097-2-jgowans@amazon.com/ It's now very close to what to suggested originally. :-) One thing: I'm not totally sure that it's necessary or correct to set this flag on its_vpe_irq_domain_alloc() too - on my system (r6g.metal Graviton 2 host) that doesn't seem to be run. My guess is that it's for a GIC-v4 system which supports posted interrupt which generally get delivery directly to the vCPU if running, but sometimes need to be delivered to the hypervisor (for example if vCPU not running). I can try test this on r7g.metal which might have GIC-v4 to be sure... JG ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-06-05 16:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-30 21:38 [PATCH 1/2] genirq: Expand doc for PENDING and REPLAY flags James Gowans 2023-05-30 21:38 ` [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent invoke James Gowans 2023-05-30 23:32 ` Randy Dunlap 2023-05-31 7:00 ` Marc Zyngier 2023-06-01 7:24 ` Gowans, James 2023-06-05 16:10 ` Gowans, James
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox