* Query on handling some special Group0 interrupt in Linux @ 2022-11-09 16:20 Mukesh Ojha 2022-11-09 18:20 ` Marc Zyngier 0 siblings, 1 reply; 4+ messages in thread From: Mukesh Ojha @ 2022-11-09 16:20 UTC (permalink / raw) To: maz, linux-arm-kernel, catalin.marinas, will, Thomas Gleixner; +Cc: lkml Hi, I was working on a use case where both el2/el3 are implemented and we have a watchdog interrupt (SPI), which is used for detecting software hangs and cause device reset; If that interrupt's current cpu affinity is on a core, where interrupts are disabled, we won't be able to serve it or if this interrupt comes on a core which has interrupt enabled, calling panic() or with smp_send_stop(), we would not be able to know the call stack of the other cores which is running with interrupt disabled. I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP (SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler, I was thinking of calling panic() which eventually sends IPI_STOP(SGI FIQ) to all the cores. And with this we will able to dump all the core call stack. I am able to achieve this but wanted to know if this is acceptable to the community to support/allow such use cases like above and enable group0 interrupt from GIC for some special use cases. -Mukesh _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Query on handling some special Group0 interrupt in Linux 2022-11-09 16:20 Query on handling some special Group0 interrupt in Linux Mukesh Ojha @ 2022-11-09 18:20 ` Marc Zyngier 2022-11-09 19:57 ` Mukesh Ojha 0 siblings, 1 reply; 4+ messages in thread From: Marc Zyngier @ 2022-11-09 18:20 UTC (permalink / raw) To: Mukesh Ojha Cc: linux-arm-kernel, catalin.marinas, will, Thomas Gleixner, lkml On Wed, 09 Nov 2022 16:20:35 +0000, Mukesh Ojha <quic_mojha@quicinc.com> wrote: > > Hi, > > I was working on a use case where both el2/el3 are implemented and we > have a watchdog interrupt (SPI), which is used for detecting software > hangs and cause device reset; If that interrupt's current cpu affinity > is on a core, where interrupts are disabled, we won't be able to serve > it or if this interrupt comes on a core which has interrupt enabled, > calling panic() or with smp_send_stop(), we would not be able > to know the call stack of the other cores which is running with > interrupt disabled. > > I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP > (SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler, > I was thinking of calling panic() which eventually sends IPI_STOP(SGI > FIQ) to all the cores. And with this we will able to dump all the core > call stack. > > I am able to achieve this but wanted to know if this is acceptable to > the community to support/allow such use cases like above and enable > group0 interrupt from GIC for some special use cases. For a start, we only deal with Group-1 interrupts in Linux. Group-0 interrupts are for the firmware, and we really don't want to see them (this is consistent with your HW having EL3). We also mask IRQ and FIQ at the same time, so this is a non-starter. If you want to be able to deliver an interrupt while the interrupts are masked, what you are looking for is the NMI framework, for which you can register SPIs as (pseudo-)NMI. This is of course assuming that you're using GICv3. If you're using an older version of the architecture, we don't have a good solution for you, unfortunately. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Query on handling some special Group0 interrupt in Linux 2022-11-09 18:20 ` Marc Zyngier @ 2022-11-09 19:57 ` Mukesh Ojha 2022-11-10 7:54 ` Marc Zyngier 0 siblings, 1 reply; 4+ messages in thread From: Mukesh Ojha @ 2022-11-09 19:57 UTC (permalink / raw) To: Marc Zyngier Cc: linux-arm-kernel, catalin.marinas, will, Thomas Gleixner, lkml Hi Marc, Thanks for your reply. On 11/9/2022 11:50 PM, Marc Zyngier wrote: > On Wed, 09 Nov 2022 16:20:35 +0000, > Mukesh Ojha <quic_mojha@quicinc.com> wrote: >> >> Hi, >> >> I was working on a use case where both el2/el3 are implemented and we >> have a watchdog interrupt (SPI), which is used for detecting software >> hangs and cause device reset; If that interrupt's current cpu affinity >> is on a core, where interrupts are disabled, we won't be able to serve >> it or if this interrupt comes on a core which has interrupt enabled, >> calling panic() or with smp_send_stop(), we would not be able >> to know the call stack of the other cores which is running with >> interrupt disabled. >> >> I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP >> (SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler, >> I was thinking of calling panic() which eventually sends IPI_STOP(SGI >> FIQ) to all the cores. And with this we will able to dump all the core >> call stack. >> >> I am able to achieve this but wanted to know if this is acceptable to >> the community to support/allow such use cases like above and enable >> group0 interrupt from GIC for some special use cases. > > For a start, we only deal with Group-1 interrupts in Linux. Group-0 > interrupts are for the firmware, and we really don't want to see them > (this is consistent with your HW having EL3). What is the downside of it we support this ? I see one of the implementation here. https://elixir.bootlin.com/linux/v6.0.7/source/drivers/irqchip/irq-apple-aic.c#L510 >We also mask IRQ and FIQ at the same time, so this is a non-starter. This can be taken care if we support this. > > If you want to be able to deliver an interrupt while the interrupts > are masked, what you are looking for is the NMI framework, for which > you can register SPIs as (pseudo-)NMI. Yes, kind of NMI. I have already looked into this. Since, in our system El2 is implemented and each physical interrupt get routed to hypervisor and later vIrq comes to El1 and each interrupt enable/disable call exercise pmr register trap can cause latency in regular run(like multiple VM). Since, some of the use-case could be special like i have mentioned in my initial mail where such interrupt will be fatal and system will get reset after that. I am not able to think of any other use case than this but can this not be considered as one of the feature. > > This is of course assuming that you're using GICv3. If you're using an > older version of the architecture, we don't have a good solution for > you, unfortunately. > we are using GICv3. > Thanks, > > M. > -Mukesh _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Query on handling some special Group0 interrupt in Linux 2022-11-09 19:57 ` Mukesh Ojha @ 2022-11-10 7:54 ` Marc Zyngier 0 siblings, 0 replies; 4+ messages in thread From: Marc Zyngier @ 2022-11-10 7:54 UTC (permalink / raw) To: Mukesh Ojha Cc: linux-arm-kernel, catalin.marinas, will, Thomas Gleixner, lkml On Wed, 09 Nov 2022 19:57:24 +0000, Mukesh Ojha <quic_mojha@quicinc.com> wrote: > > Hi Marc, > > Thanks for your reply. > > On 11/9/2022 11:50 PM, Marc Zyngier wrote: > > On Wed, 09 Nov 2022 16:20:35 +0000, > > Mukesh Ojha <quic_mojha@quicinc.com> wrote: > >> > >> Hi, > >> > >> I was working on a use case where both el2/el3 are implemented and we > >> have a watchdog interrupt (SPI), which is used for detecting software > >> hangs and cause device reset; If that interrupt's current cpu affinity > >> is on a core, where interrupts are disabled, we won't be able to serve > >> it or if this interrupt comes on a core which has interrupt enabled, > >> calling panic() or with smp_send_stop(), we would not be able > >> to know the call stack of the other cores which is running with > >> interrupt disabled. > >> > >> I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP > >> (SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler, > >> I was thinking of calling panic() which eventually sends IPI_STOP(SGI > >> FIQ) to all the cores. And with this we will able to dump all the core > >> call stack. > >> > >> I am able to achieve this but wanted to know if this is acceptable to > >> the community to support/allow such use cases like above and enable > >> group0 interrupt from GIC for some special use cases. > > > > For a start, we only deal with Group-1 interrupts in Linux. Group-0 > > interrupts are for the firmware, and we really don't want to see them > > (this is consistent with your HW having EL3). > > What is the downside of it we support this ? I see one of the > implementation here. > > https://elixir.bootlin.com/linux/v6.0.7/source/drivers/irqchip/irq-apple-aic.c#L510 You do realise that this system doesn't even have a GIC, and only uses FIQ to represent per-CPU interrupts, right? > > > We also mask IRQ and FIQ at the same time, so this is a non-starter. > This can be taken care if we support this. No. We've made the decision not to treat IRQ and FIQ differently, because FIQ only matters for systems with a single security domain such as VMs or wonky systems such as the above. With that, all systems behave the same and are treated the same, making the rules for interrupt preemption understandable and we don't have to think of IRQ and FIQ racing with each other. > > > > > If you want to be able to deliver an interrupt while the interrupts > > are masked, what you are looking for is the NMI framework, for which > > you can register SPIs as (pseudo-)NMI. > > Yes, kind of NMI. I have already looked into this. Since, in our > system El2 is implemented and each physical interrupt get routed to > hypervisor and later vIrq comes to El1 and each interrupt > enable/disable call exercise pmr register trap can cause latency in > regular run(like multiple VM). Then your hypervisor needs fixing. There is no need to trap accesses to PMR. Also, PMR being per-CPU, there should be no extra overhead depending on the number of VM even if you were trapping PMR (for example to work around broken HW). To sum it up, none of the above makes much sense to me. > Since, some of the use-case could be special like i have mentioned > in my initial mail where such interrupt will be fatal and system will > get reset after that. I am not able to think of any other use case than > this but can this not be considered as one of the feature. Well, we don't add stuff to the kernel based on idle considerations, and what you are describing so far matches 100% the requirement for an NMI-like feature. The architecture has two ways to implement almost-NMIs: interrupt priorities (our current crop of pseudo-NMIs) and the ARMv8.8 FEAT_NMI. The former is already there, and there are patches on the list for the latter. Do we need a third way that only works for odd corner cases and that adds a huge amount of complexity? No, thank you. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-11-10 7:56 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-09 16:20 Query on handling some special Group0 interrupt in Linux Mukesh Ojha 2022-11-09 18:20 ` Marc Zyngier 2022-11-09 19:57 ` Mukesh Ojha 2022-11-10 7:54 ` Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).