linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Query on handling some special Group0 interrupt in Linux
@ 2022-11-09 16:20 Mukesh Ojha
  2022-11-09 18:20 ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Mukesh Ojha @ 2022-11-09 16:20 UTC (permalink / raw)
  To: maz, linux-arm-kernel, catalin.marinas, will, Thomas Gleixner; +Cc: lkml

Hi,

I was working on a use case where both el2/el3 are implemented and we 
have a watchdog interrupt (SPI), which is used for detecting software 
hangs and cause device reset; If that interrupt's current cpu affinity 
is on a core, where interrupts are disabled, we won't be able to serve 
it or if this interrupt comes on a core which has interrupt enabled, 
calling panic() or with smp_send_stop(), we would not be able
to know the call stack of the other cores which is running with 
interrupt disabled.

I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP 
(SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler,
I was thinking of calling panic() which eventually sends IPI_STOP(SGI 
FIQ) to all the cores. And with this we will able to dump all the core 
call stack.

I am able to achieve this but wanted to know if this is acceptable to 
the community to support/allow such use cases like above and enable 
group0 interrupt from GIC for some special use cases.

-Mukesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Query on handling some special Group0 interrupt in Linux
  2022-11-09 16:20 Query on handling some special Group0 interrupt in Linux Mukesh Ojha
@ 2022-11-09 18:20 ` Marc Zyngier
  2022-11-09 19:57   ` Mukesh Ojha
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2022-11-09 18:20 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: linux-arm-kernel, catalin.marinas, will, Thomas Gleixner, lkml

On Wed, 09 Nov 2022 16:20:35 +0000,
Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> 
> Hi,
> 
> I was working on a use case where both el2/el3 are implemented and we
> have a watchdog interrupt (SPI), which is used for detecting software
> hangs and cause device reset; If that interrupt's current cpu affinity
> is on a core, where interrupts are disabled, we won't be able to serve
> it or if this interrupt comes on a core which has interrupt enabled,
> calling panic() or with smp_send_stop(), we would not be able
> to know the call stack of the other cores which is running with
> interrupt disabled.
> 
> I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP
> (SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler,
> I was thinking of calling panic() which eventually sends IPI_STOP(SGI
> FIQ) to all the cores. And with this we will able to dump all the core
> call stack.
> 
> I am able to achieve this but wanted to know if this is acceptable to
> the community to support/allow such use cases like above and enable
> group0 interrupt from GIC for some special use cases.

For a start, we only deal with Group-1 interrupts in Linux. Group-0
interrupts are for the firmware, and we really don't want to see them
(this is consistent with your HW having EL3). We also mask IRQ and FIQ
at the same time, so this is a non-starter.

If you want to be able to deliver an interrupt while the interrupts
are masked, what you are looking for is the NMI framework, for which
you can register SPIs as (pseudo-)NMI.

This is of course assuming that you're using GICv3. If you're using an
older version of the architecture, we don't have a good solution for
you, unfortunately.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Query on handling some special Group0 interrupt in Linux
  2022-11-09 18:20 ` Marc Zyngier
@ 2022-11-09 19:57   ` Mukesh Ojha
  2022-11-10  7:54     ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Mukesh Ojha @ 2022-11-09 19:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, catalin.marinas, will, Thomas Gleixner, lkml

Hi Marc,

Thanks for your reply.

On 11/9/2022 11:50 PM, Marc Zyngier wrote:
> On Wed, 09 Nov 2022 16:20:35 +0000,
> Mukesh Ojha <quic_mojha@quicinc.com> wrote:
>>
>> Hi,
>>
>> I was working on a use case where both el2/el3 are implemented and we
>> have a watchdog interrupt (SPI), which is used for detecting software
>> hangs and cause device reset; If that interrupt's current cpu affinity
>> is on a core, where interrupts are disabled, we won't be able to serve
>> it or if this interrupt comes on a core which has interrupt enabled,
>> calling panic() or with smp_send_stop(), we would not be able
>> to know the call stack of the other cores which is running with
>> interrupt disabled.
>>
>> I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP
>> (SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler,
>> I was thinking of calling panic() which eventually sends IPI_STOP(SGI
>> FIQ) to all the cores. And with this we will able to dump all the core
>> call stack.
>>
>> I am able to achieve this but wanted to know if this is acceptable to
>> the community to support/allow such use cases like above and enable
>> group0 interrupt from GIC for some special use cases.
> 
> For a start, we only deal with Group-1 interrupts in Linux. Group-0
> interrupts are for the firmware, and we really don't want to see them
> (this is consistent with your HW having EL3). 

What is the downside of it we support this ? I see one of the 
implementation here.

https://elixir.bootlin.com/linux/v6.0.7/source/drivers/irqchip/irq-apple-aic.c#L510

>We also mask IRQ and FIQ at the same time, so this is a non-starter.
This can be taken care if we support this.

> 
> If you want to be able to deliver an interrupt while the interrupts
> are masked, what you are looking for is the NMI framework, for which
> you can register SPIs as (pseudo-)NMI.

Yes, kind of NMI.
I have already looked into this.
Since, in our system El2 is implemented and each physical interrupt get 
routed to hypervisor and later vIrq comes to El1 and each interrupt 
enable/disable call exercise pmr register trap can cause latency in
regular run(like multiple VM).

Since, some of the use-case could be special like i have mentioned
in my initial mail where such interrupt will be fatal and system will
get reset after that. I am not able to think of any other use case than
this but can this not be considered as one of the feature.

> 
> This is of course assuming that you're using GICv3. If you're using an
> older version of the architecture, we don't have a good solution for
> you, unfortunately.
> 

we are using GICv3.

> Thanks,
> 
> 	M.
> 

-Mukesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Query on handling some special Group0 interrupt in Linux
  2022-11-09 19:57   ` Mukesh Ojha
@ 2022-11-10  7:54     ` Marc Zyngier
  0 siblings, 0 replies; 4+ messages in thread
From: Marc Zyngier @ 2022-11-10  7:54 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: linux-arm-kernel, catalin.marinas, will, Thomas Gleixner, lkml

On Wed, 09 Nov 2022 19:57:24 +0000,
Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> 
> Hi Marc,
> 
> Thanks for your reply.
> 
> On 11/9/2022 11:50 PM, Marc Zyngier wrote:
> > On Wed, 09 Nov 2022 16:20:35 +0000,
> > Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> >> 
> >> Hi,
> >> 
> >> I was working on a use case where both el2/el3 are implemented and we
> >> have a watchdog interrupt (SPI), which is used for detecting software
> >> hangs and cause device reset; If that interrupt's current cpu affinity
> >> is on a core, where interrupts are disabled, we won't be able to serve
> >> it or if this interrupt comes on a core which has interrupt enabled,
> >> calling panic() or with smp_send_stop(), we would not be able
> >> to know the call stack of the other cores which is running with
> >> interrupt disabled.
> >> 
> >> I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP
> >> (SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler,
> >> I was thinking of calling panic() which eventually sends IPI_STOP(SGI
> >> FIQ) to all the cores. And with this we will able to dump all the core
> >> call stack.
> >> 
> >> I am able to achieve this but wanted to know if this is acceptable to
> >> the community to support/allow such use cases like above and enable
> >> group0 interrupt from GIC for some special use cases.
> > 
> > For a start, we only deal with Group-1 interrupts in Linux. Group-0
> > interrupts are for the firmware, and we really don't want to see them
> > (this is consistent with your HW having EL3). 
> 
> What is the downside of it we support this ? I see one of the
> implementation here.
> 
> https://elixir.bootlin.com/linux/v6.0.7/source/drivers/irqchip/irq-apple-aic.c#L510

You do realise that this system doesn't even have a GIC, and only uses
FIQ to represent per-CPU interrupts, right?

>
> > We also mask IRQ and FIQ at the same time, so this is a non-starter.
> This can be taken care if we support this.

No. We've made the decision not to treat IRQ and FIQ differently,
because FIQ only matters for systems with a single security domain
such as VMs or wonky systems such as the above. With that, all systems
behave the same and are treated the same, making the rules for
interrupt preemption understandable and we don't have to think of IRQ
and FIQ racing with each other.

> 
> > 
> > If you want to be able to deliver an interrupt while the interrupts
> > are masked, what you are looking for is the NMI framework, for which
> > you can register SPIs as (pseudo-)NMI.
> 
> Yes, kind of NMI.  I have already looked into this.  Since, in our
> system El2 is implemented and each physical interrupt get routed to
> hypervisor and later vIrq comes to El1 and each interrupt
> enable/disable call exercise pmr register trap can cause latency in
> regular run(like multiple VM).

Then your hypervisor needs fixing. There is no need to trap accesses
to PMR. Also, PMR being per-CPU, there should be no extra overhead
depending on the number of VM even if you were trapping PMR (for
example to work around broken HW).

To sum it up, none of the above makes much sense to me.

> Since, some of the use-case could be special like i have mentioned
> in my initial mail where such interrupt will be fatal and system will
> get reset after that. I am not able to think of any other use case than
> this but can this not be considered as one of the feature.

Well, we don't add stuff to the kernel based on idle considerations,
and what you are describing so far matches 100% the requirement for an
NMI-like feature.

The architecture has two ways to implement almost-NMIs: interrupt
priorities (our current crop of pseudo-NMIs) and the ARMv8.8
FEAT_NMI. The former is already there, and there are patches on the
list for the latter.

Do we need a third way that only works for odd corner cases and that
adds a huge amount of complexity? No, thank you.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-10  7:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-09 16:20 Query on handling some special Group0 interrupt in Linux Mukesh Ojha
2022-11-09 18:20 ` Marc Zyngier
2022-11-09 19:57   ` Mukesh Ojha
2022-11-10  7:54     ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).