All of lore.kernel.org
 help / color / mirror / Atom feed
* irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
@ 2023-08-30  3:05 Chen, Hongzhan
  2023-08-30  7:11 ` Jan Kiszka
  2023-08-30  7:20 ` Florian Bezdeka
  0 siblings, 2 replies; 21+ messages in thread
From: Chen, Hongzhan @ 2023-08-30  3:05 UTC (permalink / raw)
  To: xenomai@lists.linux.dev

Hi Philippe/Jan

We found there is irq storm issue like [1] on sd/mmc card detect interrupt alloced from 
pinctrl device with Xenomai 3.2 Dovetail 5.10.*
According to our test,  the patch [2] works for such issue in our case. It seems that handle_edge_irq 
never ack the irq to clear the interrupt status for the Linux gpio interrupt. 

I am wondering if there is any side-effect for this patch, Please suggest.

Regards

Hongzhan Chen

[1]:
cat /proc/interrupts
  14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi   INTC1020:00, INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd

[2]:
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6aca6c036ada..4d08eaf076e9 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
        kstat_incr_irqs_this_cpu(desc);

        /* Start handling the irq */
-       if (!irqs_pipelined())
                chip->irq_ack(&desc->irq_data);



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-30  3:05 irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device Chen, Hongzhan
@ 2023-08-30  7:11 ` Jan Kiszka
  2023-08-30  7:32   ` Chen, Hongzhan
  2023-08-30  7:20 ` Florian Bezdeka
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2023-08-30  7:11 UTC (permalink / raw)
  To: Chen, Hongzhan, xenomai@lists.linux.dev, Florian Bezdeka,
	Philippe Gerum

On 30.08.23 05:05, Chen, Hongzhan wrote:
> Hi Philippe/Jan
> 
> We found there is irq storm issue like [1] on sd/mmc card detect interrupt alloced from 
> pinctrl device with Xenomai 3.2 Dovetail 5.10.*

Is this 5.10-only or also affecting newer dovetail kernels?

> According to our test,  the patch [2] works for such issue in our case. It seems that handle_edge_irq 
> never ack the irq to clear the interrupt status for the Linux gpio interrupt. 
> 
> I am wondering if there is any side-effect for this patch, Please suggest.
> 

Reminds me a bit of Florian's issue, though he was seeing that with 
level-triggered interrupts.

Are those interrupts surely edge-triggered? Just to exclude that we are 
seeing a wrong handling in general.

> Regards
> 
> Hongzhan Chen
> 
> [1]:
> cat /proc/interrupts
>   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi   INTC1020:00, INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
> 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> 
> [2]:
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 6aca6c036ada..4d08eaf076e9 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
>         kstat_incr_irqs_this_cpu(desc);
> 
>         /* Start handling the irq */
> -       if (!irqs_pipelined())
>                 chip->irq_ack(&desc->irq_data);
> 
> 

Looking at a larger block of this function

	if (start_irq_flow()) {
		desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);

		if (!irq_may_run(desc)) {
			desc->istate |= IRQS_PENDING;
			mask_ack_irq(desc);
			goto out_unlock;
		}

		/*
		 * If its disabled or no action available then mask it
		 * and get out of here.
		 */
		if (irqd_irq_disabled(&desc->irq_data) || !desc->action) {
			desc->istate |= IRQS_PENDING;
			mask_ack_irq(desc);
			goto out_unlock;
		}
	}

	if (on_pipeline_entry()) {
		chip->irq_ack(&desc->irq_data);
		desc->istate |= IRQS_EDGE;
		handle_oob_irq(desc);
		goto out_unlock;
	}

	kstat_incr_irqs_this_cpu(desc);

	/* Start handling the irq */
	if (!irqs_pipelined())
		chip->irq_ack(&desc->irq_data);


it is indeed unclear to me why that !irqs_pipelined() check is there.
When we are entering with irqs_pipelined() and !in_pipeline(), neither
start_irq_flow() nor on_pipeline_entry() will takes care of the ack. Who
else will do it instead?

Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-30  3:05 irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device Chen, Hongzhan
  2023-08-30  7:11 ` Jan Kiszka
@ 2023-08-30  7:20 ` Florian Bezdeka
  2023-08-30  7:54   ` Chen, Hongzhan
  1 sibling, 1 reply; 21+ messages in thread
From: Florian Bezdeka @ 2023-08-30  7:20 UTC (permalink / raw)
  To: Chen, Hongzhan, xenomai@lists.linux.dev

On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
> Hi Philippe/Jan
> 
> We found there is irq storm issue like [1] on sd/mmc card detect interrupt alloced from 
> pinctrl device with Xenomai 3.2 Dovetail 5.10.*
> According to our test,  the patch [2] works for such issue in our case. It seems that handle_edge_irq 
> never ack the irq to clear the interrupt status for the Linux gpio interrupt. 
> 
> I am wondering if there is any side-effect for this patch, Please suggest.
> 
> Regards
> 
> Hongzhan Chen
> 
> [1]:
> cat /proc/interrupts
>   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi   INTC1020:00, INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
> 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> 
> [2]:
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 6aca6c036ada..4d08eaf076e9 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
>         kstat_incr_irqs_this_cpu(desc);
> 
>         /* Start handling the irq */
> -       if (!irqs_pipelined())
>                 chip->irq_ack(&desc->irq_data);
> 
> 

That doesn't make much sense. A IRQ storm means that we unmasked the
IRQ to early or forgot to mask it.

A forgotten ACK would mean the IRQ stalls, so it would never be fired
again. No?

Could you please check if we call unmask_irq() with HW IRQs enabled? If
so, I have an idea what is going on...

Florian


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-30  7:11 ` Jan Kiszka
@ 2023-08-30  7:32   ` Chen, Hongzhan
  0 siblings, 0 replies; 21+ messages in thread
From: Chen, Hongzhan @ 2023-08-30  7:32 UTC (permalink / raw)
  To: Kiszka, Jan, xenomai@lists.linux.dev, Bezdeka, Florian,
	Philippe Gerum



>-----Original Message-----
>From: Jan Kiszka <jan.kiszka@siemens.com>
>Sent: Wednesday, August 30, 2023 3:12 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>; xenomai@lists.linux.dev;
>Bezdeka, Florian <florian.bezdeka@siemens.com>; Philippe Gerum
><rpm@xenomai.org>
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>On 30.08.23 05:05, Chen, Hongzhan wrote:
>> Hi Philippe/Jan
>>
>> We found there is irq storm issue like [1] on sd/mmc card detect interrupt
>alloced from
>> pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>
>Is this 5.10-only or also affecting newer dovetail kernels?
>
>> According to our test,  the patch [2] works for such issue in our case. It
>seems that handle_edge_irq
>> never ack the irq to clear the interrupt status for the Linux gpio interrupt.
>>
>> I am wondering if there is any side-effect for this patch, Please suggest.
>>
>
>Reminds me a bit of Florian's issue, though he was seeing that with
>level-triggered interrupts.
>
>Are those interrupts surely edge-triggered? Just to exclude that we are
>seeing a wrong handling in general.

Yes, the GPIO interrupt is set edge triggred type. After patch, the interrupt count seems correct like following.

14:          2          0          0          0  IR-IO-APIC   14    -fasteoi   INTC1020:00, INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd

Regards

Hongzhan Chen
>
>> Regards
>>
>> Hongzhan Chen
>>
>> [1]:
>> cat /proc/interrupts
>>   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi   INTC1020:00,
>INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>> 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>
>> [2]:
>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> index 6aca6c036ada..4d08eaf076e9 100644
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
>>         kstat_incr_irqs_this_cpu(desc);
>>
>>         /* Start handling the irq */
>> -       if (!irqs_pipelined())
>>                 chip->irq_ack(&desc->irq_data);
>>
>>
>
>Looking at a larger block of this function
>
>	if (start_irq_flow()) {
>		desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
>
>		if (!irq_may_run(desc)) {
>			desc->istate |= IRQS_PENDING;
>			mask_ack_irq(desc);
>			goto out_unlock;
>		}
>
>		/*
>		 * If its disabled or no action available then mask it
>		 * and get out of here.
>		 */
>		if (irqd_irq_disabled(&desc->irq_data) || !desc->action) {
>			desc->istate |= IRQS_PENDING;
>			mask_ack_irq(desc);
>			goto out_unlock;
>		}
>	}
>
>	if (on_pipeline_entry()) {
>		chip->irq_ack(&desc->irq_data);
>		desc->istate |= IRQS_EDGE;
>		handle_oob_irq(desc);
>		goto out_unlock;
>	}
>
>	kstat_incr_irqs_this_cpu(desc);
>
>	/* Start handling the irq */
>	if (!irqs_pipelined())
>		chip->irq_ack(&desc->irq_data);
>
>
>it is indeed unclear to me why that !irqs_pipelined() check is there.
>When we are entering with irqs_pipelined() and !in_pipeline(), neither
>start_irq_flow() nor on_pipeline_entry() will takes care of the ack. Who
>else will do it instead?

Yes, that is the problem. For oob interrupt ,it will be  took care of by on_pipeline_entry.

Regards

Hongzhan Chen
>
>Jan
>
>--
>Siemens AG, Technology
>Linux Expert Center


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-30  7:20 ` Florian Bezdeka
@ 2023-08-30  7:54   ` Chen, Hongzhan
  2023-08-30 13:23     ` Florian Bezdeka
  0 siblings, 1 reply; 21+ messages in thread
From: Chen, Hongzhan @ 2023-08-30  7:54 UTC (permalink / raw)
  To: Bezdeka, Florian, xenomai@lists.linux.dev



>-----Original Message-----
>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>Sent: Wednesday, August 30, 2023 3:20 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>; xenomai@lists.linux.dev
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>> Hi Philippe/Jan
>>
>> We found there is irq storm issue like [1] on sd/mmc card detect interrupt
>alloced from
>> pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>> According to our test,  the patch [2] works for such issue in our case. It
>seems that handle_edge_irq
>> never ack the irq to clear the interrupt status for the Linux gpio interrupt.
>>
>> I am wondering if there is any side-effect for this patch, Please suggest.
>>
>> Regards
>>
>> Hongzhan Chen
>>
>> [1]:
>> cat /proc/interrupts
>>   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi   INTC1020:00,
>INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>> 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>
>> [2]:
>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> index 6aca6c036ada..4d08eaf076e9 100644
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
>>         kstat_incr_irqs_this_cpu(desc);
>>
>>         /* Start handling the irq */
>> -       if (!irqs_pipelined())
>>                 chip->irq_ack(&desc->irq_data);
>>
>>
>
>That doesn't make much sense. A IRQ storm means that we unmasked the
>IRQ to early or forgot to mask it.
>
>A forgotten ACK would mean the IRQ stalls, so it would never be fired
>again. No?

According to the test, it seems forgotten ACK will keep firing interrupt here like
following comparing:

After patching:
128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd

Before patching:
128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd

Regards

Hongzhan Chen
>
>Could you please check if we call unmask_irq() with HW IRQs enabled? If
>so, I have an idea what is going on...
>
>Florian


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-30  7:54   ` Chen, Hongzhan
@ 2023-08-30 13:23     ` Florian Bezdeka
  2023-08-31  0:45       ` Chen, Hongzhan
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Bezdeka @ 2023-08-30 13:23 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Kiszka, Jan, xenomai@lists.linux.dev, Philippe Gerum

On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
> 
> > -----Original Message-----
> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
> > Sent: Wednesday, August 30, 2023 3:20 PM
> > To: Chen, Hongzhan <hongzhan.chen@intel.com>; xenomai@lists.linux.dev
> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
> > pinctrl device
> > 
> > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
> > > Hi Philippe/Jan
> > > 
> > > We found there is irq storm issue like [1] on sd/mmc card detect interrupt
> > alloced from
> > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
> > > According to our test,  the patch [2] works for such issue in our case. It
> > seems that handle_edge_irq
> > > never ack the irq to clear the interrupt status for the Linux gpio interrupt.
> > > 
> > > I am wondering if there is any side-effect for this patch, Please suggest.
> > > 
> > > Regards
> > > 
> > > Hongzhan Chen
> > > 
> > > [1]:
> > > cat /proc/interrupts
> > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi   INTC1020:00,
> > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
> > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> > > 
> > > [2]:
> > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > > index 6aca6c036ada..4d08eaf076e9 100644
> > > --- a/kernel/irq/chip.c
> > > +++ b/kernel/irq/chip.c
> > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
> > >         kstat_incr_irqs_this_cpu(desc);
> > > 
> > >         /* Start handling the irq */
> > > -       if (!irqs_pipelined())
> > >                 chip->irq_ack(&desc->irq_data);
> > > 
> > > 
> > 
> > That doesn't make much sense. A IRQ storm means that we unmasked the
> > IRQ to early or forgot to mask it.
> > 
> > A forgotten ACK would mean the IRQ stalls, so it would never be fired
> > again. No?
> 
> According to the test, it seems forgotten ACK will keep firing interrupt here like
> following comparing:

This doesn't make sense (IMHO).
Can you provide a callstack from handle_edge_irq() in case
on_pipeline_entry() returns false?

The IRQ should be ACKed on pipeline entry already, even if
handle_edge_irq() is called again (from Linux/Inband context) the IRQ
should already be ACKed.

But: If we use the dovetail sources to build a kernel with pipelining
disabled we have to ACK the IRQ. I assume pipelining is active on your
end.

As Jan said, I'm searching for a similar problem for some time now that
I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
not a storm. I will give it a try, let's see.

> 
> After patching:
> 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> 
> Before patching:
> 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> 
> Regards
> 
> Hongzhan Chen
> > 
> > Could you please check if we call unmask_irq() with HW IRQs enabled? If
> > so, I have an idea what is going on...
> > 
> > Florian
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-30 13:23     ` Florian Bezdeka
@ 2023-08-31  0:45       ` Chen, Hongzhan
  2023-08-31  7:09         ` Florian Bezdeka
  0 siblings, 1 reply; 21+ messages in thread
From: Chen, Hongzhan @ 2023-08-31  0:45 UTC (permalink / raw)
  To: Bezdeka, Florian; +Cc: Kiszka, Jan, xenomai@lists.linux.dev, Philippe Gerum



>-----Original Message-----
>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>Sent: Wednesday, August 30, 2023 9:23 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev; Philippe
>Gerum <rpm@xenomai.org>
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>
>> > -----Original Message-----
>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > Sent: Wednesday, August 30, 2023 3:20 PM
>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>xenomai@lists.linux.dev
>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>> > pinctrl device
>> >
>> > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>> > > Hi Philippe/Jan
>> > >
>> > > We found there is irq storm issue like [1] on sd/mmc card detect
>interrupt
>> > alloced from
>> > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>> > > According to our test,  the patch [2] works for such issue in our case. It
>> > seems that handle_edge_irq
>> > > never ack the irq to clear the interrupt status for the Linux gpio interrupt.
>> > >
>> > > I am wondering if there is any side-effect for this patch, Please suggest.
>> > >
>> > > Regards
>> > >
>> > > Hongzhan Chen
>> > >
>> > > [1]:
>> > > cat /proc/interrupts
>> > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>INTC1020:00,
>> > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>> > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>> > >
>> > > [2]:
>> > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> > > index 6aca6c036ada..4d08eaf076e9 100644
>> > > --- a/kernel/irq/chip.c
>> > > +++ b/kernel/irq/chip.c
>> > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
>> > >         kstat_incr_irqs_this_cpu(desc);
>> > >
>> > >         /* Start handling the irq */
>> > > -       if (!irqs_pipelined())
>> > >                 chip->irq_ack(&desc->irq_data);
>> > >
>> > >
>> >
>> > That doesn't make much sense. A IRQ storm means that we unmasked the
>> > IRQ to early or forgot to mask it.
>> >
>> > A forgotten ACK would mean the IRQ stalls, so it would never be fired
>> > again. No?
>>
>> According to the test, it seems forgotten ACK will keep firing interrupt here
>like
>> following comparing:
>
>This doesn't make sense (IMHO).
>Can you provide a callstack from handle_edge_irq() in case
>on_pipeline_entry() returns false?
>
>The IRQ should be ACKed on pipeline entry already, even if
>handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>should already be ACKed.

Following stack is dumped after patching my patch. Without patching there is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from which it already called irq_exit_pipeline before sync_current_irq_stage. That is why on_pipeline_entry returned false in handle_edge_irq



[  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U            5.10.179-intel-ese-standard-lts-dovetail+ #1
[  291.009331] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
[  291.009332] IRQ stage: Linux
[  291.009332] Call Trace:
[  291.009332]  <IRQ>
[  291.009333]  dump_stack+0x7b/0x93
[  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
[  291.009334]  handle_edge_irq+0xe4/0x310
[  291.009334]  generic_handle_irq+0x4b/0x60
[  291.009335]  intel_gpio_irq+0x11f/0x1c0
[  291.009335]  __handle_irq_event_percpu+0x77/0x190
[  291.009336]  ? handle_level_irq+0x200/0x200
[  291.009336]  handle_irq_event+0x6c/0x100
[  291.009337]  handle_fasteoi_irq+0xd8/0x250
[  291.009337]  asm_call_irq_on_stack+0xf/0x20
[  291.009338]  </IRQ>
[  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
[  291.009339]  sync_current_irq_stage+0x157/0x1f0
[  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
[  291.009340]  arch_pipeline_entry+0xdb/0x120
[  291.009340]  asm_common_interrupt+0x1e/0x40
[  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
[  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c 25 00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8> 08 75 13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
[  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202

Regards

Hongzhan Chen
>
>But: If we use the dovetail sources to build a kernel with pipelining
>disabled we have to ACK the IRQ. I assume pipelining is active on your
>end.
>
>As Jan said, I'm searching for a similar problem for some time now that
>I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>not a storm. I will give it a try, let's see.
>
>>
>> After patching:
>> 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>
>> Before patching:
>> 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>
>> Regards
>>
>> Hongzhan Chen
>> >
>> > Could you please check if we call unmask_irq() with HW IRQs enabled? If
>> > so, I have an idea what is going on...
>> >
>> > Florian
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-31  0:45       ` Chen, Hongzhan
@ 2023-08-31  7:09         ` Florian Bezdeka
  2023-08-31  7:39           ` Chen, Hongzhan
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Bezdeka @ 2023-08-31  7:09 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Kiszka, Jan, xenomai@lists.linux.dev, Philippe Gerum

On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
> 
> > -----Original Message-----
> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
> > Sent: Wednesday, August 30, 2023 9:23 PM
> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev; Philippe
> > Gerum <rpm@xenomai.org>
> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
> > pinctrl device
> > 
> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
> > > 
> > > > -----Original Message-----
> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
> > > > Sent: Wednesday, August 30, 2023 3:20 PM
> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
> > xenomai@lists.linux.dev
> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
> > > > pinctrl device
> > > > 
> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
> > > > > Hi Philippe/Jan
> > > > > 
> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
> > interrupt
> > > > alloced from
> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
> > > > > According to our test,  the patch [2] works for such issue in our case. It
> > > > seems that handle_edge_irq
> > > > > never ack the irq to clear the interrupt status for the Linux gpio interrupt.
> > > > > 
> > > > > I am wondering if there is any side-effect for this patch, Please suggest.
> > > > > 
> > > > > Regards
> > > > > 
> > > > > Hongzhan Chen
> > > > > 
> > > > > [1]:
> > > > > cat /proc/interrupts
> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
> > INTC1020:00,
> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
> > > > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> > > > > 
> > > > > [2]:
> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > > > > index 6aca6c036ada..4d08eaf076e9 100644
> > > > > --- a/kernel/irq/chip.c
> > > > > +++ b/kernel/irq/chip.c
> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
> > > > >         kstat_incr_irqs_this_cpu(desc);
> > > > > 
> > > > >         /* Start handling the irq */
> > > > > -       if (!irqs_pipelined())
> > > > >                 chip->irq_ack(&desc->irq_data);
> > > > > 
> > > > > 
> > > > 
> > > > That doesn't make much sense. A IRQ storm means that we unmasked the
> > > > IRQ to early or forgot to mask it.
> > > > 
> > > > A forgotten ACK would mean the IRQ stalls, so it would never be fired
> > > > again. No?
> > > 
> > > According to the test, it seems forgotten ACK will keep firing interrupt here
> > like
> > > following comparing:
> > 
> > This doesn't make sense (IMHO).
> > Can you provide a callstack from handle_edge_irq() in case
> > on_pipeline_entry() returns false?
> > 
> > The IRQ should be ACKed on pipeline entry already, even if
> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
> > should already be ACKed.
> 
> Following stack is dumped after patching my patch. Without patching there is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from which it already called irq_exit_pipeline before sync_current_irq_stage. That is why on_pipeline_entry returned false in handle_edge_irq
> 
> 
> 
> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U            5.10.179-intel-ese-standard-lts-dovetail+ #1
> [  291.009331] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
> [  291.009332] IRQ stage: Linux
> [  291.009332] Call Trace:
> [  291.009332]  <IRQ>
> [  291.009333]  dump_stack+0x7b/0x93
> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
> [  291.009334]  handle_edge_irq+0xe4/0x310
> [  291.009334]  generic_handle_irq+0x4b/0x60
> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
> [  291.009336]  ? handle_level_irq+0x200/0x200
> [  291.009336]  handle_irq_event+0x6c/0x100
> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
> [  291.009338]  </IRQ>
> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
                   ^^
My understanding is that you are not handling an HW IRQ, you are in the
stage where the inband IRQ log is being processed. HW IRQs are already
ACKed at that stage.

Maybe we have to understand your IRQ chip routing. Is there an IO APIC
involved? handle_fasteoi_irq() would be the first level IRQ entry point
which takes care of masking + ack.

> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
> [  291.009340]  arch_pipeline_entry+0xdb/0x120
> [  291.009340]  asm_common_interrupt+0x1e/0x40
> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c 25 00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8> 08 75 13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
> 
> Regards
> 
> Hongzhan Chen
> > 
> > But: If we use the dovetail sources to build a kernel with pipelining
> > disabled we have to ACK the IRQ. I assume pipelining is active on your
> > end.
> > 
> > As Jan said, I'm searching for a similar problem for some time now that
> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
> > not a storm. I will give it a try, let's see.
> > 
> > > 
> > > After patching:
> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> > > 
> > > Before patching:
> > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
> > > 
> > > Regards
> > > 
> > > Hongzhan Chen
> > > > 
> > > > Could you please check if we call unmask_irq() with HW IRQs enabled? If
> > > > so, I have an idea what is going on...
> > > > 
> > > > Florian
> > > 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-31  7:09         ` Florian Bezdeka
@ 2023-08-31  7:39           ` Chen, Hongzhan
  2023-08-31  8:06             ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Chen, Hongzhan @ 2023-08-31  7:39 UTC (permalink / raw)
  To: Bezdeka, Florian; +Cc: Kiszka, Jan, xenomai@lists.linux.dev, Philippe Gerum



>-----Original Message-----
>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>Sent: Thursday, August 31, 2023 3:10 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev; Philippe
>Gerum <rpm@xenomai.org>
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>
>> > -----Original Message-----
>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > Sent: Wednesday, August 30, 2023 9:23 PM
>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>Philippe
>> > Gerum <rpm@xenomai.org>
>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>> > pinctrl device
>> >
>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>> > >
>> > > > -----Original Message-----
>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>> > xenomai@lists.linux.dev
>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>from
>> > > > pinctrl device
>> > > >
>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>> > > > > Hi Philippe/Jan
>> > > > >
>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
>> > interrupt
>> > > > alloced from
>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>> > > > > According to our test,  the patch [2] works for such issue in our case.
>It
>> > > > seems that handle_edge_irq
>> > > > > never ack the irq to clear the interrupt status for the Linux gpio
>interrupt.
>> > > > >
>> > > > > I am wondering if there is any side-effect for this patch, Please
>suggest.
>> > > > >
>> > > > > Regards
>> > > > >
>> > > > > Hongzhan Chen
>> > > > >
>> > > > > [1]:
>> > > > > cat /proc/interrupts
>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>> > INTC1020:00,
>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>> > > > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1
>cd
>> > > > >
>> > > > > [2]:
>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>> > > > > --- a/kernel/irq/chip.c
>> > > > > +++ b/kernel/irq/chip.c
>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
>> > > > >         kstat_incr_irqs_this_cpu(desc);
>> > > > >
>> > > > >         /* Start handling the irq */
>> > > > > -       if (!irqs_pipelined())
>> > > > >                 chip->irq_ack(&desc->irq_data);
>> > > > >
>> > > > >
>> > > >
>> > > > That doesn't make much sense. A IRQ storm means that we unmasked
>the
>> > > > IRQ to early or forgot to mask it.
>> > > >
>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be fired
>> > > > again. No?
>> > >
>> > > According to the test, it seems forgotten ACK will keep firing interrupt
>here
>> > like
>> > > following comparing:
>> >
>> > This doesn't make sense (IMHO).
>> > Can you provide a callstack from handle_edge_irq() in case
>> > on_pipeline_entry() returns false?
>> >
>> > The IRQ should be ACKed on pipeline entry already, even if
>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>> > should already be ACKed.
>>
>> Following stack is dumped after patching my patch. Without patching there
>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from which
>it already called irq_exit_pipeline before sync_current_irq_stage. That is why
>on_pipeline_entry returned false in handle_edge_irq
>>
>>
>>
>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>5.10.179-intel-ese-standard-lts-dovetail+ #1
>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake Embedded
>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>> [  291.009332] IRQ stage: Linux
>> [  291.009332] Call Trace:
>> [  291.009332]  <IRQ>
>> [  291.009333]  dump_stack+0x7b/0x93
>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>> [  291.009334]  handle_edge_irq+0xe4/0x310
>> [  291.009334]  generic_handle_irq+0x4b/0x60
>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>> [  291.009336]  ? handle_level_irq+0x200/0x200
>> [  291.009336]  handle_irq_event+0x6c/0x100
>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>> [  291.009338]  </IRQ>
>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>                   ^^
>My understanding is that you are not handling an HW IRQ, you are in the
>stage where the inband IRQ log is being processed. HW IRQs are already
>ACKed at that stage.
>
>Maybe we have to understand your IRQ chip routing. Is there an IO APIC
>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>which takes care of masking + ack.

Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is registered in io_apic driver. There is second level pinctrl device acting as gpiochip which has irq domains where sd/mmc cd interrupt alloced from , which need to be acked in handle_edge_irq.

Regards

Hongzhan Chen
>
>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c 25
>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8> 08 75
>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>
>> Regards
>>
>> Hongzhan Chen
>> >
>> > But: If we use the dovetail sources to build a kernel with pipelining
>> > disabled we have to ACK the IRQ. I assume pipelining is active on your
>> > end.
>> >
>> > As Jan said, I'm searching for a similar problem for some time now that
>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>> > not a storm. I will give it a try, let's see.
>> >
>> > >
>> > > After patching:
>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>> > >
>> > > Before patching:
>> > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>> > >
>> > > Regards
>> > >
>> > > Hongzhan Chen
>> > > >
>> > > > Could you please check if we call unmask_irq() with HW IRQs enabled?
>If
>> > > > so, I have an idea what is going on...
>> > > >
>> > > > Florian
>> > >
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-31  7:39           ` Chen, Hongzhan
@ 2023-08-31  8:06             ` Philippe Gerum
  2023-08-31  8:35               ` Chen, Hongzhan
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2023-08-31  8:06 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev


"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:

>>-----Original Message-----
>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>Sent: Thursday, August 31, 2023 3:10 PM
>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev; Philippe
>>Gerum <rpm@xenomai.org>
>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>pinctrl device
>>
>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>
>>> > -----Original Message-----
>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>Philippe
>>> > Gerum <rpm@xenomai.org>
>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>> > pinctrl device
>>> >
>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>> > >
>>> > > > -----Original Message-----
>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>> > xenomai@lists.linux.dev
>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>from
>>> > > > pinctrl device
>>> > > >
>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>> > > > > Hi Philippe/Jan
>>> > > > >
>>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
>>> > interrupt
>>> > > > alloced from
>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>> > > > > According to our test,  the patch [2] works for such issue in our case.
>>It
>>> > > > seems that handle_edge_irq
>>> > > > > never ack the irq to clear the interrupt status for the Linux gpio
>>interrupt.
>>> > > > >
>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>suggest.
>>> > > > >
>>> > > > > Regards
>>> > > > >
>>> > > > > Hongzhan Chen
>>> > > > >
>>> > > > > [1]:
>>> > > > > cat /proc/interrupts
>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>> > INTC1020:00,
>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1
>>cd
>>> > > > >
>>> > > > > [2]:
>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>> > > > > --- a/kernel/irq/chip.c
>>> > > > > +++ b/kernel/irq/chip.c
>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc *desc)
>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>> > > > >
>>> > > > >         /* Start handling the irq */
>>> > > > > -       if (!irqs_pipelined())
>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>> > > > >
>>> > > > >
>>> > > >
>>> > > > That doesn't make much sense. A IRQ storm means that we unmasked
>>the
>>> > > > IRQ to early or forgot to mask it.
>>> > > >
>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be fired
>>> > > > again. No?
>>> > >
>>> > > According to the test, it seems forgotten ACK will keep firing interrupt
>>here
>>> > like
>>> > > following comparing:
>>> >
>>> > This doesn't make sense (IMHO).
>>> > Can you provide a callstack from handle_edge_irq() in case
>>> > on_pipeline_entry() returns false?
>>> >
>>> > The IRQ should be ACKed on pipeline entry already, even if
>>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>>> > should already be ACKed.
>>>
>>> Following stack is dumped after patching my patch. Without patching there
>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from which
>>it already called irq_exit_pipeline before sync_current_irq_stage. That is why
>>on_pipeline_entry returned false in handle_edge_irq
>>>
>>>
>>>
>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake Embedded
>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>> [  291.009332] IRQ stage: Linux
>>> [  291.009332] Call Trace:
>>> [  291.009332]  <IRQ>
>>> [  291.009333]  dump_stack+0x7b/0x93
>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>> [  291.009338]  </IRQ>
>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>                   ^^
>>My understanding is that you are not handling an HW IRQ, you are in the
>>stage where the inband IRQ log is being processed. HW IRQs are already
>>ACKed at that stage.
>>
>>Maybe we have to understand your IRQ chip routing. Is there an IO APIC
>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>which takes care of masking + ack.
>
> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is registered in io_apic driver. There is second level pinctrl device acting as gpiochip which has irq domains where sd/mmc cd interrupt alloced from , which need to be acked in handle_edge_irq.
>
> Regards
>
> Hongzhan Chen
>>
>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c 25
>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8> 08 75
>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>
>>> Regards
>>>
>>> Hongzhan Chen
>>> >
>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>> > disabled we have to ACK the IRQ. I assume pipelining is active on your
>>> > end.
>>> >
>>> > As Jan said, I'm searching for a similar problem for some time now that
>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>>> > not a storm. I will give it a try, let's see.
>>> >
>>> > >
>>> > > After patching:
>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>> > >
>>> > > Before patching:
>>> > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>> > >
>>> > > Regards
>>> > >
>>> > > Hongzhan Chen
>>> > > >
>>> > > > Could you please check if we call unmask_irq() with HW IRQs enabled?
>>If
>>> > > > so, I have an idea what is going on...
>>> > > >
>>> > > > Florian
>>> > >
>>>

To start with, prior to any further analysis:

diff --git a/drivers/pinctrl/intel/pinctrl-intel.h b/drivers/pinctrl/intel/pinctrl-intel.h
index 65628423bf639..b7afa763ee079 100644
--- a/drivers/pinctrl/intel/pinctrl-intel.h
+++ b/drivers/pinctrl/intel/pinctrl-intel.h
@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
  */
 struct intel_pinctrl {
 	struct device *dev;
-	raw_spinlock_t lock;
+	hard_spinlock_t lock;
 	struct pinctrl_desc pctldesc;
 	struct pinctrl_dev *pctldev;
 	struct gpio_chip chip;
        
-- 
Philippe.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-31  8:06             ` Philippe Gerum
@ 2023-08-31  8:35               ` Chen, Hongzhan
  2023-08-31  8:56                 ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Chen, Hongzhan @ 2023-08-31  8:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev



>-----Original Message-----
>From: Philippe Gerum <rpm@xenomai.org>
>Sent: Thursday, August 31, 2023 4:06 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>
>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>
>>>-----Original Message-----
>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>Philippe
>>>Gerum <rpm@xenomai.org>
>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>pinctrl device
>>>
>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>
>>>> > -----Original Message-----
>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>Philippe
>>>> > Gerum <rpm@xenomai.org>
>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>from
>>>> > pinctrl device
>>>> >
>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>> > >
>>>> > > > -----Original Message-----
>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>> > xenomai@lists.linux.dev
>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>from
>>>> > > > pinctrl device
>>>> > > >
>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>> > > > > Hi Philippe/Jan
>>>> > > > >
>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
>>>> > interrupt
>>>> > > > alloced from
>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>> > > > > According to our test,  the patch [2] works for such issue in our
>case.
>>>It
>>>> > > > seems that handle_edge_irq
>>>> > > > > never ack the irq to clear the interrupt status for the Linux gpio
>>>interrupt.
>>>> > > > >
>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>suggest.
>>>> > > > >
>>>> > > > > Regards
>>>> > > > >
>>>> > > > > Hongzhan Chen
>>>> > > > >
>>>> > > > > [1]:
>>>> > > > > cat /proc/interrupts
>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>> > INTC1020:00,
>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>0000:00:1a.1
>>>cd
>>>> > > > >
>>>> > > > > [2]:
>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>> > > > > --- a/kernel/irq/chip.c
>>>> > > > > +++ b/kernel/irq/chip.c
>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>*desc)
>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>> > > > >
>>>> > > > >         /* Start handling the irq */
>>>> > > > > -       if (!irqs_pipelined())
>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>> > > > >
>>>> > > > >
>>>> > > >
>>>> > > > That doesn't make much sense. A IRQ storm means that we
>unmasked
>>>the
>>>> > > > IRQ to early or forgot to mask it.
>>>> > > >
>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>fired
>>>> > > > again. No?
>>>> > >
>>>> > > According to the test, it seems forgotten ACK will keep firing interrupt
>>>here
>>>> > like
>>>> > > following comparing:
>>>> >
>>>> > This doesn't make sense (IMHO).
>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>> > on_pipeline_entry() returns false?
>>>> >
>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>>>> > should already be ACKed.
>>>>
>>>> Following stack is dumped after patching my patch. Without patching
>there
>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>which
>>>it already called irq_exit_pipeline before sync_current_irq_stage. That is
>why
>>>on_pipeline_entry returned false in handle_edge_irq
>>>>
>>>>
>>>>
>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake Embedded
>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>> [  291.009332] IRQ stage: Linux
>>>> [  291.009332] Call Trace:
>>>> [  291.009332]  <IRQ>
>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>> [  291.009338]  </IRQ>
>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>                   ^^
>>>My understanding is that you are not handling an HW IRQ, you are in the
>>>stage where the inband IRQ log is being processed. HW IRQs are already
>>>ACKed at that stage.
>>>
>>>Maybe we have to understand your IRQ chip routing. Is there an IO APIC
>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>which takes care of masking + ack.
>>
>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>registered in io_apic driver. There is second level pinctrl device acting as
>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>which need to be acked in handle_edge_irq.
>>
>> Regards
>>
>> Hongzhan Chen
>>>
>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c
>25
>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8> 08
>75
>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>
>>>> Regards
>>>>
>>>> Hongzhan Chen
>>>> >
>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on your
>>>> > end.
>>>> >
>>>> > As Jan said, I'm searching for a similar problem for some time now that
>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>>>> > not a storm. I will give it a try, let's see.
>>>> >
>>>> > >
>>>> > > After patching:
>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>>> > >
>>>> > > Before patching:
>>>> > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1
>cd
>>>> > >
>>>> > > Regards
>>>> > >
>>>> > > Hongzhan Chen
>>>> > > >
>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>enabled?
>>>If
>>>> > > > so, I have an idea what is going on...
>>>> > > >
>>>> > > > Florian
>>>> > >
>>>>
>
>To start with, prior to any further analysis:
>
>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h b/drivers/pinctrl/intel/pinctrl-
>intel.h
>index 65628423bf639..b7afa763ee079 100644
>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>  */
> struct intel_pinctrl {
> 	struct device *dev;
>-	raw_spinlock_t lock;
>+	hard_spinlock_t lock;

I already modified it at first when we found the issue.  But it does not work with it.

After patching my patch[2] I mentioned, the issue disappear finally.

Regards

Hongzhan Chen

> 	struct pinctrl_desc pctldesc;
> 	struct pinctrl_dev *pctldev;
> 	struct gpio_chip chip;
>
>--
>Philippe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-31  8:35               ` Chen, Hongzhan
@ 2023-08-31  8:56                 ` Philippe Gerum
  2023-09-01  2:28                   ` Chen, Hongzhan
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2023-08-31  8:56 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev


"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:

>>-----Original Message-----
>>From: Philippe Gerum <rpm@xenomai.org>
>>Sent: Thursday, August 31, 2023 4:06 PM
>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>pinctrl device
>>
>>
>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>
>>>>-----Original Message-----
>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>Philippe
>>>>Gerum <rpm@xenomai.org>
>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>pinctrl device
>>>>
>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>Philippe
>>>>> > Gerum <rpm@xenomai.org>
>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>from
>>>>> > pinctrl device
>>>>> >
>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>> > >
>>>>> > > > -----Original Message-----
>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>> > xenomai@lists.linux.dev
>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>>from
>>>>> > > > pinctrl device
>>>>> > > >
>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>> > > > > Hi Philippe/Jan
>>>>> > > > >
>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
>>>>> > interrupt
>>>>> > > > alloced from
>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>> > > > > According to our test,  the patch [2] works for such issue in our
>>case.
>>>>It
>>>>> > > > seems that handle_edge_irq
>>>>> > > > > never ack the irq to clear the interrupt status for the Linux gpio
>>>>interrupt.
>>>>> > > > >
>>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>>suggest.
>>>>> > > > >
>>>>> > > > > Regards
>>>>> > > > >
>>>>> > > > > Hongzhan Chen
>>>>> > > > >
>>>>> > > > > [1]:
>>>>> > > > > cat /proc/interrupts
>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>> > INTC1020:00,
>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>0000:00:1a.1
>>>>cd
>>>>> > > > >
>>>>> > > > > [2]:
>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>> > > > > --- a/kernel/irq/chip.c
>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>*desc)
>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>> > > > >
>>>>> > > > >         /* Start handling the irq */
>>>>> > > > > -       if (!irqs_pipelined())
>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>> > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>unmasked
>>>>the
>>>>> > > > IRQ to early or forgot to mask it.
>>>>> > > >
>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>>fired
>>>>> > > > again. No?
>>>>> > >
>>>>> > > According to the test, it seems forgotten ACK will keep firing interrupt
>>>>here
>>>>> > like
>>>>> > > following comparing:
>>>>> >
>>>>> > This doesn't make sense (IMHO).
>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>> > on_pipeline_entry() returns false?
>>>>> >
>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>>>>> > should already be ACKed.
>>>>>
>>>>> Following stack is dumped after patching my patch. Without patching
>>there
>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>which
>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That is
>>why
>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>
>>>>>
>>>>>
>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake Embedded
>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>> [  291.009332] IRQ stage: Linux
>>>>> [  291.009332] Call Trace:
>>>>> [  291.009332]  <IRQ>
>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>> [  291.009338]  </IRQ>
>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>                   ^^
>>>>My understanding is that you are not handling an HW IRQ, you are in the
>>>>stage where the inband IRQ log is being processed. HW IRQs are already
>>>>ACKed at that stage.
>>>>
>>>>Maybe we have to understand your IRQ chip routing. Is there an IO APIC
>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>which takes care of masking + ack.
>>>
>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>registered in io_apic driver. There is second level pinctrl device acting as
>>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>which need to be acked in handle_edge_irq.
>>>
>>> Regards
>>>
>>> Hongzhan Chen
>>>>
>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c
>>25
>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8> 08
>>75
>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>
>>>>> Regards
>>>>>
>>>>> Hongzhan Chen
>>>>> >
>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on your
>>>>> > end.
>>>>> >
>>>>> > As Jan said, I'm searching for a similar problem for some time now that
>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>>>>> > not a storm. I will give it a try, let's see.
>>>>> >
>>>>> > >
>>>>> > > After patching:
>>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1 cd
>>>>> > >
>>>>> > > Before patching:
>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112      0000:00:1a.1
>>cd
>>>>> > >
>>>>> > > Regards
>>>>> > >
>>>>> > > Hongzhan Chen
>>>>> > > >
>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>enabled?
>>>>If
>>>>> > > > so, I have an idea what is going on...
>>>>> > > >
>>>>> > > > Florian
>>>>> > >
>>>>>
>>
>>To start with, prior to any further analysis:
>>
>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h b/drivers/pinctrl/intel/pinctrl-
>>intel.h
>>index 65628423bf639..b7afa763ee079 100644
>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>  */
>> struct intel_pinctrl {
>> 	struct device *dev;
>>-	raw_spinlock_t lock;
>>+	hard_spinlock_t lock;
>
> I already modified it at first when we found the issue.  But it does not work with it.
>
> After patching my patch[2] I mentioned, the issue disappear finally.
>

Ok, but the patch is papering over the real issue I believe.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-08-31  8:56                 ` Philippe Gerum
@ 2023-09-01  2:28                   ` Chen, Hongzhan
  2023-09-01  6:35                     ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Chen, Hongzhan @ 2023-09-01  2:28 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev



>-----Original Message-----
>From: Philippe Gerum <rpm@xenomai.org>
>Sent: Thursday, August 31, 2023 4:57 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>
>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>
>>>-----Original Message-----
>>>From: Philippe Gerum <rpm@xenomai.org>
>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>pinctrl device
>>>
>>>
>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>
>>>>>-----Original Message-----
>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>Philippe
>>>>>Gerum <rpm@xenomai.org>
>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>pinctrl device
>>>>>
>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>
>>>>>> > -----Original Message-----
>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>Philippe
>>>>>> > Gerum <rpm@xenomai.org>
>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>from
>>>>>> > pinctrl device
>>>>>> >
>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>> > >
>>>>>> > > > -----Original Message-----
>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>> > xenomai@lists.linux.dev
>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>alloced
>>>>>from
>>>>>> > > > pinctrl device
>>>>>> > > >
>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>> > > > > Hi Philippe/Jan
>>>>>> > > > >
>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
>>>>>> > interrupt
>>>>>> > > > alloced from
>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>> > > > > According to our test,  the patch [2] works for such issue in our
>>>case.
>>>>>It
>>>>>> > > > seems that handle_edge_irq
>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux gpio
>>>>>interrupt.
>>>>>> > > > >
>>>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>>>suggest.
>>>>>> > > > >
>>>>>> > > > > Regards
>>>>>> > > > >
>>>>>> > > > > Hongzhan Chen
>>>>>> > > > >
>>>>>> > > > > [1]:
>>>>>> > > > > cat /proc/interrupts
>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>> > INTC1020:00,
>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>0000:00:1a.1
>>>>>cd
>>>>>> > > > >
>>>>>> > > > > [2]:
>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>*desc)
>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>> > > > >
>>>>>> > > > >         /* Start handling the irq */
>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > >
>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>unmasked
>>>>>the
>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>> > > >
>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>>>fired
>>>>>> > > > again. No?
>>>>>> > >
>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>interrupt
>>>>>here
>>>>>> > like
>>>>>> > > following comparing:
>>>>>> >
>>>>>> > This doesn't make sense (IMHO).
>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>> > on_pipeline_entry() returns false?
>>>>>> >
>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>>>>>> > should already be ACKed.
>>>>>>
>>>>>> Following stack is dumped after patching my patch. Without patching
>>>there
>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>>which
>>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That is
>>>why
>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>
>>>>>>
>>>>>>
>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>Embedded
>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>> [  291.009332] IRQ stage: Linux
>>>>>> [  291.009332] Call Trace:
>>>>>> [  291.009332]  <IRQ>
>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>> [  291.009338]  </IRQ>
>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>                   ^^
>>>>>My understanding is that you are not handling an HW IRQ, you are in the
>>>>>stage where the inband IRQ log is being processed. HW IRQs are already
>>>>>ACKed at that stage.
>>>>>
>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO APIC
>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>which takes care of masking + ack.
>>>>
>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>>registered in io_apic driver. There is second level pinctrl device acting as
>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>>which need to be acked in handle_edge_irq.
>>>>
>>>> Regards
>>>>
>>>> Hongzhan Chen
>>>>>
>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c
>>>25
>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
>08
>>>75
>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Hongzhan Chen
>>>>>> >
>>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on your
>>>>>> > end.
>>>>>> >
>>>>>> > As Jan said, I'm searching for a similar problem for some time now
>that
>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>>>>>> > not a storm. I will give it a try, let's see.
>>>>>> >
>>>>>> > >
>>>>>> > > After patching:
>>>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
>cd
>>>>>> > >
>>>>>> > > Before patching:
>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>0000:00:1a.1
>>>cd
>>>>>> > >
>>>>>> > > Regards
>>>>>> > >
>>>>>> > > Hongzhan Chen
>>>>>> > > >
>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>enabled?
>>>>>If
>>>>>> > > > so, I have an idea what is going on...
>>>>>> > > >
>>>>>> > > > Florian
>>>>>> > >
>>>>>>
>>>
>>>To start with, prior to any further analysis:
>>>
>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h b/drivers/pinctrl/intel/pinctrl-
>>>intel.h
>>>index 65628423bf639..b7afa763ee079 100644
>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>  */
>>> struct intel_pinctrl {
>>> 	struct device *dev;
>>>-	raw_spinlock_t lock;
>>>+	hard_spinlock_t lock;
>>
>> I already modified it at first when we found the issue.  But it does not work
>with it.
>>
>> After patching my patch[2] I mentioned, the issue disappear finally.
>>
>
>Ok, but the patch is papering over the real issue I believe.

What kind of real issue it would be? What checkpoint I need to check to find the real issue out? Please suggest.
Continuous fired irq is obviously caused by forgotten ack behavior for the issue. The ack is supposed to happen elsewhere?

Regards

Hongzhan Chen

>
>--
>Philippe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-01  2:28                   ` Chen, Hongzhan
@ 2023-09-01  6:35                     ` Philippe Gerum
  2023-09-04  8:20                       ` Chen, Hongzhan
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2023-09-01  6:35 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev


"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:

>>-----Original Message-----
>>From: Philippe Gerum <rpm@xenomai.org>
>>Sent: Thursday, August 31, 2023 4:57 PM
>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>pinctrl device
>>
>>
>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>
>>>>-----Original Message-----
>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>pinctrl device
>>>>
>>>>
>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>
>>>>>>-----Original Message-----
>>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>Philippe
>>>>>>Gerum <rpm@xenomai.org>
>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>>pinctrl device
>>>>>>
>>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>
>>>>>>> > -----Original Message-----
>>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>Philippe
>>>>>>> > Gerum <rpm@xenomai.org>
>>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>>from
>>>>>>> > pinctrl device
>>>>>>> >
>>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>> > >
>>>>>>> > > > -----Original Message-----
>>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>> > xenomai@lists.linux.dev
>>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>alloced
>>>>>>from
>>>>>>> > > > pinctrl device
>>>>>>> > > >
>>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>> > > > > Hi Philippe/Jan
>>>>>>> > > > >
>>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
>>>>>>> > interrupt
>>>>>>> > > > alloced from
>>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>> > > > > According to our test,  the patch [2] works for such issue in our
>>>>case.
>>>>>>It
>>>>>>> > > > seems that handle_edge_irq
>>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux gpio
>>>>>>interrupt.
>>>>>>> > > > >
>>>>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>>>>suggest.
>>>>>>> > > > >
>>>>>>> > > > > Regards
>>>>>>> > > > >
>>>>>>> > > > > Hongzhan Chen
>>>>>>> > > > >
>>>>>>> > > > > [1]:
>>>>>>> > > > > cat /proc/interrupts
>>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>>> > INTC1020:00,
>>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>0000:00:1a.1
>>>>>>cd
>>>>>>> > > > >
>>>>>>> > > > > [2]:
>>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>>*desc)
>>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>>> > > > >
>>>>>>> > > > >         /* Start handling the irq */
>>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > >
>>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>>unmasked
>>>>>>the
>>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>>> > > >
>>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>>>>fired
>>>>>>> > > > again. No?
>>>>>>> > >
>>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>>interrupt
>>>>>>here
>>>>>>> > like
>>>>>>> > > following comparing:
>>>>>>> >
>>>>>>> > This doesn't make sense (IMHO).
>>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>>> > on_pipeline_entry() returns false?
>>>>>>> >
>>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>>>>>>> > should already be ACKed.
>>>>>>>
>>>>>>> Following stack is dumped after patching my patch. Without patching
>>>>there
>>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>>>which
>>>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That is
>>>>why
>>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>Embedded
>>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>> [  291.009332] Call Trace:
>>>>>>> [  291.009332]  <IRQ>
>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>> [  291.009338]  </IRQ>
>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>                   ^^
>>>>>>My understanding is that you are not handling an HW IRQ, you are in the
>>>>>>stage where the inband IRQ log is being processed. HW IRQs are already
>>>>>>ACKed at that stage.
>>>>>>
>>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO APIC
>>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>>which takes care of masking + ack.
>>>>>
>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>>>registered in io_apic driver. There is second level pinctrl device acting as
>>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>>>which need to be acked in handle_edge_irq.
>>>>>
>>>>> Regards
>>>>>
>>>>> Hongzhan Chen
>>>>>>
>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c
>>>>25
>>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
>>08
>>>>75
>>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Hongzhan Chen
>>>>>>> >
>>>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on your
>>>>>>> > end.
>>>>>>> >
>>>>>>> > As Jan said, I'm searching for a similar problem for some time now
>>that
>>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>>>>>>> > not a storm. I will give it a try, let's see.
>>>>>>> >
>>>>>>> > >
>>>>>>> > > After patching:
>>>>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
>>cd
>>>>>>> > >
>>>>>>> > > Before patching:
>>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>>0000:00:1a.1
>>>>cd
>>>>>>> > >
>>>>>>> > > Regards
>>>>>>> > >
>>>>>>> > > Hongzhan Chen
>>>>>>> > > >
>>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>>enabled?
>>>>>>If
>>>>>>> > > > so, I have an idea what is going on...
>>>>>>> > > >
>>>>>>> > > > Florian
>>>>>>> > >
>>>>>>>
>>>>
>>>>To start with, prior to any further analysis:
>>>>
>>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h b/drivers/pinctrl/intel/pinctrl-
>>>>intel.h
>>>>index 65628423bf639..b7afa763ee079 100644
>>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>  */
>>>> struct intel_pinctrl {
>>>> 	struct device *dev;
>>>>-	raw_spinlock_t lock;
>>>>+	hard_spinlock_t lock;
>>>
>>> I already modified it at first when we found the issue.  But it does not work
>>with it.
>>>
>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>
>>
>>Ok, but the patch is papering over the real issue I believe.
>
> What kind of real issue it would be? What checkpoint I need to check to find the real issue out? Please suggest.
> Continuous fired irq is obviously caused by forgotten ack behavior for the issue. The ack is supposed to happen elsewhere?
>

Edge ack is supposed to happen in handle_edge_irq() when a pipeline
entry is detected a few lines earlier, your patch would cause a double
ACK in the common case when pipelining, which might cause events to be
lost. As you found out already, the problem is that such flow handler is
called _outside_ of the pipeline entry path, so the ack is not
delivered.

This happens due to this particular interrupt path, which chains two
irqchips, edge on top of fasteoi, combined to the fact that there is no
oob handling of the fasteoi IRQ, so interrupt delivery has to go through
the per-CPU pipeline log first, therefore handling is postponed until
the pipeline is synchronized, _after_ we left the pipeline entry
context. As a result, the fasteoi flow handler does have in_pipeline()
set, but the edge flow handler does not, causing this bug.

handle_irq_pipelined_prepare()
[on_pipeline_entry: true]
arch_handle_irq()
        handle_fasteoi_irq()
                handle_oob_irq()
                        mask_cond_eoi_irq()
handle_irq_pipelined_finish()
[on_pipeline_entry: false]
        synchronize_pipeline_on_irq()
                handle_fasteoi_irq() [!on_pipeline_entry]
                      intel_gpio_irq()
                           handle_edge_irq() <<< failed to ack

This is a bug in the Dovetail logic for stacked IRQ chips. I may have
some cycles spare the next couple of weeks to work on this issue and the
rest of my Dovetail backlog if nobody comes up with a fix earlier.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-01  6:35                     ` Philippe Gerum
@ 2023-09-04  8:20                       ` Chen, Hongzhan
  2023-09-04  8:56                         ` Jan Kiszka
                                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Chen, Hongzhan @ 2023-09-04  8:20 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev



>-----Original Message-----
>From: Philippe Gerum <rpm@xenomai.org>
>Sent: Friday, September 1, 2023 2:35 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>
>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>
>>>-----Original Message-----
>>>From: Philippe Gerum <rpm@xenomai.org>
>>>Sent: Thursday, August 31, 2023 4:57 PM
>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>pinctrl device
>>>
>>>
>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>
>>>>>-----Original Message-----
>>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>pinctrl device
>>>>>
>>>>>
>>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>Philippe
>>>>>>>Gerum <rpm@xenomai.org>
>>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>from
>>>>>>>pinctrl device
>>>>>>>
>>>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>>
>>>>>>>> > -----Original Message-----
>>>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>>Philippe
>>>>>>>> > Gerum <rpm@xenomai.org>
>>>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>>>from
>>>>>>>> > pinctrl device
>>>>>>>> >
>>>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>>> > >
>>>>>>>> > > > -----Original Message-----
>>>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>>> > xenomai@lists.linux.dev
>>>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>>alloced
>>>>>>>from
>>>>>>>> > > > pinctrl device
>>>>>>>> > > >
>>>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>>> > > > > Hi Philippe/Jan
>>>>>>>> > > > >
>>>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card
>detect
>>>>>>>> > interrupt
>>>>>>>> > > > alloced from
>>>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>>> > > > > According to our test,  the patch [2] works for such issue in
>our
>>>>>case.
>>>>>>>It
>>>>>>>> > > > seems that handle_edge_irq
>>>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux
>gpio
>>>>>>>interrupt.
>>>>>>>> > > > >
>>>>>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>>>>>suggest.
>>>>>>>> > > > >
>>>>>>>> > > > > Regards
>>>>>>>> > > > >
>>>>>>>> > > > > Hongzhan Chen
>>>>>>>> > > > >
>>>>>>>> > > > > [1]:
>>>>>>>> > > > > cat /proc/interrupts
>>>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>>>> > INTC1020:00,
>>>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>>0000:00:1a.1
>>>>>>>cd
>>>>>>>> > > > >
>>>>>>>> > > > > [2]:
>>>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>>>*desc)
>>>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>>>> > > > >
>>>>>>>> > > > >         /* Start handling the irq */
>>>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>>>> > > > >
>>>>>>>> > > > >
>>>>>>>> > > >
>>>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>>>unmasked
>>>>>>>the
>>>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>>>> > > >
>>>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>>>>>fired
>>>>>>>> > > > again. No?
>>>>>>>> > >
>>>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>>>interrupt
>>>>>>>here
>>>>>>>> > like
>>>>>>>> > > following comparing:
>>>>>>>> >
>>>>>>>> > This doesn't make sense (IMHO).
>>>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>>>> > on_pipeline_entry() returns false?
>>>>>>>> >
>>>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the
>IRQ
>>>>>>>> > should already be ACKed.
>>>>>>>>
>>>>>>>> Following stack is dumped after patching my patch. Without patching
>>>>>there
>>>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>>>>which
>>>>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That
>is
>>>>>why
>>>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>>Embedded
>>>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>>> [  291.009332] Call Trace:
>>>>>>>> [  291.009332]  <IRQ>
>>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>>> [  291.009338]  </IRQ>
>>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>>                   ^^
>>>>>>>My understanding is that you are not handling an HW IRQ, you are in
>the
>>>>>>>stage where the inband IRQ log is being processed. HW IRQs are
>already
>>>>>>>ACKed at that stage.
>>>>>>>
>>>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO
>APIC
>>>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>>>which takes care of masking + ack.
>>>>>>
>>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>>>>registered in io_apic driver. There is second level pinctrl device acting as
>>>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>>>>which need to be acked in handle_edge_irq.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Hongzhan Chen
>>>>>>>
>>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b
>1c
>>>>>25
>>>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
>>>08
>>>>>75
>>>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Hongzhan Chen
>>>>>>>> >
>>>>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on
>your
>>>>>>>> > end.
>>>>>>>> >
>>>>>>>> > As Jan said, I'm searching for a similar problem for some time now
>>>that
>>>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a
>stall,
>>>>>>>> > not a storm. I will give it a try, let's see.
>>>>>>>> >
>>>>>>>> > >
>>>>>>>> > > After patching:
>>>>>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
>>>cd
>>>>>>>> > >
>>>>>>>> > > Before patching:
>>>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>>>0000:00:1a.1
>>>>>cd
>>>>>>>> > >
>>>>>>>> > > Regards
>>>>>>>> > >
>>>>>>>> > > Hongzhan Chen
>>>>>>>> > > >
>>>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>>>enabled?
>>>>>>>If
>>>>>>>> > > > so, I have an idea what is going on...
>>>>>>>> > > >
>>>>>>>> > > > Florian
>>>>>>>> > >
>>>>>>>>
>>>>>
>>>>>To start with, prior to any further analysis:
>>>>>
>>>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>b/drivers/pinctrl/intel/pinctrl-
>>>>>intel.h
>>>>>index 65628423bf639..b7afa763ee079 100644
>>>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>>  */
>>>>> struct intel_pinctrl {
>>>>> 	struct device *dev;
>>>>>-	raw_spinlock_t lock;
>>>>>+	hard_spinlock_t lock;
>>>>
>>>> I already modified it at first when we found the issue.  But it does not
>work
>>>with it.
>>>>
>>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>>
>>>
>>>Ok, but the patch is papering over the real issue I believe.
>>
>> What kind of real issue it would be? What checkpoint I need to check to find
>the real issue out? Please suggest.
>> Continuous fired irq is obviously caused by forgotten ack behavior for the
>issue. The ack is supposed to happen elsewhere?
>>
>
>Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>entry is detected a few lines earlier, your patch would cause a double

Following patch is OK for you? It should avoid double ACK.  Because on_pipeline_entry after  ack
it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set to avoid double ack.

index 6aca6c036ada..d06cf4a79f2a 10064
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
        if (on_pipeline_entry()) {
                chip->irq_ack(&desc->irq_data);
                desc->istate |= IRQS_EDGE;
                handle_oob_irq(desc);
                goto out_unlock;
        }

        kstat_incr_irqs_this_cpu(desc);

-       /* Start handling the irq */
-       if (!irqs_pipelined())
+       /* Start handling the irq
+        * Avoid duplicating ack when IRQS_EDGE set
+        */
+       if (!(desc->istate & IRQS_EDGE))
                chip->irq_ack(&desc->irq_data);

Regards

Hongzhan Chen

>ACK in the common case when pipelining, which might cause events to be
>lost. As you found out already, the problem is that such flow handler is
>called _outside_ of the pipeline entry path, so the ack is not
>delivered.
>
>This happens due to this particular interrupt path, which chains two
>irqchips, edge on top of fasteoi, combined to the fact that there is no
>oob handling of the fasteoi IRQ, so interrupt delivery has to go through
>the per-CPU pipeline log first, therefore handling is postponed until
>the pipeline is synchronized, _after_ we left the pipeline entry
>context. As a result, the fasteoi flow handler does have in_pipeline()
>set, but the edge flow handler does not, causing this bug.
>
>handle_irq_pipelined_prepare()
>[on_pipeline_entry: true]
>arch_handle_irq()
>        handle_fasteoi_irq()
>                handle_oob_irq()
>                        mask_cond_eoi_irq()
>handle_irq_pipelined_finish()
>[on_pipeline_entry: false]
>        synchronize_pipeline_on_irq()
>                handle_fasteoi_irq() [!on_pipeline_entry]
>                      intel_gpio_irq()
>                           handle_edge_irq() <<< failed to ack
>
>This is a bug in the Dovetail logic for stacked IRQ chips. I may have
>some cycles spare the next couple of weeks to work on this issue and the
>rest of my Dovetail backlog if nobody comes up with a fix earlier.
>
>--
>Philippe.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-04  8:20                       ` Chen, Hongzhan
@ 2023-09-04  8:56                         ` Jan Kiszka
  2023-09-04  9:34                           ` Florian Bezdeka
  2023-09-04  9:21                         ` Philippe Gerum
  2023-09-04  9:36                         ` Chen, Hongzhan
  2 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2023-09-04  8:56 UTC (permalink / raw)
  To: Chen, Hongzhan, Philippe Gerum; +Cc: Bezdeka, Florian, xenomai@lists.linux.dev

On 04.09.23 10:20, Chen, Hongzhan wrote:
> 
> 
>> -----Original Message-----
>> From: Philippe Gerum <rpm@xenomai.org>
>> Sent: Friday, September 1, 2023 2:35 PM
>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>> <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>> Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>> pinctrl device
>>
>>
>> "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>
>>>> -----Original Message-----
>>>> From: Philippe Gerum <rpm@xenomai.org>
>>>> Sent: Thursday, August 31, 2023 4:57 PM
>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>> Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>> <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>> Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>> pinctrl device
>>>>
>>>>
>>>> "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Philippe Gerum <rpm@xenomai.org>
>>>>>> Sent: Thursday, August 31, 2023 4:06 PM
>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>> Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>>>> <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>>> Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>> pinctrl device
>>>>>>
>>>>>>
>>>>>> "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>> Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>> Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>> Philippe
>>>>>>>> Gerum <rpm@xenomai.org>
>>>>>>>> Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>> from
>>>>>>>> pinctrl device
>>>>>>>>
>>>>>>>> On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>>> Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>> Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>>> Philippe
>>>>>>>>>> Gerum <rpm@xenomai.org>
>>>>>>>>>> Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>>>> from
>>>>>>>>>> pinctrl device
>>>>>>>>>>
>>>>>>>>>> On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>>>>> Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>>>>> xenomai@lists.linux.dev
>>>>>>>>>>>> Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>>> alloced
>>>>>>>> from
>>>>>>>>>>>> pinctrl device
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>>>>>>>> Hi Philippe/Jan
>>>>>>>>>>>>>
>>>>>>>>>>>>> We found there is irq storm issue like [1] on sd/mmc card
>> detect
>>>>>>>>>> interrupt
>>>>>>>>>>>> alloced from
>>>>>>>>>>>>> pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>>>>>>>> According to our test,  the patch [2] works for such issue in
>> our
>>>>>> case.
>>>>>>>> It
>>>>>>>>>>>> seems that handle_edge_irq
>>>>>>>>>>>>> never ack the irq to clear the interrupt status for the Linux
>> gpio
>>>>>>>> interrupt.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am wondering if there is any side-effect for this patch, Please
>>>>>>>> suggest.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hongzhan Chen
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>> cat /proc/interrupts
>>>>>>>>>>>>>   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>>>>>> INTC1020:00,
>>>>>>>>>>>> INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>>>>>>>> 128:    2440190          0          0          0  INTC1020:01  112
>>>>>> 0000:00:1a.1
>>>>>>>> cd
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2]:
>>>>>>>>>>>>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>>>>>>>> index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>>>>>>>> --- a/kernel/irq/chip.c
>>>>>>>>>>>>> +++ b/kernel/irq/chip.c
>>>>>>>>>>>>> @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>>>> *desc)
>>>>>>>>>>>>>         kstat_incr_irqs_this_cpu(desc);
>>>>>>>>>>>>>
>>>>>>>>>>>>>         /* Start handling the irq */
>>>>>>>>>>>>> -       if (!irqs_pipelined())
>>>>>>>>>>>>>                 chip->irq_ack(&desc->irq_data);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> That doesn't make much sense. A IRQ storm means that we
>>>>>> unmasked
>>>>>>>> the
>>>>>>>>>>>> IRQ to early or forgot to mask it.
>>>>>>>>>>>>
>>>>>>>>>>>> A forgotten ACK would mean the IRQ stalls, so it would never be
>>>>>> fired
>>>>>>>>>>>> again. No?
>>>>>>>>>>>
>>>>>>>>>>> According to the test, it seems forgotten ACK will keep firing
>>>> interrupt
>>>>>>>> here
>>>>>>>>>> like
>>>>>>>>>>> following comparing:
>>>>>>>>>>
>>>>>>>>>> This doesn't make sense (IMHO).
>>>>>>>>>> Can you provide a callstack from handle_edge_irq() in case
>>>>>>>>>> on_pipeline_entry() returns false?
>>>>>>>>>>
>>>>>>>>>> The IRQ should be ACKed on pipeline entry already, even if
>>>>>>>>>> handle_edge_irq() is called again (from Linux/Inband context) the
>> IRQ
>>>>>>>>>> should already be ACKed.
>>>>>>>>>
>>>>>>>>> Following stack is dumped after patching my patch. Without patching
>>>>>> there
>>>>>>>> is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>>>>> which
>>>>>>>> it already called irq_exit_pipeline before sync_current_irq_stage. That
>> is
>>>>>> why
>>>>>>>> on_pipeline_entry returned false in handle_edge_irq
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>>> 5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>>> Embedded
>>>>>>>> Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>>> EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>>>> [  291.009332] Call Trace:
>>>>>>>>> [  291.009332]  <IRQ>
>>>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>>>> [  291.009338]  </IRQ>
>>>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>>>                   ^^
>>>>>>>> My understanding is that you are not handling an HW IRQ, you are in
>> the
>>>>>>>> stage where the inband IRQ log is being processed. HW IRQs are
>> already
>>>>>>>> ACKed at that stage.
>>>>>>>>
>>>>>>>> Maybe we have to understand your IRQ chip routing. Is there an IO
>> APIC
>>>>>>>> involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>>>> which takes care of masking + ack.
>>>>>>>
>>>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>>>>> registered in io_apic driver. There is second level pinctrl device acting as
>>>>>> gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>>>>> which need to be acked in handle_edge_irq.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Hongzhan Chen
>>>>>>>>
>>>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b
>> 1c
>>>>>> 25
>>>>>>>> 00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
>>>> 08
>>>>>> 75
>>>>>>>> 13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Hongzhan Chen
>>>>>>>>>>
>>>>>>>>>> But: If we use the dovetail sources to build a kernel with pipelining
>>>>>>>>>> disabled we have to ACK the IRQ. I assume pipelining is active on
>> your
>>>>>>>>>> end.
>>>>>>>>>>
>>>>>>>>>> As Jan said, I'm searching for a similar problem for some time now
>>>> that
>>>>>>>>>> I can only reproduce on VirtualBox VMM. In my case I'm seeing a
>> stall,
>>>>>>>>>> not a storm. I will give it a try, let's see.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> After patching:
>>>>>>>>>>> 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
>>>> cd
>>>>>>>>>>>
>>>>>>>>>>> Before patching:
>>>>>>>>>>> 128:    2440190          0          0          0  INTC1020:01  112
>>>> 0000:00:1a.1
>>>>>> cd
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Hongzhan Chen
>>>>>>>>>>>>
>>>>>>>>>>>> Could you please check if we call unmask_irq() with HW IRQs
>>>>>> enabled?
>>>>>>>> If
>>>>>>>>>>>> so, I have an idea what is going on...
>>>>>>>>>>>>
>>>>>>>>>>>> Florian
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> To start with, prior to any further analysis:
>>>>>>
>>>>>> diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>> b/drivers/pinctrl/intel/pinctrl-
>>>>>> intel.h
>>>>>> index 65628423bf639..b7afa763ee079 100644
>>>>>> --- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>> +++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>> @@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>>>  */
>>>>>> struct intel_pinctrl {
>>>>>> 	struct device *dev;
>>>>>> -	raw_spinlock_t lock;
>>>>>> +	hard_spinlock_t lock;
>>>>>
>>>>> I already modified it at first when we found the issue.  But it does not
>> work
>>>> with it.
>>>>>
>>>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>>>
>>>>
>>>> Ok, but the patch is papering over the real issue I believe.
>>>
>>> What kind of real issue it would be? What checkpoint I need to check to find
>> the real issue out? Please suggest.
>>> Continuous fired irq is obviously caused by forgotten ack behavior for the
>> issue. The ack is supposed to happen elsewhere?
>>>
>>
>> Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>> entry is detected a few lines earlier, your patch would cause a double
> 
> Following patch is OK for you? It should avoid double ACK.  Because on_pipeline_entry after  ack
> it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set to avoid double ack.
> 
> index 6aca6c036ada..d06cf4a79f2a 10064
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
>         if (on_pipeline_entry()) {
>                 chip->irq_ack(&desc->irq_data);
>                 desc->istate |= IRQS_EDGE;
>                 handle_oob_irq(desc);
>                 goto out_unlock;
>         }
> 
>         kstat_incr_irqs_this_cpu(desc);
> 
> -       /* Start handling the irq */
> -       if (!irqs_pipelined())
> +       /* Start handling the irq
> +        * Avoid duplicating ack when IRQS_EDGE set
> +        */
> +       if (!(desc->istate & IRQS_EDGE))
>                 chip->irq_ack(&desc->irq_data);
> 

Doesn't answer why we get here without an ack while the pipeline is
enabled, does it?

Jan

-- 
Siemens AG, Technology
Linux Expert Center


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-04  8:20                       ` Chen, Hongzhan
  2023-09-04  8:56                         ` Jan Kiszka
@ 2023-09-04  9:21                         ` Philippe Gerum
  2023-09-04 13:26                           ` Chen, Hongzhan
  2023-09-04  9:36                         ` Chen, Hongzhan
  2 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2023-09-04  9:21 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev


"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:

>>-----Original Message-----
>>From: Philippe Gerum <rpm@xenomai.org>
>>Sent: Friday, September 1, 2023 2:35 PM
>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>pinctrl device
>>
>>
>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>
>>>>-----Original Message-----
>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>Sent: Thursday, August 31, 2023 4:57 PM
>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>pinctrl device
>>>>
>>>>
>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>
>>>>>>-----Original Message-----
>>>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>>pinctrl device
>>>>>>
>>>>>>
>>>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>Philippe
>>>>>>>>Gerum <rpm@xenomai.org>
>>>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>from
>>>>>>>>pinctrl device
>>>>>>>>
>>>>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>>>
>>>>>>>>> > -----Original Message-----
>>>>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>>>Philippe
>>>>>>>>> > Gerum <rpm@xenomai.org>
>>>>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>>>>from
>>>>>>>>> > pinctrl device
>>>>>>>>> >
>>>>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>>>> > >
>>>>>>>>> > > > -----Original Message-----
>>>>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>>>> > xenomai@lists.linux.dev
>>>>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>>>alloced
>>>>>>>>from
>>>>>>>>> > > > pinctrl device
>>>>>>>>> > > >
>>>>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>>>> > > > > Hi Philippe/Jan
>>>>>>>>> > > > >
>>>>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card
>>detect
>>>>>>>>> > interrupt
>>>>>>>>> > > > alloced from
>>>>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>>>> > > > > According to our test,  the patch [2] works for such issue in
>>our
>>>>>>case.
>>>>>>>>It
>>>>>>>>> > > > seems that handle_edge_irq
>>>>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux
>>gpio
>>>>>>>>interrupt.
>>>>>>>>> > > > >
>>>>>>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>>>>>>suggest.
>>>>>>>>> > > > >
>>>>>>>>> > > > > Regards
>>>>>>>>> > > > >
>>>>>>>>> > > > > Hongzhan Chen
>>>>>>>>> > > > >
>>>>>>>>> > > > > [1]:
>>>>>>>>> > > > > cat /proc/interrupts
>>>>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>>>>> > INTC1020:00,
>>>>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>>>0000:00:1a.1
>>>>>>>>cd
>>>>>>>>> > > > >
>>>>>>>>> > > > > [2]:
>>>>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>>>>*desc)
>>>>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>>>>> > > > >
>>>>>>>>> > > > >         /* Start handling the irq */
>>>>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>>>>> > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > >
>>>>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>>>>unmasked
>>>>>>>>the
>>>>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>>>>> > > >
>>>>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>>>>>>fired
>>>>>>>>> > > > again. No?
>>>>>>>>> > >
>>>>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>>>>interrupt
>>>>>>>>here
>>>>>>>>> > like
>>>>>>>>> > > following comparing:
>>>>>>>>> >
>>>>>>>>> > This doesn't make sense (IMHO).
>>>>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>>>>> > on_pipeline_entry() returns false?
>>>>>>>>> >
>>>>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the
>>IRQ
>>>>>>>>> > should already be ACKed.
>>>>>>>>>
>>>>>>>>> Following stack is dumped after patching my patch. Without patching
>>>>>>there
>>>>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>>>>>which
>>>>>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That
>>is
>>>>>>why
>>>>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>>>Embedded
>>>>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>>>> [  291.009332] Call Trace:
>>>>>>>>> [  291.009332]  <IRQ>
>>>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>>>> [  291.009338]  </IRQ>
>>>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>>>                   ^^
>>>>>>>>My understanding is that you are not handling an HW IRQ, you are in
>>the
>>>>>>>>stage where the inband IRQ log is being processed. HW IRQs are
>>already
>>>>>>>>ACKed at that stage.
>>>>>>>>
>>>>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO
>>APIC
>>>>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>>>>which takes care of masking + ack.
>>>>>>>
>>>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>>>>>registered in io_apic driver. There is second level pinctrl device acting as
>>>>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>>>>>which need to be acked in handle_edge_irq.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Hongzhan Chen
>>>>>>>>
>>>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b
>>1c
>>>>>>25
>>>>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
>>>>08
>>>>>>75
>>>>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Hongzhan Chen
>>>>>>>>> >
>>>>>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on
>>your
>>>>>>>>> > end.
>>>>>>>>> >
>>>>>>>>> > As Jan said, I'm searching for a similar problem for some time now
>>>>that
>>>>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a
>>stall,
>>>>>>>>> > not a storm. I will give it a try, let's see.
>>>>>>>>> >
>>>>>>>>> > >
>>>>>>>>> > > After patching:
>>>>>>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
>>>>cd
>>>>>>>>> > >
>>>>>>>>> > > Before patching:
>>>>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>0000:00:1a.1
>>>>>>cd
>>>>>>>>> > >
>>>>>>>>> > > Regards
>>>>>>>>> > >
>>>>>>>>> > > Hongzhan Chen
>>>>>>>>> > > >
>>>>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>>>>enabled?
>>>>>>>>If
>>>>>>>>> > > > so, I have an idea what is going on...
>>>>>>>>> > > >
>>>>>>>>> > > > Florian
>>>>>>>>> > >
>>>>>>>>>
>>>>>>
>>>>>>To start with, prior to any further analysis:
>>>>>>
>>>>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>>b/drivers/pinctrl/intel/pinctrl-
>>>>>>intel.h
>>>>>>index 65628423bf639..b7afa763ee079 100644
>>>>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>>>  */
>>>>>> struct intel_pinctrl {
>>>>>> 	struct device *dev;
>>>>>>-	raw_spinlock_t lock;
>>>>>>+	hard_spinlock_t lock;
>>>>>
>>>>> I already modified it at first when we found the issue.  But it does not
>>work
>>>>with it.
>>>>>
>>>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>>>
>>>>
>>>>Ok, but the patch is papering over the real issue I believe.
>>>
>>> What kind of real issue it would be? What checkpoint I need to check to find
>>the real issue out? Please suggest.
>>> Continuous fired irq is obviously caused by forgotten ack behavior for the
>>issue. The ack is supposed to happen elsewhere?
>>>
>>
>>Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>>entry is detected a few lines earlier, your patch would cause a double
>
> Following patch is OK for you? It should avoid double ACK.  Because on_pipeline_entry after  ack
> it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set to avoid double ack.
>
> index 6aca6c036ada..d06cf4a79f2a 10064
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
>         if (on_pipeline_entry()) {
>                 chip->irq_ack(&desc->irq_data);
>                 desc->istate |= IRQS_EDGE;
>                 handle_oob_irq(desc);
>                 goto out_unlock;
>         }
>
>         kstat_incr_irqs_this_cpu(desc);
>
> -       /* Start handling the irq */
> -       if (!irqs_pipelined())
> +       /* Start handling the irq
> +        * Avoid duplicating ack when IRQS_EDGE set
> +        */
> +       if (!(desc->istate & IRQS_EDGE))
>                 chip->irq_ack(&desc->irq_data);

This is getting closer but not yet correct. The idea of tracking the
pending acknowledge using a per-event bit from the internal descriptor
state looks right, we need something along these lines.

However you may not reuse IRQS_EDGE for this, first because the latter
deals with IRQ resend, so the semantics are different, but also because
this bit is cleared early when leaving the flow handler upon its initial
invocation for an IRQ. This won't work because we are interested by the
second invocation of such handler for the same IRQ event, the one which
replays the event from the in-band stage. There are some gory details
about the interrupt flow at [1] which may help understanding.

[1] https://evlproject.org/dovetail/porting/irqflow/

-- 
Philippe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-04  8:56                         ` Jan Kiszka
@ 2023-09-04  9:34                           ` Florian Bezdeka
  2023-09-04 13:35                             ` Chen, Hongzhan
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Bezdeka @ 2023-09-04  9:34 UTC (permalink / raw)
  To: Jan Kiszka, Chen, Hongzhan, Philippe Gerum; +Cc: xenomai@lists.linux.dev

On Mon, 2023-09-04 at 10:56 +0200, Jan Kiszka wrote:
> On 04.09.23 10:20, Chen, Hongzhan wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Philippe Gerum <rpm@xenomai.org>
> > > Sent: Friday, September 1, 2023 2:35 PM
> > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
> > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
> > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
> > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
> > > pinctrl device
> > > 
> > > 
> > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
> > > 
> > > > > -----Original Message-----
> > > > > From: Philippe Gerum <rpm@xenomai.org>
> > > > > Sent: Thursday, August 31, 2023 4:57 PM
> > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
> > > > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
> > > > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
> > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
> > > > > pinctrl device
> > > > > 
> > > > > 
> > > > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
> > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: Philippe Gerum <rpm@xenomai.org>
> > > > > > > Sent: Thursday, August 31, 2023 4:06 PM
> > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
> > > > > > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
> > > > > > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
> > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
> > > > > > > pinctrl device
> > > > > > > 
> > > > > > > 
> > > > > > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
> > > > > > > 
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
> > > > > > > > > Sent: Thursday, August 31, 2023 3:10 PM
> > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
> > > > > > > > > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
> > > > > > > Philippe
> > > > > > > > > Gerum <rpm@xenomai.org>
> > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
> > > from
> > > > > > > > > pinctrl device
> > > > > > > > > 
> > > > > > > > > On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
> > > > > > > > > > 
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
> > > > > > > > > > > Sent: Wednesday, August 30, 2023 9:23 PM
> > > > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
> > > > > > > > > > > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
> > > > > > > > > Philippe
> > > > > > > > > > > Gerum <rpm@xenomai.org>
> > > > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
> > > > > > > from
> > > > > > > > > > > pinctrl device
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
> > > > > > > > > > > > > Sent: Wednesday, August 30, 2023 3:20 PM
> > > > > > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
> > > > > > > > > > > xenomai@lists.linux.dev
> > > > > > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
> > > > > alloced
> > > > > > > > > from
> > > > > > > > > > > > > pinctrl device
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
> > > > > > > > > > > > > > Hi Philippe/Jan
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > We found there is irq storm issue like [1] on sd/mmc card
> > > detect
> > > > > > > > > > > interrupt
> > > > > > > > > > > > > alloced from
> > > > > > > > > > > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
> > > > > > > > > > > > > > According to our test,  the patch [2] works for such issue in
> > > our
> > > > > > > case.
> > > > > > > > > It
> > > > > > > > > > > > > seems that handle_edge_irq
> > > > > > > > > > > > > > never ack the irq to clear the interrupt status for the Linux
> > > gpio
> > > > > > > > > interrupt.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I am wondering if there is any side-effect for this patch, Please
> > > > > > > > > suggest.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hongzhan Chen
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > [1]:
> > > > > > > > > > > > > > cat /proc/interrupts
> > > > > > > > > > > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
> > > > > > > > > > > INTC1020:00,
> > > > > > > > > > > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
> > > > > > > > > > > > > > 128:    2440190          0          0          0  INTC1020:01  112
> > > > > > > 0000:00:1a.1
> > > > > > > > > cd
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > [2]:
> > > > > > > > > > > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > > > > > > > > > > > > > index 6aca6c036ada..4d08eaf076e9 100644
> > > > > > > > > > > > > > --- a/kernel/irq/chip.c
> > > > > > > > > > > > > > +++ b/kernel/irq/chip.c
> > > > > > > > > > > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
> > > > > > > *desc)
> > > > > > > > > > > > > >         kstat_incr_irqs_this_cpu(desc);
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >         /* Start handling the irq */
> > > > > > > > > > > > > > -       if (!irqs_pipelined())
> > > > > > > > > > > > > >                 chip->irq_ack(&desc->irq_data);
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > That doesn't make much sense. A IRQ storm means that we
> > > > > > > unmasked
> > > > > > > > > the
> > > > > > > > > > > > > IRQ to early or forgot to mask it.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > A forgotten ACK would mean the IRQ stalls, so it would never be
> > > > > > > fired
> > > > > > > > > > > > > again. No?
> > > > > > > > > > > > 
> > > > > > > > > > > > According to the test, it seems forgotten ACK will keep firing
> > > > > interrupt
> > > > > > > > > here
> > > > > > > > > > > like
> > > > > > > > > > > > following comparing:
> > > > > > > > > > > 
> > > > > > > > > > > This doesn't make sense (IMHO).
> > > > > > > > > > > Can you provide a callstack from handle_edge_irq() in case
> > > > > > > > > > > on_pipeline_entry() returns false?
> > > > > > > > > > > 
> > > > > > > > > > > The IRQ should be ACKed on pipeline entry already, even if
> > > > > > > > > > > handle_edge_irq() is called again (from Linux/Inband context) the
> > > IRQ
> > > > > > > > > > > should already be ACKed.
> > > > > > > > > > 
> > > > > > > > > > Following stack is dumped after patching my patch. Without patching
> > > > > > > there
> > > > > > > > > is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
> > > > > > > which
> > > > > > > > > it already called irq_exit_pipeline before sync_current_irq_stage. That
> > > is
> > > > > > > why
> > > > > > > > > on_pipeline_entry returned false in handle_edge_irq
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
> > > > > > > > > 5.10.179-intel-ese-standard-lts-dovetail+ #1
> > > > > > > > > > [  291.009331] Hardware name: Intel Corporation Elkhart Lake
> > > > > Embedded
> > > > > > > > > Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
> > > > > > > > > EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
> > > > > > > > > > [  291.009332] IRQ stage: Linux
> > > > > > > > > > [  291.009332] Call Trace:
> > > > > > > > > > [  291.009332]  <IRQ>
> > > > > > > > > > [  291.009333]  dump_stack+0x7b/0x93
> > > > > > > > > > [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
> > > > > > > > > > [  291.009334]  handle_edge_irq+0xe4/0x310
> > > > > > > > > > [  291.009334]  generic_handle_irq+0x4b/0x60
> > > > > > > > > > [  291.009335]  intel_gpio_irq+0x11f/0x1c0
> > > > > > > > > > [  291.009335]  __handle_irq_event_percpu+0x77/0x190
> > > > > > > > > > [  291.009336]  ? handle_level_irq+0x200/0x200
> > > > > > > > > > [  291.009336]  handle_irq_event+0x6c/0x100
> > > > > > > > > > [  291.009337]  handle_fasteoi_irq+0xd8/0x250
> > > > > > > > > > [  291.009337]  asm_call_irq_on_stack+0xf/0x20
> > > > > > > > > > [  291.009338]  </IRQ>
> > > > > > > > > > [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
> > > > > > > > > > [  291.009339]  sync_current_irq_stage+0x157/0x1f0
> > > > > > > > >                   ^^
> > > > > > > > > My understanding is that you are not handling an HW IRQ, you are in
> > > the
> > > > > > > > > stage where the inband IRQ log is being processed. HW IRQs are
> > > already
> > > > > > > > > ACKed at that stage.
> > > > > > > > > 
> > > > > > > > > Maybe we have to understand your IRQ chip routing. Is there an IO
> > > APIC
> > > > > > > > > involved? handle_fasteoi_irq() would be the first level IRQ entry point
> > > > > > > > > which takes care of masking + ack.
> > > > > > > > 
> > > > > > > > Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
> > > > > > > registered in io_apic driver. There is second level pinctrl device acting as
> > > > > > > gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
> > > > > > > which need to be acked in handle_edge_irq.
> > > > > > > > 
> > > > > > > > Regards
> > > > > > > > 
> > > > > > > > Hongzhan Chen
> > > > > > > > > 
> > > > > > > > > > [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
> > > > > > > > > > [  291.009340]  arch_pipeline_entry+0xdb/0x120
> > > > > > > > > > [  291.009340]  asm_common_interrupt+0x1e/0x40
> > > > > > > > > > [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
> > > > > > > > > > [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b
> > > 1c
> > > > > > > 25
> > > > > > > > > 00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
> > > > > 08
> > > > > > > 75
> > > > > > > > > 13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
> > > > > > > > > > [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
> > > > > > > > > > 
> > > > > > > > > > Regards
> > > > > > > > > > 
> > > > > > > > > > Hongzhan Chen
> > > > > > > > > > > 
> > > > > > > > > > > But: If we use the dovetail sources to build a kernel with pipelining
> > > > > > > > > > > disabled we have to ACK the IRQ. I assume pipelining is active on
> > > your
> > > > > > > > > > > end.
> > > > > > > > > > > 
> > > > > > > > > > > As Jan said, I'm searching for a similar problem for some time now
> > > > > that
> > > > > > > > > > > I can only reproduce on VirtualBox VMM. In my case I'm seeing a
> > > stall,
> > > > > > > > > > > not a storm. I will give it a try, let's see.
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > After patching:
> > > > > > > > > > > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
> > > > > cd
> > > > > > > > > > > > 
> > > > > > > > > > > > Before patching:
> > > > > > > > > > > > 128:    2440190          0          0          0  INTC1020:01  112
> > > > > 0000:00:1a.1
> > > > > > > cd
> > > > > > > > > > > > 
> > > > > > > > > > > > Regards
> > > > > > > > > > > > 
> > > > > > > > > > > > Hongzhan Chen
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Could you please check if we call unmask_irq() with HW IRQs
> > > > > > > enabled?
> > > > > > > > > If
> > > > > > > > > > > > > so, I have an idea what is going on...
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Florian
> > > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > 
> > > > > > > To start with, prior to any further analysis:
> > > > > > > 
> > > > > > > diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
> > > b/drivers/pinctrl/intel/pinctrl-
> > > > > > > intel.h
> > > > > > > index 65628423bf639..b7afa763ee079 100644
> > > > > > > --- a/drivers/pinctrl/intel/pinctrl-intel.h
> > > > > > > +++ b/drivers/pinctrl/intel/pinctrl-intel.h
> > > > > > > @@ -222,7 +222,7 @@ struct intel_pinctrl_context {
> > > > > > >  */
> > > > > > > struct intel_pinctrl {
> > > > > > > 	struct device *dev;
> > > > > > > -	raw_spinlock_t lock;
> > > > > > > +	hard_spinlock_t lock;
> > > > > > 
> > > > > > I already modified it at first when we found the issue.  But it does not
> > > work
> > > > > with it.
> > > > > > 
> > > > > > After patching my patch[2] I mentioned, the issue disappear finally.
> > > > > > 
> > > > > 
> > > > > Ok, but the patch is papering over the real issue I believe.
> > > > 
> > > > What kind of real issue it would be? What checkpoint I need to check to find
> > > the real issue out? Please suggest.
> > > > Continuous fired irq is obviously caused by forgotten ack behavior for the
> > > issue. The ack is supposed to happen elsewhere?
> > > > 
> > > 
> > > Edge ack is supposed to happen in handle_edge_irq() when a pipeline
> > > entry is detected a few lines earlier, your patch would cause a double
> > 
> > Following patch is OK for you? It should avoid double ACK.  Because on_pipeline_entry after  ack
> > it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set to avoid double ack.
> > 
> > index 6aca6c036ada..d06cf4a79f2a 10064
> > --- a/kernel/irq/chip.c
> > +++ b/kernel/irq/chip.c
> > @@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
> >         if (on_pipeline_entry()) {
> >                 chip->irq_ack(&desc->irq_data);
> >                 desc->istate |= IRQS_EDGE;
> >                 handle_oob_irq(desc);
> >                 goto out_unlock;
> >         }
> > 
> >         kstat_incr_irqs_this_cpu(desc);
> > 
> > -       /* Start handling the irq */
> > -       if (!irqs_pipelined())
> > +       /* Start handling the irq
> > +        * Avoid duplicating ack when IRQS_EDGE set
> > +        */
> > +       if (!(desc->istate & IRQS_EDGE))
> >                 chip->irq_ack(&desc->irq_data);
> > 
> 
> Doesn't answer why we get here without an ack while the pipeline is
> enabled, does it?

The problem in this case is the "stacking" of IRQ chips. Philippe
already confirmed that we have a problem here. 

This patch is still papering over the real issue.

As already pointed out by Philippe we have two work items here:
 - Missing pipelining support within the Intel GPIO pin control
 - Dovetail logic regarding stacked IRQ chips needs reworking

Florian

> 
> Jan
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-04  8:20                       ` Chen, Hongzhan
  2023-09-04  8:56                         ` Jan Kiszka
  2023-09-04  9:21                         ` Philippe Gerum
@ 2023-09-04  9:36                         ` Chen, Hongzhan
  2 siblings, 0 replies; 21+ messages in thread
From: Chen, Hongzhan @ 2023-09-04  9:36 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev



>-----Original Message-----
>From: Chen, Hongzhan
>Sent: Monday, September 4, 2023 4:20 PM
>To: Philippe Gerum <rpm@xenomai.org>
>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>Subject: RE: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>
>
>>-----Original Message-----
>>From: Philippe Gerum <rpm@xenomai.org>
>>Sent: Friday, September 1, 2023 2:35 PM
>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>pinctrl device
>>
>>
>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>
>>>>-----Original Message-----
>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>Sent: Thursday, August 31, 2023 4:57 PM
>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>pinctrl device
>>>>
>>>>
>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>
>>>>>>-----Original Message-----
>>>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>from
>>>>>>pinctrl device
>>>>>>
>>>>>>
>>>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>Philippe
>>>>>>>>Gerum <rpm@xenomai.org>
>>>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>from
>>>>>>>>pinctrl device
>>>>>>>>
>>>>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>>>
>>>>>>>>> > -----Original Message-----
>>>>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>;
>xenomai@lists.linux.dev;
>>>>>>>>Philippe
>>>>>>>>> > Gerum <rpm@xenomai.org>
>>>>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>alloced
>>>>>>from
>>>>>>>>> > pinctrl device
>>>>>>>>> >
>>>>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>>>> > >
>>>>>>>>> > > > -----Original Message-----
>>>>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>>>> > xenomai@lists.linux.dev
>>>>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>>>alloced
>>>>>>>>from
>>>>>>>>> > > > pinctrl device
>>>>>>>>> > > >
>>>>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>>>> > > > > Hi Philippe/Jan
>>>>>>>>> > > > >
>>>>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card
>>detect
>>>>>>>>> > interrupt
>>>>>>>>> > > > alloced from
>>>>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>>>> > > > > According to our test,  the patch [2] works for such issue in
>>our
>>>>>>case.
>>>>>>>>It
>>>>>>>>> > > > seems that handle_edge_irq
>>>>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux
>>gpio
>>>>>>>>interrupt.
>>>>>>>>> > > > >
>>>>>>>>> > > > > I am wondering if there is any side-effect for this patch,
>Please
>>>>>>>>suggest.
>>>>>>>>> > > > >
>>>>>>>>> > > > > Regards
>>>>>>>>> > > > >
>>>>>>>>> > > > > Hongzhan Chen
>>>>>>>>> > > > >
>>>>>>>>> > > > > [1]:
>>>>>>>>> > > > > cat /proc/interrupts
>>>>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>>>>> > INTC1020:00,
>>>>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>>>0000:00:1a.1
>>>>>>>>cd
>>>>>>>>> > > > >
>>>>>>>>> > > > > [2]:
>>>>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>>>>*desc)
>>>>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>>>>> > > > >
>>>>>>>>> > > > >         /* Start handling the irq */
>>>>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>>>>> > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > >
>>>>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>>>>unmasked
>>>>>>>>the
>>>>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>>>>> > > >
>>>>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never
>be
>>>>>>fired
>>>>>>>>> > > > again. No?
>>>>>>>>> > >
>>>>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>>>>interrupt
>>>>>>>>here
>>>>>>>>> > like
>>>>>>>>> > > following comparing:
>>>>>>>>> >
>>>>>>>>> > This doesn't make sense (IMHO).
>>>>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>>>>> > on_pipeline_entry() returns false?
>>>>>>>>> >
>>>>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the
>>IRQ
>>>>>>>>> > should already be ACKed.
>>>>>>>>>
>>>>>>>>> Following stack is dumped after patching my patch. Without
>patching
>>>>>>there
>>>>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish
>from
>>>>>>which
>>>>>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That
>>is
>>>>>>why
>>>>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>>>Embedded
>>>>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>>>> [  291.009332] Call Trace:
>>>>>>>>> [  291.009332]  <IRQ>
>>>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>>>> [  291.009338]  </IRQ>
>>>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>>>                   ^^
>>>>>>>>My understanding is that you are not handling an HW IRQ, you are in
>>the
>>>>>>>>stage where the inband IRQ log is being processed. HW IRQs are
>>already
>>>>>>>>ACKed at that stage.
>>>>>>>>
>>>>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO
>>APIC
>>>>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>>>>which takes care of masking + ack.
>>>>>>>
>>>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq()
>is
>>>>>>registered in io_apic driver. There is second level pinctrl device acting as
>>>>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced
>from ,
>>>>>>which need to be acked in handle_edge_irq.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Hongzhan Chen
>>>>>>>>
>>>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48
>8b
>>1c
>>>>>>25
>>>>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03
><a8>
>>>>08
>>>>>>75
>>>>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Hongzhan Chen
>>>>>>>>> >
>>>>>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on
>>your
>>>>>>>>> > end.
>>>>>>>>> >
>>>>>>>>> > As Jan said, I'm searching for a similar problem for some time now
>>>>that
>>>>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a
>>stall,
>>>>>>>>> > not a storm. I will give it a try, let's see.
>>>>>>>>> >
>>>>>>>>> > >
>>>>>>>>> > > After patching:
>>>>>>>>> > > 128:          2          0          0          0  INTC1020:01  112
>0000:00:1a.1
>>>>cd
>>>>>>>>> > >
>>>>>>>>> > > Before patching:
>>>>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>0000:00:1a.1
>>>>>>cd
>>>>>>>>> > >
>>>>>>>>> > > Regards
>>>>>>>>> > >
>>>>>>>>> > > Hongzhan Chen
>>>>>>>>> > > >
>>>>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>>>>enabled?
>>>>>>>>If
>>>>>>>>> > > > so, I have an idea what is going on...
>>>>>>>>> > > >
>>>>>>>>> > > > Florian
>>>>>>>>> > >
>>>>>>>>>
>>>>>>
>>>>>>To start with, prior to any further analysis:
>>>>>>
>>>>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>>b/drivers/pinctrl/intel/pinctrl-
>>>>>>intel.h
>>>>>>index 65628423bf639..b7afa763ee079 100644
>>>>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>>>  */
>>>>>> struct intel_pinctrl {
>>>>>> 	struct device *dev;
>>>>>>-	raw_spinlock_t lock;
>>>>>>+	hard_spinlock_t lock;
>>>>>
>>>>> I already modified it at first when we found the issue.  But it does not
>>work
>>>>with it.
>>>>>
>>>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>>>
>>>>
>>>>Ok, but the patch is papering over the real issue I believe.
>>>
>>> What kind of real issue it would be? What checkpoint I need to check to
>find
>>the real issue out? Please suggest.
>>> Continuous fired irq is obviously caused by forgotten ack behavior for the
>>issue. The ack is supposed to happen elsewhere?
>>>
>>
>>Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>>entry is detected a few lines earlier, your patch would cause a double
>
>Following patch is OK for you? It should avoid double ACK.  Because
>on_pipeline_entry after  ack
>it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set to
>avoid double ack.
>
>index 6aca6c036ada..d06cf4a79f2a 10064
>--- a/kernel/irq/chip.c
>+++ b/kernel/irq/chip.c
>@@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
>        if (on_pipeline_entry()) {
>                chip->irq_ack(&desc->irq_data);
>                desc->istate |= IRQS_EDGE;
>                handle_oob_irq(desc);
>                goto out_unlock;
>        }
>
>        kstat_incr_irqs_this_cpu(desc);
>
>-       /* Start handling the irq */
>-       if (!irqs_pipelined())
>+       /* Start handling the irq
>+        * Avoid duplicating ack when IRQS_EDGE set
>+        */
>+       if (!(desc->istate & IRQS_EDGE))
>                chip->irq_ack(&desc->irq_data);
>
Please ignore it. It does not work. IRQS_EDGE flag will be cleaned before quitting handle_edge_irq.

Regards

Hongzhan Chen
>Regards

>
>Hongzhan Chen
>
>>ACK in the common case when pipelining, which might cause events to be
>>lost. As you found out already, the problem is that such flow handler is
>>called _outside_ of the pipeline entry path, so the ack is not
>>delivered.
>>
>>This happens due to this particular interrupt path, which chains two
>>irqchips, edge on top of fasteoi, combined to the fact that there is no
>>oob handling of the fasteoi IRQ, so interrupt delivery has to go through
>>the per-CPU pipeline log first, therefore handling is postponed until
>>the pipeline is synchronized, _after_ we left the pipeline entry
>>context. As a result, the fasteoi flow handler does have in_pipeline()
>>set, but the edge flow handler does not, causing this bug.
>>
>>handle_irq_pipelined_prepare()
>>[on_pipeline_entry: true]
>>arch_handle_irq()
>>        handle_fasteoi_irq()
>>                handle_oob_irq()
>>                        mask_cond_eoi_irq()
>>handle_irq_pipelined_finish()
>>[on_pipeline_entry: false]
>>        synchronize_pipeline_on_irq()
>>                handle_fasteoi_irq() [!on_pipeline_entry]
>>                      intel_gpio_irq()
>>                           handle_edge_irq() <<< failed to ack
>>
>>This is a bug in the Dovetail logic for stacked IRQ chips. I may have
>>some cycles spare the next couple of weeks to work on this issue and the
>>rest of my Dovetail backlog if nobody comes up with a fix earlier.
>>
>>--
>>Philippe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-04  9:21                         ` Philippe Gerum
@ 2023-09-04 13:26                           ` Chen, Hongzhan
  0 siblings, 0 replies; 21+ messages in thread
From: Chen, Hongzhan @ 2023-09-04 13:26 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Bezdeka, Florian, Kiszka, Jan, xenomai@lists.linux.dev



>-----Original Message-----
>From: Philippe Gerum <rpm@xenomai.org>
>Sent: Monday, September 4, 2023 5:22 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>
>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>
>>>-----Original Message-----
>>>From: Philippe Gerum <rpm@xenomai.org>
>>>Sent: Friday, September 1, 2023 2:35 PM
>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>pinctrl device
>>>
>>>
>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>
>>>>>-----Original Message-----
>>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>>Sent: Thursday, August 31, 2023 4:57 PM
>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>pinctrl device
>>>>>
>>>>>
>>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>from
>>>>>>>pinctrl device
>>>>>>>
>>>>>>>
>>>>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>>>>
>>>>>>>>>-----Original Message-----
>>>>>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>>Philippe
>>>>>>>>>Gerum <rpm@xenomai.org>
>>>>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>from
>>>>>>>>>pinctrl device
>>>>>>>>>
>>>>>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>>>>
>>>>>>>>>> > -----Original Message-----
>>>>>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>;
>xenomai@lists.linux.dev;
>>>>>>>>>Philippe
>>>>>>>>>> > Gerum <rpm@xenomai.org>
>>>>>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>alloced
>>>>>>>from
>>>>>>>>>> > pinctrl device
>>>>>>>>>> >
>>>>>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>>>>> > >
>>>>>>>>>> > > > -----Original Message-----
>>>>>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>>>>> > xenomai@lists.linux.dev
>>>>>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>>>>alloced
>>>>>>>>>from
>>>>>>>>>> > > > pinctrl device
>>>>>>>>>> > > >
>>>>>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>>>>> > > > > Hi Philippe/Jan
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card
>>>detect
>>>>>>>>>> > interrupt
>>>>>>>>>> > > > alloced from
>>>>>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>>>>> > > > > According to our test,  the patch [2] works for such issue in
>>>our
>>>>>>>case.
>>>>>>>>>It
>>>>>>>>>> > > > seems that handle_edge_irq
>>>>>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux
>>>gpio
>>>>>>>>>interrupt.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > I am wondering if there is any side-effect for this patch,
>Please
>>>>>>>>>suggest.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Regards
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Hongzhan Chen
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > [1]:
>>>>>>>>>> > > > > cat /proc/interrupts
>>>>>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -
>fasteoi
>>>>>>>>>> > INTC1020:00,
>>>>>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>>>>0000:00:1a.1
>>>>>>>>>cd
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > [2]:
>>>>>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct
>irq_desc
>>>>>>>*desc)
>>>>>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >         /* Start handling the irq */
>>>>>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>>>>>unmasked
>>>>>>>>>the
>>>>>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>>>>>> > > >
>>>>>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never
>be
>>>>>>>fired
>>>>>>>>>> > > > again. No?
>>>>>>>>>> > >
>>>>>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>>>>>interrupt
>>>>>>>>>here
>>>>>>>>>> > like
>>>>>>>>>> > > following comparing:
>>>>>>>>>> >
>>>>>>>>>> > This doesn't make sense (IMHO).
>>>>>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>>>>>> > on_pipeline_entry() returns false?
>>>>>>>>>> >
>>>>>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>>>>>> > handle_edge_irq() is called again (from Linux/Inband context)
>the
>>>IRQ
>>>>>>>>>> > should already be ACKed.
>>>>>>>>>>
>>>>>>>>>> Following stack is dumped after patching my patch. Without
>patching
>>>>>>>there
>>>>>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish
>from
>>>>>>>which
>>>>>>>>>it already called irq_exit_pipeline before sync_current_irq_stage.
>That
>>>is
>>>>>>>why
>>>>>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>>>>Embedded
>>>>>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>>>>> [  291.009332] Call Trace:
>>>>>>>>>> [  291.009332]  <IRQ>
>>>>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>>>>> [  291.009338]  </IRQ>
>>>>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>>>>                   ^^
>>>>>>>>>My understanding is that you are not handling an HW IRQ, you are
>in
>>>the
>>>>>>>>>stage where the inband IRQ log is being processed. HW IRQs are
>>>already
>>>>>>>>>ACKed at that stage.
>>>>>>>>>
>>>>>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO
>>>APIC
>>>>>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry
>point
>>>>>>>>>which takes care of masking + ack.
>>>>>>>>
>>>>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq()
>is
>>>>>>>registered in io_apic driver. There is second level pinctrl device acting
>as
>>>>>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced
>from ,
>>>>>>>which need to be acked in handle_edge_irq.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Hongzhan Chen
>>>>>>>>>
>>>>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48
>8b
>>>1c
>>>>>>>25
>>>>>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03
><a8>
>>>>>08
>>>>>>>75
>>>>>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> Hongzhan Chen
>>>>>>>>>> >
>>>>>>>>>> > But: If we use the dovetail sources to build a kernel with
>pipelining
>>>>>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on
>>>your
>>>>>>>>>> > end.
>>>>>>>>>> >
>>>>>>>>>> > As Jan said, I'm searching for a similar problem for some time
>now
>>>>>that
>>>>>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a
>>>stall,
>>>>>>>>>> > not a storm. I will give it a try, let's see.
>>>>>>>>>> >
>>>>>>>>>> > >
>>>>>>>>>> > > After patching:
>>>>>>>>>> > > 128:          2          0          0          0  INTC1020:01  112
>0000:00:1a.1
>>>>>cd
>>>>>>>>>> > >
>>>>>>>>>> > > Before patching:
>>>>>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>>0000:00:1a.1
>>>>>>>cd
>>>>>>>>>> > >
>>>>>>>>>> > > Regards
>>>>>>>>>> > >
>>>>>>>>>> > > Hongzhan Chen
>>>>>>>>>> > > >
>>>>>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>>>>>enabled?
>>>>>>>>>If
>>>>>>>>>> > > > so, I have an idea what is going on...
>>>>>>>>>> > > >
>>>>>>>>>> > > > Florian
>>>>>>>>>> > >
>>>>>>>>>>
>>>>>>>
>>>>>>>To start with, prior to any further analysis:
>>>>>>>
>>>>>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>>>b/drivers/pinctrl/intel/pinctrl-
>>>>>>>intel.h
>>>>>>>index 65628423bf639..b7afa763ee079 100644
>>>>>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>>>>  */
>>>>>>> struct intel_pinctrl {
>>>>>>> 	struct device *dev;
>>>>>>>-	raw_spinlock_t lock;
>>>>>>>+	hard_spinlock_t lock;
>>>>>>
>>>>>> I already modified it at first when we found the issue.  But it does not
>>>work
>>>>>with it.
>>>>>>
>>>>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>>>>
>>>>>
>>>>>Ok, but the patch is papering over the real issue I believe.
>>>>
>>>> What kind of real issue it would be? What checkpoint I need to check to
>find
>>>the real issue out? Please suggest.
>>>> Continuous fired irq is obviously caused by forgotten ack behavior for the
>>>issue. The ack is supposed to happen elsewhere?
>>>>
>>>
>>>Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>>>entry is detected a few lines earlier, your patch would cause a double
>>
>> Following patch is OK for you? It should avoid double ACK.  Because
>on_pipeline_entry after  ack
>> it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set
>to avoid double ack.
>>
>> index 6aca6c036ada..d06cf4a79f2a 10064
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
>>         if (on_pipeline_entry()) {
>>                 chip->irq_ack(&desc->irq_data);
>>                 desc->istate |= IRQS_EDGE;
>>                 handle_oob_irq(desc);
>>                 goto out_unlock;
>>         }
>>
>>         kstat_incr_irqs_this_cpu(desc);
>>
>> -       /* Start handling the irq */
>> -       if (!irqs_pipelined())
>> +       /* Start handling the irq
>> +        * Avoid duplicating ack when IRQS_EDGE set
>> +        */
>> +       if (!(desc->istate & IRQS_EDGE))
>>                 chip->irq_ack(&desc->irq_data);
>
>This is getting closer but not yet correct. The idea of tracking the
>pending acknowledge using a per-event bit from the internal descriptor
>state looks right, we need something along these lines.
>
>However you may not reuse IRQS_EDGE for this, first because the latter

Yes, it does not work because the IRQS_EDGE will be cleaned. I realized it was wrong as soon as  I sent out the email. What about adding
New flag IRQS_ACK to differentiate the ACKed status for the delayed inband irq like
following patch?

commit 98350106b7546b2082455132e75f6054b2e06b4a (HEAD -> storm/dovetail-xenomai)
Author: Hongzhan Chen <hongzhan.chen@intel.com>
Date:   Mon Sep 4 06:33:58 2023 -0400

    pipeline: irq: ack the irq when it is delayed inband irq in
    synchronize_pipeline_on_irq

    1. To avoid double ack, we add new flag to differentiate delayed
       acked inband irq.
    2. For stacked irq, as discussed in [1], to fix the forgotten irq
       issue, ack the irq directly when the flag is supposed to be unset.

    [1]: https://lore.kernel.org/xenomai/BY5PR11MB4276FEFA2C6D88C4B6CFFEFCF2E9A@BY5PR11MB4276.namprd11.prod.outlook.com/T/#t

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6aca6c036ada..7280a338bd3e 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -863,14 +863,23 @@ void handle_edge_irq(struct irq_desc *desc)
                chip->irq_ack(&desc->irq_data);
                desc->istate |= IRQS_EDGE;
                handle_oob_irq(desc);
+
+               /* For inband irq, we tag acked status
+                * to avoid double ack
+                */
+               if (!irq_settings_is_oob(desc))
+                       desc->istate |= IRQS_ACK;
+
                goto out_unlock;
        }
        kstat_incr_irqs_this_cpu(desc);

        /* Start handling the irq */
-       if (!irqs_pipelined())
+       if (!(desc->istate & IRQS_ACK))
                chip->irq_ack(&desc->irq_data);
+       else
+               desc->istate &= ~IRQS_ACK;

        do {
                if (unlikely(!desc->action)) {
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index a9121f6e6f1c..1b43467b5407 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -54,6 +54,7 @@ enum {
  * IRQS_NMI                    - irq line is used to deliver NMIs
  * IRQS_SYSFS                  - descriptor has been added to sysfs
  * IRQS_EDGE                   - irq line received an edge event
+ * IRQS_ACK                    - acked irq line
  */
 enum {
        IRQS_AUTODETECT         = 0x00000001,
@@ -68,6 +69,7 @@ enum {
        IRQS_NMI                = 0x00002000,
        IRQS_SYSFS              = 0x00004000,
        IRQS_EDGE               = 0x00008000,
+       IRQS_ACK                = 0x00010000,
 };

>deals with IRQ resend, so the semantics are different, but also because
>this bit is cleared early when leaving the flow handler upon its initial
>invocation for an IRQ. This won't work because we are interested by the
>second invocation of such handler for the same IRQ event, the one which
>replays the event from the in-band stage. There are some gory details
>about the interrupt flow at [1] which may help understanding.
>
>[1] https://evlproject.org/dovetail/porting/irqflow/
>
>--
>Philippe.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
  2023-09-04  9:34                           ` Florian Bezdeka
@ 2023-09-04 13:35                             ` Chen, Hongzhan
  0 siblings, 0 replies; 21+ messages in thread
From: Chen, Hongzhan @ 2023-09-04 13:35 UTC (permalink / raw)
  To: Bezdeka, Florian, Kiszka, Jan, Philippe Gerum; +Cc: xenomai@lists.linux.dev



>-----Original Message-----
>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>Sent: Monday, September 4, 2023 5:35 PM
>To: Kiszka, Jan <jan.kiszka@siemens.com>; Chen, Hongzhan
><hongzhan.chen@intel.com>; Philippe Gerum <rpm@xenomai.org>
>Cc: xenomai@lists.linux.dev
>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>pinctrl device
>
>On Mon, 2023-09-04 at 10:56 +0200, Jan Kiszka wrote:
>> On 04.09.23 10:20, Chen, Hongzhan wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: Philippe Gerum <rpm@xenomai.org>
>> > > Sent: Friday, September 1, 2023 2:35 PM
>> > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>> > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>> > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>> > > pinctrl device
>> > >
>> > >
>> > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>> > >
>> > > > > -----Original Message-----
>> > > > > From: Philippe Gerum <rpm@xenomai.org>
>> > > > > Sent: Thursday, August 31, 2023 4:57 PM
>> > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>> > > > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>> > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>from
>> > > > > pinctrl device
>> > > > >
>> > > > >
>> > > > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>> > > > >
>> > > > > > > -----Original Message-----
>> > > > > > > From: Philippe Gerum <rpm@xenomai.org>
>> > > > > > > Sent: Thursday, August 31, 2023 4:06 PM
>> > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > > > Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>> > > > > > > <jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>> > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>alloced from
>> > > > > > > pinctrl device
>> > > > > > >
>> > > > > > >
>> > > > > > > "Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>> > > > > > >
>> > > > > > > > > -----Original Message-----
>> > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > > > > > > Sent: Thursday, August 31, 2023 3:10 PM
>> > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > > > > > Cc: Kiszka, Jan <jan.kiszka@siemens.com>;
>xenomai@lists.linux.dev;
>> > > > > > > Philippe
>> > > > > > > > > Gerum <rpm@xenomai.org>
>> > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>alloced
>> > > from
>> > > > > > > > > pinctrl device
>> > > > > > > > >
>> > > > > > > > > On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>> > > > > > > > > >
>> > > > > > > > > > > -----Original Message-----
>> > > > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > > > > > > > > Sent: Wednesday, August 30, 2023 9:23 PM
>> > > > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> > > > > > > > > > > Cc: Kiszka, Jan <jan.kiszka@siemens.com>;
>xenomai@lists.linux.dev;
>> > > > > > > > > Philippe
>> > > > > > > > > > > Gerum <rpm@xenomai.org>
>> > > > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio
>interrupt alloced
>> > > > > > > from
>> > > > > > > > > > > pinctrl device
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan
>wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > -----Original Message-----
>> > > > > > > > > > > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > > > > > > > > > > > > Sent: Wednesday, August 30, 2023 3:20 PM
>> > > > > > > > > > > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>> > > > > > > > > > > xenomai@lists.linux.dev
>> > > > > > > > > > > > > Subject: Re: irq storm on sd/mmc card detect gpio
>interrupt
>> > > > > alloced
>> > > > > > > > > from
>> > > > > > > > > > > > > pinctrl device
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan
>wrote:
>> > > > > > > > > > > > > > Hi Philippe/Jan
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > We found there is irq storm issue like [1] on sd/mmc
>card
>> > > detect
>> > > > > > > > > > > interrupt
>> > > > > > > > > > > > > alloced from
>> > > > > > > > > > > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>> > > > > > > > > > > > > > According to our test,  the patch [2] works for such
>issue in
>> > > our
>> > > > > > > case.
>> > > > > > > > > It
>> > > > > > > > > > > > > seems that handle_edge_irq
>> > > > > > > > > > > > > > never ack the irq to clear the interrupt status for the
>Linux
>> > > gpio
>> > > > > > > > > interrupt.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I am wondering if there is any side-effect for this
>patch, Please
>> > > > > > > > > suggest.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Regards
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hongzhan Chen
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > [1]:
>> > > > > > > > > > > > > > cat /proc/interrupts
>> > > > > > > > > > > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -
>fasteoi
>> > > > > > > > > > > INTC1020:00,
>> > > > > > > > > > > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>> > > > > > > > > > > > > > 128:    2440190          0          0          0  INTC1020:01  112
>> > > > > > > 0000:00:1a.1
>> > > > > > > > > cd
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > [2]:
>> > > > > > > > > > > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> > > > > > > > > > > > > > index 6aca6c036ada..4d08eaf076e9 100644
>> > > > > > > > > > > > > > --- a/kernel/irq/chip.c
>> > > > > > > > > > > > > > +++ b/kernel/irq/chip.c
>> > > > > > > > > > > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct
>irq_desc
>> > > > > > > *desc)
>> > > > > > > > > > > > > >         kstat_incr_irqs_this_cpu(desc);
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >         /* Start handling the irq */
>> > > > > > > > > > > > > > -       if (!irqs_pipelined())
>> > > > > > > > > > > > > >                 chip->irq_ack(&desc->irq_data);
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > That doesn't make much sense. A IRQ storm means
>that we
>> > > > > > > unmasked
>> > > > > > > > > the
>> > > > > > > > > > > > > IRQ to early or forgot to mask it.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > A forgotten ACK would mean the IRQ stalls, so it would
>never be
>> > > > > > > fired
>> > > > > > > > > > > > > again. No?
>> > > > > > > > > > > >
>> > > > > > > > > > > > According to the test, it seems forgotten ACK will keep
>firing
>> > > > > interrupt
>> > > > > > > > > here
>> > > > > > > > > > > like
>> > > > > > > > > > > > following comparing:
>> > > > > > > > > > >
>> > > > > > > > > > > This doesn't make sense (IMHO).
>> > > > > > > > > > > Can you provide a callstack from handle_edge_irq() in case
>> > > > > > > > > > > on_pipeline_entry() returns false?
>> > > > > > > > > > >
>> > > > > > > > > > > The IRQ should be ACKed on pipeline entry already, even if
>> > > > > > > > > > > handle_edge_irq() is called again (from Linux/Inband
>context) the
>> > > IRQ
>> > > > > > > > > > > should already be ACKed.
>> > > > > > > > > >
>> > > > > > > > > > Following stack is dumped after patching my patch. Without
>patching
>> > > > > > > there
>> > > > > > > > > is no intel_gpio_irq_ack. It is called in
>handle_irq_pipelined_finish from
>> > > > > > > which
>> > > > > > > > > it already called irq_exit_pipeline before
>sync_current_irq_stage. That
>> > > is
>> > > > > > > why
>> > > > > > > > > on_pipeline_entry returned false in handle_edge_irq
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G
>U
>> > > > > > > > > 5.10.179-intel-ese-standard-lts-dovetail+ #1
>> > > > > > > > > > [  291.009331] Hardware name: Intel Corporation Elkhart
>Lake
>> > > > > Embedded
>> > > > > > > > > Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>> > > > > > > > > EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>> > > > > > > > > > [  291.009332] IRQ stage: Linux
>> > > > > > > > > > [  291.009332] Call Trace:
>> > > > > > > > > > [  291.009332]  <IRQ>
>> > > > > > > > > > [  291.009333]  dump_stack+0x7b/0x93
>> > > > > > > > > > [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>> > > > > > > > > > [  291.009334]  handle_edge_irq+0xe4/0x310
>> > > > > > > > > > [  291.009334]  generic_handle_irq+0x4b/0x60
>> > > > > > > > > > [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>> > > > > > > > > > [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>> > > > > > > > > > [  291.009336]  ? handle_level_irq+0x200/0x200
>> > > > > > > > > > [  291.009336]  handle_irq_event+0x6c/0x100
>> > > > > > > > > > [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>> > > > > > > > > > [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>> > > > > > > > > > [  291.009338]  </IRQ>
>> > > > > > > > > > [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>> > > > > > > > > > [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>> > > > > > > > >                   ^^
>> > > > > > > > > My understanding is that you are not handling an HW IRQ,
>you are in
>> > > the
>> > > > > > > > > stage where the inband IRQ log is being processed. HW IRQs
>are
>> > > already
>> > > > > > > > > ACKed at that stage.
>> > > > > > > > >
>> > > > > > > > > Maybe we have to understand your IRQ chip routing. Is there
>an IO
>> > > APIC
>> > > > > > > > > involved? handle_fasteoi_irq() would be the first level IRQ
>entry point
>> > > > > > > > > which takes care of masking + ack.
>> > > > > > > >
>> > > > > > > > Yes. IOAPIC is the first level for external devices.
>handle_fasteoi_irq() is
>> > > > > > > registered in io_apic driver. There is second level pinctrl device
>acting as
>> > > > > > > gpiochip which has irq domains where sd/mmc cd interrupt
>alloced from ,
>> > > > > > > which need to be acked in handle_edge_irq.
>> > > > > > > >
>> > > > > > > > Regards
>> > > > > > > >
>> > > > > > > > Hongzhan Chen
>> > > > > > > > >
>> > > > > > > > > > [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>> > > > > > > > > > [  291.009340]  arch_pipeline_entry+0xdb/0x120
>> > > > > > > > > > [  291.009340]  asm_common_interrupt+0x1e/0x40
>> > > > > > > > > > [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>> > > > > > > > > > [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff
>65 48 8b
>> > > 1c
>> > > > > > > 25
>> > > > > > > > > 00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48
>8b 03 <a8>
>> > > > > 08
>> > > > > > > 75
>> > > > > > > > > 13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>> > > > > > > > > > [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>> > > > > > > > > >
>> > > > > > > > > > Regards
>> > > > > > > > > >
>> > > > > > > > > > Hongzhan Chen
>> > > > > > > > > > >
>> > > > > > > > > > > But: If we use the dovetail sources to build a kernel with
>pipelining
>> > > > > > > > > > > disabled we have to ACK the IRQ. I assume pipelining is
>active on
>> > > your
>> > > > > > > > > > > end.
>> > > > > > > > > > >
>> > > > > > > > > > > As Jan said, I'm searching for a similar problem for some
>time now
>> > > > > that
>> > > > > > > > > > > I can only reproduce on VirtualBox VMM. In my case I'm
>seeing a
>> > > stall,
>> > > > > > > > > > > not a storm. I will give it a try, let's see.
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > After patching:
>> > > > > > > > > > > > 128:          2          0          0          0  INTC1020:01  112
>0000:00:1a.1
>> > > > > cd
>> > > > > > > > > > > >
>> > > > > > > > > > > > Before patching:
>> > > > > > > > > > > > 128:    2440190          0          0          0  INTC1020:01  112
>> > > > > 0000:00:1a.1
>> > > > > > > cd
>> > > > > > > > > > > >
>> > > > > > > > > > > > Regards
>> > > > > > > > > > > >
>> > > > > > > > > > > > Hongzhan Chen
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Could you please check if we call unmask_irq() with HW
>IRQs
>> > > > > > > enabled?
>> > > > > > > > > If
>> > > > > > > > > > > > > so, I have an idea what is going on...
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Florian
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > >
>> > > > > > > To start with, prior to any further analysis:
>> > > > > > >
>> > > > > > > diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>> > > b/drivers/pinctrl/intel/pinctrl-
>> > > > > > > intel.h
>> > > > > > > index 65628423bf639..b7afa763ee079 100644
>> > > > > > > --- a/drivers/pinctrl/intel/pinctrl-intel.h
>> > > > > > > +++ b/drivers/pinctrl/intel/pinctrl-intel.h
>> > > > > > > @@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>> > > > > > >  */
>> > > > > > > struct intel_pinctrl {
>> > > > > > > 	struct device *dev;
>> > > > > > > -	raw_spinlock_t lock;
>> > > > > > > +	hard_spinlock_t lock;
>> > > > > >
>> > > > > > I already modified it at first when we found the issue.  But it does
>not
>> > > work
>> > > > > with it.
>> > > > > >
>> > > > > > After patching my patch[2] I mentioned, the issue disappear finally.
>> > > > > >
>> > > > >
>> > > > > Ok, but the patch is papering over the real issue I believe.
>> > > >
>> > > > What kind of real issue it would be? What checkpoint I need to check to
>find
>> > > the real issue out? Please suggest.
>> > > > Continuous fired irq is obviously caused by forgotten ack behavior for
>the
>> > > issue. The ack is supposed to happen elsewhere?
>> > > >
>> > >
>> > > Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>> > > entry is detected a few lines earlier, your patch would cause a double
>> >
>> > Following patch is OK for you? It should avoid double ACK.  Because
>on_pipeline_entry after  ack
>> > it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set
>to avoid double ack.
>> >
>> > index 6aca6c036ada..d06cf4a79f2a 10064
>> > --- a/kernel/irq/chip.c
>> > +++ b/kernel/irq/chip.c
>> > @@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
>> >         if (on_pipeline_entry()) {
>> >                 chip->irq_ack(&desc->irq_data);
>> >                 desc->istate |= IRQS_EDGE;
>> >                 handle_oob_irq(desc);
>> >                 goto out_unlock;
>> >         }
>> >
>> >         kstat_incr_irqs_this_cpu(desc);
>> >
>> > -       /* Start handling the irq */
>> > -       if (!irqs_pipelined())
>> > +       /* Start handling the irq
>> > +        * Avoid duplicating ack when IRQS_EDGE set
>> > +        */
>> > +       if (!(desc->istate & IRQS_EDGE))
>> >                 chip->irq_ack(&desc->irq_data);
>> >
>>
>> Doesn't answer why we get here without an ack while the pipeline is
>> enabled, does it?
>
>The problem in this case is the "stacking" of IRQ chips. Philippe
>already confirmed that we have a problem here.
>
>This patch is still papering over the real issue.
>
>As already pointed out by Philippe we have two work items here:
> - Missing pipelining support within the Intel GPIO pin control

Yes, the patches already submitted to review.

> - Dovetail logic regarding stacked IRQ chips needs reworking

Per my understanding , there is only one side-effect need to fix is double ack issue based on my
first patch as Philippe explained. I do not know if my last patch of adding new flag address double ack issue.

Regards

Hongzhan Chen
>
>Florian
>
>>
>> Jan
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-09-04 13:35 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-30  3:05 irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device Chen, Hongzhan
2023-08-30  7:11 ` Jan Kiszka
2023-08-30  7:32   ` Chen, Hongzhan
2023-08-30  7:20 ` Florian Bezdeka
2023-08-30  7:54   ` Chen, Hongzhan
2023-08-30 13:23     ` Florian Bezdeka
2023-08-31  0:45       ` Chen, Hongzhan
2023-08-31  7:09         ` Florian Bezdeka
2023-08-31  7:39           ` Chen, Hongzhan
2023-08-31  8:06             ` Philippe Gerum
2023-08-31  8:35               ` Chen, Hongzhan
2023-08-31  8:56                 ` Philippe Gerum
2023-09-01  2:28                   ` Chen, Hongzhan
2023-09-01  6:35                     ` Philippe Gerum
2023-09-04  8:20                       ` Chen, Hongzhan
2023-09-04  8:56                         ` Jan Kiszka
2023-09-04  9:34                           ` Florian Bezdeka
2023-09-04 13:35                             ` Chen, Hongzhan
2023-09-04  9:21                         ` Philippe Gerum
2023-09-04 13:26                           ` Chen, Hongzhan
2023-09-04  9:36                         ` Chen, Hongzhan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.