Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Philippe Gerum <rpm@xenomai.org>
To: "Chen, Hongzhan" <hongzhan.chen@intel.com>
Cc: "Bezdeka, Florian" <florian.bezdeka@siemens.com>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>,
	"xenomai@lists.linux.dev" <xenomai@lists.linux.dev>
Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
Date: Mon, 04 Sep 2023 11:21:35 +0200	[thread overview]
Message-ID: <87jzt6p9qu.fsf@xenomai.org> (raw)
In-Reply-To: <BY5PR11MB42769E5D589385FDF64A7C7EF2E9A@BY5PR11MB4276.namprd11.prod.outlook.com>


"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:

>>-----Original Message-----
>>From: Philippe Gerum <rpm@xenomai.org>
>>Sent: Friday, September 1, 2023 2:35 PM
>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>pinctrl device
>>
>>
>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>
>>>>-----Original Message-----
>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>Sent: Thursday, August 31, 2023 4:57 PM
>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>pinctrl device
>>>>
>>>>
>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>
>>>>>>-----Original Message-----
>>>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>>pinctrl device
>>>>>>
>>>>>>
>>>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>Philippe
>>>>>>>>Gerum <rpm@xenomai.org>
>>>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>from
>>>>>>>>pinctrl device
>>>>>>>>
>>>>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>>>
>>>>>>>>> > -----Original Message-----
>>>>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>>>Philippe
>>>>>>>>> > Gerum <rpm@xenomai.org>
>>>>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>>>>from
>>>>>>>>> > pinctrl device
>>>>>>>>> >
>>>>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>>>> > >
>>>>>>>>> > > > -----Original Message-----
>>>>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>>>> > xenomai@lists.linux.dev
>>>>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>>>alloced
>>>>>>>>from
>>>>>>>>> > > > pinctrl device
>>>>>>>>> > > >
>>>>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>>>> > > > > Hi Philippe/Jan
>>>>>>>>> > > > >
>>>>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card
>>detect
>>>>>>>>> > interrupt
>>>>>>>>> > > > alloced from
>>>>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>>>> > > > > According to our test,  the patch [2] works for such issue in
>>our
>>>>>>case.
>>>>>>>>It
>>>>>>>>> > > > seems that handle_edge_irq
>>>>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux
>>gpio
>>>>>>>>interrupt.
>>>>>>>>> > > > >
>>>>>>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>>>>>>suggest.
>>>>>>>>> > > > >
>>>>>>>>> > > > > Regards
>>>>>>>>> > > > >
>>>>>>>>> > > > > Hongzhan Chen
>>>>>>>>> > > > >
>>>>>>>>> > > > > [1]:
>>>>>>>>> > > > > cat /proc/interrupts
>>>>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>>>>> > INTC1020:00,
>>>>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>>>0000:00:1a.1
>>>>>>>>cd
>>>>>>>>> > > > >
>>>>>>>>> > > > > [2]:
>>>>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>>>>*desc)
>>>>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>>>>> > > > >
>>>>>>>>> > > > >         /* Start handling the irq */
>>>>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>>>>> > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > >
>>>>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>>>>unmasked
>>>>>>>>the
>>>>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>>>>> > > >
>>>>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>>>>>>fired
>>>>>>>>> > > > again. No?
>>>>>>>>> > >
>>>>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>>>>interrupt
>>>>>>>>here
>>>>>>>>> > like
>>>>>>>>> > > following comparing:
>>>>>>>>> >
>>>>>>>>> > This doesn't make sense (IMHO).
>>>>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>>>>> > on_pipeline_entry() returns false?
>>>>>>>>> >
>>>>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the
>>IRQ
>>>>>>>>> > should already be ACKed.
>>>>>>>>>
>>>>>>>>> Following stack is dumped after patching my patch. Without patching
>>>>>>there
>>>>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>>>>>which
>>>>>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That
>>is
>>>>>>why
>>>>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>>>Embedded
>>>>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>>>> [  291.009332] Call Trace:
>>>>>>>>> [  291.009332]  <IRQ>
>>>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>>>> [  291.009338]  </IRQ>
>>>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>>>                   ^^
>>>>>>>>My understanding is that you are not handling an HW IRQ, you are in
>>the
>>>>>>>>stage where the inband IRQ log is being processed. HW IRQs are
>>already
>>>>>>>>ACKed at that stage.
>>>>>>>>
>>>>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO
>>APIC
>>>>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>>>>which takes care of masking + ack.
>>>>>>>
>>>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>>>>>registered in io_apic driver. There is second level pinctrl device acting as
>>>>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>>>>>which need to be acked in handle_edge_irq.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Hongzhan Chen
>>>>>>>>
>>>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b
>>1c
>>>>>>25
>>>>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
>>>>08
>>>>>>75
>>>>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> Hongzhan Chen
>>>>>>>>> >
>>>>>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on
>>your
>>>>>>>>> > end.
>>>>>>>>> >
>>>>>>>>> > As Jan said, I'm searching for a similar problem for some time now
>>>>that
>>>>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a
>>stall,
>>>>>>>>> > not a storm. I will give it a try, let's see.
>>>>>>>>> >
>>>>>>>>> > >
>>>>>>>>> > > After patching:
>>>>>>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
>>>>cd
>>>>>>>>> > >
>>>>>>>>> > > Before patching:
>>>>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>0000:00:1a.1
>>>>>>cd
>>>>>>>>> > >
>>>>>>>>> > > Regards
>>>>>>>>> > >
>>>>>>>>> > > Hongzhan Chen
>>>>>>>>> > > >
>>>>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>>>>enabled?
>>>>>>>>If
>>>>>>>>> > > > so, I have an idea what is going on...
>>>>>>>>> > > >
>>>>>>>>> > > > Florian
>>>>>>>>> > >
>>>>>>>>>
>>>>>>
>>>>>>To start with, prior to any further analysis:
>>>>>>
>>>>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h
>>b/drivers/pinctrl/intel/pinctrl-
>>>>>>intel.h
>>>>>>index 65628423bf639..b7afa763ee079 100644
>>>>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>>>  */
>>>>>> struct intel_pinctrl {
>>>>>> 	struct device *dev;
>>>>>>-	raw_spinlock_t lock;
>>>>>>+	hard_spinlock_t lock;
>>>>>
>>>>> I already modified it at first when we found the issue.  But it does not
>>work
>>>>with it.
>>>>>
>>>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>>>
>>>>
>>>>Ok, but the patch is papering over the real issue I believe.
>>>
>>> What kind of real issue it would be? What checkpoint I need to check to find
>>the real issue out? Please suggest.
>>> Continuous fired irq is obviously caused by forgotten ack behavior for the
>>issue. The ack is supposed to happen elsewhere？
>>>
>>
>>Edge ack is supposed to happen in handle_edge_irq() when a pipeline
>>entry is detected a few lines earlier, your patch would cause a double
>
> Following patch is OK for you? It should avoid double ACK.  Because on_pipeline_entry after  ack
> it set IRQS_EDGE. so we can ack here when find that  IRQS_EDGE is not set to avoid double ack.
>
> index 6aca6c036ada..d06cf4a79f2a 10064
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -868,8 +868,10 @@ void handle_edge_irq(struct irq_desc *desc)
>         if (on_pipeline_entry()) {
>                 chip->irq_ack(&desc->irq_data);
>                 desc->istate |= IRQS_EDGE;
>                 handle_oob_irq(desc);
>                 goto out_unlock;
>         }
>
>         kstat_incr_irqs_this_cpu(desc);
>
> -       /* Start handling the irq */
> -       if (!irqs_pipelined())
> +       /* Start handling the irq
> +        * Avoid duplicating ack when IRQS_EDGE set
> +        */
> +       if (!(desc->istate & IRQS_EDGE))
>                 chip->irq_ack(&desc->irq_data);

This is getting closer but not yet correct. The idea of tracking the
pending acknowledge using a per-event bit from the internal descriptor
state looks right, we need something along these lines.

However you may not reuse IRQS_EDGE for this, first because the latter
deals with IRQ resend, so the semantics are different, but also because
this bit is cleared early when leaving the flow handler upon its initial
invocation for an IRQ. This won't work because we are interested by the
second invocation of such handler for the same IRQ event, the one which
replays the event from the in-band stage. There are some gory details
about the interrupt flow at [1] which may help understanding.

[1] https://evlproject.org/dovetail/porting/irqflow/

-- 
Philippe.

next prev parent reply	other threads:[~2023-09-04  9:29 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30  3:05 irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device Chen, Hongzhan
2023-08-30  7:11 ` Jan Kiszka
2023-08-30  7:32   ` Chen, Hongzhan
2023-08-30  7:20 ` Florian Bezdeka
2023-08-30  7:54   ` Chen, Hongzhan
2023-08-30 13:23     ` Florian Bezdeka
2023-08-31  0:45       ` Chen, Hongzhan
2023-08-31  7:09         ` Florian Bezdeka
2023-08-31  7:39           ` Chen, Hongzhan
2023-08-31  8:06             ` Philippe Gerum
2023-08-31  8:35               ` Chen, Hongzhan
2023-08-31  8:56                 ` Philippe Gerum
2023-09-01  2:28                   ` Chen, Hongzhan
2023-09-01  6:35                     ` Philippe Gerum
2023-09-04  8:20                       ` Chen, Hongzhan
2023-09-04  8:56                         ` Jan Kiszka
2023-09-04  9:34                           ` Florian Bezdeka
2023-09-04 13:35                             ` Chen, Hongzhan
2023-09-04  9:21                         ` Philippe Gerum [this message]
2023-09-04 13:26                           ` Chen, Hongzhan
2023-09-04  9:36                         ` Chen, Hongzhan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jzt6p9qu.fsf@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=florian.bezdeka@siemens.com \
    --cc=hongzhan.chen@intel.com \
    --cc=jan.kiszka@siemens.com \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.