All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philippe Gerum <rpm@xenomai.org>
To: "Chen, Hongzhan" <hongzhan.chen@intel.com>
Cc: "Bezdeka, Florian" <florian.bezdeka@siemens.com>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>,
	"xenomai@lists.linux.dev" <xenomai@lists.linux.dev>
Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device
Date: Fri, 01 Sep 2023 08:35:15 +0200	[thread overview]
Message-ID: <874jkewdwi.fsf@xenomai.org> (raw)
In-Reply-To: <MN2PR11MB4285669E0D22FE1D5B103E2FF2E4A@MN2PR11MB4285.namprd11.prod.outlook.com>


"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:

>>-----Original Message-----
>>From: Philippe Gerum <rpm@xenomai.org>
>>Sent: Thursday, August 31, 2023 4:57 PM
>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>pinctrl device
>>
>>
>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>
>>>>-----Original Message-----
>>>>From: Philippe Gerum <rpm@xenomai.org>
>>>>Sent: Thursday, August 31, 2023 4:06 PM
>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>Cc: Bezdeka, Florian <florian.bezdeka@siemens.com>; Kiszka, Jan
>>>><jan.kiszka@siemens.com>; xenomai@lists.linux.dev
>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>pinctrl device
>>>>
>>>>
>>>>"Chen, Hongzhan" <hongzhan.chen@intel.com> writes:
>>>>
>>>>>>-----Original Message-----
>>>>>>From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>Sent: Thursday, August 31, 2023 3:10 PM
>>>>>>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>Philippe
>>>>>>Gerum <rpm@xenomai.org>
>>>>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from
>>>>>>pinctrl device
>>>>>>
>>>>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote:
>>>>>>>
>>>>>>> > -----Original Message-----
>>>>>>> > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>> > Sent: Wednesday, August 30, 2023 9:23 PM
>>>>>>> > To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>> > Cc: Kiszka, Jan <jan.kiszka@siemens.com>; xenomai@lists.linux.dev;
>>>>>>Philippe
>>>>>>> > Gerum <rpm@xenomai.org>
>>>>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced
>>>>from
>>>>>>> > pinctrl device
>>>>>>> >
>>>>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote:
>>>>>>> > >
>>>>>>> > > > -----Original Message-----
>>>>>>> > > > From: Florian Bezdeka <florian.bezdeka@siemens.com>
>>>>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM
>>>>>>> > > > To: Chen, Hongzhan <hongzhan.chen@intel.com>;
>>>>>>> > xenomai@lists.linux.dev
>>>>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt
>>alloced
>>>>>>from
>>>>>>> > > > pinctrl device
>>>>>>> > > >
>>>>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote:
>>>>>>> > > > > Hi Philippe/Jan
>>>>>>> > > > >
>>>>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect
>>>>>>> > interrupt
>>>>>>> > > > alloced from
>>>>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.*
>>>>>>> > > > > According to our test,  the patch [2] works for such issue in our
>>>>case.
>>>>>>It
>>>>>>> > > > seems that handle_edge_irq
>>>>>>> > > > > never ack the irq to clear the interrupt status for the Linux gpio
>>>>>>interrupt.
>>>>>>> > > > >
>>>>>>> > > > > I am wondering if there is any side-effect for this patch, Please
>>>>>>suggest.
>>>>>>> > > > >
>>>>>>> > > > > Regards
>>>>>>> > > > >
>>>>>>> > > > > Hongzhan Chen
>>>>>>> > > > >
>>>>>>> > > > > [1]:
>>>>>>> > > > > cat /proc/interrupts
>>>>>>> > > > >   14:    2440190          0          0          0  IR-IO-APIC   14    -fasteoi
>>>>>>> > INTC1020:00,
>>>>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05
>>>>>>> > > > > 128:    2440190          0          0          0  INTC1020:01  112
>>>>0000:00:1a.1
>>>>>>cd
>>>>>>> > > > >
>>>>>>> > > > > [2]:
>>>>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>>>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644
>>>>>>> > > > > --- a/kernel/irq/chip.c
>>>>>>> > > > > +++ b/kernel/irq/chip.c
>>>>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc
>>>>*desc)
>>>>>>> > > > >         kstat_incr_irqs_this_cpu(desc);
>>>>>>> > > > >
>>>>>>> > > > >         /* Start handling the irq */
>>>>>>> > > > > -       if (!irqs_pipelined())
>>>>>>> > > > >                 chip->irq_ack(&desc->irq_data);
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > >
>>>>>>> > > > That doesn't make much sense. A IRQ storm means that we
>>>>unmasked
>>>>>>the
>>>>>>> > > > IRQ to early or forgot to mask it.
>>>>>>> > > >
>>>>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be
>>>>fired
>>>>>>> > > > again. No?
>>>>>>> > >
>>>>>>> > > According to the test, it seems forgotten ACK will keep firing
>>interrupt
>>>>>>here
>>>>>>> > like
>>>>>>> > > following comparing:
>>>>>>> >
>>>>>>> > This doesn't make sense (IMHO).
>>>>>>> > Can you provide a callstack from handle_edge_irq() in case
>>>>>>> > on_pipeline_entry() returns false?
>>>>>>> >
>>>>>>> > The IRQ should be ACKed on pipeline entry already, even if
>>>>>>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ
>>>>>>> > should already be ACKed.
>>>>>>>
>>>>>>> Following stack is dumped after patching my patch. Without patching
>>>>there
>>>>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from
>>>>which
>>>>>>it already called irq_exit_pipeline before sync_current_irq_stage. That is
>>>>why
>>>>>>on_pipeline_entry returned false in handle_edge_irq
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [  291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U
>>>>>>5.10.179-intel-ese-standard-lts-dovetail+ #1
>>>>>>> [  291.009331] Hardware name: Intel Corporation Elkhart Lake
>>Embedded
>>>>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS
>>>>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023
>>>>>>> [  291.009332] IRQ stage: Linux
>>>>>>> [  291.009332] Call Trace:
>>>>>>> [  291.009332]  <IRQ>
>>>>>>> [  291.009333]  dump_stack+0x7b/0x93
>>>>>>> [  291.009333]  intel_gpio_irq_ack.cold+0x22/0x27
>>>>>>> [  291.009334]  handle_edge_irq+0xe4/0x310
>>>>>>> [  291.009334]  generic_handle_irq+0x4b/0x60
>>>>>>> [  291.009335]  intel_gpio_irq+0x11f/0x1c0
>>>>>>> [  291.009335]  __handle_irq_event_percpu+0x77/0x190
>>>>>>> [  291.009336]  ? handle_level_irq+0x200/0x200
>>>>>>> [  291.009336]  handle_irq_event+0x6c/0x100
>>>>>>> [  291.009337]  handle_fasteoi_irq+0xd8/0x250
>>>>>>> [  291.009337]  asm_call_irq_on_stack+0xf/0x20
>>>>>>> [  291.009338]  </IRQ>
>>>>>>> [  291.009338]  arch_do_IRQ_pipelined+0x11d/0x230
>>>>>>> [  291.009339]  sync_current_irq_stage+0x157/0x1f0
>>>>>>                   ^^
>>>>>>My understanding is that you are not handling an HW IRQ, you are in the
>>>>>>stage where the inband IRQ log is being processed. HW IRQs are already
>>>>>>ACKed at that stage.
>>>>>>
>>>>>>Maybe we have to understand your IRQ chip routing. Is there an IO APIC
>>>>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point
>>>>>>which takes care of masking + ack.
>>>>>
>>>>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is
>>>>registered in io_apic driver. There is second level pinctrl device acting as
>>>>gpiochip which has irq domains where sd/mmc cd interrupt alloced from ,
>>>>which need to be acked in handle_edge_irq.
>>>>>
>>>>> Regards
>>>>>
>>>>> Hongzhan Chen
>>>>>>
>>>>>>> [  291.009339]  handle_irq_pipelined_finish+0xa2/0x1a0
>>>>>>> [  291.009340]  arch_pipeline_entry+0xdb/0x120
>>>>>>> [  291.009340]  asm_common_interrupt+0x1e/0x40
>>>>>>> [  291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110
>>>>>>> [  291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c
>>>>25
>>>>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 <a8>
>>08
>>>>75
>>>>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75
>>>>>>> [  291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Hongzhan Chen
>>>>>>> >
>>>>>>> > But: If we use the dovetail sources to build a kernel with pipelining
>>>>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on your
>>>>>>> > end.
>>>>>>> >
>>>>>>> > As Jan said, I'm searching for a similar problem for some time now
>>that
>>>>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall,
>>>>>>> > not a storm. I will give it a try, let's see.
>>>>>>> >
>>>>>>> > >
>>>>>>> > > After patching:
>>>>>>> > > 128:          2          0          0          0  INTC1020:01  112      0000:00:1a.1
>>cd
>>>>>>> > >
>>>>>>> > > Before patching:
>>>>>>> > > 128:    2440190          0          0          0  INTC1020:01  112
>>0000:00:1a.1
>>>>cd
>>>>>>> > >
>>>>>>> > > Regards
>>>>>>> > >
>>>>>>> > > Hongzhan Chen
>>>>>>> > > >
>>>>>>> > > > Could you please check if we call unmask_irq() with HW IRQs
>>>>enabled?
>>>>>>If
>>>>>>> > > > so, I have an idea what is going on...
>>>>>>> > > >
>>>>>>> > > > Florian
>>>>>>> > >
>>>>>>>
>>>>
>>>>To start with, prior to any further analysis:
>>>>
>>>>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h b/drivers/pinctrl/intel/pinctrl-
>>>>intel.h
>>>>index 65628423bf639..b7afa763ee079 100644
>>>>--- a/drivers/pinctrl/intel/pinctrl-intel.h
>>>>+++ b/drivers/pinctrl/intel/pinctrl-intel.h
>>>>@@ -222,7 +222,7 @@ struct intel_pinctrl_context {
>>>>  */
>>>> struct intel_pinctrl {
>>>> 	struct device *dev;
>>>>-	raw_spinlock_t lock;
>>>>+	hard_spinlock_t lock;
>>>
>>> I already modified it at first when we found the issue.  But it does not work
>>with it.
>>>
>>> After patching my patch[2] I mentioned, the issue disappear finally.
>>>
>>
>>Ok, but the patch is papering over the real issue I believe.
>
> What kind of real issue it would be? What checkpoint I need to check to find the real issue out? Please suggest.
> Continuous fired irq is obviously caused by forgotten ack behavior for the issue. The ack is supposed to happen elsewhere?
>

Edge ack is supposed to happen in handle_edge_irq() when a pipeline
entry is detected a few lines earlier, your patch would cause a double
ACK in the common case when pipelining, which might cause events to be
lost. As you found out already, the problem is that such flow handler is
called _outside_ of the pipeline entry path, so the ack is not
delivered.

This happens due to this particular interrupt path, which chains two
irqchips, edge on top of fasteoi, combined to the fact that there is no
oob handling of the fasteoi IRQ, so interrupt delivery has to go through
the per-CPU pipeline log first, therefore handling is postponed until
the pipeline is synchronized, _after_ we left the pipeline entry
context. As a result, the fasteoi flow handler does have in_pipeline()
set, but the edge flow handler does not, causing this bug.

handle_irq_pipelined_prepare()
[on_pipeline_entry: true]
arch_handle_irq()
        handle_fasteoi_irq()
                handle_oob_irq()
                        mask_cond_eoi_irq()
handle_irq_pipelined_finish()
[on_pipeline_entry: false]
        synchronize_pipeline_on_irq()
                handle_fasteoi_irq() [!on_pipeline_entry]
                      intel_gpio_irq()
                           handle_edge_irq() <<< failed to ack

This is a bug in the Dovetail logic for stacked IRQ chips. I may have
some cycles spare the next couple of weeks to work on this issue and the
rest of my Dovetail backlog if nobody comes up with a fix earlier.

-- 
Philippe.

  reply	other threads:[~2023-09-01  7:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30  3:05 irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device Chen, Hongzhan
2023-08-30  7:11 ` Jan Kiszka
2023-08-30  7:32   ` Chen, Hongzhan
2023-08-30  7:20 ` Florian Bezdeka
2023-08-30  7:54   ` Chen, Hongzhan
2023-08-30 13:23     ` Florian Bezdeka
2023-08-31  0:45       ` Chen, Hongzhan
2023-08-31  7:09         ` Florian Bezdeka
2023-08-31  7:39           ` Chen, Hongzhan
2023-08-31  8:06             ` Philippe Gerum
2023-08-31  8:35               ` Chen, Hongzhan
2023-08-31  8:56                 ` Philippe Gerum
2023-09-01  2:28                   ` Chen, Hongzhan
2023-09-01  6:35                     ` Philippe Gerum [this message]
2023-09-04  8:20                       ` Chen, Hongzhan
2023-09-04  8:56                         ` Jan Kiszka
2023-09-04  9:34                           ` Florian Bezdeka
2023-09-04 13:35                             ` Chen, Hongzhan
2023-09-04  9:21                         ` Philippe Gerum
2023-09-04 13:26                           ` Chen, Hongzhan
2023-09-04  9:36                         ` Chen, Hongzhan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874jkewdwi.fsf@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=florian.bezdeka@siemens.com \
    --cc=hongzhan.chen@intel.com \
    --cc=jan.kiszka@siemens.com \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.