From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mslow1.mail.gandi.net (mslow1.mail.gandi.net [217.70.178.240]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 373E5210A for ; Thu, 31 Aug 2023 09:05:55 +0000 (UTC) Received: from relay2-d.mail.gandi.net (unknown [IPv6:2001:4b98:dc4:8::222]) by mslow1.mail.gandi.net (Postfix) with ESMTP id CCB26CC37B for ; Thu, 31 Aug 2023 08:57:13 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 1EDE44001A; Thu, 31 Aug 2023 08:57:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xenomai.org; s=gm1; t=1693472225; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=O8TUSHlUM1htJ0SIo5r8qqkYKKjRXXGYPtQmsBcCS4I=; b=dqx2Ra7xC9PWIztIQWTh8fzd1KN/G9UCTAY/x4V+8jsDrRnUwINb0plizjC8nKXlaTslPE GssyD+TlUkA8SUGgBM7CFqC1PD0U+RJDUaLBLERCxDCSBPS4eGU6SeS3QiMOV0wvLZ5LJw ZQtV2nmb6PHjMKe7ZyUlo6Ao6CQ4jgRMFimtZpE7Psyto6lZfp1fl9iIJfNDunW5Requ4k 3blZ0LNRHN+zDIpso9Mdku5ElKt+TkEBCtEZAeCt8b5rwmUngB7042sDVU+fK9tJqWwylW BxrciGya7P7hlBb1OWyD3mmj0uDMo/30d7KPOmQ60j+K7Vln9D93a/JLwV2jhA== References: <463948a9bd09ddf6062f879fa9746b0c70e6a9ab.camel@siemens.com> <42d72c52569d30b9ecaf8fd30dd777abd422adf1.camel@siemens.com> <874jkffxe0.fsf@xenomai.org> User-agent: mu4e 1.8.11; emacs 28.2 From: Philippe Gerum To: "Chen, Hongzhan" Cc: "Bezdeka, Florian" , "Kiszka, Jan" , "xenomai@lists.linux.dev" Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from pinctrl device Date: Thu, 31 Aug 2023 10:56:31 +0200 In-reply-to: Message-ID: <87zg27egin.fsf@xenomai.org> Precedence: bulk X-Mailing-List: xenomai@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain X-GND-Sasl: rpm@xenomai.org "Chen, Hongzhan" writes: >>-----Original Message----- >>From: Philippe Gerum >>Sent: Thursday, August 31, 2023 4:06 PM >>To: Chen, Hongzhan >>Cc: Bezdeka, Florian ; Kiszka, Jan >>; xenomai@lists.linux.dev >>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from >>pinctrl device >> >> >>"Chen, Hongzhan" writes: >> >>>>-----Original Message----- >>>>From: Florian Bezdeka >>>>Sent: Thursday, August 31, 2023 3:10 PM >>>>To: Chen, Hongzhan >>>>Cc: Kiszka, Jan ; xenomai@lists.linux.dev; >>Philippe >>>>Gerum >>>>Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced from >>>>pinctrl device >>>> >>>>On Thu, 2023-08-31 at 00:45 +0000, Chen, Hongzhan wrote: >>>>> >>>>> > -----Original Message----- >>>>> > From: Florian Bezdeka >>>>> > Sent: Wednesday, August 30, 2023 9:23 PM >>>>> > To: Chen, Hongzhan >>>>> > Cc: Kiszka, Jan ; xenomai@lists.linux.dev; >>>>Philippe >>>>> > Gerum >>>>> > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced >>from >>>>> > pinctrl device >>>>> > >>>>> > On Wed, 2023-08-30 at 07:54 +0000, Chen, Hongzhan wrote: >>>>> > > >>>>> > > > -----Original Message----- >>>>> > > > From: Florian Bezdeka >>>>> > > > Sent: Wednesday, August 30, 2023 3:20 PM >>>>> > > > To: Chen, Hongzhan ; >>>>> > xenomai@lists.linux.dev >>>>> > > > Subject: Re: irq storm on sd/mmc card detect gpio interrupt alloced >>>>from >>>>> > > > pinctrl device >>>>> > > > >>>>> > > > On Wed, 2023-08-30 at 03:05 +0000, Chen, Hongzhan wrote: >>>>> > > > > Hi Philippe/Jan >>>>> > > > > >>>>> > > > > We found there is irq storm issue like [1] on sd/mmc card detect >>>>> > interrupt >>>>> > > > alloced from >>>>> > > > > pinctrl device with Xenomai 3.2 Dovetail 5.10.* >>>>> > > > > According to our test, the patch [2] works for such issue in our >>case. >>>>It >>>>> > > > seems that handle_edge_irq >>>>> > > > > never ack the irq to clear the interrupt status for the Linux gpio >>>>interrupt. >>>>> > > > > >>>>> > > > > I am wondering if there is any side-effect for this patch, Please >>>>suggest. >>>>> > > > > >>>>> > > > > Regards >>>>> > > > > >>>>> > > > > Hongzhan Chen >>>>> > > > > >>>>> > > > > [1]: >>>>> > > > > cat /proc/interrupts >>>>> > > > > 14: 2440190 0 0 0 IR-IO-APIC 14 -fasteoi >>>>> > INTC1020:00, >>>>> > > > INTC1020:01, INTC1020:03, INTC1020:04, INTC1020:05 >>>>> > > > > 128: 2440190 0 0 0 INTC1020:01 112 >>0000:00:1a.1 >>>>cd >>>>> > > > > >>>>> > > > > [2]: >>>>> > > > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c >>>>> > > > > index 6aca6c036ada..4d08eaf076e9 100644 >>>>> > > > > --- a/kernel/irq/chip.c >>>>> > > > > +++ b/kernel/irq/chip.c >>>>> > > > > @@ -869,7 +869,6 @@ void handle_edge_irq(struct irq_desc >>*desc) >>>>> > > > > kstat_incr_irqs_this_cpu(desc); >>>>> > > > > >>>>> > > > > /* Start handling the irq */ >>>>> > > > > - if (!irqs_pipelined()) >>>>> > > > > chip->irq_ack(&desc->irq_data); >>>>> > > > > >>>>> > > > > >>>>> > > > >>>>> > > > That doesn't make much sense. A IRQ storm means that we >>unmasked >>>>the >>>>> > > > IRQ to early or forgot to mask it. >>>>> > > > >>>>> > > > A forgotten ACK would mean the IRQ stalls, so it would never be >>fired >>>>> > > > again. No? >>>>> > > >>>>> > > According to the test, it seems forgotten ACK will keep firing interrupt >>>>here >>>>> > like >>>>> > > following comparing: >>>>> > >>>>> > This doesn't make sense (IMHO). >>>>> > Can you provide a callstack from handle_edge_irq() in case >>>>> > on_pipeline_entry() returns false? >>>>> > >>>>> > The IRQ should be ACKed on pipeline entry already, even if >>>>> > handle_edge_irq() is called again (from Linux/Inband context) the IRQ >>>>> > should already be ACKed. >>>>> >>>>> Following stack is dumped after patching my patch. Without patching >>there >>>>is no intel_gpio_irq_ack. It is called in handle_irq_pipelined_finish from >>which >>>>it already called irq_exit_pipeline before sync_current_irq_stage. That is >>why >>>>on_pipeline_entry returned false in handle_edge_irq >>>>> >>>>> >>>>> >>>>> [ 291.009330] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U >>>>5.10.179-intel-ese-standard-lts-dovetail+ #1 >>>>> [ 291.009331] Hardware name: Intel Corporation Elkhart Lake Embedded >>>>Platform/ElkhartLake LPDDR4x T3 CRB, BIOS >>>>EHLSFWI1.R00.5204.A01.2305180757 05/18/2023 >>>>> [ 291.009332] IRQ stage: Linux >>>>> [ 291.009332] Call Trace: >>>>> [ 291.009332] >>>>> [ 291.009333] dump_stack+0x7b/0x93 >>>>> [ 291.009333] intel_gpio_irq_ack.cold+0x22/0x27 >>>>> [ 291.009334] handle_edge_irq+0xe4/0x310 >>>>> [ 291.009334] generic_handle_irq+0x4b/0x60 >>>>> [ 291.009335] intel_gpio_irq+0x11f/0x1c0 >>>>> [ 291.009335] __handle_irq_event_percpu+0x77/0x190 >>>>> [ 291.009336] ? handle_level_irq+0x200/0x200 >>>>> [ 291.009336] handle_irq_event+0x6c/0x100 >>>>> [ 291.009337] handle_fasteoi_irq+0xd8/0x250 >>>>> [ 291.009337] asm_call_irq_on_stack+0xf/0x20 >>>>> [ 291.009338] >>>>> [ 291.009338] arch_do_IRQ_pipelined+0x11d/0x230 >>>>> [ 291.009339] sync_current_irq_stage+0x157/0x1f0 >>>> ^^ >>>>My understanding is that you are not handling an HW IRQ, you are in the >>>>stage where the inband IRQ log is being processed. HW IRQs are already >>>>ACKed at that stage. >>>> >>>>Maybe we have to understand your IRQ chip routing. Is there an IO APIC >>>>involved? handle_fasteoi_irq() would be the first level IRQ entry point >>>>which takes care of masking + ack. >>> >>> Yes. IOAPIC is the first level for external devices. handle_fasteoi_irq() is >>registered in io_apic driver. There is second level pinctrl device acting as >>gpiochip which has irq domains where sd/mmc cd interrupt alloced from , >>which need to be acked in handle_edge_irq. >>> >>> Regards >>> >>> Hongzhan Chen >>>> >>>>> [ 291.009339] handle_irq_pipelined_finish+0xa2/0x1a0 >>>>> [ 291.009340] arch_pipeline_entry+0xdb/0x120 >>>>> [ 291.009340] asm_common_interrupt+0x1e/0x40 >>>>> [ 291.009341] RIP: 0010:cpu_idle_poll.isra.0+0x3a/0x110 >>>>> [ 291.009342] Code: 00 85 d2 0f 8f 9c 00 00 00 e8 72 33 28 ff 65 48 8b 1c >>25 >>>>00 ad 01 00 e8 a4 29 27 ff 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b 03 08 >>75 >>>>13 8b 05 cc 6b 01 01 85 c0 75 ed e8 f3 22 2a ff 85 c0 75 >>>>> [ 291.009342] RSP: 0018:ffffffffa1803ec0 EFLAGS: 00000202 >>>>> >>>>> Regards >>>>> >>>>> Hongzhan Chen >>>>> > >>>>> > But: If we use the dovetail sources to build a kernel with pipelining >>>>> > disabled we have to ACK the IRQ. I assume pipelining is active on your >>>>> > end. >>>>> > >>>>> > As Jan said, I'm searching for a similar problem for some time now that >>>>> > I can only reproduce on VirtualBox VMM. In my case I'm seeing a stall, >>>>> > not a storm. I will give it a try, let's see. >>>>> > >>>>> > > >>>>> > > After patching: >>>>> > > 128: 2 0 0 0 INTC1020:01 112 0000:00:1a.1 cd >>>>> > > >>>>> > > Before patching: >>>>> > > 128: 2440190 0 0 0 INTC1020:01 112 0000:00:1a.1 >>cd >>>>> > > >>>>> > > Regards >>>>> > > >>>>> > > Hongzhan Chen >>>>> > > > >>>>> > > > Could you please check if we call unmask_irq() with HW IRQs >>enabled? >>>>If >>>>> > > > so, I have an idea what is going on... >>>>> > > > >>>>> > > > Florian >>>>> > > >>>>> >> >>To start with, prior to any further analysis: >> >>diff --git a/drivers/pinctrl/intel/pinctrl-intel.h b/drivers/pinctrl/intel/pinctrl- >>intel.h >>index 65628423bf639..b7afa763ee079 100644 >>--- a/drivers/pinctrl/intel/pinctrl-intel.h >>+++ b/drivers/pinctrl/intel/pinctrl-intel.h >>@@ -222,7 +222,7 @@ struct intel_pinctrl_context { >> */ >> struct intel_pinctrl { >> struct device *dev; >>- raw_spinlock_t lock; >>+ hard_spinlock_t lock; > > I already modified it at first when we found the issue. But it does not work with it. > > After patching my patch[2] I mentioned, the issue disappear finally. > Ok, but the patch is papering over the real issue I believe. -- Philippe.