From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79A07C433ED for ; Fri, 16 Apr 2021 14:16:55 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EE8E061042 for ; Fri, 16 Apr 2021 14:16:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE8E061042 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Subject:Cc:To: From:Message-ID:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=bkWxFIV7tx2FPVsBkIBKm4yAz1nWZQFqBISPGShCJE0=; b=p7wR9lAv90DExYQHERzKbCwTP rFC9eVTS3WSUVOtZMi0Z2YSDfy5NT+5tyMl6vyXnas79y/y59zbZ5194RZe0SjLcoC+y+Ac3/aPcW UPGeX3i4iy7U0jtSkHQrWh8XUGrINnEZUQ40Bre/PU8oDZ6ebUXIHL0NHYWFxsF8y+6Q+NwgfDDDE uCuOyTMQP03s+uzJr9N8XAhA/eLkrAmGgx/XKFBhtFw6dCCy295pD2QHmMZsBz6MKvwqylgS0synM LzbiWwAZAx17vPeVCOUujqDEI0ngNqXev0QN7rbpsQuO3ZfFRjpSWtHoJ+btCgOz6ilqf3Bq10Wb9 lQHvpor5A==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lXPFX-002MZM-Vq; Fri, 16 Apr 2021 14:15:12 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lXPFT-002MYn-9Q for linux-arm-kernel@desiato.infradead.org; Fri, 16 Apr 2021 14:15:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:MIME-Version:References: In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=f2mii46ZClOv0HmadwPhHKEJAfsIMqyTPwNLwHAcdGY=; b=bP2MF6rvrCKEXvKL+UQ66bPY3H 9/URHf4ko1Xh492mj1g0vwHVhUezr7o4Re+9YlezmkLiuNXap0abpynMzKUAOwI54BFJbcr0TKIUy U8N0B+ZhXcFTk07cyNpduE+zU8HHU/TyrQiSri9HxGL3V6kn1Z5oq2GNtohaag9cyK7e8iI+cWklX yBxJciLWC7z4MEtNrOwN4Z/GZ/deq0zfixza7WPWHt3bTYb/N3Z0C+JSTLe5V8FLDs6ZuhircRbDC RV10cmGpTZAdSRA60saoGc/4DZ9d66Bk4nk/xe3mklRcKYD+EFT96r/rChYrGQt6ul8iJ6PriIgaJ OLw9956w==; Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lXPFQ-009Qmk-Et for linux-arm-kernel@lists.infradead.org; Fri, 16 Apr 2021 14:15:06 +0000 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9A79D6103D; Fri, 16 Apr 2021 14:15:03 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lXPFN-007s36-HZ; Fri, 16 Apr 2021 15:15:01 +0100 Date: Fri, 16 Apr 2021 15:15:00 +0100 Message-ID: <87y2dis4d7.wl-maz@kernel.org> From: Marc Zyngier To: He Ying Cc: , , , , , Mark Rutland Subject: Re: [RFC PATCH] irqchip/gic-v3: Do not enable irqs when handling spurious interrups In-Reply-To: <20210416062217.25157-1-heying24@huawei.com> References: <20210416062217.25157-1-heying24@huawei.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: heying24@huawei.com, tglx@linutronix.de, julien.thierry.kdev@gmail.com, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210416_071504_561488_FD701C51 X-CRM114-Status: GOOD ( 41.43 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org [+ Mark] On Fri, 16 Apr 2021 07:22:17 +0100, He Ying wrote: > > We found this problem in our kernel src tree: > > [ 14.816231] ------------[ cut here ]------------ > [ 14.816231] kernel BUG at irq.c:99! > [ 14.816232] Internal error: Oops - BUG: 0 [#1] SMP > [ 14.816232] Process swapper/0 (pid: 0, stack limit = 0x(____ptrval____)) > [ 14.816233] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.19.95-1.h1.AOS2.0.aarch64 #14 > [ 14.816233] Hardware name: evb (DT) > [ 14.816234] pstate: 80400085 (Nzcv daIf +PAN -UAO) > [ 14.816234] pc : asm_nmi_enter+0x94/0x98 > [ 14.816235] lr : asm_nmi_enter+0x18/0x98 > [ 14.816235] sp : ffff000008003c50 > [ 14.816235] pmr_save: 00000070 > [ 14.816237] x29: ffff000008003c50 x28: ffff0000095f56c0 > [ 14.816238] x27: 0000000000000000 x26: ffff000008004000 > [ 14.816239] x25: 00000000015e0000 x24: ffff8008fb916000 > [ 14.816240] x23: 0000000020400005 x22: ffff0000080817cc > [ 14.816241] x21: ffff000008003da0 x20: 0000000000000060 > [ 14.816242] x19: 00000000000003ff x18: ffffffffffffffff > [ 14.816243] x17: 0000000000000008 x16: 003d090000000000 > [ 14.816244] x15: ffff0000095ea6c8 x14: ffff8008fff5ab40 > [ 14.816244] x13: ffff8008fff58b9d x12: 0000000000000000 > [ 14.816245] x11: ffff000008c8a200 x10: 000000008e31fca5 > [ 14.816246] x9 : ffff000008c8a208 x8 : 000000000000000f > [ 14.816247] x7 : 0000000000000004 x6 : ffff8008fff58b9e > [ 14.816248] x5 : 0000000000000000 x4 : 0000000080000000 > [ 14.816249] x3 : 0000000000000000 x2 : 0000000080000000 > [ 14.816250] x1 : 0000000000120000 x0 : ffff0000095f56c0 > [ 14.816251] Call trace: > [ 14.816251] asm_nmi_enter+0x94/0x98 > [ 14.816251] el1_irq+0x8c/0x180 > [ 14.816252] gic_handle_irq+0xbc/0x2e4 > [ 14.816252] el1_irq+0xcc/0x180 > [ 14.816253] arch_timer_handler_virt+0x38/0x58 > [ 14.816253] handle_percpu_devid_irq+0x90/0x240 > [ 14.816253] generic_handle_irq+0x34/0x50 > [ 14.816254] __handle_domain_irq+0x68/0xc0 > [ 14.816254] gic_handle_irq+0xf8/0x2e4 > [ 14.816255] el1_irq+0xcc/0x180 > [ 14.816255] arch_cpu_idle+0x34/0x1c8 > [ 14.816255] default_idle_call+0x24/0x44 > [ 14.816256] do_idle+0x1d0/0x2c8 > [ 14.816256] cpu_startup_entry+0x28/0x30 > [ 14.816256] rest_init+0xb8/0xc8 > [ 14.816257] start_kernel+0x4c8/0x4f4 > [ 14.816257] Code: 940587f1 d5384100 b9401001 36a7fd01 (d4210000) > [ 14.816258] Modules linked in: start_dp(O) smeth(O) > [ 15.103092] ---[ end trace 701753956cb14aa8 ]--- > [ 15.103093] Kernel panic - not syncing: Fatal exception in interrupt > [ 15.103099] SMP: stopping secondary CPUs > [ 15.103100] Kernel Offset: disabled > [ 15.103100] CPU features: 0x36,a2400218 > [ 15.103100] Memory Limit: none Urgh... > Our kernel src tree is based on 4.19.95 and backports arm64 pseudo-NMI > patches but doesn't support nested NMI. Its top relative commit is > commit 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs"). Can you please reproduce it with mainline and without any backport? It is hard to reason about something that isn't a vanilla kernel. > I look into this issue and find that it's caused by 'BUG_ON(in_nmi())' > in nmi_enter(). From the call trace, we find two 'el1_irqs' which > means an interrupt preempts the other one and the new one is an NMI. > Furthermore, by adding some prints, we find the first irq also calls > nmi_enter(), but its priority is not GICD_INT_NMI_PRI and its irq number > is 1023. It enables irq by calling gic_arch_enable_irqs() in > gic_handle_irq(). At this moment, the second irq preempts the first irq > and it's an NMI but current context is already in nmi. So that may be > the problem. I'm not sure I get it. From the stack trace, I see this: [ 14.816251] asm_nmi_enter+0x94/0x98 [ 14.816251] el1_irq+0x8c/0x180 (C) [ 14.816252] gic_handle_irq+0xbc/0x2e4 [ 14.816252] el1_irq+0xcc/0x180 (B) [ 14.816253] arch_timer_handler_virt+0x38/0x58 [ 14.816253] handle_percpu_devid_irq+0x90/0x240 [ 14.816253] generic_handle_irq+0x34/0x50 [ 14.816254] __handle_domain_irq+0x68/0xc0 [ 14.816254] gic_handle_irq+0xf8/0x2e4 [ 14.816255] el1_irq+0xcc/0x180 (A) which indicates that we preempted a timer interrupt (A) with another IRQ (B), itself immediately preempted by another IRQ (C)? That's indeed at least one too many. Can you please describe for each of (A), (B) and (C) whether they are spurious or not, what their priorities are if they aren't spurious? > In my opinion, when handling spurious interrupts, we shouldn't enable irqs. > My reason is that for spurious interrupts we may enter nmi context in > el1_irq() because current PMR may be GIC_PRIO_IRQOFF. If we enable irqs > at this time, another NMI may happen and preempt this spurious interrupt > but the context is already in nmi. That causes a bug on if nested NMI is > not supported. Even for nested nmi, I think it's not a normal scenario. I would tend to agree that this isn't great. Actually, I'd probably move the check for a spurious interrupt right after the read of ICC_IAR1_EL1, because there is no real need to do anything else at that point. However, upstream is quite different from 4.19 in that respect, and I'm not sure if what I am looking at is what you are seeing with your older kernel. Thanks, M. > Fixes: 17ce302f3117 ("arm64: Fix interrupt tracing in the presence of NMIs") > Signed-off-by: He Ying > --- > drivers/irqchip/irq-gic-v3.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index 94b89258d045..d3b52734a2c5 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -654,15 +654,15 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs > return; > } > > + /* Check for special IDs first */ > + if ((irqnr >= 1020 && irqnr <= 1023)) > + return; > + > if (gic_prio_masking_enabled()) { > gic_pmr_mask_irqs(); > gic_arch_enable_irqs(); > } > > - /* Check for special IDs first */ > - if ((irqnr >= 1020 && irqnr <= 1023)) > - return; > - > if (static_branch_likely(&supports_deactivate_key)) > gic_write_eoir(irqnr); > else > -- > 2.17.1 > > -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel