From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84DF0C433F5 for ; Wed, 9 Mar 2022 00:03:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=7lyNZ9KF320ZXeSq2bCpanu+jk7smr+0ddihbMON7RY=; b=GYjikYFpGv9jW3 zupMc72tcm3VPfegpS8Fx7FpBO7AXVeKmUhLEgxuR4rftX7Zpm4sF5Kt0P+qVM6Yd6UPqc2xvPvPZ XRf3g1mnhDhOKd7hUlxEFpqAPMtPWcpML5wHbybas1okRuGWtjuQ03xzhr3g7rzWzm3nNNX4chrlt Pru0RO58y/9aHFqag6fP2G0Q05SJU6NtTFw0CYahXPhzCUf2h4QVaR7J0dLZ8P5DbRzX5xvaibtTf LFOjsMcl9XOUFl4leqRn7eRQC8IOAOp+Lx2VLWhdkZ47WTzdv6t/stQf6MsNFJAfCWqTxKHavA7A6 GXIsMCqHft+p134BRkSQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nRjm3-006fLf-6w; Wed, 09 Mar 2022 00:01:51 +0000 Received: from pandora.armlinux.org.uk ([2001:4d48:ad52:32c8:5054:ff:fe00:142]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nRjlk-006fHB-Ab for linux-arm-kernel@lists.infradead.org; Wed, 09 Mar 2022 00:01:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=JWRnRAXSj1KYbRLgyTpp+qOfp9s6oy6noKV18zlkBxs=; b=jmgnrLWJT3+j+/K9nTLQtOmVv1 0Pjv/cN3hSSqgvioOZSUKVk/HXwdjcxqRqKSGNSECikcc4AswCoR7XStRH2wpxCthhZ/u0kRhmd90 98O0UjPACa4S3R4BGnOPCzO8J80TAsSr0xprVpKBwQcimy3bYLCcw/WCIYlDNq9aclfvdMZ9LJBKZ UuUf4qGA/rZ7Z+Mtw1TGzQHbyEtR95sefLNs5nVT6uuV24ZejUie70qhx+SqbLszuYMdNmCnxs8iq 78wDY7gh/74kW9zx0RYI+XFVu1U2iet5gGJxwvutEpNw/w59rYWiON6QZ+1apwqsb9LFXDsiWHPC9 +5AfYJaQ==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:57726) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nRjlg-00017j-CG; Wed, 09 Mar 2022 00:01:28 +0000 Received: from linux by shell.armlinux.org.uk with local (Exim 4.94.2) (envelope-from ) id 1nRjle-0007LU-Gs; Wed, 09 Mar 2022 00:01:26 +0000 Date: Wed, 9 Mar 2022 00:01:26 +0000 From: "Russell King (Oracle)" To: Ard Biesheuvel Cc: Corentin Labbe , Linus Walleij , Arnd Bergmann , Linux ARM , Linux Kernel Mailing List Subject: Re: boot flooded with unwind: Index not found Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220308_160132_589104_F67FA5ED X-CRM114-Status: GOOD ( 42.21 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Mar 02, 2022 at 11:22:29AM +0000, Russell King (Oracle) wrote: > On Wed, Mar 02, 2022 at 12:19:40PM +0100, Ard Biesheuvel wrote: > > On Wed, 2 Mar 2022 at 12:12, Russell King (Oracle) > > wrote: > > > > > > On Wed, Mar 02, 2022 at 11:09:49AM +0100, Corentin Labbe wrote: > > > > The crash disappeared (but the suspicious RCU usage is still here). > > > > > > As the trace on those is: > > > > > > [ 0.239629] unwind_backtrace from show_stack+0x10/0x14 > > > [ 0.239654] show_stack from init_stack+0x1c54/0x2000 > > > > > > unwind_backtrace() and show_stack() are both C code, the compiler will > > > emit the unwind information for it. show_stack() isn't called from > > > assembly code, only from C code, so the next function's unwind > > > information should also be generated by the compiler. > > > > > > However, init_stack is not a function - it's an array of unsigned long. > > > There is no way this should appear in the trace, and this suggests that > > > the unwind of show_stack() has gone wrong. > > > > > > I don't see anything obvious in Ard's changes that would cause that > > > though. > > > > > > Did it used to work fine with previous versions of linux-next - those > > > versions where we had Ard's "arm-vmap-stacks-v6" tag merged in > > > (commit 2fa394824493) and did this only appear when I merged > > > "arm-ftrace-for-rmk" (commit 74aaaa1e9bba) ? Did merging > > > "arm-ftrace-for-rmk" cause any change in your .config? > > > > > > > I can reproduce the RCU warnings, and I have tracked this down to the > > change I made to return_address() for the graph tracer, which I > > thought was justified after removing the call to > > kernel_text_address(): > > > > --- a/arch/arm/include/asm/ftrace.h > > +++ b/arch/arm/include/asm/ftrace.h > > @@ -35,26 +35,8 @@ static inline unsigned long > > ftrace_call_adjust(unsigned long addr) > > > > #ifndef __ASSEMBLY__ > > > > -#if defined(CONFIG_FRAME_POINTER) && !defined(CONFIG_ARM_UNWIND) > > -/* > > - * return_address uses walk_stackframe to do it's work. If both > > - * CONFIG_FRAME_POINTER=y and CONFIG_ARM_UNWIND=y walk_stackframe uses unwind > > - * information. For this to work in the function tracer many functions would > > - * have to be marked with __notrace. So for now just depend on > > - * !CONFIG_ARM_UNWIND. > > - */ > > - > > void *return_address(unsigned int); > > > > -#else > > - > > -static inline void *return_address(unsigned int level) > > -{ > > - return NULL; > > -} > > - > > -#endif > > - > > #define ftrace_return_address(n) return_address(n) > > > > #define ARCH_HAS_SYSCALL_MATCH_SYM_NAME > > > > However, the function graph tracer works happily with this bit > > reverted, and so that is probably the best course of action here. > > > > I have already sent the patch that reintroduces the > > kernel_text_address() check - would you prefer a v2 of that one with > > this change incorporated? Or a second patch that just reverts the > > above? (Given that the bogus dereference was invoked from > > return_address() as well, I suspect that this change would make the > > get_kernel_nofault() change I proposed in this thread redundant) > > I'd prefer patches on top of my devel-stable branch, thanks. To reinterate what I've just put on IRC - we have not got to the bottom of this problem yet - it still very much exists. There seems to be something of a fundamental issue with the unwinder, it now appears to be going wrong and failing to unwind beyond a couple of functions, and the address it's coming out with appears to be incorrect. I've only just discovered this because I created my very own bug, and yet again, the timing sucks with the proximity of the merge window. I'm getting: [ 13.198803] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [ 13.198820] [] (show_stack) from [] (0xc2be78d4) for the WARN_ON() stacktrace, and that address that apparently called show_stack() is most definitely rubbish and incorrect. This makes any WARN_ON() condition undebuggable. This is with both 9183/1 and 9184/1 applied on top of pulling your "arm-ftrace-for-rmk" tag and also with just the "arm-vmap-stacks-v6" tag. This seems to point at one of these patches breaking the unwinder: a1c510d0adc6 ARM: implement support for vmap'ed stacks 532319b9c418 ARM: unwind: disregard unwind info before stack frame is set up 4ab6827081c6 ARM: unwind: dump exception stack from calling frame b6506981f880 ARM: unwind: support unwinding across multiple stacks Given that the unwinder is broken, I wonder whether 0183/1 and 9184/1 are actually required. I did try to point this problem out a few emails back: "As the trace on those is: [ 0.239629] unwind_backtrace from show_stack+0x10/0x14 [ 0.239654] show_stack from init_stack+0x1c54/0x2000 unwind_backtrace() and show_stack() are both C code, the compiler will emit the unwind information for it. show_stack() isn't called from assembly code, only from C code, so the next function's unwind information should also be generated by the compiler. However, init_stack is not a function - it's an array of unsigned long. There is no way this should appear in the trace, and this suggests that the unwind of show_stack() has gone wrong." In Corentin's case, there is no way init_stack should ever appear in the stack trace. In my case, it's not init_stack, but 0xc2be78d4. Can you try testing out a dummy WARN_ON(1) test in your kernel please? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel