From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3193DC7EE24 for ; Tue, 16 May 2023 10:09:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231908AbjEPKJq (ORCPT ); Tue, 16 May 2023 06:09:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232252AbjEPKJn (ORCPT ); Tue, 16 May 2023 06:09:43 -0400 Received: from mail-vk1-xa2b.google.com (mail-vk1-xa2b.google.com [IPv6:2607:f8b0:4864:20::a2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9D4E1984 for ; Tue, 16 May 2023 03:09:33 -0700 (PDT) Received: by mail-vk1-xa2b.google.com with SMTP id 71dfb90a1353d-4536ec953c1so3979143e0c.3 for ; Tue, 16 May 2023 03:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684231773; x=1686823773; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/lEOYeeU5eTLm7de7UnRvbY+9wcsAqP7XOcqCDjWADA=; b=gK8/96Mwc9pJtoGDw27JnaeCATyvSkorVev6aES+s3evOWGv2eZ4S4ZQmr3dLrS1ji P75ZaYIR630JDnG938lXkgTpAzoCDm25UQTJicgxSxrC3C5xMLuqUAPnjpjFrMhTltvf 5nOShkedMWMDsdiQC31T8R7CGcauuP/Smnv++IlvHsPbYsldV+8gK/muWzPtBUnpiWnm Edf/xMXcnyiN3A+Aoitl4bcEl0LNRK/bxRLNRAcSY4buq6wTKN2GYV7kQg0AX0DercuO 4x6z9zmJyS5hNtZA48bVcCbR5vHvKIoMRTebZWJTjQO8cX5BYJGXGALHEndzBJd2BeEM QaIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684231773; x=1686823773; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/lEOYeeU5eTLm7de7UnRvbY+9wcsAqP7XOcqCDjWADA=; b=eVCrktWkqeg2b1Oicc5MReDYY0n82qEEMz3fNHZXE9V5s3Dv84fxxOsihxaIxLTi2c ahKxCJN+nBeXGyXZqoSgCcT8NtsMtnP0rc5U9JV/pJ+EN9rjyOggLZ4wXMsje3KPZ74C 8OEEmVT1P0iA1OL7ZtDjLh2p0iZetzI+Ahjt6rQB9sFY/mRYeK18pOCzMG1TWjpYoyhm sRP0zH41AmcI7nBp1SZjghGXOZedl/39+r9TEwVnEfpSla7yGaqR0I7siOvc31oMUyZp aRLcK2/lUeZkp2dB46FhNMcHnta6hRP3EDcBpLko5xDIOpXG2Nr94KvuzlbLlI7LiJTL eiHA== X-Gm-Message-State: AC+VfDzDracYNaQuivNSVa2M+L53P5+UItFZigwCE7se/My0zGfjBt1V OgNTnQkjsKYw0jRRPsCH0aC5P8xY2BRfN2UR4J+qQA== X-Google-Smtp-Source: ACHHUZ6TLbLYd474B27i7fOyl3BsnPpQ4fbmukWnlRJS2Ui51PNk1Fw9UBPkdN9uxCCarlSOI3HX0KcO/kJKITuV9Sk= X-Received: by 2002:a1f:6dc6:0:b0:44f:eb0a:77db with SMTP id i189-20020a1f6dc6000000b0044feb0a77dbmr13395898vkc.11.1684231772863; Tue, 16 May 2023 03:09:32 -0700 (PDT) MIME-Version: 1.0 References: <20230419225604.21204-1-dianders@chromium.org> In-Reply-To: From: Sumit Garg Date: Tue, 16 May 2023 15:39:21 +0530 Message-ID: Subject: Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI To: Doug Anderson Cc: Mark Rutland , Catalin Marinas , Will Deacon , Daniel Thompson , Marc Zyngier , ito-yuichi@fujitsu.com, kgdb-bugreport@lists.sourceforge.net, Chen-Yu Tsai , Masayoshi Mizuma , Peter Zijlstra , Ard Biesheuvel , "Rafael J . Wysocki" , linux-arm-kernel@lists.infradead.org, Stephen Boyd , Lecopzer Chen , Thomas Gleixner , linux-perf-users@vger.kernel.org, Alexandru Elisei , Andrey Konovalov , Ben Dooks , Borislav Petkov , Christophe Leroy , "Darrick J. Wong" , Dave Hansen , "David S. Miller" , "Eric W. Biederman" , Frederic Weisbecker , Gaosheng Cui , "Gautham R. Shenoy" , Greg Kroah-Hartman , "Guilherme G. Piccoli" , Guo Ren , "H. Peter Anvin" , Huacai Chen , Ingo Molnar , Ingo Molnar , "Jason A. Donenfeld" , Jason Wessel , Jianmin Lv , Jiaxun Yang , Jinyang He , Joey Gouly , Kees Cook , Laurent Dufour , Masahiro Yamada , Masayoshi Mizuma , Michael Ellerman , Nicholas Piggin , "Paul E. McKenney" , =?UTF-8?Q?Philippe_Mathieu=2DDaud=C3=A9?= , Pierre Gondois , Qing Zhang , "Russell King (Oracle)" , Russell King , Thomas Bogendoerfer , Ulf Hansson , WANG Xuerui , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, sparclinux@vger.kernel.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-perf-users@vger.kernel.org On Wed, 10 May 2023 at 22:20, Doug Anderson wrote: > > Hi, > > On Wed, May 10, 2023 at 9:30=E2=80=AFAM Mark Rutland wrote: > > > > On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote: > > > Hi, > > > > Hi Doug, > > > > > On Wed, Apr 19, 2023 at 3:57=E2=80=AFPM Douglas Anderson wrote: > > > > This is an attempt to resurrect Sumit's old patch series [1] that > > > > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs an= d > > > > also to round up CPUs in kdb/kgdb. The last post from Sumit that I > > > > could find was v7, so I called this series v8. I haven't copied all= of > > > > his old changelongs here, but you can find them from the link. > > > > Thanks Doug for picking up this work and for all your additions/improvement= s. > > > > Since v7, I have: > > > > * Addressed the small amount of feedback that was there for v7. > > > > * Rebased. > > > > * Added a new patch that prevents us from spamming the logs with id= le > > > > tasks. > > > > * Added an extra patch to gracefully fall back to regular IPIs if > > > > pseudo-NMIs aren't there. > > > > > > > > Since there appear to be a few different patches series related to > > > > being able to use NMIs to get stack traces of crashed systems, let = me > > > > try to organize them to the best of my understanding: > > > > > > > > a) This series. On its own, a) will (among other things) enable sta= ck > > > > traces of all running processes with the soft lockup detector if > > > > you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace".= On > > > > its own, a) doesn't give a hard lockup detector. > > > > > > > > b) A different recently-posted series [2] that adds a hard lockup > > > > detector based on perf. On its own, b) gives a stack crawl of th= e > > > > locked up CPU but no stack crawls of other CPUs (even if they're > > > > locked too). Together with a) + b) we get everything (full locku= p > > > > detect, full ability to get stack crawls). > > > > > > > > c) The old Android "buddy" hard lockup detector [3] that I'm > > > > considering trying to upstream. If b) lands then I believe c) wo= uld > > > > be redundant (at least for arm64). c) on its own is really only > > > > useful on arm64 for platforms that can print CPU_DBGPCSR somehow > > > > (see [4]). a) + c) is roughly as good as a) + b). > > > > > It's been 3 weeks and I haven't heard a peep on this series. That > > > means nobody has any objections and it's all good to land, right? > > > Right? :-P For me it was months waiting without any feedback. So I think you are lucky :) or atleast better than me at poking arm64 maintainers. > > > > FWIW, there are still longstanding soundness issues in the arm64 pseudo= -NMI > > support (and fixing that requires an overhaul of our DAIF / IRQ flag > > management, which I've been chipping away at for a number of releases),= so I > > hadn't looked at this in detail yet because the foundations are still s= omewhat > > dodgy. > > > > I appreciate that this has been around for a while, and it's on my queu= e to > > look at. > > Ah, thanks for the heads up! We've been thinking about turning this on > in production in ChromeOS because it will help us track down a whole > class of field-generated crash reports that are otherwise opaque to > us. It sounds as if maybe that's not a good idea quite yet? Do you > have any idea of how much farther along this needs to go? ...of > course, we've also run into issues with Mediatek devices because they > don't save/restore GICR registers properly [1]. In theory, we might be > able to work around that in the kernel. > > In any case, even if there are bugs that would prevent turning this on > for production, it still seems like we could still land this series. > It simply wouldn't do anything until someone turned on pseudo NMIs, > which wouldn't happen till the kinks are worked out. I agree here. We should be able to make the foundations robust later on. IMHO, until we turn on features surrounding pseudo NMIs, I am not sure how we can have true confidence in the underlying robustness. -Sumit > > ...actually, I guess I should say that if all the patches of the > current series do land then it actually _would_ still do something, > even without pseudo-NMI. Assuming the last patch looks OK, it would at > least start falling back to using regular IPIs to do backtraces. That > wouldn't get backtraces on hard locked up CPUs but it would be better > than what we have today where we don't get any backtraces. This would > get arm64 on par with arm32... > > [1] https://issuetracker.google.com/281831288