From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6D421C77B7A for ; Tue, 16 May 2023 23:47:23 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QLXw92Zrqz3cg0 for ; Wed, 17 May 2023 09:47:21 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=gK8/96Mw; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linaro.org (client-ip=2607:f8b0:4864:20::a2b; helo=mail-vk1-xa2b.google.com; envelope-from=sumit.garg@linaro.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=gK8/96Mw; dkim-atps=neutral Received: from mail-vk1-xa2b.google.com (mail-vk1-xa2b.google.com [IPv6:2607:f8b0:4864:20::a2b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QLBmf6Bvnz3c7K for ; Tue, 16 May 2023 20:09:36 +1000 (AEST) Received: by mail-vk1-xa2b.google.com with SMTP id 71dfb90a1353d-4501ca552a3so7412232e0c.2 for ; Tue, 16 May 2023 03:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684231773; x=1686823773; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/lEOYeeU5eTLm7de7UnRvbY+9wcsAqP7XOcqCDjWADA=; b=gK8/96Mwc9pJtoGDw27JnaeCATyvSkorVev6aES+s3evOWGv2eZ4S4ZQmr3dLrS1ji P75ZaYIR630JDnG938lXkgTpAzoCDm25UQTJicgxSxrC3C5xMLuqUAPnjpjFrMhTltvf 5nOShkedMWMDsdiQC31T8R7CGcauuP/Smnv++IlvHsPbYsldV+8gK/muWzPtBUnpiWnm Edf/xMXcnyiN3A+Aoitl4bcEl0LNRK/bxRLNRAcSY4buq6wTKN2GYV7kQg0AX0DercuO 4x6z9zmJyS5hNtZA48bVcCbR5vHvKIoMRTebZWJTjQO8cX5BYJGXGALHEndzBJd2BeEM QaIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684231773; x=1686823773; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/lEOYeeU5eTLm7de7UnRvbY+9wcsAqP7XOcqCDjWADA=; b=ZnMROzp32xGtCmoZPHjvkprhOCAsJzHA8K7AwvN2erCPl30TQJENZozmIZFzbB53WM o5n7elUFTC3xxHSpCTjDCNhHf7vQeYKIg4A8HdJkSjyJPeh7ZSMaEL9C4DWTEBASirZj XViK4MLSetquBc42FVVQd4yt7RBCo2nZOLC2lMOYnBU14hRH9GKLmDFSg+gl5wwMTwAd 73h5COhc00NxY8LEP8eXsYFqnV936aAP1nvnnQBRwJqOd8MAhAaLT9IeT2xe+/JHahqx ZQqhQ4aZzzNJKw13U9yuz/LHgDwqZQD9+qQJlwURHs9xqBHESKfajdcQlw9wr2OUyTN/ I2wQ== X-Gm-Message-State: AC+VfDxLaQ85ZjlGbwFdzYPASTDqi/VL+w0AxPWifpC103PzHXFTM6pN vv8SLOmcwBAcL6ArbO7BNvvU6WdqxQzYbmESCQv44g== X-Google-Smtp-Source: ACHHUZ6TLbLYd474B27i7fOyl3BsnPpQ4fbmukWnlRJS2Ui51PNk1Fw9UBPkdN9uxCCarlSOI3HX0KcO/kJKITuV9Sk= X-Received: by 2002:a1f:6dc6:0:b0:44f:eb0a:77db with SMTP id i189-20020a1f6dc6000000b0044feb0a77dbmr13395898vkc.11.1684231772863; Tue, 16 May 2023 03:09:32 -0700 (PDT) MIME-Version: 1.0 References: <20230419225604.21204-1-dianders@chromium.org> In-Reply-To: From: Sumit Garg Date: Tue, 16 May 2023 15:39:21 +0530 Message-ID: Subject: Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI To: Doug Anderson Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Wed, 17 May 2023 09:46:34 +1000 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Masayoshi Mizuma , "Jason A. Donenfeld" , Huacai Chen , Jinyang He , Lecopzer Chen , Peter Zijlstra , Catalin Marinas , Dave Hansen , Qing Zhang , Jiaxun Yang , linux-kernel@vger.kernel.org, Guo Ren , "H. Peter Anvin" , sparclinux@vger.kernel.org, WANG Xuerui , Will Deacon , Ard Biesheuvel , Daniel Thompson , Marc Zyngier , Masahiro Yamada , "Rafael J . Wysocki" , Russell King , Gaosheng Cui , Chen-Yu Tsai , Ingo Molnar , "Darrick J. Wong" , kgdb-bugreport@lists.sourceforge.net, Ulf Hansson , Masayoshi Mizuma , Ingo Molnar , Pierre Gondois , Kees Cook , "Paul E. McKenney" , Frederic Weisbecker , Nicholas Piggin , Stephen Boyd , loongarch@lists.linux.dev, "Russell King \(Oracle\)" , Jianmin Lv , Borislav Petkov , Joey Gouly , Ben Dooks , Thomas Gleixner , Laurent Dufour , Alexandru Elisei , linux-arm-kernel@lists.infradead.org, Andrey Konovalov , x86@kernel.org, Thomas Bogendoerfer , linux-mips@vger.kernel.org, Greg Kroah-Hartman , ito-yuichi@fujitsu.com, =?UTF-8?Q?Philippe_ Mathieu=2DDaud=C3=A9?= , linux-perf-users@vger.kernel.org, "Guilherme G. Piccoli" , "Eric W. Biederman" , "Gautham R. Shenoy" , Jason Wessel , linuxppc-dev@lists.ozlabs.org, "David S. Miller" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, 10 May 2023 at 22:20, Doug Anderson wrote: > > Hi, > > On Wed, May 10, 2023 at 9:30=E2=80=AFAM Mark Rutland wrote: > > > > On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote: > > > Hi, > > > > Hi Doug, > > > > > On Wed, Apr 19, 2023 at 3:57=E2=80=AFPM Douglas Anderson wrote: > > > > This is an attempt to resurrect Sumit's old patch series [1] that > > > > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs an= d > > > > also to round up CPUs in kdb/kgdb. The last post from Sumit that I > > > > could find was v7, so I called this series v8. I haven't copied all= of > > > > his old changelongs here, but you can find them from the link. > > > > Thanks Doug for picking up this work and for all your additions/improvement= s. > > > > Since v7, I have: > > > > * Addressed the small amount of feedback that was there for v7. > > > > * Rebased. > > > > * Added a new patch that prevents us from spamming the logs with id= le > > > > tasks. > > > > * Added an extra patch to gracefully fall back to regular IPIs if > > > > pseudo-NMIs aren't there. > > > > > > > > Since there appear to be a few different patches series related to > > > > being able to use NMIs to get stack traces of crashed systems, let = me > > > > try to organize them to the best of my understanding: > > > > > > > > a) This series. On its own, a) will (among other things) enable sta= ck > > > > traces of all running processes with the soft lockup detector if > > > > you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace".= On > > > > its own, a) doesn't give a hard lockup detector. > > > > > > > > b) A different recently-posted series [2] that adds a hard lockup > > > > detector based on perf. On its own, b) gives a stack crawl of th= e > > > > locked up CPU but no stack crawls of other CPUs (even if they're > > > > locked too). Together with a) + b) we get everything (full locku= p > > > > detect, full ability to get stack crawls). > > > > > > > > c) The old Android "buddy" hard lockup detector [3] that I'm > > > > considering trying to upstream. If b) lands then I believe c) wo= uld > > > > be redundant (at least for arm64). c) on its own is really only > > > > useful on arm64 for platforms that can print CPU_DBGPCSR somehow > > > > (see [4]). a) + c) is roughly as good as a) + b). > > > > > It's been 3 weeks and I haven't heard a peep on this series. That > > > means nobody has any objections and it's all good to land, right? > > > Right? :-P For me it was months waiting without any feedback. So I think you are lucky :) or atleast better than me at poking arm64 maintainers. > > > > FWIW, there are still longstanding soundness issues in the arm64 pseudo= -NMI > > support (and fixing that requires an overhaul of our DAIF / IRQ flag > > management, which I've been chipping away at for a number of releases),= so I > > hadn't looked at this in detail yet because the foundations are still s= omewhat > > dodgy. > > > > I appreciate that this has been around for a while, and it's on my queu= e to > > look at. > > Ah, thanks for the heads up! We've been thinking about turning this on > in production in ChromeOS because it will help us track down a whole > class of field-generated crash reports that are otherwise opaque to > us. It sounds as if maybe that's not a good idea quite yet? Do you > have any idea of how much farther along this needs to go? ...of > course, we've also run into issues with Mediatek devices because they > don't save/restore GICR registers properly [1]. In theory, we might be > able to work around that in the kernel. > > In any case, even if there are bugs that would prevent turning this on > for production, it still seems like we could still land this series. > It simply wouldn't do anything until someone turned on pseudo NMIs, > which wouldn't happen till the kinks are worked out. I agree here. We should be able to make the foundations robust later on. IMHO, until we turn on features surrounding pseudo NMIs, I am not sure how we can have true confidence in the underlying robustness. -Sumit > > ...actually, I guess I should say that if all the patches of the > current series do land then it actually _would_ still do something, > even without pseudo-NMI. Assuming the last patch looks OK, it would at > least start falling back to using regular IPIs to do backtraces. That > wouldn't get backtraces on hard locked up CPUs but it would be better > than what we have today where we don't get any backtraces. This would > get arm64 on par with arm32... > > [1] https://issuetracker.google.com/281831288