From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8843EE95A82 for ; Sun, 8 Oct 2023 10:16:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344616AbjJHKNa (ORCPT ); Sun, 8 Oct 2023 06:13:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230412AbjJHKN2 (ORCPT ); Sun, 8 Oct 2023 06:13:28 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 335A2B9 for ; Sun, 8 Oct 2023 03:13:27 -0700 (PDT) From: John Ogness DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1696760005; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IyfYsQzaOeLTXQBHv3VBHVIY5Xsq/KdLp69dFVT7Uxc=; b=B9yMbKj+Ld4oAx8iNBwUTVe9fhZZ9OV/bhBArBYfFcz1cUUc96jT017mfeXCEMYiXEitg7 r/DW8iwe5KpB1VIS7rGeel0TO014yF/4qFtsO3RokUtFKG5K0uRcOC6PFv/4lEz9CvrOEl AZAeDUu8U2nn0MZEn9oYldZE5VeUPYW/m/j9soZiCT9ToNc3BZ7+RCnw+FZAX1RqjVdM58 WDBejKIYbmwFdYSFIZkztA3w4wMDTXyx8MrkUDzAaBlGXm5UmOvr3TfLHOx7IaGqErPjEH lnKIpwYtehNZuTpmLVq0+is/jRqfok+ziHez4H8wFp+rTk8cZNqXeKU/D+EJMA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1696760005; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IyfYsQzaOeLTXQBHv3VBHVIY5Xsq/KdLp69dFVT7Uxc=; b=6Y5TkTky3+TMm/7Vo2O4V7hJ4YfP8LJCRen2M97wkx+dJN9CrkX7j3we+MNj0CuwINs2WE Fa03uwJLm/k3Y6DQ== To: Petr Mladek , Linus Torvalds Cc: Sergey Senozhatsky , Steven Rostedt , Thomas Gleixner , linux-kernel@vger.kernel.org, Greg Kroah-Hartman Subject: Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections In-Reply-To: References: <87h6n5teos.fsf@jogness.linutronix.de> Date: Sun, 08 Oct 2023 12:19:21 +0206 Message-ID: <87il7hv2v2.fsf@jogness.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Petr, On 2023-10-06, Petr Mladek wrote: >> During the demo at LPC2022 we had the situation that there was a large >> backlog when a WARN was hit. With current mainline the first line of the >> WARN is put into the ringbuffer and then the entire backlog is flushed >> before storing the rest of the WARN into the ringbuffer. At the time it >> was obvious that we should finish storing the WARN message and then >> start flushing the backlog. > > This talks about the "emergency" context (WARN/OOPS/watchdog). > The system might be in big troubles but it would still try to continue. > > Do we really want to defer the flush also for panic() context? We can start flushing right after the backtrace is in the ringbuffer. But flushing the backlog _before_ putting the backtrace into the ringbuffer was not desired because if there is a large backlog, the machine may not survive to get the backtrace out. And in that case it won't even be in the ringbuffer to be used by other debugging tools. > I ask because I was not on LPC 2022 in person and I do not remember > all details. The LPC2022 demo/talk was recorded: https://www.youtube.com/watch?v=TVhNcKQvzxI At 55:55 is where the situation occurred and triggered the conversation, ultimately leading to this new feature. You may also want to reread my summary: https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de as well as Linus' follow-up message: https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com > But it is tricky in panic(), see 8th patch at > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@linutronix.de > > + nbcon_atomic_exit() is called only in one code path. Correct. When panic() is complete and the machine goes into its infinite loop. This is also the point where it will attempt an unsafe flush, if it could not get the messages out yet. > + nbcon_atomic_flush_all() is used in other paths. It looks like > a "Whack a mole" game to me. Several different outputs occur during panic(). The flush is everywhere where something significant has been put into the ringbuffer and now it would be OK to flush it. > + messages are never emitted by printk kthread either because > CPUs are stopped or the kthread is not allowed to get the lock Correct. > I see only one positive of the explicit flush. The consoles would > not delay crash_exec() and the crash dump might be closer to > the point where panic() was called. It's only about getting the critical messages into the ringbuffer before flushing. And since various things can go wrong during the many actions within panic(), it makes sense to flush in between those actions. > Otherwise I see only negatives => IMHO, we want to flush atomic > consoles synchronously from printk() in panic(). > > Does anyone really want explicit flushes in panic()? So far you are the only one speaking against it. I expect as time goes on it will get even more complex as it becomes tunable (also something we talked about during the demo). John