From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DB5518A93F for ; Wed, 20 Aug 2025 06:23:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755670985; cv=none; b=mrVWRnnf4k8Ss7oB+lLjHWaqtWqhwvrnoMBCWkxz9EJ19v0wN0NWS++30o+newbePBpE83/+imd4XUzVnMJZoxAI73kEW1/IPjgkcs9QrLWT4jerPklnaQIgoTcQOEW9HyQLa7cG+VqJuWS0vVGhfvaoZK7K88Xle1Kn5mftVrE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755670985; c=relaxed/simple; bh=boOusmaYByKP5jxMW9KTEZu45SyWQ5YnzJ5l7DVR+DQ=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=p3csE6VU29Xc9+owVeFBaxs6RM9Z0yE+0ySNrGSihqbIXMUP/LSMN+AIP+YtJnvO5YwCVPVHFoj72OOxE1yR2kx+6l2BRl86wWKubujquCFA5X7bi8BbEw0agKvH2hjcN3CXpkvMPvTcojdmHBYK+4SKtkuhwMOQMoklr96kTng= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VBaQnzKc; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VBaQnzKc" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-76e2e8aff06so4791957b3a.1 for ; Tue, 19 Aug 2025 23:23:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755670983; x=1756275783; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Ow7o4zEBrDv+M/VPxOP9Kk65hdCSfAHOCcmPOMKRc1w=; b=VBaQnzKcWSfcAO/evZiYy9zVIEDSFw2B536lwz7gJUtIiLFrtPatMlPEgBRHpB/b88 wEsgQz0sm6fqARK/20DI2yB8z3l3WNe0AYpnBGKfhuYUkSXFZW7Ijk2xlutKsGtB/07b 16d338STNjq2a2iOIaUrBw3fYw7My8vMNq1ksC57gOvMVQyQr+ET5DsI5FG2j7QUs7OG dzs4XaWJVkUnM3TT0kBIcyCrl8sf1c3Kbf3K2kc+G81Gq0VE6ffzX564iZKB9CBm57Zp zqUJoOCHiSbB2auUQgbt2pUGtMAcW0FPYl7aua0fND2IcQdyBTmyb3ZO38g7RYjQnujt EJCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755670983; x=1756275783; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ow7o4zEBrDv+M/VPxOP9Kk65hdCSfAHOCcmPOMKRc1w=; b=pvly0bI/CeNC9y6SD7Vq4AQY1huruYiP+lYuRsjvZZ+hhex0PmxoPjucZAUBsLYScL tkNU6WomZKaUXn26n7/WHV7Ir7mI3pKvmxBXnhGNN6VE8tvSTGh+8w+3x8QXgEurwwB4 5U27KFlAQrBhFIDCgVALC6pQfLgkmdJ6vp/7pc4Ks0ouJkcR0c0nZN0y8jQF1peDWgXA tLKFNQh2xJwoi0fy62d9FoDFuUtGbMRI2SrrhvZ9Kr0bBzRmi6IBEqVlS+17rKOPHFFg R7LW7Ca/Yoi0gPEVIJB0gHgG5xrMpa+m2jdZxO/hbfcupR6aCwVlAJxIcDxp/WFBvmJf Z5xA== X-Forwarded-Encrypted: i=1; AJvYcCVb54G36OyvCgJRVKiDBThmm+6cYbxOD3IVh9eIbNXkdh7xBFfCZMx+GvAHxSX997JmVzh4onkokJnQBsY=@vger.kernel.org X-Gm-Message-State: AOJu0Yy61IwQzfYUSYdyVBx0CV2ur+qgeJq3YFkC6SmRM6c3xvDRe5S3 C+Pckump4te29Z/06ZaLjc0bU52IHBl+mawmKUioWusNiSWpYa7MN0xO X-Gm-Gg: ASbGncvlmZCxDHGAtC3bFMzjndUuIBRhzacstgZ72wRzGBQ7YvG3nzCyzenri34DVYs +LXCQSksIA5TUx7b5KdoyXZYQF4a4XGYpfVMNFs3kG2NO+cwK/Gg5PQxhYxsYlQ0jukn/Powi9O daKlSr2r2HcNTaJ8pr7C5dt4QiqaWunhwWZZgH3AA9WajuwO5WubOSmVOY9N8eVwjtybOuLhq7f BZDqwqgBez4Cy1XVnJpq4933fimJSzu2QVFpRs7sBZq4n27nCCHbosNYQaoI7R5IZHfu4UmuUCe Ipj6UMjW6YhmUoOWdRjMleks6UzlrzqaXc1VEEUI8kmpb9d6vCVIydoHzXo01ehtF3viu6CBkkw HQO5nxMnTlbhG86zXXtme X-Google-Smtp-Source: AGHT+IGa+NWabu0A5audCLbSTBwIgJVkA9VdSKrmVElbhihveAIkz6BPHtz+JeANK9Imn2c411Gcjg== X-Received: by 2002:a05:6a00:180c:b0:76b:6288:e2e7 with SMTP id d2e1a72fcca58-76e8dcef2c2mr2102089b3a.20.1755670983083; Tue, 19 Aug 2025 23:23:03 -0700 (PDT) Received: from [127.0.0.1] ([2403:2c80:6::3058]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-76e7d524c88sm4339590b3a.76.2025.08.19.23.22.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 19 Aug 2025 23:23:02 -0700 (PDT) Message-ID: <970366ee-0fc0-4a64-816e-3c3ac738e24a@gmail.com> Date: Wed, 20 Aug 2025 14:22:52 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] panic: call hardlockup_detector_perf_stop in panic To: Petr Mladek Cc: John Ogness , Thomas Gleixner , Joel Granados , Dave Jiang , Josh Poimboeuf , Sravan Kumar Gundu , Ryo Takakura , linux-kernel@vger.kernel.org, Wei Liu , Jason Gunthorpe References: <20250730030638.65355-1-wangjinchao600@gmail.com> Content-Language: en-US From: Jinchao Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 8/19/25 23:01, Petr Mladek wrote: > On Wed 2025-07-30 11:06:33, Wang Jinchao wrote: >> When a panic happens, it blocks the cpu, which may >> trigger the hardlockup detector if some dump is slow. >> So call hardlockup_detector_perf_stop() to disable >> hardlockup dector. > > Could you please provide more details, especially the log showing > the problem? Here's what happened: I configured the kernel to use efi-pstore for kdump logging while enabling the perf hard lockup detector (NMI). Perhaps the efi-pstore was slow and there were too many logs. When the first panic was triggered, the pstore dump callback in kmsg_dump()->dumper->dump() took a long time, which triggered the NMI watchdog. Then emergency_restart() triggered the machine restart before the efi-pstore operation finished. The function call flow looked like this: ```c real panic() { kmsg_dump() { ... pstore_dump() { start_dump(); ... // long time operation triggers NMI watchdog nmi panic() { ... emergency_restart(); //pstore unfinished } ... finish_dump(); // never reached } } } ``` This created a nested panic situation where the second panic interrupted the crash dump process, causing the loss of the original panic information. > > I wonder if this is similar to > https://lore.kernel.org/all/SN6PR02MB4157A4C5E8CB219A75263A17D46DA@SN6PR02MB4157.namprd02.prod.outlook.com/ > > There was a problem that a non-panic CPU might get stuck in > pl011_console_write_thread() or any other con->write_thread() > callback because nbcon_reacquire_nobuf(wctxt) ended in an infinite > loop. > > It was a real lockup. It has got recently fixed in 6.17-rc1 by > the commit 571c1ea91a73db56bd94 ("printk: nbcon: Allow reacquire > during panic"), see > https://patch.msgid.link/20250606185549.900611-1-john.ogness@linutronix.de > It is possible that it fixed your problem as well. > > That said, it might make sense to disable the hardlockup > detector during panic. But I do not like the proposed way, > see below. > >> --- a/kernel/panic.c >> +++ b/kernel/panic.c >> @@ -339,6 +339,7 @@ void panic(const char *fmt, ...) >> */ >> local_irq_disable(); >> preempt_disable_notrace(); >> + hardlockup_detector_perf_stop(); > > I see the following in kernel/watchdog_perf.c: > > /** > * hardlockup_detector_perf_stop - Globally stop watchdog events > * > * Special interface for x86 to handle the perf HT bug. > */ > void __init hardlockup_detector_perf_stop(void) > { > [...] > lockdep_assert_cpus_held(); > [...] > } > > 1. It is suspicious to see an x86-specific "hacky" function called in > the generic panic(). > > Is this safe? > What about other hardlockup detectors? > > > 2. I expect that lockdep_assert_cpus_held() would complain > when CONFIG_LOCKDEP was enabled. > > > Anyway, it does not look safe. panic() might be called in any context, > including NMI, and I see: > > + hardlockup_detector_perf_stop() > + perf_event_disable() > + perf_event_ctx_lock() > + mutex_lock_nested() > > This might cause deadlock when called in NMI, definitely. > > Alternative: > > A conservative approach would be to update watchdog_hardlockup_check() > so that it does nothing when panic_in_progress() returns true. It > would even work for both hardlockup detectors implementation. Yes, I think it is a better solution. I didn't find panic_in_progress() but found hardlockup_detector_perf_stop() available instead :) I will send another patch. > > Best Regards, > Petr -- Best regards, Jinchao