From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Feng Tang <feng.tang@linux.alibaba.com>,
rafael@kernel.org, Len Brown <lenb@kernel.org>,
James Morse <james.morse@arm.com>,
Tony Luck <tony.luck@intel.com>,
linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
Ira Weiny <ira.weiny@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH] acpi/ghes: Make ghes_panic_timeout adjustable as a parameter
Date: Thu, 02 Jan 2025 10:46:54 +0800 [thread overview]
Message-ID: <87ikqydja9.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <20241231111314.GDZ3PRyq_tiU002p5d@fat_crate.local> (Borislav Petkov's message of "Tue, 31 Dec 2024 12:13:14 +0100")
Borislav Petkov <bp@alien8.de> writes:
> On Tue, Dec 31, 2024 at 06:15:59PM +0800, Feng Tang wrote:
>> Thanks for the hint! IIUC, you are mentioning the set_arch_panic_timeout().
>> One thing is, most ARCHs' default timeout is 0, while in our case, the user
>> will also set 'panic=0' :), so we can't easily detect if the 0 is the user-set
>> value or the OS default one. Originally I even thought about adding a flag
>> of 'timeout_user_changed'. Any suggestion?
>
> Ok, enough talking. Let's get concrete:
>
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Tue, 31 Dec 2024 12:03:55 +0100
> Subject: [PATCH] APEI: GHES: Have GHES honor the panic= setting
>
> The GHES driver overrides the panic= setting by rebooting the system
> after a fatal hw error has been reported. The intent being that such an
> error would be hopefully written out faster on non-volatile storage for
> later inspection.
IIUC, the hardware error will be written out on non-volatile storage at
the same time with or without ghes_panic_timeout overriding. The
difference is that after rebooting, the error information in
non-volatile storage can be extracted and reported via UI, SNMP, or
MCTP earlier.
> However, this is not optimal when a hard-to-debug issue requires long
> time to reproduce and when that happens, the box will get rebooted after
> 30 seconds and thus destroy the whole hw context of when the error
> happened.
>
> So rip out the default GHES panic timeout and honor the global one.
>
> In the panic disabled (panic=0) case, the error will still be logged to
> dmesg for later inspection and if panic after a hw error is really
> required, then that can be controlled the usual way - use panic= on the
> cmdline or set it in the kernel .config's CONFIG_PANIC_TIMEOUT.
>
> Reported-by: Feng Tang <feng.tang@linux.alibaba.com>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> ---
> drivers/acpi/apei/ghes.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 07789f0b59bc..b72772494655 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -173,8 +173,6 @@ static struct gen_pool *ghes_estatus_pool;
> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> static atomic_t ghes_estatus_cache_alloced;
>
> -static int ghes_panic_timeout __read_mostly = 30;
> -
> static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
> {
> phys_addr_t paddr;
> @@ -983,14 +981,16 @@ static void __ghes_panic(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus,
> u64 buf_paddr, enum fixed_addresses fixmap_idx)
> {
> + const char *msg = GHES_PFX "Fatal hardware error";
> +
> __ghes_print_estatus(KERN_EMERG, ghes->generic, estatus);
>
> ghes_clear_estatus(ghes, estatus, buf_paddr, fixmap_idx);
>
> - /* reboot to log the error! */
> if (!panic_timeout)
> - panic_timeout = ghes_panic_timeout;
> - panic("Fatal hardware error!");
> + pr_emerg("%s but panic disabled\n", msg);
> +
> + panic(msg);
> }
>
> static int ghes_proc(struct ghes *ghes)
---
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2025-01-02 2:52 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-27 9:54 [PATCH] acpi/ghes: Make ghes_panic_timeout adjustable as a parameter Feng Tang
2024-12-27 10:09 ` Borislav Petkov
2024-12-30 5:54 ` Huang, Ying
2024-12-30 10:16 ` Borislav Petkov
2024-12-30 11:04 ` Huang, Ying
2024-12-30 11:26 ` Borislav Petkov
2024-12-30 11:40 ` Feng Tang
2024-12-30 12:10 ` Borislav Petkov
2024-12-30 13:04 ` Feng Tang
2024-12-30 13:24 ` Borislav Petkov
2024-12-31 6:44 ` Feng Tang
2024-12-31 9:23 ` Borislav Petkov
2024-12-31 10:15 ` Feng Tang
2024-12-31 11:13 ` Borislav Petkov
2024-12-31 12:02 ` Feng Tang
2025-01-02 2:46 ` Huang, Ying [this message]
2025-01-02 8:35 ` Borislav Petkov
2025-01-13 12:52 ` [PATCH] APEI: GHES: Have GHES honor the panic= setting Borislav Petkov
2025-01-13 20:08 ` Ira Weiny
2025-01-14 17:29 ` Rafael J. Wysocki
2025-01-09 13:47 ` [PATCH] acpi/ghes: Make ghes_panic_timeout adjustable as a parameter Feng Tang
2025-01-13 12:50 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ikqydja9.fsf@DESKTOP-5N7EMDA \
--to=ying.huang@linux.alibaba.com \
--cc=ak@linux.intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=feng.tang@linux.alibaba.com \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.