From: Nikolay Borisov <nik.borisov@suse.com>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: x86@kernel.org, Tony Luck <tony.luck@intel.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org,
Smita.KoralahalliChannabasappa@amd.com,
Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
linux-acpi@vger.kernel.org
Subject: Re: [PATCH v6 15/15] x86/mce: Save and use APEI corrected threshold limit
Date: Fri, 19 Sep 2025 13:42:57 +0300 [thread overview]
Message-ID: <6e2d5351-dbd2-4e28-8bdb-b961fade5ebc@suse.com> (raw)
In-Reply-To: <20250915173332.GA869676@yaz-khff2.amd.com>
On 9/15/25 20:33, Yazen Ghannam wrote:
> On Thu, Sep 11, 2025 at 08:01:17PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 9/8/25 18:40, Yazen Ghannam wrote:
>>> The MCA threshold limit generally is not something that needs to change
>>> during runtime. It is common for a system administrator to decide on a
>>> policy for their managed systems.
>>>
>>> If MCA thresholding is OS-managed, then the threshold limit must be set
>>> at every boot. However, many systems allow the user to set a value in
>>> their BIOS. And this is reported through an APEI HEST entry even if
>>> thresholding is not in FW-First mode.
>>>
>>> Use this value, if available, to set the OS-managed threshold limit.
>>> Users can still override it through sysfs if desired for testing or
>>> debug.
>>>
>>> APEI is parsed after MCE is initialized. So reset the thresholding
>>> blocks later to pick up the threshold limit.
>>>
>>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>>> ---
>>>
>>> Notes:
>>> Link:
>>> https://lore.kernel.org/r/20250825-wip-mca-updates-v5-20-865768a2eef8@amd.com
>>> v5->v6:
>>> * No change.
>>> v4->v5:
>>> * No change.
>>> v3->v4:
>>> * New in v4.
>>>
>>> arch/x86/include/asm/mce.h | 6 ++++++
>>> arch/x86/kernel/acpi/apei.c | 2 ++
>>> arch/x86/kernel/cpu/mce/amd.c | 18 ++++++++++++++++--
>>> arch/x86/kernel/cpu/mce/internal.h | 2 ++
>>> arch/x86/kernel/cpu/mce/threshold.c | 13 +++++++++++++
>>> 5 files changed, 39 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>>> index 7d6588195d56..1cfbfff0be3f 100644
>>> --- a/arch/x86/include/asm/mce.h
>>> +++ b/arch/x86/include/asm/mce.h
>>> @@ -308,6 +308,12 @@ DECLARE_PER_CPU(struct mce, injectm);
>>> /* Disable CMCI/polling for MCA bank claimed by firmware */
>>> extern void mce_disable_bank(int bank);
>>> +#ifdef CONFIG_X86_MCE_THRESHOLD
>>> +void mce_save_apei_thr_limit(u32 thr_limit);
>>> +#else
>>> +static inline void mce_save_apei_thr_limit(u32 thr_limit) { }
>>> +#endif /* CONFIG_X86_MCE_THRESHOLD */
>>> +
>>> /*
>>> * Exception handler
>>> */
>>> diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
>>> index 0916f00a992e..e21419e686eb 100644
>>> --- a/arch/x86/kernel/acpi/apei.c
>>> +++ b/arch/x86/kernel/acpi/apei.c
>>> @@ -19,6 +19,8 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
>>> if (!cmc->enabled)
>>> return 0;
>>> + mce_save_apei_thr_limit(cmc->notify.error_threshold_value);
>>> +
>>> /*
>>> * We expect HEST to provide a list of MC banks that report errors
>>> * in firmware first mode. Otherwise, return non-zero value to
>>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>>> index b895559e80ad..9b746080351f 100644
>>> --- a/arch/x86/kernel/cpu/mce/amd.c
>>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>>> @@ -489,6 +489,18 @@ static void threshold_restart_bank(unsigned int bank, bool intr_en)
>>> }
>>> }
>>> +/* Try to use the threshold limit reported through APEI. */
>>> +static u16 get_thr_limit(void)
>>> +{
>>> + u32 thr_limit = mce_get_apei_thr_limit();
>>> +
>>> + /* Fallback to old default if APEI limit is not available. */
>>> + if (!thr_limit)
>>> + return THRESHOLD_MAX;
>>> +
>>> + return min(thr_limit, THRESHOLD_MAX);
>>> +}
>>> +
>>> static void mce_threshold_block_init(struct threshold_block *b, int offset)
>>> {
>>> struct thresh_restart tr = {
>>> @@ -497,7 +509,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
>>> .lvt_off = offset,
>>> };
>>> - b->threshold_limit = THRESHOLD_MAX;
>>> + b->threshold_limit = get_thr_limit();
>>> threshold_restart_block(&tr);
>>> };
>>> @@ -1071,7 +1083,7 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
>>> b->address = address;
>>> b->interrupt_enable = 0;
>>> b->interrupt_capable = lvt_interrupt_supported(bank, high);
>>> - b->threshold_limit = THRESHOLD_MAX;
>>> + b->threshold_limit = get_thr_limit();
>>> if (b->interrupt_capable) {
>>> default_attrs[2] = &interrupt_enable.attr;
>>> @@ -1082,6 +1094,8 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
>>> list_add(&b->miscj, &tb->miscj);
>>> + mce_threshold_block_init(b, (high & MASK_LVTOFF_HI) >> 20);
>>
>> Why is this necessary? Shouldn't this patch consist of mainly
>> s/THRESHOLD_MAX/get_thr_limit();
>>
>>
>> In allocate_threshold_block have already properly set threshold_limit. So
>> this change really ensures threshold_restart_block is being called for the
>> given block being initialized. Ignoring the changed threshold limit logic,
>> why is this extra call necessary now and wasn't before?
>>
>
> It is necessary to apply the threshold limit to the hardware register.
> The MCA thresholding registers are accessed in two passes: first during
> per-CPU init, and second when the MCE subsystem devices are created.
>
> The hardware registers are updated in the first pass, and they are left
> as-is in the second pass assuming no configuration has changed. That's
> why there isn't a "reset" in the second pass.
>
> The APEI tables are parsed between the first and second passes. So now
> we need to update the registers during the second pass to apply the
> value found from HEST.
So APEI is initialized as part of the subsys_initcall which is processed
via:
start_kernel
rest_init
kernel_init
kernel_init_freeable
do_basic_setup
do_initcalls
And the first mce_threshold_block_init() happens from :
start_kernel
arch_cpu_finalize_init <---- way before rest_init()
identify_boot_cpu
identify_cpu
mcheck_cpu_init
mcheck_cpu_init_vendor
mce_amd_feature_init
prepare_threshold_block
mce_threshold_block_init
Finally the per-cpu hotplug callback is installed via:
mcheck_init_device <- initiated from a device_initcall, happens after
APEI subsys init.
mce_cpu_online - called on every cpu from the HP machinery
mce_threshold_create_device
threshold_create_bank
allocate_threshold_blocks
mce_threshold_block_init - newly added call in alloc block, used to
update the register with the new limit
Given that mce_cpu_online is already called on every online cpu at the
time of installation of the callback, and every subsequent cpu that will
come online I can't help but wonder why do we have to do the mce
initialization from start_kernel, can't we move the mcheck_cpu_init call
into mce_cpu_online, or have the latter subsume the former?
Sorry if I'm being too nitpicky, I just want to have proper
understanding of the subsystem and the various (implicit) requirements
it has.
At the very least I believe this commit message should at least allude
to the fact that mce threshold devices have a strict requirement to be
created after the APEI subsys has been created.
>
> Thanks,
> Yazen
next prev parent reply other threads:[~2025-09-19 10:43 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 15:40 [PATCH v6 00/15] AMD MCA interrupts rework Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 01/15] x86/mce: Set CR4.MCE last during init Yazen Ghannam
2025-09-10 11:43 ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 02/15] x86/mce: Define BSP-only init Yazen Ghannam
2025-09-10 11:47 ` Nikolay Borisov
2025-09-10 13:53 ` Yazen Ghannam
2025-09-10 14:28 ` Nikolay Borisov
2025-09-10 17:23 ` Luck, Tony
2025-09-08 15:40 ` [PATCH v6 03/15] x86/mce: Define BSP-only SMCA init Yazen Ghannam
2025-09-10 11:48 ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 04/15] x86/mce: Do 'UNKNOWN' vendor check early Yazen Ghannam
2025-09-10 13:27 ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 05/15] x86/mce: Separate global and per-CPU quirks Yazen Ghannam
2025-09-10 13:29 ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 06/15] x86/mce: Move machine_check_poll() status checks to helper functions Yazen Ghannam
2025-09-10 15:09 ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 07/15] x86/mce: Add clear_bank() helper Yazen Ghannam
2025-09-10 15:22 ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 08/15] x86/mce: Unify AMD THR handler with MCA Polling Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 09/15] x86/mce: Unify AMD DFR " Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 10/15] x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems Yazen Ghannam
2025-09-11 10:22 ` Nikolay Borisov
2025-09-11 15:53 ` Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 11/15] x86/mce/amd: Support SMCA Corrected Error Interrupt Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 12/15] x86/mce/amd: Remove redundant reset_block() Yazen Ghannam
2025-09-11 14:42 ` Nikolay Borisov
2025-09-11 16:11 ` Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 13/15] x86/mce/amd: Define threshold restart function for banks Yazen Ghannam
2025-09-11 14:49 ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 14/15] x86/mce: Handle AMD threshold interrupt storms Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 15/15] x86/mce: Save and use APEI corrected threshold limit Yazen Ghannam
2025-09-11 17:01 ` Nikolay Borisov
2025-09-15 17:33 ` Yazen Ghannam
2025-09-19 10:42 ` Nikolay Borisov [this message]
2025-09-22 13:58 ` Yazen Ghannam
2025-09-08 16:10 ` [PATCH v6 00/15] AMD MCA interrupts rework Luck, Tony
2025-09-09 13:36 ` Yazen Ghannam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e2d5351-dbd2-4e28-8bdb-b961fade5ebc@suse.com \
--to=nik.borisov@suse.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=qiuxu.zhuo@intel.com \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).