Re: [PATCH v6 15/15] x86/mce: Save and use APEI corrected threshold limit

linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nikolay Borisov <nik.borisov@suse.com>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: x86@kernel.org, Tony Luck <tony.luck@intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org,
	Smita.KoralahalliChannabasappa@amd.com,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	linux-acpi@vger.kernel.org
Subject: Re: [PATCH v6 15/15] x86/mce: Save and use APEI corrected threshold limit
Date: Fri, 19 Sep 2025 13:42:57 +0300	[thread overview]
Message-ID: <6e2d5351-dbd2-4e28-8bdb-b961fade5ebc@suse.com> (raw)
In-Reply-To: <20250915173332.GA869676@yaz-khff2.amd.com>



On 9/15/25 20:33, Yazen Ghannam wrote:
> On Thu, Sep 11, 2025 at 08:01:17PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 9/8/25 18:40, Yazen Ghannam wrote:
>>> The MCA threshold limit generally is not something that needs to change
>>> during runtime. It is common for a system administrator to decide on a
>>> policy for their managed systems.
>>>
>>> If MCA thresholding is OS-managed, then the threshold limit must be set
>>> at every boot. However, many systems allow the user to set a value in
>>> their BIOS. And this is reported through an APEI HEST entry even if
>>> thresholding is not in FW-First mode.
>>>
>>> Use this value, if available, to set the OS-managed threshold limit.
>>> Users can still override it through sysfs if desired for testing or
>>> debug.
>>>
>>> APEI is parsed after MCE is initialized. So reset the thresholding
>>> blocks later to pick up the threshold limit.
>>>
>>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>>> ---
>>>
>>> Notes:
>>>       Link:
>>>       https://lore.kernel.org/r/20250825-wip-mca-updates-v5-20-865768a2eef8@amd.com
>>>       v5->v6:
>>>       * No change.
>>>       v4->v5:
>>>       * No change.
>>>       v3->v4:
>>>       * New in v4.
>>>
>>>    arch/x86/include/asm/mce.h          |  6 ++++++
>>>    arch/x86/kernel/acpi/apei.c         |  2 ++
>>>    arch/x86/kernel/cpu/mce/amd.c       | 18 ++++++++++++++++--
>>>    arch/x86/kernel/cpu/mce/internal.h  |  2 ++
>>>    arch/x86/kernel/cpu/mce/threshold.c | 13 +++++++++++++
>>>    5 files changed, 39 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>>> index 7d6588195d56..1cfbfff0be3f 100644
>>> --- a/arch/x86/include/asm/mce.h
>>> +++ b/arch/x86/include/asm/mce.h
>>> @@ -308,6 +308,12 @@ DECLARE_PER_CPU(struct mce, injectm);
>>>    /* Disable CMCI/polling for MCA bank claimed by firmware */
>>>    extern void mce_disable_bank(int bank);
>>> +#ifdef CONFIG_X86_MCE_THRESHOLD
>>> +void mce_save_apei_thr_limit(u32 thr_limit);
>>> +#else
>>> +static inline void mce_save_apei_thr_limit(u32 thr_limit) { }
>>> +#endif /* CONFIG_X86_MCE_THRESHOLD */
>>> +
>>>    /*
>>>     * Exception handler
>>>     */
>>> diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
>>> index 0916f00a992e..e21419e686eb 100644
>>> --- a/arch/x86/kernel/acpi/apei.c
>>> +++ b/arch/x86/kernel/acpi/apei.c
>>> @@ -19,6 +19,8 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
>>>    	if (!cmc->enabled)
>>>    		return 0;
>>> +	mce_save_apei_thr_limit(cmc->notify.error_threshold_value);
>>> +
>>>    	/*
>>>    	 * We expect HEST to provide a list of MC banks that report errors
>>>    	 * in firmware first mode. Otherwise, return non-zero value to
>>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>>> index b895559e80ad..9b746080351f 100644
>>> --- a/arch/x86/kernel/cpu/mce/amd.c
>>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>>> @@ -489,6 +489,18 @@ static void threshold_restart_bank(unsigned int bank, bool intr_en)
>>>    	}
>>>    }
>>> +/* Try to use the threshold limit reported through APEI. */
>>> +static u16 get_thr_limit(void)
>>> +{
>>> +	u32 thr_limit = mce_get_apei_thr_limit();
>>> +
>>> +	/* Fallback to old default if APEI limit is not available. */
>>> +	if (!thr_limit)
>>> +		return THRESHOLD_MAX;
>>> +
>>> +	return min(thr_limit, THRESHOLD_MAX);
>>> +}
>>> +
>>>    static void mce_threshold_block_init(struct threshold_block *b, int offset)
>>>    {
>>>    	struct thresh_restart tr = {
>>> @@ -497,7 +509,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
>>>    		.lvt_off		= offset,
>>>    	};
>>> -	b->threshold_limit		= THRESHOLD_MAX;
>>> +	b->threshold_limit		= get_thr_limit();
>>>    	threshold_restart_block(&tr);
>>>    };
>>> @@ -1071,7 +1083,7 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
>>>    	b->address		= address;
>>>    	b->interrupt_enable	= 0;
>>>    	b->interrupt_capable	= lvt_interrupt_supported(bank, high);
>>> -	b->threshold_limit	= THRESHOLD_MAX;
>>> +	b->threshold_limit	= get_thr_limit();
>>>    	if (b->interrupt_capable) {
>>>    		default_attrs[2] = &interrupt_enable.attr;
>>> @@ -1082,6 +1094,8 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
>>>    	list_add(&b->miscj, &tb->miscj);
>>> +	mce_threshold_block_init(b, (high & MASK_LVTOFF_HI) >> 20);
>>
>> Why is this necessary? Shouldn't this patch consist of mainly
>> s/THRESHOLD_MAX/get_thr_limit();
>>
>>
>> In allocate_threshold_block have already properly set threshold_limit. So
>> this change really ensures threshold_restart_block is being called for the
>> given block being initialized. Ignoring the changed threshold limit logic,
>> why is this extra call necessary now and wasn't before?
>>
> 
> It is necessary to apply the threshold limit to the hardware register.
> The MCA thresholding registers are accessed in two passes: first during
> per-CPU init, and second when the MCE subsystem devices are created.
> 
> The hardware registers are updated in the first pass, and they are left
> as-is in the second pass assuming no configuration has changed. That's
> why there isn't a "reset" in the second pass.
> 
> The APEI tables are parsed between the first and second passes. So now
> we need to update the registers during the second pass to apply the
> value found from HEST.

So APEI is initialized as part of the subsys_initcall which is processed 
via:

start_kernel
rest_init
kernel_init
kernel_init_freeable
do_basic_setup
do_initcalls

And the first mce_threshold_block_init() happens from :

start_kernel
arch_cpu_finalize_init <---- way before rest_init()
identify_boot_cpu
identify_cpu
mcheck_cpu_init
mcheck_cpu_init_vendor
mce_amd_feature_init
prepare_threshold_block
mce_threshold_block_init


Finally the per-cpu hotplug callback is installed via:

mcheck_init_device <- initiated from a device_initcall, happens after 
APEI subsys init.
mce_cpu_online  - called on every cpu from the HP machinery
mce_threshold_create_device
threshold_create_bank
allocate_threshold_blocks
mce_threshold_block_init - newly added call in alloc block, used to 
update the register with the new limit


Given that mce_cpu_online is already called on every online cpu at the 
time of installation of the callback, and every subsequent cpu that will 
come online I can't help but wonder why do we have to do the mce 
initialization from start_kernel, can't we move the mcheck_cpu_init call 
into mce_cpu_online, or have the latter subsume the former?

Sorry if I'm being too nitpicky, I just want to have proper 
understanding of the subsystem and the various (implicit) requirements 
it has.


At the very least I believe this commit message should at least allude 
to the fact that mce threshold devices have a strict requirement to be 
created after the APEI subsys has been created.


> 
> Thanks,
> Yazen

next prev parent reply	other threads:[~2025-09-19 10:43 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08 15:40 [PATCH v6 00/15] AMD MCA interrupts rework Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 01/15] x86/mce: Set CR4.MCE last during init Yazen Ghannam
2025-09-10 11:43   ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 02/15] x86/mce: Define BSP-only init Yazen Ghannam
2025-09-10 11:47   ` Nikolay Borisov
2025-09-10 13:53     ` Yazen Ghannam
2025-09-10 14:28       ` Nikolay Borisov
2025-09-10 17:23     ` Luck, Tony
2025-09-08 15:40 ` [PATCH v6 03/15] x86/mce: Define BSP-only SMCA init Yazen Ghannam
2025-09-10 11:48   ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 04/15] x86/mce: Do 'UNKNOWN' vendor check early Yazen Ghannam
2025-09-10 13:27   ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 05/15] x86/mce: Separate global and per-CPU quirks Yazen Ghannam
2025-09-10 13:29   ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 06/15] x86/mce: Move machine_check_poll() status checks to helper functions Yazen Ghannam
2025-09-10 15:09   ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 07/15] x86/mce: Add clear_bank() helper Yazen Ghannam
2025-09-10 15:22   ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 08/15] x86/mce: Unify AMD THR handler with MCA Polling Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 09/15] x86/mce: Unify AMD DFR " Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 10/15] x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems Yazen Ghannam
2025-09-11 10:22   ` Nikolay Borisov
2025-09-11 15:53     ` Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 11/15] x86/mce/amd: Support SMCA Corrected Error Interrupt Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 12/15] x86/mce/amd: Remove redundant reset_block() Yazen Ghannam
2025-09-11 14:42   ` Nikolay Borisov
2025-09-11 16:11     ` Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 13/15] x86/mce/amd: Define threshold restart function for banks Yazen Ghannam
2025-09-11 14:49   ` Nikolay Borisov
2025-09-08 15:40 ` [PATCH v6 14/15] x86/mce: Handle AMD threshold interrupt storms Yazen Ghannam
2025-09-08 15:40 ` [PATCH v6 15/15] x86/mce: Save and use APEI corrected threshold limit Yazen Ghannam
2025-09-11 17:01   ` Nikolay Borisov
2025-09-15 17:33     ` Yazen Ghannam
2025-09-19 10:42       ` Nikolay Borisov [this message]
2025-09-22 13:58         ` Yazen Ghannam
2025-09-08 16:10 ` [PATCH v6 00/15] AMD MCA interrupts rework Luck, Tony
2025-09-09 13:36   ` Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e2d5351-dbd2-4e28-8bdb-b961fade5ebc@suse.com \
    --to=nik.borisov@suse.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).