public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Shuai Xue <xueshuai@linux.alibaba.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Borislav Petkov <bp@alien8.de>, Hanjun Guo <guohanjun@huawei.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev, Andi Kleen <andi.kleen@intel.com>
Subject: Re: [PATCH] ACPI: APEI: GHES: Improve ghes_notify_nmi() status check
Date: Thu, 6 Nov 2025 13:04:05 +0800	[thread overview]
Message-ID: <d49c1287-6a55-4eff-a908-b8a878bdd08a@linux.alibaba.com> (raw)
In-Reply-To: <aQwDd-Nhgxpkdrcb@agluck-desk3>



在 2025/11/6 10:09, Luck, Tony 写道:
> On Thu, Nov 06, 2025 at 09:46:33AM +0800, Shuai Xue wrote:
>>
>>
>> 在 2025/11/4 07:05, Tony Luck 写道:
>>> ghes_notify_nmi() is called for every NMI and must check whether the NMI was
>>> generated because an error was signalled by platform firmware.
>>>
>>> This check is very expensive as for each registered GHES NMI source it reads
>>> from the acpi generic address attached to this error source to get the physical
>>> address of the acpi_hest_generic_status block.  It then checks the "block_status"
>>> to see if an error was logged.
>>>
>>> The ACPI/APEI code must create virtual mappings for each of those physical
>>> addresses, and tear them down afterwards. On an Icelake system this takes around
>>> 15,000 TSC cycles. Enough to disturb efforts to profile system performance.
>>
>> Hi, Tony
>>
>> Interesting.
>>
>> If I understand correctly, you mean ghes_peek_estatus() and
>> ghes_clear_estatus().
>>
>> I conducted performance testing on our system (ARM v8) and found the
>> following average costs:
>>
>> - ghes_peek_estatus(): 8,138.3 ns (21,160 cycles)
>> - ghes_clear_estatus(): 2,038.3 ns (5,300 cycles)
> 
> ARM doesn't use the NMI path (HAVE_ACPI_APEI_NMI is only set on X86).
> But maybe you are looking at ghes_notify_sea() which seems similar?

Yes. It is measured in ghes_notify_sea().

>>
>>>
>>> If that were not bad enough, there are some atomic accesses in the code path
>>> that will cause cache line bounces between CPUs. A problem that gets worse as
>>> the core count increases.
>>
>> Could you elaborate on which specific atomic accesses you're referring to?
> 
> ghes_notify_nmi() starts with code to ensure only one CPU executes the
> GHES NMI path.
> 
> 	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> 		return ret;
> 
> Looks like an optimization to avoid having a bunch of CPUs queue up
> waiting for this spinllock:
> 
> 	raw_spin_lock(&ghes_notify_lock_nmi);
> 
> when the first one to get it will find and handle the logged error.

If an NMI issued, at last one status block is active. I don't see how
the code path is different.

>>
>>>
>>> But BIOS changes neither the acpi generic address nor the physical address of
>>> the acpi_hest_generic_status block. So this walk can be done once when the NMI is
>>> registered to save the virtual address (unmapping if the NMI is ever unregistered).
>>> The "block_status" can be checked directly in the NMI handler. This can be done
>>> without any atomic accesses.
>>>
>>> Resulting time to check that there is not an error record is around 900 cycles.
>>>
>>> Reported-by: Andi Kleen <andi.kleen@intel.com>
>>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>>>
>>> ---
>>> N.B. I only talked to an Intel BIOS expert about this. GHES code is shared by
>>> other architectures, so it would be wise to get confirmation on whether this
>>> assumption applies to all, or is Intel (or X86) specific.
>>
>> The assumption is "BIOS changes neither the acpi generic address nor the
>> physical address of the acpi_hest_generic_status block."?
>>
>> I've consulted with our BIOS experts from both ARM and RISC-V platform
>> teams, and they confirmed that error status blocks are reserved at boot
>> time and remain unchanged during runtime.
> 
> Thanks. Good to have this confirmation.
> 
>>> ---
>>>    include/acpi/ghes.h      |  1 +
>>>    drivers/acpi/apei/ghes.c | 39 ++++++++++++++++++++++++++++++++++++---
>>>    2 files changed, 37 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>>> index ebd21b05fe6e..58655d313a1f 100644
>>> --- a/include/acpi/ghes.h
>>> +++ b/include/acpi/ghes.h
>>> @@ -29,6 +29,7 @@ struct ghes {
>>>    	};
>>>    	struct device *dev;
>>>    	struct list_head elist;
>>> +	void __iomem *error_status_vaddr;
>>>    };
>>>    struct ghes_estatus_node {
>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>> index 97ee19f2cae0..62713b612865 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -1425,7 +1425,21 @@ static LIST_HEAD(ghes_nmi);
>>>    static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>>>    {
>>>    	static DEFINE_RAW_SPINLOCK(ghes_notify_lock_nmi);
>>> +	bool active_error = false;
>>>    	int ret = NMI_DONE;
>>> +	struct ghes *ghes;
>>> +
>>> +	rcu_read_lock();
>>> +	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
>>> +		if (ghes->error_status_vaddr && readl(ghes->error_status_vaddr)) {
>>> +			active_error = true;
>>> +			break;
>>> +		}
>>> +	}
>>> +	rcu_read_unlock();
>>> +
>>> +	if (!active_error)
>>> +		return ret;
>>
>> Shoud we put active_error into struct ghes? If we know it is active, we
>> do not need to call __ghes_peek_estatus() to estatus->block_status().
> 
> That might be a useful addition. I was primarily concerned in fixing the
> "no erroor" case that happens at a very high rate while profiling the
> system with "perf". 

Do you mean you see "NMI received for unknown reason" when profiling with
"perf"? And the unknown error is handled by ghes_notify_nmi().

I see some unknown NMI in production in Intel platform, but I did not
figure out how it happend. Can you help to explain it?

> But skipping (or just removing?
> __ghes_peek_estatus()) if you have already confirmed that there is
> a logged error would be good.
> 
> If you can use the same technique for ghes_notify_sea() then it would be
> sensible to move the code I added to ghes_nmi_add() to ghes_new() to
> save the virtual address for every type of GHES notification.

Sure, I'd like to add it for ghes_notify_sea().

Thanks.
Shuai


  reply	other threads:[~2025-11-06  5:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-03 23:05 [PATCH] ACPI: APEI: GHES: Improve ghes_notify_nmi() status check Tony Luck
2025-11-05 21:19 ` Yazen Ghannam
2025-11-05 23:53   ` Luck, Tony
2025-11-06  1:46 ` Shuai Xue
2025-11-06  2:09   ` Luck, Tony
2025-11-06  5:04     ` Shuai Xue [this message]
2025-11-06 18:03       ` Luck, Tony
2025-11-07  2:20         ` Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d49c1287-6a55-4eff-a908-b8a878bdd08a@linux.alibaba.com \
    --to=xueshuai@linux.alibaba.com \
    --cc=andi.kleen@intel.com \
    --cc=bp@alien8.de \
    --cc=guohanjun@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox