[PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
@ 2025-10-30  7:13 Junhao He
  2025-11-03 16:19 ` Rafael J. Wysocki
  0 siblings, 1 reply; 12+ messages in thread
From: Junhao He @ 2025-10-30  7:13 UTC (permalink / raw)
  To: rafael, tony.luck, bp, guohanjun, mchehab, xueshuai, jarkko,
	yazen.ghannam, jane.chu, lenb, Jonathan.Cameron
  Cc: linux-acpi, linux-arm-kernel, linux-kernel, linux-edac,
	shiju.jose, tanxiaofei, linuxarm, hejunhao3

The do_sea() function defaults to using firmware-first mode, if supported.
It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
errors. If the same kind SEA error continues to occur, GHES will skip to
reporting this SEA error and will not add it to the "ghes_estatus_llist"
list until the cache times out after 10 seconds, at which point the SEA
error will be reprocessed.

The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
ultimately executes memory_failure() to process the page with hardware
memory corruption. If the same SEA error appears multiple times
consecutively, it indicates that the previous handling was incomplete or
unable to resolve the fault. In such cases, it is more appropriate to
return a failure when encountering the same error again, and then proceed
to arm64_do_kernel_sea for further processing.

When hardware memory corruption occurs, a memory error interrupt is
triggered. If the kernel accesses this erroneous data, it will trigger
the SEA error exception handler. All such handlers will call
memory_failure() to handle the faulty page.

If a memory error interrupt occurs first, followed by an SEA error
interrupt, the faulty page is first marked as poisoned by the memory error
interrupt process, and then the SEA error interrupt handling process will
send a SIGBUS signal to the process accessing the poisoned page.

However, if the SEA interrupt is reported first, the following exceptional
scenario occurs:

When a user process directly requests and accesses a page with hardware
memory corruption via mmap (such as with devmem), the page containing this
address may still be in a free buddy state in the kernel. At this point,
the page is marked as "poisoned" during the SEA claim memory_failure().
However, since the process does not request the page through the kernel's
MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
error interrupt handling process not support send SIGBUS signal. As a
result, these processes continues to access the faulty page, causing
repeated entries into the SEA exception handler. At this time, it lead to
an SEA error interrupt storm.

Fixes this by returning a failure when encountering the same error again.

The following error logs is explained using the devmem process:
  NOTICE:  SEA Handle
  NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
  NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
  NOTICE:  EsrEl3 = 0x92000410
  NOTICE:  PA is valid: 0x1000093c00
  NOTICE:  Hest Set GenericError Data
  [ 1419.542401][    C1] {57}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
  [ 1419.551435][    C1] {57}[Hardware Error]: event severity: recoverable
  [ 1419.557865][    C1] {57}[Hardware Error]:  Error 0, type: recoverable
  [ 1419.564295][    C1] {57}[Hardware Error]:   section_type: ARM processor error
  [ 1419.571421][    C1] {57}[Hardware Error]:   MIDR: 0x0000000000000000
  [ 1419.571434][    C1] {57}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
  [ 1419.586813][    C1] {57}[Hardware Error]:   error affinity level: 0
  [ 1419.586821][    C1] {57}[Hardware Error]:   running state: 0x1
  [ 1419.602714][    C1] {57}[Hardware Error]:   Power State Coordination Interface state: 0
  [ 1419.602724][    C1] {57}[Hardware Error]:   Error info structure 0:
  [ 1419.614797][    C1] {57}[Hardware Error]:   num errors: 1
  [ 1419.614804][    C1] {57}[Hardware Error]:    error_type: 0, cache error
  [ 1419.629226][    C1] {57}[Hardware Error]:    error_info: 0x0000000020400014
  [ 1419.629234][    C1] {57}[Hardware Error]:     cache level: 1
  [ 1419.642006][    C1] {57}[Hardware Error]:     the error has not been corrected
  [ 1419.642013][    C1] {57}[Hardware Error]:    physical fault address: 0x0000001000093c00
  [ 1419.654001][    C1] {57}[Hardware Error]:   Vendor specific error info has 48 bytes:
  [ 1419.654014][    C1] {57}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
  [ 1419.670685][    C1] {57}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
  [ 1419.670692][    C1] {57}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
  [ 1419.783606][T54990] Memory failure: 0x1000093: recovery action for free buddy page: Recovered
  [ 1419.919580][ T9955] EDAC MC0: 1 UE Multi-bit ECC on unknown memory (node:0 card:1 module:71 bank:7 row:0 col:0 page:0x1000093 offset:0xc00 grain:1 - APEI location: node:0 card:257 module:71 bank:7 row:0 col:0)
  NOTICE:  SEA Handle
  NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
  NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
  NOTICE:  EsrEl3 = 0x92000410
  NOTICE:  PA is valid: 0x1000093c00
  NOTICE:  Hest Set GenericError Data
  NOTICE:  SEA Handle
  NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
  NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
  NOTICE:  EsrEl3 = 0x92000410
  NOTICE:  PA is valid: 0x1000093c00
  NOTICE:  Hest Set GenericError Data
  ...
  ...        ---> Hapend SEA error interrupt storm
  ...
  NOTICE:  SEA Handle
  NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
  NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
  NOTICE:  EsrEl3 = 0x92000410
  NOTICE:  PA is valid: 0x1000093c00
  NOTICE:  Hest Set GenericError Data
  [ 1429.818080][ T9955] Memory failure: 0x1000093: already hardware poisoned
  [ 1429.825760][    C1] ghes_print_estatus: 1 callbacks suppressed
  [ 1429.825763][    C1] {59}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
  [ 1429.843731][    C1] {59}[Hardware Error]: event severity: recoverable
  [ 1429.861800][    C1] {59}[Hardware Error]:  Error 0, type: recoverable
  [ 1429.874658][    C1] {59}[Hardware Error]:   section_type: ARM processor error
  [ 1429.887516][    C1] {59}[Hardware Error]:   MIDR: 0x0000000000000000
  [ 1429.901159][    C1] {59}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
  [ 1429.901166][    C1] {59}[Hardware Error]:   error affinity level: 0
  [ 1429.914896][    C1] {59}[Hardware Error]:   running state: 0x1
  [ 1429.914903][    C1] {59}[Hardware Error]:   Power State Coordination Interface state: 0
  [ 1429.933319][    C1] {59}[Hardware Error]:   Error info structure 0:
  [ 1429.946261][    C1] {59}[Hardware Error]:   num errors: 1
  [ 1429.946269][    C1] {59}[Hardware Error]:    error_type: 0, cache error
  [ 1429.970847][    C1] {59}[Hardware Error]:    error_info: 0x0000000020400014
  [ 1429.970854][    C1] {59}[Hardware Error]:     cache level: 1
  [ 1429.988406][    C1] {59}[Hardware Error]:     the error has not been corrected
  [ 1430.013419][    C1] {59}[Hardware Error]:    physical fault address: 0x0000001000093c00
  [ 1430.013425][    C1] {59}[Hardware Error]:   Vendor specific error info has 48 bytes:
  [ 1430.025424][    C1] {59}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
  [ 1430.053736][    C1] {59}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
  [ 1430.066341][    C1] {59}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
  [ 1430.294255][T54990] Memory failure: 0x1000093: already hardware poisoned
  [ 1430.305518][T54990] 0x1000093: Sending SIGBUS to devmem:54990 due to hardware memory corruption

Signed-off-by: Junhao He <hejunhao3@h-partners.com>
---
 drivers/acpi/apei/ghes.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 005de10d80c3..eebda39bfc30 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1343,8 +1343,10 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
 	ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
 
 	/* This error has been reported before, don't process it again. */
-	if (ghes_estatus_cached(estatus))
+	if (ghes_estatus_cached(estatus)) {
+		rc = -ECANCELED;
 		goto no_work;
+	}
 
 	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
 
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2025-10-30  7:13 [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios Junhao He
@ 2025-11-03 16:19 ` Rafael J. Wysocki
  2025-11-04  1:32   ` Shuai Xue
  0 siblings, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2025-11-03 16:19 UTC (permalink / raw)
  To: Junhao He
  Cc: rafael, tony.luck, bp, guohanjun, mchehab, xueshuai, jarkko,
	yazen.ghannam, jane.chu, lenb, Jonathan.Cameron, linux-acpi,
	linux-arm-kernel, linux-kernel, linux-edac, shiju.jose,
	tanxiaofei, linuxarm

On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>
> The do_sea() function defaults to using firmware-first mode, if supported.
> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
> errors. If the same kind SEA error continues to occur, GHES will skip to
> reporting this SEA error and will not add it to the "ghes_estatus_llist"
> list until the cache times out after 10 seconds, at which point the SEA
> error will be reprocessed.
>
> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
> ultimately executes memory_failure() to process the page with hardware
> memory corruption. If the same SEA error appears multiple times
> consecutively, it indicates that the previous handling was incomplete or
> unable to resolve the fault. In such cases, it is more appropriate to
> return a failure when encountering the same error again, and then proceed
> to arm64_do_kernel_sea for further processing.
>
> When hardware memory corruption occurs, a memory error interrupt is
> triggered. If the kernel accesses this erroneous data, it will trigger
> the SEA error exception handler. All such handlers will call
> memory_failure() to handle the faulty page.
>
> If a memory error interrupt occurs first, followed by an SEA error
> interrupt, the faulty page is first marked as poisoned by the memory error
> interrupt process, and then the SEA error interrupt handling process will
> send a SIGBUS signal to the process accessing the poisoned page.
>
> However, if the SEA interrupt is reported first, the following exceptional
> scenario occurs:
>
> When a user process directly requests and accesses a page with hardware
> memory corruption via mmap (such as with devmem), the page containing this
> address may still be in a free buddy state in the kernel. At this point,
> the page is marked as "poisoned" during the SEA claim memory_failure().
> However, since the process does not request the page through the kernel's
> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
> error interrupt handling process not support send SIGBUS signal. As a
> result, these processes continues to access the faulty page, causing
> repeated entries into the SEA exception handler. At this time, it lead to
> an SEA error interrupt storm.
>
> Fixes this by returning a failure when encountering the same error again.
>
> The following error logs is explained using the devmem process:
>   NOTICE:  SEA Handle
>   NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>   NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>   NOTICE:  EsrEl3 = 0x92000410
>   NOTICE:  PA is valid: 0x1000093c00
>   NOTICE:  Hest Set GenericError Data
>   [ 1419.542401][    C1] {57}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>   [ 1419.551435][    C1] {57}[Hardware Error]: event severity: recoverable
>   [ 1419.557865][    C1] {57}[Hardware Error]:  Error 0, type: recoverable
>   [ 1419.564295][    C1] {57}[Hardware Error]:   section_type: ARM processor error
>   [ 1419.571421][    C1] {57}[Hardware Error]:   MIDR: 0x0000000000000000
>   [ 1419.571434][    C1] {57}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>   [ 1419.586813][    C1] {57}[Hardware Error]:   error affinity level: 0
>   [ 1419.586821][    C1] {57}[Hardware Error]:   running state: 0x1
>   [ 1419.602714][    C1] {57}[Hardware Error]:   Power State Coordination Interface state: 0
>   [ 1419.602724][    C1] {57}[Hardware Error]:   Error info structure 0:
>   [ 1419.614797][    C1] {57}[Hardware Error]:   num errors: 1
>   [ 1419.614804][    C1] {57}[Hardware Error]:    error_type: 0, cache error
>   [ 1419.629226][    C1] {57}[Hardware Error]:    error_info: 0x0000000020400014
>   [ 1419.629234][    C1] {57}[Hardware Error]:     cache level: 1
>   [ 1419.642006][    C1] {57}[Hardware Error]:     the error has not been corrected
>   [ 1419.642013][    C1] {57}[Hardware Error]:    physical fault address: 0x0000001000093c00
>   [ 1419.654001][    C1] {57}[Hardware Error]:   Vendor specific error info has 48 bytes:
>   [ 1419.654014][    C1] {57}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>   [ 1419.670685][    C1] {57}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>   [ 1419.670692][    C1] {57}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>   [ 1419.783606][T54990] Memory failure: 0x1000093: recovery action for free buddy page: Recovered
>   [ 1419.919580][ T9955] EDAC MC0: 1 UE Multi-bit ECC on unknown memory (node:0 card:1 module:71 bank:7 row:0 col:0 page:0x1000093 offset:0xc00 grain:1 - APEI location: node:0 card:257 module:71 bank:7 row:0 col:0)
>   NOTICE:  SEA Handle
>   NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>   NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>   NOTICE:  EsrEl3 = 0x92000410
>   NOTICE:  PA is valid: 0x1000093c00
>   NOTICE:  Hest Set GenericError Data
>   NOTICE:  SEA Handle
>   NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>   NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>   NOTICE:  EsrEl3 = 0x92000410
>   NOTICE:  PA is valid: 0x1000093c00
>   NOTICE:  Hest Set GenericError Data
>   ...
>   ...        ---> Hapend SEA error interrupt storm
>   ...
>   NOTICE:  SEA Handle
>   NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>   NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>   NOTICE:  EsrEl3 = 0x92000410
>   NOTICE:  PA is valid: 0x1000093c00
>   NOTICE:  Hest Set GenericError Data
>   [ 1429.818080][ T9955] Memory failure: 0x1000093: already hardware poisoned
>   [ 1429.825760][    C1] ghes_print_estatus: 1 callbacks suppressed
>   [ 1429.825763][    C1] {59}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>   [ 1429.843731][    C1] {59}[Hardware Error]: event severity: recoverable
>   [ 1429.861800][    C1] {59}[Hardware Error]:  Error 0, type: recoverable
>   [ 1429.874658][    C1] {59}[Hardware Error]:   section_type: ARM processor error
>   [ 1429.887516][    C1] {59}[Hardware Error]:   MIDR: 0x0000000000000000
>   [ 1429.901159][    C1] {59}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>   [ 1429.901166][    C1] {59}[Hardware Error]:   error affinity level: 0
>   [ 1429.914896][    C1] {59}[Hardware Error]:   running state: 0x1
>   [ 1429.914903][    C1] {59}[Hardware Error]:   Power State Coordination Interface state: 0
>   [ 1429.933319][    C1] {59}[Hardware Error]:   Error info structure 0:
>   [ 1429.946261][    C1] {59}[Hardware Error]:   num errors: 1
>   [ 1429.946269][    C1] {59}[Hardware Error]:    error_type: 0, cache error
>   [ 1429.970847][    C1] {59}[Hardware Error]:    error_info: 0x0000000020400014
>   [ 1429.970854][    C1] {59}[Hardware Error]:     cache level: 1
>   [ 1429.988406][    C1] {59}[Hardware Error]:     the error has not been corrected
>   [ 1430.013419][    C1] {59}[Hardware Error]:    physical fault address: 0x0000001000093c00
>   [ 1430.013425][    C1] {59}[Hardware Error]:   Vendor specific error info has 48 bytes:
>   [ 1430.025424][    C1] {59}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>   [ 1430.053736][    C1] {59}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>   [ 1430.066341][    C1] {59}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>   [ 1430.294255][T54990] Memory failure: 0x1000093: already hardware poisoned
>   [ 1430.305518][T54990] 0x1000093: Sending SIGBUS to devmem:54990 due to hardware memory corruption
>
> Signed-off-by: Junhao He <hejunhao3@h-partners.com>
> ---
>  drivers/acpi/apei/ghes.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 005de10d80c3..eebda39bfc30 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1343,8 +1343,10 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>         ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>
>         /* This error has been reported before, don't process it again. */
> -       if (ghes_estatus_cached(estatus))
> +       if (ghes_estatus_cached(estatus)) {
> +               rc = -ECANCELED;
>                 goto no_work;
> +       }
>
>         llist_add(&estatus_node->llnode, &ghes_estatus_llist);
>
> --

This needs a response from the APEI reviewers as per MAINTAINERS, thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2025-11-03 16:19 ` Rafael J. Wysocki
@ 2025-11-04  1:32   ` Shuai Xue
  2026-02-27 12:12     ` hejunhao
  0 siblings, 1 reply; 12+ messages in thread
From: Shuai Xue @ 2025-11-04  1:32 UTC (permalink / raw)
  To: Rafael J. Wysocki, Junhao He, Luck, Tony
  Cc: tony.luck, bp, guohanjun, mchehab, jarkko, yazen.ghannam,
	jane.chu, lenb, Jonathan.Cameron, linux-acpi, linux-arm-kernel,
	linux-kernel, linux-edac, shiju.jose, tanxiaofei, linuxarm



在 2025/11/4 00:19, Rafael J. Wysocki 写道:
> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>
>> The do_sea() function defaults to using firmware-first mode, if supported.
>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>> errors. If the same kind SEA error continues to occur, GHES will skip to
>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>> list until the cache times out after 10 seconds, at which point the SEA
>> error will be reprocessed.
>>
>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>> ultimately executes memory_failure() to process the page with hardware
>> memory corruption. If the same SEA error appears multiple times
>> consecutively, it indicates that the previous handling was incomplete or
>> unable to resolve the fault. In such cases, it is more appropriate to
>> return a failure when encountering the same error again, and then proceed
>> to arm64_do_kernel_sea for further processing.
>>
>> When hardware memory corruption occurs, a memory error interrupt is
>> triggered. If the kernel accesses this erroneous data, it will trigger
>> the SEA error exception handler. All such handlers will call
>> memory_failure() to handle the faulty page.
>>
>> If a memory error interrupt occurs first, followed by an SEA error
>> interrupt, the faulty page is first marked as poisoned by the memory error
>> interrupt process, and then the SEA error interrupt handling process will
>> send a SIGBUS signal to the process accessing the poisoned page.
>>
>> However, if the SEA interrupt is reported first, the following exceptional
>> scenario occurs:
>>
>> When a user process directly requests and accesses a page with hardware
>> memory corruption via mmap (such as with devmem), the page containing this
>> address may still be in a free buddy state in the kernel. At this point,
>> the page is marked as "poisoned" during the SEA claim memory_failure().
>> However, since the process does not request the page through the kernel's
>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>> error interrupt handling process not support send SIGBUS signal. As a
>> result, these processes continues to access the faulty page, causing
>> repeated entries into the SEA exception handler. At this time, it lead to
>> an SEA error interrupt storm.
>>
>> Fixes this by returning a failure when encountering the same error again.
>>
>> The following error logs is explained using the devmem process:
>>    NOTICE:  SEA Handle
>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>    NOTICE:  EsrEl3 = 0x92000410
>>    NOTICE:  PA is valid: 0x1000093c00
>>    NOTICE:  Hest Set GenericError Data
>>    [ 1419.542401][    C1] {57}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>    [ 1419.551435][    C1] {57}[Hardware Error]: event severity: recoverable
>>    [ 1419.557865][    C1] {57}[Hardware Error]:  Error 0, type: recoverable
>>    [ 1419.564295][    C1] {57}[Hardware Error]:   section_type: ARM processor error
>>    [ 1419.571421][    C1] {57}[Hardware Error]:   MIDR: 0x0000000000000000
>>    [ 1419.571434][    C1] {57}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>    [ 1419.586813][    C1] {57}[Hardware Error]:   error affinity level: 0
>>    [ 1419.586821][    C1] {57}[Hardware Error]:   running state: 0x1
>>    [ 1419.602714][    C1] {57}[Hardware Error]:   Power State Coordination Interface state: 0
>>    [ 1419.602724][    C1] {57}[Hardware Error]:   Error info structure 0:
>>    [ 1419.614797][    C1] {57}[Hardware Error]:   num errors: 1
>>    [ 1419.614804][    C1] {57}[Hardware Error]:    error_type: 0, cache error
>>    [ 1419.629226][    C1] {57}[Hardware Error]:    error_info: 0x0000000020400014
>>    [ 1419.629234][    C1] {57}[Hardware Error]:     cache level: 1
>>    [ 1419.642006][    C1] {57}[Hardware Error]:     the error has not been corrected
>>    [ 1419.642013][    C1] {57}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>    [ 1419.654001][    C1] {57}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>    [ 1419.654014][    C1] {57}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>    [ 1419.670685][    C1] {57}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>    [ 1419.670692][    C1] {57}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>    [ 1419.783606][T54990] Memory failure: 0x1000093: recovery action for free buddy page: Recovered
>>    [ 1419.919580][ T9955] EDAC MC0: 1 UE Multi-bit ECC on unknown memory (node:0 card:1 module:71 bank:7 row:0 col:0 page:0x1000093 offset:0xc00 grain:1 - APEI location: node:0 card:257 module:71 bank:7 row:0 col:0)
>>    NOTICE:  SEA Handle
>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>    NOTICE:  EsrEl3 = 0x92000410
>>    NOTICE:  PA is valid: 0x1000093c00
>>    NOTICE:  Hest Set GenericError Data
>>    NOTICE:  SEA Handle
>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>    NOTICE:  EsrEl3 = 0x92000410
>>    NOTICE:  PA is valid: 0x1000093c00
>>    NOTICE:  Hest Set GenericError Data
>>    ...
>>    ...        ---> Hapend SEA error interrupt storm
>>    ...
>>    NOTICE:  SEA Handle
>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>    NOTICE:  EsrEl3 = 0x92000410
>>    NOTICE:  PA is valid: 0x1000093c00
>>    NOTICE:  Hest Set GenericError Data
>>    [ 1429.818080][ T9955] Memory failure: 0x1000093: already hardware poisoned
>>    [ 1429.825760][    C1] ghes_print_estatus: 1 callbacks suppressed
>>    [ 1429.825763][    C1] {59}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>    [ 1429.843731][    C1] {59}[Hardware Error]: event severity: recoverable
>>    [ 1429.861800][    C1] {59}[Hardware Error]:  Error 0, type: recoverable
>>    [ 1429.874658][    C1] {59}[Hardware Error]:   section_type: ARM processor error
>>    [ 1429.887516][    C1] {59}[Hardware Error]:   MIDR: 0x0000000000000000
>>    [ 1429.901159][    C1] {59}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>    [ 1429.901166][    C1] {59}[Hardware Error]:   error affinity level: 0
>>    [ 1429.914896][    C1] {59}[Hardware Error]:   running state: 0x1
>>    [ 1429.914903][    C1] {59}[Hardware Error]:   Power State Coordination Interface state: 0
>>    [ 1429.933319][    C1] {59}[Hardware Error]:   Error info structure 0:
>>    [ 1429.946261][    C1] {59}[Hardware Error]:   num errors: 1
>>    [ 1429.946269][    C1] {59}[Hardware Error]:    error_type: 0, cache error
>>    [ 1429.970847][    C1] {59}[Hardware Error]:    error_info: 0x0000000020400014
>>    [ 1429.970854][    C1] {59}[Hardware Error]:     cache level: 1
>>    [ 1429.988406][    C1] {59}[Hardware Error]:     the error has not been corrected
>>    [ 1430.013419][    C1] {59}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>    [ 1430.013425][    C1] {59}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>    [ 1430.025424][    C1] {59}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>    [ 1430.053736][    C1] {59}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>    [ 1430.066341][    C1] {59}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>    [ 1430.294255][T54990] Memory failure: 0x1000093: already hardware poisoned
>>    [ 1430.305518][T54990] 0x1000093: Sending SIGBUS to devmem:54990 due to hardware memory corruption
>>
>> Signed-off-by: Junhao He <hejunhao3@h-partners.com>
>> ---
>>   drivers/acpi/apei/ghes.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 005de10d80c3..eebda39bfc30 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -1343,8 +1343,10 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>          ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>
>>          /* This error has been reported before, don't process it again. */
>> -       if (ghes_estatus_cached(estatus))
>> +       if (ghes_estatus_cached(estatus)) {
>> +               rc = -ECANCELED;
>>                  goto no_work;
>> +       }
>>
>>          llist_add(&estatus_node->llnode, &ghes_estatus_llist);
>>
>> --
> 
> This needs a response from the APEI reviewers as per MAINTAINERS, thanks!

Hi, Rafael and Junhao,

Sorry for late response, I try to reproduce the issue, it seems that
EINJ systems broken in 6.18.0-rc1+.

[ 3950.741186] CPU: 36 UID: 0 PID: 74112 Comm: einj_mem_uc Tainted: G            E       6.18.0-rc1+ #227 PREEMPT(none)
[ 3950.751749] Tainted: [E]=UNSIGNED_MODULE
[ 3950.755655] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 1.91 07/29/2022
[ 3950.763797] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3950.770729] pc : acpi_os_write_memory+0x108/0x150
[ 3950.775419] lr : acpi_os_write_memory+0x28/0x150
[ 3950.780017] sp : ffff800093fbba40
[ 3950.783319] x29: ffff800093fbba40 x28: 0000000000000000 x27: 0000000000000000
[ 3950.790425] x26: 0000000000000002 x25: ffffffffffffffff x24: 000000403f20e400
[ 3950.797530] x23: 0000000000000000 x22: 0000000000000008 x21: 000000000000ffff
[ 3950.804635] x20: 0000000000000040 x19: 000000002f7d0018 x18: 0000000000000000
[ 3950.811741] x17: 0000000000000000 x16: ffffae52d36ae5d0 x15: 000000001ba8e890
[ 3950.818847] x14: 0000000000000000 x13: 0000000000000000 x12: 0000005fffffffff
[ 3950.825952] x11: 0000000000000001 x10: ffff00400d761b90 x9 : ffffae52d365b198
[ 3950.833058] x8 : 0000280000000000 x7 : 000000002f7d0018 x6 : ffffae52d5198548
[ 3950.840164] x5 : 000000002f7d1000 x4 : 0000000000000018 x3 : ffff204016735060
[ 3950.847269] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffff8000845bd018
[ 3950.854376] Call trace:
[ 3950.856814]  acpi_os_write_memory+0x108/0x150 (P)
[ 3950.861500]  apei_write+0xb4/0xd0
[ 3950.864806]  apei_exec_write_register_value+0x88/0xc0
[ 3950.869838]  __apei_exec_run+0xac/0x120
[ 3950.873659]  __einj_error_inject+0x88/0x408 [einj]
[ 3950.878434]  einj_error_inject+0x168/0x1f0 [einj]
[ 3950.883120]  error_inject_set+0x48/0x60 [einj]
[ 3950.887548]  simple_attr_write_xsigned.constprop.0.isra.0+0x14c/0x1d0
[ 3950.893964]  simple_attr_write+0x1c/0x30
[ 3950.897873]  debugfs_attr_write+0x54/0xa0
[ 3950.901870]  vfs_write+0xc4/0x240
[ 3950.905173]  ksys_write+0x70/0x108
[ 3950.908562]  __arm64_sys_write+0x20/0x30
[ 3950.912471]  invoke_syscall+0x4c/0x110
[ 3950.916207]  el0_svc_common.constprop.0+0x44/0xe8
[ 3950.920893]  do_el0_svc+0x20/0x30
[ 3950.924194]  el0_svc+0x38/0x160
[ 3950.927324]  el0t_64_sync_handler+0x98/0xe0
[ 3950.931491]  el0t_64_sync+0x184/0x188
[ 3950.935140] Code: 14000006 7101029f 54000221 d50332bf (f9000015)
[ 3950.941210] ---[ end trace 0000000000000000 ]---
[ 3950.945807] Kernel panic - not syncing: Oops: Fatal exception

We need to fix it first.

Thanks.
Shuai


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2025-11-04  1:32   ` Shuai Xue
@ 2026-02-27 12:12     ` hejunhao
  2026-03-03 14:42       ` Shuai Xue
  0 siblings, 1 reply; 12+ messages in thread
From: hejunhao @ 2026-02-27 12:12 UTC (permalink / raw)
  To: Shuai Xue, Rafael J. Wysocki, Luck, Tony
  Cc: Junhao He, bp, guohanjun, mchehab, jarkko, yazen.ghannam,
	jane.chu, lenb, Jonathan.Cameron, linux-acpi, linux-arm-kernel,
	linux-kernel, linux-edac, shiju.jose, tanxiaofei



On 2025/11/4 9:32, Shuai Xue wrote:
>
>
> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>
>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>> list until the cache times out after 10 seconds, at which point the SEA
>>> error will be reprocessed.
>>>
>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>> ultimately executes memory_failure() to process the page with hardware
>>> memory corruption. If the same SEA error appears multiple times
>>> consecutively, it indicates that the previous handling was incomplete or
>>> unable to resolve the fault. In such cases, it is more appropriate to
>>> return a failure when encountering the same error again, and then proceed
>>> to arm64_do_kernel_sea for further processing.
>>>
>>> When hardware memory corruption occurs, a memory error interrupt is
>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>> the SEA error exception handler. All such handlers will call
>>> memory_failure() to handle the faulty page.
>>>
>>> If a memory error interrupt occurs first, followed by an SEA error
>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>> interrupt process, and then the SEA error interrupt handling process will
>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>
>>> However, if the SEA interrupt is reported first, the following exceptional
>>> scenario occurs:
>>>
>>> When a user process directly requests and accesses a page with hardware
>>> memory corruption via mmap (such as with devmem), the page containing this
>>> address may still be in a free buddy state in the kernel. At this point,
>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>> However, since the process does not request the page through the kernel's
>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>> error interrupt handling process not support send SIGBUS signal. As a
>>> result, these processes continues to access the faulty page, causing
>>> repeated entries into the SEA exception handler. At this time, it lead to
>>> an SEA error interrupt storm.
>>>
>>> Fixes this by returning a failure when encountering the same error again.
>>>
>>> The following error logs is explained using the devmem process:
>>>    NOTICE:  SEA Handle
>>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>    NOTICE:  EsrEl3 = 0x92000410
>>>    NOTICE:  PA is valid: 0x1000093c00
>>>    NOTICE:  Hest Set GenericError Data
>>>    [ 1419.542401][    C1] {57}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>    [ 1419.551435][    C1] {57}[Hardware Error]: event severity: recoverable
>>>    [ 1419.557865][    C1] {57}[Hardware Error]:  Error 0, type: recoverable
>>>    [ 1419.564295][    C1] {57}[Hardware Error]:   section_type: ARM processor error
>>>    [ 1419.571421][    C1] {57}[Hardware Error]:   MIDR: 0x0000000000000000
>>>    [ 1419.571434][    C1] {57}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>>    [ 1419.586813][    C1] {57}[Hardware Error]:   error affinity level: 0
>>>    [ 1419.586821][    C1] {57}[Hardware Error]:   running state: 0x1
>>>    [ 1419.602714][    C1] {57}[Hardware Error]:   Power State Coordination Interface state: 0
>>>    [ 1419.602724][    C1] {57}[Hardware Error]:   Error info structure 0:
>>>    [ 1419.614797][    C1] {57}[Hardware Error]:   num errors: 1
>>>    [ 1419.614804][    C1] {57}[Hardware Error]:    error_type: 0, cache error
>>>    [ 1419.629226][    C1] {57}[Hardware Error]:    error_info: 0x0000000020400014
>>>    [ 1419.629234][    C1] {57}[Hardware Error]:     cache level: 1
>>>    [ 1419.642006][    C1] {57}[Hardware Error]:     the error has not been corrected
>>>    [ 1419.642013][    C1] {57}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>>    [ 1419.654001][    C1] {57}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>    [ 1419.654014][    C1] {57}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>    [ 1419.670685][    C1] {57}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>    [ 1419.670692][    C1] {57}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>    [ 1419.783606][T54990] Memory failure: 0x1000093: recovery action for free buddy page: Recovered
>>>    [ 1419.919580][ T9955] EDAC MC0: 1 UE Multi-bit ECC on unknown memory (node:0 card:1 module:71 bank:7 row:0 col:0 page:0x1000093 offset:0xc00 grain:1 - APEI location: node:0 card:257 module:71 bank:7 row:0 col:0)
>>>    NOTICE:  SEA Handle
>>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>    NOTICE:  EsrEl3 = 0x92000410
>>>    NOTICE:  PA is valid: 0x1000093c00
>>>    NOTICE:  Hest Set GenericError Data
>>>    NOTICE:  SEA Handle
>>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>    NOTICE:  EsrEl3 = 0x92000410
>>>    NOTICE:  PA is valid: 0x1000093c00
>>>    NOTICE:  Hest Set GenericError Data
>>>    ...
>>>    ...        ---> Hapend SEA error interrupt storm
>>>    ...
>>>    NOTICE:  SEA Handle
>>>    NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>    NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>    NOTICE:  EsrEl3 = 0x92000410
>>>    NOTICE:  PA is valid: 0x1000093c00
>>>    NOTICE:  Hest Set GenericError Data
>>>    [ 1429.818080][ T9955] Memory failure: 0x1000093: already hardware poisoned
>>>    [ 1429.825760][    C1] ghes_print_estatus: 1 callbacks suppressed
>>>    [ 1429.825763][    C1] {59}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>    [ 1429.843731][    C1] {59}[Hardware Error]: event severity: recoverable
>>>    [ 1429.861800][    C1] {59}[Hardware Error]:  Error 0, type: recoverable
>>>    [ 1429.874658][    C1] {59}[Hardware Error]:   section_type: ARM processor error
>>>    [ 1429.887516][    C1] {59}[Hardware Error]:   MIDR: 0x0000000000000000
>>>    [ 1429.901159][    C1] {59}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>>    [ 1429.901166][    C1] {59}[Hardware Error]:   error affinity level: 0
>>>    [ 1429.914896][    C1] {59}[Hardware Error]:   running state: 0x1
>>>    [ 1429.914903][    C1] {59}[Hardware Error]:   Power State Coordination Interface state: 0
>>>    [ 1429.933319][    C1] {59}[Hardware Error]:   Error info structure 0:
>>>    [ 1429.946261][    C1] {59}[Hardware Error]:   num errors: 1
>>>    [ 1429.946269][    C1] {59}[Hardware Error]:    error_type: 0, cache error
>>>    [ 1429.970847][    C1] {59}[Hardware Error]:    error_info: 0x0000000020400014
>>>    [ 1429.970854][    C1] {59}[Hardware Error]:     cache level: 1
>>>    [ 1429.988406][    C1] {59}[Hardware Error]:     the error has not been corrected
>>>    [ 1430.013419][    C1] {59}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>>    [ 1430.013425][    C1] {59}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>    [ 1430.025424][    C1] {59}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>    [ 1430.053736][    C1] {59}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>    [ 1430.066341][    C1] {59}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>    [ 1430.294255][T54990] Memory failure: 0x1000093: already hardware poisoned
>>>    [ 1430.305518][T54990] 0x1000093: Sending SIGBUS to devmem:54990 due to hardware memory corruption
>>>
>>> Signed-off-by: Junhao He <hejunhao3@h-partners.com>
>>> ---
>>>   drivers/acpi/apei/ghes.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>> index 005de10d80c3..eebda39bfc30 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -1343,8 +1343,10 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>>          ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>
>>>          /* This error has been reported before, don't process it again. */
>>> -       if (ghes_estatus_cached(estatus))
>>> +       if (ghes_estatus_cached(estatus)) {
>>> +               rc = -ECANCELED;
>>>                  goto no_work;
>>> +       }
>>>
>>>          llist_add(&estatus_node->llnode, &ghes_estatus_llist);
>>>
>>> -- 
>>
>> This needs a response from the APEI reviewers as per MAINTAINERS, thanks!
>
> Hi, Rafael and Junhao,
>
> Sorry for late response, I try to reproduce the issue, it seems that
> EINJ systems broken in 6.18.0-rc1+.
>
> [ 3950.741186] CPU: 36 UID: 0 PID: 74112 Comm: einj_mem_uc Tainted: G            E       6.18.0-rc1+ #227 PREEMPT(none)
> [ 3950.751749] Tainted: [E]=UNSIGNED_MODULE
> [ 3950.755655] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 1.91 07/29/2022
> [ 3950.763797] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 3950.770729] pc : acpi_os_write_memory+0x108/0x150
> [ 3950.775419] lr : acpi_os_write_memory+0x28/0x150
> [ 3950.780017] sp : ffff800093fbba40
> [ 3950.783319] x29: ffff800093fbba40 x28: 0000000000000000 x27: 0000000000000000
> [ 3950.790425] x26: 0000000000000002 x25: ffffffffffffffff x24: 000000403f20e400
> [ 3950.797530] x23: 0000000000000000 x22: 0000000000000008 x21: 000000000000ffff
> [ 3950.804635] x20: 0000000000000040 x19: 000000002f7d0018 x18: 0000000000000000
> [ 3950.811741] x17: 0000000000000000 x16: ffffae52d36ae5d0 x15: 000000001ba8e890
> [ 3950.818847] x14: 0000000000000000 x13: 0000000000000000 x12: 0000005fffffffff
> [ 3950.825952] x11: 0000000000000001 x10: ffff00400d761b90 x9 : ffffae52d365b198
> [ 3950.833058] x8 : 0000280000000000 x7 : 000000002f7d0018 x6 : ffffae52d5198548
> [ 3950.840164] x5 : 000000002f7d1000 x4 : 0000000000000018 x3 : ffff204016735060
> [ 3950.847269] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffff8000845bd018
> [ 3950.854376] Call trace:
> [ 3950.856814]  acpi_os_write_memory+0x108/0x150 (P)
> [ 3950.861500]  apei_write+0xb4/0xd0
> [ 3950.864806]  apei_exec_write_register_value+0x88/0xc0
> [ 3950.869838]  __apei_exec_run+0xac/0x120
> [ 3950.873659]  __einj_error_inject+0x88/0x408 [einj]
> [ 3950.878434]  einj_error_inject+0x168/0x1f0 [einj]
> [ 3950.883120]  error_inject_set+0x48/0x60 [einj]
> [ 3950.887548]  simple_attr_write_xsigned.constprop.0.isra.0+0x14c/0x1d0
> [ 3950.893964]  simple_attr_write+0x1c/0x30
> [ 3950.897873]  debugfs_attr_write+0x54/0xa0
> [ 3950.901870]  vfs_write+0xc4/0x240
> [ 3950.905173]  ksys_write+0x70/0x108
> [ 3950.908562]  __arm64_sys_write+0x20/0x30
> [ 3950.912471]  invoke_syscall+0x4c/0x110
> [ 3950.916207]  el0_svc_common.constprop.0+0x44/0xe8
> [ 3950.920893]  do_el0_svc+0x20/0x30
> [ 3950.924194]  el0_svc+0x38/0x160
> [ 3950.927324]  el0t_64_sync_handler+0x98/0xe0
> [ 3950.931491]  el0t_64_sync+0x184/0x188
> [ 3950.935140] Code: 14000006 7101029f 54000221 d50332bf (f9000015)
> [ 3950.941210] ---[ end trace 0000000000000000 ]---
> [ 3950.945807] Kernel panic - not syncing: Oops: Fatal exception
>
> We need to fix it first.

Hi shuai xue,

Sorry for my late reply. Thank you for the review.
To clarify the issue:
This problem was introduced in v6.18-rc1 via a suspicious ARM64
memory mapping change [1]. I can reproduce the crash consistently
using the v6.18-rc1 kernel with this patch applied.

Crucially, the crash disappears when the change is reverted — error
injection completes successfully without any kernel panic or oops.
This confirms that the ARM64 memory mapping change is the root cause.

As noted in the original report, the change was reverted in v6.19-rc1, and
subsequent kernels (including v6.19-rc1 and later) are stable and do not
exhibit this problem.

reproduce  logs:
[  216.347073] Unable to handle kernel write to read-only memory at virtual address ffff800084825018
...
[  216.475949] CPU: 75 UID: 0 PID: 11477 Comm: sh Kdump: loaded Not tainted 6.18.0-rc1+ #60 PREEMPT
[  216.486561] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.91 07/29/2022
[  216.587297] Call trace:
[  216.589904]  acpi_os_write_memory+0x188/0x1c8 (P)
[  216.594763]  apei_write+0xcc/0xe8
[  216.598238]  apei_exec_write_register_value+0x90/0xd0
[  216.603437]  __apei_exec_run+0xb0/0x128
[  216.607420]  __einj_error_inject+0xac/0x450
[  216.611750]  einj_error_inject+0x19c/0x220
[  216.615988]  error_inject_set+0x4c/0x68
[  216.619962]  simple_attr_write_xsigned.constprop.0.isra.0+0xe8/0x1b0
[  216.626445]  simple_attr_write+0x20/0x38
[  216.630502]  debugfs_attr_write+0x58/0xa8
[  216.634643]  vfs_write+0xdc/0x408
[  216.638088]  ksys_write+0x78/0x118
[  216.641610]  __arm64_sys_write+0x24/0x38
[  216.645648]  invoke_syscall+0x50/0x120
[  216.649510]  el0_svc_common.constprop.0+0xc8/0xf0
[  216.654318]  do_el0_svc+0x24/0x38
[  216.657742]  el0_svc+0x38/0x150
[  216.660996]  el0t_64_sync_handler+0xa0/0xe8
[  216.665286]  el0t_64_sync+0x1ac/0x1b0
[  216.669054] Code: d65f03c0 710102ff 540001e1 d50332bf (f9000295)
[  216.675244] ---[ end trace 0000000000000000 ]---

[1] https://lore.kernel.org/all/20251121224611.07efa95a@foz.lan/

Best regards,
Junhao.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-02-27 12:12     ` hejunhao
@ 2026-03-03 14:42       ` Shuai Xue
  2026-03-24 10:04         ` hejunhao
  0 siblings, 1 reply; 12+ messages in thread
From: Shuai Xue @ 2026-03-03 14:42 UTC (permalink / raw)
  To: hejunhao, Rafael J. Wysocki, Luck, Tony
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei

Hi, junhao,

On 2/27/26 8:12 PM, hejunhao wrote:
> 
> 
> On 2025/11/4 9:32, Shuai Xue wrote:
>>
>>
>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>
>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>> error will be reprocessed.
>>>>
>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>> ultimately executes memory_failure() to process the page with hardware
>>>> memory corruption. If the same SEA error appears multiple times
>>>> consecutively, it indicates that the previous handling was incomplete or
>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>> return a failure when encountering the same error again, and then proceed
>>>> to arm64_do_kernel_sea for further processing.

There is no such function in the arm64 tree. If apei_claim_sea() returns
an error, the actual fallback path in do_sea() is arm64_notify_die(),
which sends SIGBUS?

>>>>
>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>> the SEA error exception handler. All such handlers will call
>>>> memory_failure() to handle the faulty page.
>>>>
>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>> interrupt process, and then the SEA error interrupt handling process will
>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>
>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>> scenario occurs:
>>>>
>>>> When a user process directly requests and accesses a page with hardware
>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>> address may still be in a free buddy state in the kernel. At this point,
>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>> However, since the process does not request the page through the kernel's
>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>> result, these processes continues to access the faulty page, causing
>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>> an SEA error interrupt storm.

In such case, the user process which accessing the poisoned page will be killed
by memory_fauilre?

// memory_failure():

	if (TestSetPageHWPoison(p)) {
		res = -EHWPOISON;
		if (flags & MF_ACTION_REQUIRED)
			res = kill_accessing_process(current, pfn, flags);
		if (flags & MF_COUNT_INCREASED)
			put_page(p);
		action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
		goto unlock_mutex;
	}

I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
fix infinite UCE for VM_PFNMAP pfn").

The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
no .test_walk callback is set, so kill_accessing_process() returns 0 for a
devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
the UCE was handled properly while the process was never actually killed.

Did you try the lastest kernel version?

>>>>
>>>> Fixes this by returning a failure when encountering the same error again.
>>>>
>>>> The following error logs is explained using the devmem process:
>>>>     NOTICE:  SEA Handle
>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>     NOTICE:  Hest Set GenericError Data
>>>>     [ 1419.542401][    C1] {57}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>     [ 1419.551435][    C1] {57}[Hardware Error]: event severity: recoverable
>>>>     [ 1419.557865][    C1] {57}[Hardware Error]:  Error 0, type: recoverable
>>>>     [ 1419.564295][    C1] {57}[Hardware Error]:   section_type: ARM processor error
>>>>     [ 1419.571421][    C1] {57}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>     [ 1419.571434][    C1] {57}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>>>     [ 1419.586813][    C1] {57}[Hardware Error]:   error affinity level: 0
>>>>     [ 1419.586821][    C1] {57}[Hardware Error]:   running state: 0x1
>>>>     [ 1419.602714][    C1] {57}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>     [ 1419.602724][    C1] {57}[Hardware Error]:   Error info structure 0:
>>>>     [ 1419.614797][    C1] {57}[Hardware Error]:   num errors: 1
>>>>     [ 1419.614804][    C1] {57}[Hardware Error]:    error_type: 0, cache error
>>>>     [ 1419.629226][    C1] {57}[Hardware Error]:    error_info: 0x0000000020400014
>>>>     [ 1419.629234][    C1] {57}[Hardware Error]:     cache level: 1
>>>>     [ 1419.642006][    C1] {57}[Hardware Error]:     the error has not been corrected
>>>>     [ 1419.642013][    C1] {57}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>>>     [ 1419.654001][    C1] {57}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>     [ 1419.654014][    C1] {57}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>     [ 1419.670685][    C1] {57}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>     [ 1419.670692][    C1] {57}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>     [ 1419.783606][T54990] Memory failure: 0x1000093: recovery action for free buddy page: Recovered
>>>>     [ 1419.919580][ T9955] EDAC MC0: 1 UE Multi-bit ECC on unknown memory (node:0 card:1 module:71 bank:7 row:0 col:0 page:0x1000093 offset:0xc00 grain:1 - APEI location: node:0 card:257 module:71 bank:7 row:0 col:0)
>>>>     NOTICE:  SEA Handle
>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>     NOTICE:  Hest Set GenericError Data
>>>>     NOTICE:  SEA Handle
>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>     NOTICE:  Hest Set GenericError Data
>>>>     ...
>>>>     ...        ---> Hapend SEA error interrupt storm
>>>>     ...
>>>>     NOTICE:  SEA Handle
>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>     NOTICE:  Hest Set GenericError Data
>>>>     [ 1429.818080][ T9955] Memory failure: 0x1000093: already hardware poisoned
>>>>     [ 1429.825760][    C1] ghes_print_estatus: 1 callbacks suppressed
>>>>     [ 1429.825763][    C1] {59}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>     [ 1429.843731][    C1] {59}[Hardware Error]: event severity: recoverable
>>>>     [ 1429.861800][    C1] {59}[Hardware Error]:  Error 0, type: recoverable
>>>>     [ 1429.874658][    C1] {59}[Hardware Error]:   section_type: ARM processor error
>>>>     [ 1429.887516][    C1] {59}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>     [ 1429.901159][    C1] {59}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>>>     [ 1429.901166][    C1] {59}[Hardware Error]:   error affinity level: 0
>>>>     [ 1429.914896][    C1] {59}[Hardware Error]:   running state: 0x1
>>>>     [ 1429.914903][    C1] {59}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>     [ 1429.933319][    C1] {59}[Hardware Error]:   Error info structure 0:
>>>>     [ 1429.946261][    C1] {59}[Hardware Error]:   num errors: 1
>>>>     [ 1429.946269][    C1] {59}[Hardware Error]:    error_type: 0, cache error
>>>>     [ 1429.970847][    C1] {59}[Hardware Error]:    error_info: 0x0000000020400014
>>>>     [ 1429.970854][    C1] {59}[Hardware Error]:     cache level: 1
>>>>     [ 1429.988406][    C1] {59}[Hardware Error]:     the error has not been corrected
>>>>     [ 1430.013419][    C1] {59}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>>>     [ 1430.013425][    C1] {59}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>     [ 1430.025424][    C1] {59}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>     [ 1430.053736][    C1] {59}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>     [ 1430.066341][    C1] {59}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>     [ 1430.294255][T54990] Memory failure: 0x1000093: already hardware poisoned
>>>>     [ 1430.305518][T54990] 0x1000093: Sending SIGBUS to devmem:54990 due to hardware memory corruption
>>>>
>>>> Signed-off-by: Junhao He <hejunhao3@h-partners.com>
>>>> ---
>>>>    drivers/acpi/apei/ghes.c | 4 +++-
>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index 005de10d80c3..eebda39bfc30 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -1343,8 +1343,10 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>>>           ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>>
>>>>           /* This error has been reported before, don't process it again. */
>>>> -       if (ghes_estatus_cached(estatus))
>>>> +       if (ghes_estatus_cached(estatus)) {
>>>> +               rc = -ECANCELED;
>>>>                   goto no_work;
>>>> +       }
>>>>
>>>>           llist_add(&estatus_node->llnode, &ghes_estatus_llist);
>>>>
>>>> -- 
>>>
>>> This needs a response from the APEI reviewers as per MAINTAINERS, thanks!
>>
>> Hi, Rafael and Junhao,
>>
>> Sorry for late response, I try to reproduce the issue, it seems that
>> EINJ systems broken in 6.18.0-rc1+.
>>
>> [ 3950.741186] CPU: 36 UID: 0 PID: 74112 Comm: einj_mem_uc Tainted: G            E       6.18.0-rc1+ #227 PREEMPT(none)
>> [ 3950.751749] Tainted: [E]=UNSIGNED_MODULE
>> [ 3950.755655] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 1.91 07/29/2022
>> [ 3950.763797] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 3950.770729] pc : acpi_os_write_memory+0x108/0x150
>> [ 3950.775419] lr : acpi_os_write_memory+0x28/0x150
>> [ 3950.780017] sp : ffff800093fbba40
>> [ 3950.783319] x29: ffff800093fbba40 x28: 0000000000000000 x27: 0000000000000000
>> [ 3950.790425] x26: 0000000000000002 x25: ffffffffffffffff x24: 000000403f20e400
>> [ 3950.797530] x23: 0000000000000000 x22: 0000000000000008 x21: 000000000000ffff
>> [ 3950.804635] x20: 0000000000000040 x19: 000000002f7d0018 x18: 0000000000000000
>> [ 3950.811741] x17: 0000000000000000 x16: ffffae52d36ae5d0 x15: 000000001ba8e890
>> [ 3950.818847] x14: 0000000000000000 x13: 0000000000000000 x12: 0000005fffffffff
>> [ 3950.825952] x11: 0000000000000001 x10: ffff00400d761b90 x9 : ffffae52d365b198
>> [ 3950.833058] x8 : 0000280000000000 x7 : 000000002f7d0018 x6 : ffffae52d5198548
>> [ 3950.840164] x5 : 000000002f7d1000 x4 : 0000000000000018 x3 : ffff204016735060
>> [ 3950.847269] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffff8000845bd018
>> [ 3950.854376] Call trace:
>> [ 3950.856814]  acpi_os_write_memory+0x108/0x150 (P)
>> [ 3950.861500]  apei_write+0xb4/0xd0
>> [ 3950.864806]  apei_exec_write_register_value+0x88/0xc0
>> [ 3950.869838]  __apei_exec_run+0xac/0x120
>> [ 3950.873659]  __einj_error_inject+0x88/0x408 [einj]
>> [ 3950.878434]  einj_error_inject+0x168/0x1f0 [einj]
>> [ 3950.883120]  error_inject_set+0x48/0x60 [einj]
>> [ 3950.887548]  simple_attr_write_xsigned.constprop.0.isra.0+0x14c/0x1d0
>> [ 3950.893964]  simple_attr_write+0x1c/0x30
>> [ 3950.897873]  debugfs_attr_write+0x54/0xa0
>> [ 3950.901870]  vfs_write+0xc4/0x240
>> [ 3950.905173]  ksys_write+0x70/0x108
>> [ 3950.908562]  __arm64_sys_write+0x20/0x30
>> [ 3950.912471]  invoke_syscall+0x4c/0x110
>> [ 3950.916207]  el0_svc_common.constprop.0+0x44/0xe8
>> [ 3950.920893]  do_el0_svc+0x20/0x30
>> [ 3950.924194]  el0_svc+0x38/0x160
>> [ 3950.927324]  el0t_64_sync_handler+0x98/0xe0
>> [ 3950.931491]  el0t_64_sync+0x184/0x188
>> [ 3950.935140] Code: 14000006 7101029f 54000221 d50332bf (f9000015)
>> [ 3950.941210] ---[ end trace 0000000000000000 ]---
>> [ 3950.945807] Kernel panic - not syncing: Oops: Fatal exception
>>
>> We need to fix it first.
> 
> Hi shuai xue,
> 
> Sorry for my late reply. Thank you for the review.
> To clarify the issue:
> This problem was introduced in v6.18-rc1 via a suspicious ARM64
> memory mapping change [1]. I can reproduce the crash consistently
> using the v6.18-rc1 kernel with this patch applied.
> 
> Crucially, the crash disappears when the change is reverted — error
> injection completes successfully without any kernel panic or oops.
> This confirms that the ARM64 memory mapping change is the root cause.
> 
> As noted in the original report, the change was reverted in v6.19-rc1, and
> subsequent kernels (including v6.19-rc1 and later) are stable and do not
> exhibit this problem.
> 
> reproduce  logs:
> [  216.347073] Unable to handle kernel write to read-only memory at virtual address ffff800084825018
> ...
> [  216.475949] CPU: 75 UID: 0 PID: 11477 Comm: sh Kdump: loaded Not tainted 6.18.0-rc1+ #60 PREEMPT
> [  216.486561] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.91 07/29/2022
> [  216.587297] Call trace:
> [  216.589904]  acpi_os_write_memory+0x188/0x1c8 (P)
> [  216.594763]  apei_write+0xcc/0xe8
> [  216.598238]  apei_exec_write_register_value+0x90/0xd0
> [  216.603437]  __apei_exec_run+0xb0/0x128
> [  216.607420]  __einj_error_inject+0xac/0x450
> [  216.611750]  einj_error_inject+0x19c/0x220
> [  216.615988]  error_inject_set+0x4c/0x68
> [  216.619962]  simple_attr_write_xsigned.constprop.0.isra.0+0xe8/0x1b0
> [  216.626445]  simple_attr_write+0x20/0x38
> [  216.630502]  debugfs_attr_write+0x58/0xa8
> [  216.634643]  vfs_write+0xdc/0x408
> [  216.638088]  ksys_write+0x78/0x118
> [  216.641610]  __arm64_sys_write+0x24/0x38
> [  216.645648]  invoke_syscall+0x50/0x120
> [  216.649510]  el0_svc_common.constprop.0+0xc8/0xf0
> [  216.654318]  do_el0_svc+0x24/0x38
> [  216.657742]  el0_svc+0x38/0x150
> [  216.660996]  el0t_64_sync_handler+0xa0/0xe8
> [  216.665286]  el0t_64_sync+0x1ac/0x1b0
> [  216.669054] Code: d65f03c0 710102ff 540001e1 d50332bf (f9000295)
> [  216.675244] ---[ end trace 0000000000000000 ]---
> 
> [1] https://lore.kernel.org/all/20251121224611.07efa95a@foz.lan/
> 
> Best regards,
> Junhao.

Thanks for clarify the issue.

Thanks.
Shuai



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-03-03 14:42       ` Shuai Xue
@ 2026-03-24 10:04         ` hejunhao
  2026-03-25  2:12           ` Shuai Xue
  0 siblings, 1 reply; 12+ messages in thread
From: hejunhao @ 2026-03-24 10:04 UTC (permalink / raw)
  To: Shuai Xue, Rafael J. Wysocki, Luck, Tony
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei, Junhao He, Linuxarm

Hi shuai xue,


On 2026/3/3 22:42, Shuai Xue wrote:
> Hi, junhao,
>
> On 2/27/26 8:12 PM, hejunhao wrote:
>>
>>
>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>
>>>
>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>
>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>> error will be reprocessed.
>>>>>
>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>> memory corruption. If the same SEA error appears multiple times
>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>> return a failure when encountering the same error again, and then proceed
>>>>> to arm64_do_kernel_sea for further processing.
>
> There is no such function in the arm64 tree. If apei_claim_sea() returns

Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
be arm64_notify_die().

> an error, the actual fallback path in do_sea() is arm64_notify_die(),
> which sends SIGBUS?
>

If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.

>>>>>
>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>> the SEA error exception handler. All such handlers will call
>>>>> memory_failure() to handle the faulty page.
>>>>>
>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>
>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>> scenario occurs:
>>>>>
>>>>> When a user process directly requests and accesses a page with hardware
>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>> However, since the process does not request the page through the kernel's
>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>> result, these processes continues to access the faulty page, causing
>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>> an SEA error interrupt storm.
>
> In such case, the user process which accessing the poisoned page will be killed
> by memory_fauilre?
>
> // memory_failure():
>
>     if (TestSetPageHWPoison(p)) {
>         res = -EHWPOISON;
>         if (flags & MF_ACTION_REQUIRED)
>             res = kill_accessing_process(current, pfn, flags);
>         if (flags & MF_COUNT_INCREASED)
>             put_page(p);
>         action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>         goto unlock_mutex;
>     }
>
> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
> fix infinite UCE for VM_PFNMAP pfn").
>
> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
> the UCE was handled properly while the process was never actually killed.
>
> Did you try the lastest kernel version?
>

I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.


@@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
        ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);

        /* This error has been reported before, don't process it again. */
-       if (ghes_estatus_cached(estatus))
+       if (ghes_estatus_cached(estatus)) {
+               pr_info("This error has been reported before, don't process it again.\n");
                goto no_work;
+       }

the test log Only some debug logs are retained here.

[2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
[2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
[2026/3/24 14:51:58.458] [  130.558038][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
[2026/3/24 14:51:58.459] [  130.572517][   C40] {1}[Hardware Error]: event severity: recoverable
[2026/3/24 14:51:58.459] [  130.578861][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
[2026/3/24 14:51:58.459] [  130.585203][   C40] {1}[Hardware Error]:   section_type: ARM processor error
[2026/3/24 14:51:58.459] [  130.592238][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
[2026/3/24 14:51:58.459] [  130.598492][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
[2026/3/24 14:51:58.459] [  130.607871][   C40] {1}[Hardware Error]:   error affinity level: 0
[2026/3/24 14:51:58.459] [  130.614038][   C40] {1}[Hardware Error]:   running state: 0x1
[2026/3/24 14:51:58.459] [  130.619770][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
[2026/3/24 14:51:58.459] [  130.627673][   C40] {1}[Hardware Error]:   Error info structure 0:
[2026/3/24 14:51:58.459] [  130.633839][   C40] {1}[Hardware Error]:   num errors: 1
[2026/3/24 14:51:58.459] [  130.639137][   C40] {1}[Hardware Error]:    error_type: 0, cache error
[2026/3/24 14:51:58.459] [  130.645652][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
[2026/3/24 14:51:58.459] [  130.652514][   C40] {1}[Hardware Error]:     cache level: 1
[2026/3/24 14:51:58.551] [  130.658073][   C40] {1}[Hardware Error]:     the error has not been corrected
[2026/3/24 14:51:58.551] [  130.665194][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
[2026/3/24 14:51:58.551] [  130.673097][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
[2026/3/24 14:51:58.551] [  130.680744][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
[2026/3/24 14:51:58.551] [  130.690471][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
[2026/3/24 14:51:58.552] [  130.700198][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
[2026/3/24 14:51:58.552] [  130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
[2026/3/24 14:51:58.638] [  130.790952][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:51:58.903] [  131.046994][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:51:58.991] [  131.132360][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:51:59.969] [  132.071431][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:00.860] [  133.010255][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:01.927] [  134.034746][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:02.906] [  135.058973][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:03.971] [  136.083213][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:04.860] [  137.021956][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:06.018] [  138.131460][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:06.905] [  139.070280][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:07.886] [  140.009147][   C40] This error has been reported before, don't process it again.
[2026/3/24 14:52:08.596] [  140.777368][   C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
[2026/3/24 14:52:08.683] [  140.791921][   C40] {2}[Hardware Error]: event severity: recoverable
[2026/3/24 14:52:08.683] [  140.798263][   C40] {2}[Hardware Error]:  Error 0, type: recoverable
[2026/3/24 14:52:08.683] [  140.804606][   C40] {2}[Hardware Error]:   section_type: ARM processor error
[2026/3/24 14:52:08.683] [  140.811641][   C40] {2}[Hardware Error]:   MIDR: 0x0000000000000000
[2026/3/24 14:52:08.684] [  140.817895][   C40] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
[2026/3/24 14:52:08.684] [  140.827274][   C40] {2}[Hardware Error]:   error affinity level: 0
[2026/3/24 14:52:08.684] [  140.833440][   C40] {2}[Hardware Error]:   running state: 0x1
[2026/3/24 14:52:08.684] [  140.839173][   C40] {2}[Hardware Error]:   Power State Coordination Interface state: 0
[2026/3/24 14:52:08.684] [  140.847076][   C40] {2}[Hardware Error]:   Error info structure 0:
[2026/3/24 14:52:08.684] [  140.853241][   C40] {2}[Hardware Error]:   num errors: 1
[2026/3/24 14:52:08.684] [  140.858540][   C40] {2}[Hardware Error]:    error_type: 0, cache error
[2026/3/24 14:52:08.684] [  140.865055][   C40] {2}[Hardware Error]:    error_info: 0x0000000020400014
[2026/3/24 14:52:08.684] [  140.871917][   C40] {2}[Hardware Error]:     cache level: 1
[2026/3/24 14:52:08.684] [  140.877475][   C40] {2}[Hardware Error]:     the error has not been corrected
[2026/3/24 14:52:08.764] [  140.884596][   C40] {2}[Hardware Error]:    physical fault address: 0x0000001351811800
[2026/3/24 14:52:08.764] [  140.892499][   C40] {2}[Hardware Error]:   Vendor specific error info has 48 bytes:
[2026/3/24 14:52:08.766] [  140.900145][   C40] {2}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
[2026/3/24 14:52:08.767] [  140.909872][   C40] {2}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
[2026/3/24 14:52:08.767] [  140.919598][   C40] {2}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
[2026/3/24 14:52:08.768] [  140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
[2026/3/24 14:52:08.768] [  140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption


Apply the patch:

@@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
        ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);

        /* This error has been reported before, don't process it again. */
-       if (ghes_estatus_cached(estatus))
+       if (ghes_estatus_cached(estatus)) {
+               pr_info("This error has been reported before, don't process it again.\n");
+               rc = -ECANCELED;
                goto no_work;
+       }

[2026/3/24 16:45:40.084] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
[2026/3/24 16:45:40.272] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
[2026/3/24 16:45:40.362] [  112.279324][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
[2026/3/24 16:45:40.362] [  112.293797][   C40] {1}[Hardware Error]: event severity: recoverable
[2026/3/24 16:45:40.362] [  112.300139][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
[2026/3/24 16:45:40.363] [  112.306481][   C40] {1}[Hardware Error]:   section_type: ARM processor error
[2026/3/24 16:45:40.363] [  112.313516][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
[2026/3/24 16:45:40.363] [  112.319771][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
[2026/3/24 16:45:40.363] [  112.329151][   C40] {1}[Hardware Error]:   error affinity level: 0
[2026/3/24 16:45:40.363] [  112.335317][   C40] {1}[Hardware Error]:   running state: 0x1
[2026/3/24 16:45:40.363] [  112.341049][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
[2026/3/24 16:45:40.363] [  112.348953][   C40] {1}[Hardware Error]:   Error info structure 0:
[2026/3/24 16:45:40.363] [  112.355119][   C40] {1}[Hardware Error]:   num errors: 1
[2026/3/24 16:45:40.363] [  112.360418][   C40] {1}[Hardware Error]:    error_type: 0, cache error
[2026/3/24 16:45:40.363] [  112.366932][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
[2026/3/24 16:45:40.363] [  112.373795][   C40] {1}[Hardware Error]:     cache level: 1
[2026/3/24 16:45:40.453] [  112.379354][   C40] {1}[Hardware Error]:     the error has not been corrected
[2026/3/24 16:45:40.453] [  112.386475][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
[2026/3/24 16:45:40.453] [  112.394378][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
[2026/3/24 16:45:40.453] [  112.402027][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
[2026/3/24 16:45:40.453] [  112.411754][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
[2026/3/24 16:45:40.453] [  112.421480][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
[2026/3/24 16:45:40.453] [  112.431639][ T9769] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
[2026/3/24 16:45:40.531] [  112.512520][   C40] This error has been reported before, don't process it again.
[2026/3/24 16:45:40.757] Bus error (core dumped)

>>>>>
>>>>> Fixes this by returning a failure when encountering the same error again.
>>>>>
>>>>> The following error logs is explained using the devmem process:
>>>>>     NOTICE:  SEA Handle
>>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>>     NOTICE:  Hest Set GenericError Data
>>>>>     [ 1419.542401][    C1] {57}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>>     [ 1419.551435][    C1] {57}[Hardware Error]: event severity: recoverable
>>>>>     [ 1419.557865][    C1] {57}[Hardware Error]:  Error 0, type: recoverable
>>>>>     [ 1419.564295][    C1] {57}[Hardware Error]:   section_type: ARM processor error
>>>>>     [ 1419.571421][    C1] {57}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>>     [ 1419.571434][    C1] {57}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>>>>     [ 1419.586813][    C1] {57}[Hardware Error]:   error affinity level: 0
>>>>>     [ 1419.586821][    C1] {57}[Hardware Error]:   running state: 0x1
>>>>>     [ 1419.602714][    C1] {57}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>>     [ 1419.602724][    C1] {57}[Hardware Error]:   Error info structure 0:
>>>>>     [ 1419.614797][    C1] {57}[Hardware Error]:   num errors: 1
>>>>>     [ 1419.614804][    C1] {57}[Hardware Error]:    error_type: 0, cache error
>>>>>     [ 1419.629226][    C1] {57}[Hardware Error]:    error_info: 0x0000000020400014
>>>>>     [ 1419.629234][    C1] {57}[Hardware Error]:     cache level: 1
>>>>>     [ 1419.642006][    C1] {57}[Hardware Error]:     the error has not been corrected
>>>>>     [ 1419.642013][    C1] {57}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>>>>     [ 1419.654001][    C1] {57}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>>     [ 1419.654014][    C1] {57}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>>     [ 1419.670685][    C1] {57}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>>     [ 1419.670692][    C1] {57}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>>     [ 1419.783606][T54990] Memory failure: 0x1000093: recovery action for free buddy page: Recovered
>>>>>     [ 1419.919580][ T9955] EDAC MC0: 1 UE Multi-bit ECC on unknown memory (node:0 card:1 module:71 bank:7 row:0 col:0 page:0x1000093 offset:0xc00 grain:1 - APEI location: node:0 card:257 module:71 bank:7 row:0 col:0)
>>>>>     NOTICE:  SEA Handle
>>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>>     NOTICE:  Hest Set GenericError Data
>>>>>     NOTICE:  SEA Handle
>>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>>     NOTICE:  Hest Set GenericError Data
>>>>>     ...
>>>>>     ...        ---> Hapend SEA error interrupt storm
>>>>>     ...
>>>>>     NOTICE:  SEA Handle
>>>>>     NOTICE:  SpsrEl3 = 0x60001000, ELR_EL3 = 0xffffc6ab42671400
>>>>>     NOTICE:  skt[0x0]die[0x0]cluster[0x0]core[0x1]
>>>>>     NOTICE:  EsrEl3 = 0x92000410
>>>>>     NOTICE:  PA is valid: 0x1000093c00
>>>>>     NOTICE:  Hest Set GenericError Data
>>>>>     [ 1429.818080][ T9955] Memory failure: 0x1000093: already hardware poisoned
>>>>>     [ 1429.825760][    C1] ghes_print_estatus: 1 callbacks suppressed
>>>>>     [ 1429.825763][    C1] {59}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>>     [ 1429.843731][    C1] {59}[Hardware Error]: event severity: recoverable
>>>>>     [ 1429.861800][    C1] {59}[Hardware Error]:  Error 0, type: recoverable
>>>>>     [ 1429.874658][    C1] {59}[Hardware Error]:   section_type: ARM processor error
>>>>>     [ 1429.887516][    C1] {59}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>>     [ 1429.901159][    C1] {59}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081000100
>>>>>     [ 1429.901166][    C1] {59}[Hardware Error]:   error affinity level: 0
>>>>>     [ 1429.914896][    C1] {59}[Hardware Error]:   running state: 0x1
>>>>>     [ 1429.914903][    C1] {59}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>>     [ 1429.933319][    C1] {59}[Hardware Error]:   Error info structure 0:
>>>>>     [ 1429.946261][    C1] {59}[Hardware Error]:   num errors: 1
>>>>>     [ 1429.946269][    C1] {59}[Hardware Error]:    error_type: 0, cache error
>>>>>     [ 1429.970847][    C1] {59}[Hardware Error]:    error_info: 0x0000000020400014
>>>>>     [ 1429.970854][    C1] {59}[Hardware Error]:     cache level: 1
>>>>>     [ 1429.988406][    C1] {59}[Hardware Error]:     the error has not been corrected
>>>>>     [ 1430.013419][    C1] {59}[Hardware Error]:    physical fault address: 0x0000001000093c00
>>>>>     [ 1430.013425][    C1] {59}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>>     [ 1430.025424][    C1] {59}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>>     [ 1430.053736][    C1] {59}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>>     [ 1430.066341][    C1] {59}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>>     [ 1430.294255][T54990] Memory failure: 0x1000093: already hardware poisoned
>>>>>     [ 1430.305518][T54990] 0x1000093: Sending SIGBUS to devmem:54990 due to hardware memory corruption
>>>>>
>>>>> Signed-off-by: Junhao He <hejunhao3@h-partners.com>
>>>>> ---
>>>>>    drivers/acpi/apei/ghes.c | 4 +++-
>>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>>> index 005de10d80c3..eebda39bfc30 100644
>>>>> --- a/drivers/acpi/apei/ghes.c
>>>>> +++ b/drivers/acpi/apei/ghes.c
>>>>> @@ -1343,8 +1343,10 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>>>>           ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>>>
>>>>>           /* This error has been reported before, don't process it again. */
>>>>> -       if (ghes_estatus_cached(estatus))
>>>>> +       if (ghes_estatus_cached(estatus)) {
>>>>> +               rc = -ECANCELED;
>>>>>                   goto no_work;
>>>>> +       }
>>>>>
>>>>>           llist_add(&estatus_node->llnode, &ghes_estatus_llist);
>>>>>
>>>>> -- 
>>>>
>>>> This needs a response from the APEI reviewers as per MAINTAINERS, thanks!
>>>
>>> Hi, Rafael and Junhao,
>>>
>>> Sorry for late response, I try to reproduce the issue, it seems that
>>> EINJ systems broken in 6.18.0-rc1+.
>>>
>>> [ 3950.741186] CPU: 36 UID: 0 PID: 74112 Comm: einj_mem_uc Tainted: G            E       6.18.0-rc1+ #227 PREEMPT(none)
>>> [ 3950.751749] Tainted: [E]=UNSIGNED_MODULE
>>> [ 3950.755655] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 1.91 07/29/2022
>>> [ 3950.763797] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [ 3950.770729] pc : acpi_os_write_memory+0x108/0x150
>>> [ 3950.775419] lr : acpi_os_write_memory+0x28/0x150
>>> [ 3950.780017] sp : ffff800093fbba40
>>> [ 3950.783319] x29: ffff800093fbba40 x28: 0000000000000000 x27: 0000000000000000
>>> [ 3950.790425] x26: 0000000000000002 x25: ffffffffffffffff x24: 000000403f20e400
>>> [ 3950.797530] x23: 0000000000000000 x22: 0000000000000008 x21: 000000000000ffff
>>> [ 3950.804635] x20: 0000000000000040 x19: 000000002f7d0018 x18: 0000000000000000
>>> [ 3950.811741] x17: 0000000000000000 x16: ffffae52d36ae5d0 x15: 000000001ba8e890
>>> [ 3950.818847] x14: 0000000000000000 x13: 0000000000000000 x12: 0000005fffffffff
>>> [ 3950.825952] x11: 0000000000000001 x10: ffff00400d761b90 x9 : ffffae52d365b198
>>> [ 3950.833058] x8 : 0000280000000000 x7 : 000000002f7d0018 x6 : ffffae52d5198548
>>> [ 3950.840164] x5 : 000000002f7d1000 x4 : 0000000000000018 x3 : ffff204016735060
>>> [ 3950.847269] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffff8000845bd018
>>> [ 3950.854376] Call trace:
>>> [ 3950.856814]  acpi_os_write_memory+0x108/0x150 (P)
>>> [ 3950.861500]  apei_write+0xb4/0xd0
>>> [ 3950.864806]  apei_exec_write_register_value+0x88/0xc0
>>> [ 3950.869838]  __apei_exec_run+0xac/0x120
>>> [ 3950.873659]  __einj_error_inject+0x88/0x408 [einj]
>>> [ 3950.878434]  einj_error_inject+0x168/0x1f0 [einj]
>>> [ 3950.883120]  error_inject_set+0x48/0x60 [einj]
>>> [ 3950.887548]  simple_attr_write_xsigned.constprop.0.isra.0+0x14c/0x1d0
>>> [ 3950.893964]  simple_attr_write+0x1c/0x30
>>> [ 3950.897873]  debugfs_attr_write+0x54/0xa0
>>> [ 3950.901870]  vfs_write+0xc4/0x240
>>> [ 3950.905173]  ksys_write+0x70/0x108
>>> [ 3950.908562]  __arm64_sys_write+0x20/0x30
>>> [ 3950.912471]  invoke_syscall+0x4c/0x110
>>> [ 3950.916207]  el0_svc_common.constprop.0+0x44/0xe8
>>> [ 3950.920893]  do_el0_svc+0x20/0x30
>>> [ 3950.924194]  el0_svc+0x38/0x160
>>> [ 3950.927324]  el0t_64_sync_handler+0x98/0xe0
>>> [ 3950.931491]  el0t_64_sync+0x184/0x188
>>> [ 3950.935140] Code: 14000006 7101029f 54000221 d50332bf (f9000015)
>>> [ 3950.941210] ---[ end trace 0000000000000000 ]---
>>> [ 3950.945807] Kernel panic - not syncing: Oops: Fatal exception
>>>
>>> We need to fix it first.
>>
>> Hi shuai xue,
>>
>> Sorry for my late reply. Thank you for the review.
>> To clarify the issue:
>> This problem was introduced in v6.18-rc1 via a suspicious ARM64
>> memory mapping change [1]. I can reproduce the crash consistently
>> using the v6.18-rc1 kernel with this patch applied.
>>
>> Crucially, the crash disappears when the change is reverted — error
>> injection completes successfully without any kernel panic or oops.
>> This confirms that the ARM64 memory mapping change is the root cause.
>>
>> As noted in the original report, the change was reverted in v6.19-rc1, and
>> subsequent kernels (including v6.19-rc1 and later) are stable and do not
>> exhibit this problem.
>>
>> reproduce  logs:
>> [  216.347073] Unable to handle kernel write to read-only memory at virtual address ffff800084825018
>> ...
>> [  216.475949] CPU: 75 UID: 0 PID: 11477 Comm: sh Kdump: loaded Not tainted 6.18.0-rc1+ #60 PREEMPT
>> [  216.486561] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.91 07/29/2022
>> [  216.587297] Call trace:
>> [  216.589904]  acpi_os_write_memory+0x188/0x1c8 (P)
>> [  216.594763]  apei_write+0xcc/0xe8
>> [  216.598238]  apei_exec_write_register_value+0x90/0xd0
>> [  216.603437]  __apei_exec_run+0xb0/0x128
>> [  216.607420]  __einj_error_inject+0xac/0x450
>> [  216.611750]  einj_error_inject+0x19c/0x220
>> [  216.615988]  error_inject_set+0x4c/0x68
>> [  216.619962]  simple_attr_write_xsigned.constprop.0.isra.0+0xe8/0x1b0
>> [  216.626445]  simple_attr_write+0x20/0x38
>> [  216.630502]  debugfs_attr_write+0x58/0xa8
>> [  216.634643]  vfs_write+0xdc/0x408
>> [  216.638088]  ksys_write+0x78/0x118
>> [  216.641610]  __arm64_sys_write+0x24/0x38
>> [  216.645648]  invoke_syscall+0x50/0x120
>> [  216.649510]  el0_svc_common.constprop.0+0xc8/0xf0
>> [  216.654318]  do_el0_svc+0x24/0x38
>> [  216.657742]  el0_svc+0x38/0x150
>> [  216.660996]  el0t_64_sync_handler+0xa0/0xe8
>> [  216.665286]  el0t_64_sync+0x1ac/0x1b0
>> [  216.669054] Code: d65f03c0 710102ff 540001e1 d50332bf (f9000295)
>> [  216.675244] ---[ end trace 0000000000000000 ]---
>>
>> [1] https://lore.kernel.org/all/20251121224611.07efa95a@foz.lan/
>>
>> Best regards,
>> Junhao.
>
> Thanks for clarify the issue.
>
> Thanks.
> Shuai
>
> .
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-03-24 10:04         ` hejunhao
@ 2026-03-25  2:12           ` Shuai Xue
  2026-03-25  9:24             ` hejunhao
  0 siblings, 1 reply; 12+ messages in thread
From: Shuai Xue @ 2026-03-25  2:12 UTC (permalink / raw)
  To: hejunhao, Rafael J. Wysocki, Luck, Tony
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei, Linuxarm

Hi, junhao

On 3/24/26 6:04 PM, hejunhao wrote:
> Hi shuai xue,
> 
> 
> On 2026/3/3 22:42, Shuai Xue wrote:
>> Hi, junhao,
>>
>> On 2/27/26 8:12 PM, hejunhao wrote:
>>>
>>>
>>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>>
>>>>
>>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>>
>>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>>> error will be reprocessed.
>>>>>>
>>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>>> memory corruption. If the same SEA error appears multiple times
>>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>>> return a failure when encountering the same error again, and then proceed
>>>>>> to arm64_do_kernel_sea for further processing.
>>
>> There is no such function in the arm64 tree. If apei_claim_sea() returns
> 
> Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
> be arm64_notify_die().
> 
>> an error, the actual fallback path in do_sea() is arm64_notify_die(),
>> which sends SIGBUS?
>>
> 
> If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
> followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.

So the process is expected to killed by SIGBUS?

> 
>>>>>>
>>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>>> the SEA error exception handler. All such handlers will call
>>>>>> memory_failure() to handle the faulty page.
>>>>>>
>>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>>
>>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>>> scenario occurs:
>>>>>>
>>>>>> When a user process directly requests and accesses a page with hardware
>>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>>> However, since the process does not request the page through the kernel's
>>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>>> result, these processes continues to access the faulty page, causing
>>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>>> an SEA error interrupt storm.
>>
>> In such case, the user process which accessing the poisoned page will be killed
>> by memory_fauilre?
>>
>> // memory_failure():
>>
>>      if (TestSetPageHWPoison(p)) {
>>          res = -EHWPOISON;
>>          if (flags & MF_ACTION_REQUIRED)
>>              res = kill_accessing_process(current, pfn, flags);
>>          if (flags & MF_COUNT_INCREASED)
>>              put_page(p);
>>          action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>>          goto unlock_mutex;
>>      }
>>
>> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
>> fix infinite UCE for VM_PFNMAP pfn").
>>
>> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
>> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
>> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
>> the UCE was handled properly while the process was never actually killed.
>>
>> Did you try the lastest kernel version?
>>
> 
> I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.
> 
> 
> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>          ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
> 
>          /* This error has been reported before, don't process it again. */
> -       if (ghes_estatus_cached(estatus))
> +       if (ghes_estatus_cached(estatus)) {
> +               pr_info("This error has been reported before, don't process it again.\n");
>                  goto no_work;
> +       }
> 
> the test log Only some debug logs are retained here.
> 
> [2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
> [2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
> [2026/3/24 14:51:58.458] [  130.558038][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
> [2026/3/24 14:51:58.459] [  130.572517][   C40] {1}[Hardware Error]: event severity: recoverable
> [2026/3/24 14:51:58.459] [  130.578861][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
> [2026/3/24 14:51:58.459] [  130.585203][   C40] {1}[Hardware Error]:   section_type: ARM processor error
> [2026/3/24 14:51:58.459] [  130.592238][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
> [2026/3/24 14:51:58.459] [  130.598492][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
> [2026/3/24 14:51:58.459] [  130.607871][   C40] {1}[Hardware Error]:   error affinity level: 0
> [2026/3/24 14:51:58.459] [  130.614038][   C40] {1}[Hardware Error]:   running state: 0x1
> [2026/3/24 14:51:58.459] [  130.619770][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
> [2026/3/24 14:51:58.459] [  130.627673][   C40] {1}[Hardware Error]:   Error info structure 0:
> [2026/3/24 14:51:58.459] [  130.633839][   C40] {1}[Hardware Error]:   num errors: 1
> [2026/3/24 14:51:58.459] [  130.639137][   C40] {1}[Hardware Error]:    error_type: 0, cache error
> [2026/3/24 14:51:58.459] [  130.645652][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
> [2026/3/24 14:51:58.459] [  130.652514][   C40] {1}[Hardware Error]:     cache level: 1
> [2026/3/24 14:51:58.551] [  130.658073][   C40] {1}[Hardware Error]:     the error has not been corrected
> [2026/3/24 14:51:58.551] [  130.665194][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
> [2026/3/24 14:51:58.551] [  130.673097][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
> [2026/3/24 14:51:58.551] [  130.680744][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 14:51:58.551] [  130.690471][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 14:51:58.552] [  130.700198][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 14:51:58.552] [  130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
> [2026/3/24 14:51:58.638] [  130.790952][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:51:58.903] [  131.046994][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:51:58.991] [  131.132360][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:51:59.969] [  132.071431][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:00.860] [  133.010255][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:01.927] [  134.034746][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:02.906] [  135.058973][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:03.971] [  136.083213][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:04.860] [  137.021956][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:06.018] [  138.131460][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:06.905] [  139.070280][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:07.886] [  140.009147][   C40] This error has been reported before, don't process it again.
> [2026/3/24 14:52:08.596] [  140.777368][   C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
> [2026/3/24 14:52:08.683] [  140.791921][   C40] {2}[Hardware Error]: event severity: recoverable
> [2026/3/24 14:52:08.683] [  140.798263][   C40] {2}[Hardware Error]:  Error 0, type: recoverable
> [2026/3/24 14:52:08.683] [  140.804606][   C40] {2}[Hardware Error]:   section_type: ARM processor error
> [2026/3/24 14:52:08.683] [  140.811641][   C40] {2}[Hardware Error]:   MIDR: 0x0000000000000000
> [2026/3/24 14:52:08.684] [  140.817895][   C40] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
> [2026/3/24 14:52:08.684] [  140.827274][   C40] {2}[Hardware Error]:   error affinity level: 0
> [2026/3/24 14:52:08.684] [  140.833440][   C40] {2}[Hardware Error]:   running state: 0x1
> [2026/3/24 14:52:08.684] [  140.839173][   C40] {2}[Hardware Error]:   Power State Coordination Interface state: 0
> [2026/3/24 14:52:08.684] [  140.847076][   C40] {2}[Hardware Error]:   Error info structure 0:
> [2026/3/24 14:52:08.684] [  140.853241][   C40] {2}[Hardware Error]:   num errors: 1
> [2026/3/24 14:52:08.684] [  140.858540][   C40] {2}[Hardware Error]:    error_type: 0, cache error
> [2026/3/24 14:52:08.684] [  140.865055][   C40] {2}[Hardware Error]:    error_info: 0x0000000020400014
> [2026/3/24 14:52:08.684] [  140.871917][   C40] {2}[Hardware Error]:     cache level: 1
> [2026/3/24 14:52:08.684] [  140.877475][   C40] {2}[Hardware Error]:     the error has not been corrected
> [2026/3/24 14:52:08.764] [  140.884596][   C40] {2}[Hardware Error]:    physical fault address: 0x0000001351811800
> [2026/3/24 14:52:08.764] [  140.892499][   C40] {2}[Hardware Error]:   Vendor specific error info has 48 bytes:
> [2026/3/24 14:52:08.766] [  140.900145][   C40] {2}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 14:52:08.767] [  140.909872][   C40] {2}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 14:52:08.767] [  140.919598][   C40] {2}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 14:52:08.768] [  140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
> [2026/3/24 14:52:08.768] [  140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption

Did you cut off some logs here?
The error log also indicates that the SIGBUS is delivered as expected.

> 
> 
> Apply the patch:
> 
> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>          ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
> 
>          /* This error has been reported before, don't process it again. */
> -       if (ghes_estatus_cached(estatus))
> +       if (ghes_estatus_cached(estatus)) {
> +               pr_info("This error has been reported before, don't process it again.\n");
> +               rc = -ECANCELED;
>                  goto no_work;
> +       }
> 
> [2026/3/24 16:45:40.084] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
> [2026/3/24 16:45:40.272] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
> [2026/3/24 16:45:40.362] [  112.279324][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
> [2026/3/24 16:45:40.362] [  112.293797][   C40] {1}[Hardware Error]: event severity: recoverable
> [2026/3/24 16:45:40.362] [  112.300139][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
> [2026/3/24 16:45:40.363] [  112.306481][   C40] {1}[Hardware Error]:   section_type: ARM processor error
> [2026/3/24 16:45:40.363] [  112.313516][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
> [2026/3/24 16:45:40.363] [  112.319771][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
> [2026/3/24 16:45:40.363] [  112.329151][   C40] {1}[Hardware Error]:   error affinity level: 0
> [2026/3/24 16:45:40.363] [  112.335317][   C40] {1}[Hardware Error]:   running state: 0x1
> [2026/3/24 16:45:40.363] [  112.341049][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
> [2026/3/24 16:45:40.363] [  112.348953][   C40] {1}[Hardware Error]:   Error info structure 0:
> [2026/3/24 16:45:40.363] [  112.355119][   C40] {1}[Hardware Error]:   num errors: 1
> [2026/3/24 16:45:40.363] [  112.360418][   C40] {1}[Hardware Error]:    error_type: 0, cache error
> [2026/3/24 16:45:40.363] [  112.366932][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
> [2026/3/24 16:45:40.363] [  112.373795][   C40] {1}[Hardware Error]:     cache level: 1
> [2026/3/24 16:45:40.453] [  112.379354][   C40] {1}[Hardware Error]:     the error has not been corrected
> [2026/3/24 16:45:40.453] [  112.386475][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
> [2026/3/24 16:45:40.453] [  112.394378][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
> [2026/3/24 16:45:40.453] [  112.402027][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 16:45:40.453] [  112.411754][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 16:45:40.453] [  112.421480][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
> [2026/3/24 16:45:40.453] [  112.431639][ T9769] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
> [2026/3/24 16:45:40.531] [  112.512520][   C40] This error has been reported before, don't process it again.
> [2026/3/24 16:45:40.757] Bus error (core dumped)
> 


Thanks.
Shuai




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-03-25  2:12           ` Shuai Xue
@ 2026-03-25  9:24             ` hejunhao
  2026-03-25 12:40               ` Shuai Xue
  0 siblings, 1 reply; 12+ messages in thread
From: hejunhao @ 2026-03-25  9:24 UTC (permalink / raw)
  To: Shuai Xue, Rafael J. Wysocki, Luck, Tony
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei, Linuxarm, Junhao He



On 2026/3/25 10:12, Shuai Xue wrote:
> Hi, junhao
>
> On 3/24/26 6:04 PM, hejunhao wrote:
>> Hi shuai xue,
>>
>>
>> On 2026/3/3 22:42, Shuai Xue wrote:
>>> Hi, junhao,
>>>
>>> On 2/27/26 8:12 PM, hejunhao wrote:
>>>>
>>>>
>>>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>>>
>>>>>
>>>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>>>
>>>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>>>> error will be reprocessed.
>>>>>>>
>>>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>>>> memory corruption. If the same SEA error appears multiple times
>>>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>>>> return a failure when encountering the same error again, and then proceed
>>>>>>> to arm64_do_kernel_sea for further processing.
>>>
>>> There is no such function in the arm64 tree. If apei_claim_sea() returns
>>
>> Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
>> be arm64_notify_die().
>>
>>> an error, the actual fallback path in do_sea() is arm64_notify_die(),
>>> which sends SIGBUS?
>>>
>>
>> If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
>> followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.
>
> So the process is expected to killed by SIGBUS?

Yes. The devmem process is expected to terminate upon receiving a SIGBUS signal, you can
see this at the last line of the test log after the patch is applied.
For other processes whether it terminates depends on whether it catches the signal; the kernel is
responsible for sending it immediately.

>
>>
>>>>>>>
>>>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>>>> the SEA error exception handler. All such handlers will call
>>>>>>> memory_failure() to handle the faulty page.
>>>>>>>
>>>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>>>
>>>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>>>> scenario occurs:
>>>>>>>
>>>>>>> When a user process directly requests and accesses a page with hardware
>>>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>>>> However, since the process does not request the page through the kernel's
>>>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>>>> result, these processes continues to access the faulty page, causing
>>>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>>>> an SEA error interrupt storm.
>>>
>>> In such case, the user process which accessing the poisoned page will be killed
>>> by memory_fauilre?
>>>
>>> // memory_failure():
>>>
>>>      if (TestSetPageHWPoison(p)) {
>>>          res = -EHWPOISON;
>>>          if (flags & MF_ACTION_REQUIRED)
>>>              res = kill_accessing_process(current, pfn, flags);
>>>          if (flags & MF_COUNT_INCREASED)
>>>              put_page(p);
>>>          action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>>>          goto unlock_mutex;
>>>      }
>>>
>>> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
>>> fix infinite UCE for VM_PFNMAP pfn").
>>>
>>> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
>>> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
>>> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
>>> the UCE was handled properly while the process was never actually killed.
>>>
>>> Did you try the lastest kernel version?
>>>
>>
>> I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.
>>
>>
>> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>          ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>
>>          /* This error has been reported before, don't process it again. */
>> -       if (ghes_estatus_cached(estatus))
>> +       if (ghes_estatus_cached(estatus)) {
>> +               pr_info("This error has been reported before, don't process it again.\n");
>>                  goto no_work;
>> +       }
>>
>> the test log Only some debug logs are retained here.
>>
>> [2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
>> [2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
>> [2026/3/24 14:51:58.458] [  130.558038][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>> [2026/3/24 14:51:58.459] [  130.572517][   C40] {1}[Hardware Error]: event severity: recoverable
>> [2026/3/24 14:51:58.459] [  130.578861][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
>> [2026/3/24 14:51:58.459] [  130.585203][   C40] {1}[Hardware Error]:   section_type: ARM processor error
>> [2026/3/24 14:51:58.459] [  130.592238][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
>> [2026/3/24 14:51:58.459] [  130.598492][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>> [2026/3/24 14:51:58.459] [  130.607871][   C40] {1}[Hardware Error]:   error affinity level: 0
>> [2026/3/24 14:51:58.459] [  130.614038][   C40] {1}[Hardware Error]:   running state: 0x1
>> [2026/3/24 14:51:58.459] [  130.619770][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
>> [2026/3/24 14:51:58.459] [  130.627673][   C40] {1}[Hardware Error]:   Error info structure 0:
>> [2026/3/24 14:51:58.459] [  130.633839][   C40] {1}[Hardware Error]:   num errors: 1
>> [2026/3/24 14:51:58.459] [  130.639137][   C40] {1}[Hardware Error]:    error_type: 0, cache error
>> [2026/3/24 14:51:58.459] [  130.645652][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
>> [2026/3/24 14:51:58.459] [  130.652514][   C40] {1}[Hardware Error]:     cache level: 1
>> [2026/3/24 14:51:58.551] [  130.658073][   C40] {1}[Hardware Error]:     the error has not been corrected
>> [2026/3/24 14:51:58.551] [  130.665194][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
>> [2026/3/24 14:51:58.551] [  130.673097][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
>> [2026/3/24 14:51:58.551] [  130.680744][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 14:51:58.551] [  130.690471][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 14:51:58.552] [  130.700198][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 14:51:58.552] [  130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
>> [2026/3/24 14:51:58.638] [  130.790952][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:51:58.903] [  131.046994][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:51:58.991] [  131.132360][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:51:59.969] [  132.071431][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:00.860] [  133.010255][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:01.927] [  134.034746][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:02.906] [  135.058973][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:03.971] [  136.083213][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:04.860] [  137.021956][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:06.018] [  138.131460][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:06.905] [  139.070280][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:07.886] [  140.009147][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 14:52:08.596] [  140.777368][   C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>> [2026/3/24 14:52:08.683] [  140.791921][   C40] {2}[Hardware Error]: event severity: recoverable
>> [2026/3/24 14:52:08.683] [  140.798263][   C40] {2}[Hardware Error]:  Error 0, type: recoverable
>> [2026/3/24 14:52:08.683] [  140.804606][   C40] {2}[Hardware Error]:   section_type: ARM processor error
>> [2026/3/24 14:52:08.683] [  140.811641][   C40] {2}[Hardware Error]:   MIDR: 0x0000000000000000
>> [2026/3/24 14:52:08.684] [  140.817895][   C40] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>> [2026/3/24 14:52:08.684] [  140.827274][   C40] {2}[Hardware Error]:   error affinity level: 0
>> [2026/3/24 14:52:08.684] [  140.833440][   C40] {2}[Hardware Error]:   running state: 0x1
>> [2026/3/24 14:52:08.684] [  140.839173][   C40] {2}[Hardware Error]:   Power State Coordination Interface state: 0
>> [2026/3/24 14:52:08.684] [  140.847076][   C40] {2}[Hardware Error]:   Error info structure 0:
>> [2026/3/24 14:52:08.684] [  140.853241][   C40] {2}[Hardware Error]:   num errors: 1
>> [2026/3/24 14:52:08.684] [  140.858540][   C40] {2}[Hardware Error]:    error_type: 0, cache error
>> [2026/3/24 14:52:08.684] [  140.865055][   C40] {2}[Hardware Error]:    error_info: 0x0000000020400014
>> [2026/3/24 14:52:08.684] [  140.871917][   C40] {2}[Hardware Error]:     cache level: 1
>> [2026/3/24 14:52:08.684] [  140.877475][   C40] {2}[Hardware Error]:     the error has not been corrected
>> [2026/3/24 14:52:08.764] [  140.884596][   C40] {2}[Hardware Error]:    physical fault address: 0x0000001351811800
>> [2026/3/24 14:52:08.764] [  140.892499][   C40] {2}[Hardware Error]:   Vendor specific error info has 48 bytes:
>> [2026/3/24 14:52:08.766] [  140.900145][   C40] {2}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 14:52:08.767] [  140.909872][   C40] {2}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 14:52:08.767] [  140.919598][   C40] {2}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 14:52:08.768] [  140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
>> [2026/3/24 14:52:08.768] [  140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption
>
> Did you cut off some logs here?

I just removed some duplicate debug logs: "This error has already been...", these were added by myself.
> The error log also indicates that the SIGBUS is delivered as expected.

An SError occurs at kernel time 130.558038. Then, after 10 seconds, the kernel
can re-enter the SEA processing flow and send the SIGBUS signal to the process.
This 10-second delay corresponds to the cache timeout threshold of the
ghes_estatus_cached() feature.
Therefore, the purpose of this patch is to send the SIGBUS signal to the process
immediately, rather than waiting for the timeout to expire.

>
>>
>>
>> Apply the patch:
>>
>> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>          ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>
>>          /* This error has been reported before, don't process it again. */
>> -       if (ghes_estatus_cached(estatus))
>> +       if (ghes_estatus_cached(estatus)) {
>> +               pr_info("This error has been reported before, don't process it again.\n");
>> +               rc = -ECANCELED;
>>                  goto no_work;
>> +       }
>>
>> [2026/3/24 16:45:40.084] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
>> [2026/3/24 16:45:40.272] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
>> [2026/3/24 16:45:40.362] [  112.279324][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>> [2026/3/24 16:45:40.362] [  112.293797][   C40] {1}[Hardware Error]: event severity: recoverable
>> [2026/3/24 16:45:40.362] [  112.300139][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
>> [2026/3/24 16:45:40.363] [  112.306481][   C40] {1}[Hardware Error]:   section_type: ARM processor error
>> [2026/3/24 16:45:40.363] [  112.313516][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
>> [2026/3/24 16:45:40.363] [  112.319771][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>> [2026/3/24 16:45:40.363] [  112.329151][   C40] {1}[Hardware Error]:   error affinity level: 0
>> [2026/3/24 16:45:40.363] [  112.335317][   C40] {1}[Hardware Error]:   running state: 0x1
>> [2026/3/24 16:45:40.363] [  112.341049][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
>> [2026/3/24 16:45:40.363] [  112.348953][   C40] {1}[Hardware Error]:   Error info structure 0:
>> [2026/3/24 16:45:40.363] [  112.355119][   C40] {1}[Hardware Error]:   num errors: 1
>> [2026/3/24 16:45:40.363] [  112.360418][   C40] {1}[Hardware Error]:    error_type: 0, cache error
>> [2026/3/24 16:45:40.363] [  112.366932][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
>> [2026/3/24 16:45:40.363] [  112.373795][   C40] {1}[Hardware Error]:     cache level: 1
>> [2026/3/24 16:45:40.453] [  112.379354][   C40] {1}[Hardware Error]:     the error has not been corrected
>> [2026/3/24 16:45:40.453] [  112.386475][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
>> [2026/3/24 16:45:40.453] [  112.394378][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
>> [2026/3/24 16:45:40.453] [  112.402027][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 16:45:40.453] [  112.411754][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 16:45:40.453] [  112.421480][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>> [2026/3/24 16:45:40.453] [  112.431639][ T9769] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
>> [2026/3/24 16:45:40.531] [  112.512520][   C40] This error has been reported before, don't process it again.
>> [2026/3/24 16:45:40.757] Bus error (core dumped)
>>
>
>
> Thanks.
> Shuai
>
>
> .
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-03-25  9:24             ` hejunhao
@ 2026-03-25 12:40               ` Shuai Xue
  2026-03-26 13:26                 ` hejunhao
  0 siblings, 1 reply; 12+ messages in thread
From: Shuai Xue @ 2026-03-25 12:40 UTC (permalink / raw)
  To: hejunhao, Rafael J. Wysocki, Luck, Tony
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei, Linuxarm



On 3/25/26 5:24 PM, hejunhao wrote:
> 
> 
> On 2026/3/25 10:12, Shuai Xue wrote:
>> Hi, junhao
>>
>> On 3/24/26 6:04 PM, hejunhao wrote:
>>> Hi shuai xue,
>>>
>>>
>>> On 2026/3/3 22:42, Shuai Xue wrote:
>>>> Hi, junhao,
>>>>
>>>> On 2/27/26 8:12 PM, hejunhao wrote:
>>>>>
>>>>>
>>>>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>>>>
>>>>>>
>>>>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>>>>
>>>>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>>>>> error will be reprocessed.
>>>>>>>>
>>>>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>>>>> memory corruption. If the same SEA error appears multiple times
>>>>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>>>>> return a failure when encountering the same error again, and then proceed
>>>>>>>> to arm64_do_kernel_sea for further processing.
>>>>
>>>> There is no such function in the arm64 tree. If apei_claim_sea() returns
>>>
>>> Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
>>> be arm64_notify_die().
>>>
>>>> an error, the actual fallback path in do_sea() is arm64_notify_die(),
>>>> which sends SIGBUS?
>>>>
>>>
>>> If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
>>> followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.
>>
>> So the process is expected to killed by SIGBUS?
> 
> Yes. The devmem process is expected to terminate upon receiving a SIGBUS signal, you can
> see this at the last line of the test log after the patch is applied.
> For other processes whether it terminates depends on whether it catches the signal; the kernel is
> responsible for sending it immediately.
> 
>>
>>>
>>>>>>>>
>>>>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>>>>> the SEA error exception handler. All such handlers will call
>>>>>>>> memory_failure() to handle the faulty page.
>>>>>>>>
>>>>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>>>>
>>>>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>>>>> scenario occurs:
>>>>>>>>
>>>>>>>> When a user process directly requests and accesses a page with hardware
>>>>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>>>>> However, since the process does not request the page through the kernel's
>>>>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>>>>> result, these processes continues to access the faulty page, causing
>>>>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>>>>> an SEA error interrupt storm.
>>>>
>>>> In such case, the user process which accessing the poisoned page will be killed
>>>> by memory_fauilre?
>>>>
>>>> // memory_failure():
>>>>
>>>>       if (TestSetPageHWPoison(p)) {
>>>>           res = -EHWPOISON;
>>>>           if (flags & MF_ACTION_REQUIRED)
>>>>               res = kill_accessing_process(current, pfn, flags);
>>>>           if (flags & MF_COUNT_INCREASED)
>>>>               put_page(p);
>>>>           action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>>>>           goto unlock_mutex;
>>>>       }
>>>>
>>>> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
>>>> fix infinite UCE for VM_PFNMAP pfn").
>>>>
>>>> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
>>>> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
>>>> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
>>>> the UCE was handled properly while the process was never actually killed.
>>>>
>>>> Did you try the lastest kernel version?
>>>>
>>>
>>> I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.
>>>
>>>
>>> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>>           ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>
>>>           /* This error has been reported before, don't process it again. */
>>> -       if (ghes_estatus_cached(estatus))
>>> +       if (ghes_estatus_cached(estatus)) {
>>> +               pr_info("This error has been reported before, don't process it again.\n");
>>>                   goto no_work;
>>> +       }
>>>
>>> the test log Only some debug logs are retained here.
>>>
>>> [2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
>>> [2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
>>> [2026/3/24 14:51:58.458] [  130.558038][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>> [2026/3/24 14:51:58.459] [  130.572517][   C40] {1}[Hardware Error]: event severity: recoverable
>>> [2026/3/24 14:51:58.459] [  130.578861][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
>>> [2026/3/24 14:51:58.459] [  130.585203][   C40] {1}[Hardware Error]:   section_type: ARM processor error
>>> [2026/3/24 14:51:58.459] [  130.592238][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
>>> [2026/3/24 14:51:58.459] [  130.598492][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>> [2026/3/24 14:51:58.459] [  130.607871][   C40] {1}[Hardware Error]:   error affinity level: 0
>>> [2026/3/24 14:51:58.459] [  130.614038][   C40] {1}[Hardware Error]:   running state: 0x1
>>> [2026/3/24 14:51:58.459] [  130.619770][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
>>> [2026/3/24 14:51:58.459] [  130.627673][   C40] {1}[Hardware Error]:   Error info structure 0:
>>> [2026/3/24 14:51:58.459] [  130.633839][   C40] {1}[Hardware Error]:   num errors: 1
>>> [2026/3/24 14:51:58.459] [  130.639137][   C40] {1}[Hardware Error]:    error_type: 0, cache error
>>> [2026/3/24 14:51:58.459] [  130.645652][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
>>> [2026/3/24 14:51:58.459] [  130.652514][   C40] {1}[Hardware Error]:     cache level: 1
>>> [2026/3/24 14:51:58.551] [  130.658073][   C40] {1}[Hardware Error]:     the error has not been corrected
>>> [2026/3/24 14:51:58.551] [  130.665194][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
>>> [2026/3/24 14:51:58.551] [  130.673097][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>> [2026/3/24 14:51:58.551] [  130.680744][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>> [2026/3/24 14:51:58.551] [  130.690471][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>> [2026/3/24 14:51:58.552] [  130.700198][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>> [2026/3/24 14:51:58.552] [  130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
>>> [2026/3/24 14:51:58.638] [  130.790952][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:51:58.903] [  131.046994][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:51:58.991] [  131.132360][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:51:59.969] [  132.071431][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:00.860] [  133.010255][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:01.927] [  134.034746][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:02.906] [  135.058973][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:03.971] [  136.083213][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:04.860] [  137.021956][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:06.018] [  138.131460][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:06.905] [  139.070280][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:07.886] [  140.009147][   C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:08.596] [  140.777368][   C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>> [2026/3/24 14:52:08.683] [  140.791921][   C40] {2}[Hardware Error]: event severity: recoverable
>>> [2026/3/24 14:52:08.683] [  140.798263][   C40] {2}[Hardware Error]:  Error 0, type: recoverable
>>> [2026/3/24 14:52:08.683] [  140.804606][   C40] {2}[Hardware Error]:   section_type: ARM processor error
>>> [2026/3/24 14:52:08.683] [  140.811641][   C40] {2}[Hardware Error]:   MIDR: 0x0000000000000000
>>> [2026/3/24 14:52:08.684] [  140.817895][   C40] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>> [2026/3/24 14:52:08.684] [  140.827274][   C40] {2}[Hardware Error]:   error affinity level: 0
>>> [2026/3/24 14:52:08.684] [  140.833440][   C40] {2}[Hardware Error]:   running state: 0x1
>>> [2026/3/24 14:52:08.684] [  140.839173][   C40] {2}[Hardware Error]:   Power State Coordination Interface state: 0
>>> [2026/3/24 14:52:08.684] [  140.847076][   C40] {2}[Hardware Error]:   Error info structure 0:
>>> [2026/3/24 14:52:08.684] [  140.853241][   C40] {2}[Hardware Error]:   num errors: 1
>>> [2026/3/24 14:52:08.684] [  140.858540][   C40] {2}[Hardware Error]:    error_type: 0, cache error
>>> [2026/3/24 14:52:08.684] [  140.865055][   C40] {2}[Hardware Error]:    error_info: 0x0000000020400014
>>> [2026/3/24 14:52:08.684] [  140.871917][   C40] {2}[Hardware Error]:     cache level: 1
>>> [2026/3/24 14:52:08.684] [  140.877475][   C40] {2}[Hardware Error]:     the error has not been corrected
>>> [2026/3/24 14:52:08.764] [  140.884596][   C40] {2}[Hardware Error]:    physical fault address: 0x0000001351811800
>>> [2026/3/24 14:52:08.764] [  140.892499][   C40] {2}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>> [2026/3/24 14:52:08.766] [  140.900145][   C40] {2}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>> [2026/3/24 14:52:08.767] [  140.909872][   C40] {2}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>> [2026/3/24 14:52:08.767] [  140.919598][   C40] {2}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>> [2026/3/24 14:52:08.768] [  140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
>>> [2026/3/24 14:52:08.768] [  140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption
>>
>> Did you cut off some logs here?
> 
> I just removed some duplicate debug logs: "This error has already been...", these were added by myself.
>> The error log also indicates that the SIGBUS is delivered as expected.
> 
> An SError occurs at kernel time 130.558038. Then, after 10 seconds, the kernel
> can re-enter the SEA processing flow and send the SIGBUS signal to the process.
> This 10-second delay corresponds to the cache timeout threshold of the
> ghes_estatus_cached() feature.
> Therefore, the purpose of this patch is to send the SIGBUS signal to the process
> immediately, rather than waiting for the timeout to expire.

Hi, hejun,

Sorry, but I am still not convinced by the log you provided.

As I understand your commit message, there are two different cases being discussed:

Case 1: memory error interrupt first, then SEA

When hardware memory corruption occurs, a memory error interrupt is
triggered first. If the kernel later accesses the corrupted data, it may
then enter the SEA handler. In this case, the faulty page would already
have been marked poisoned by the memory error interrupt path, and the SEA
handling path would eventually send SIGBUS to the task accessing that page.

Case 2: SEA first, then memory error interrupt

Your commit message describes this as the problematic scenario:

A user process directly accesses hardware-corrupted memory through a
PFNMAP-style mapping such as devmem. The page may still be in the free
buddy state when SEA is handled first. In that case, memory_failure()
poisons the page during SEA handling, but the process is not killed
immediately. Since the task continues accessing the same corrupted
location, it keeps re-entering the SEA handler, leading to an SEA storm.
Later, the memory error interrupt path also cannot kill the task, so the
system remains stuck in this repeated SEA loop.

My concern is that your recent explanation and log seem to demonstrate
something different from what the commit message claims to fix.

 From the log, what I can see is:

the first SEA occurs,
the page is marked poisoned as a free buddy page,
repeated SEAs are suppressed by ghes_estatus_cached(),
after the cache timeout expires, the SEA path runs again,
then memory_failure() reports "already hardware poisoned" and SIGBUS is
sent to the busybox devmem process.
This seems to show a delayed SIGBUS delivery caused by the GHES cache
timeout, rather than clearly demonstrating the SEA storm problem described
in the commit message.

So I think there is still a mismatch here:

If the patch is intended to fix the SEA storm described in case 2,
then I would expect evidence that the storm still exists on the latest
kernel and that this patch is what actually breaks that loop.
If instead the patch is intended to avoid the 10-second delay before
SIGBUS delivery, then that should be stated explicitly, because that is
a different problem statement from what the current commit message says.
Also, regarding the devmem/PFNMAP case: I previously pointed to commit
2e6053fea379 ("mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn"),
which was meant to address the failure to kill tasks accessing poisoned
VM_PFNMAP mappings.

So my main question is:

Does the SEA storm issue still exist on the latest kernel version, or is
the remaining issue only that SIGBUS is delayed by the GHES estatus cache
timeout?

I think the answer to that question is important before deciding whether
this patch is correct, and before finalizing the commit message.

Thanks,
Shuai



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-03-25 12:40               ` Shuai Xue
@ 2026-03-26 13:26                 ` hejunhao
  2026-04-07  2:23                   ` Shuai Xue
  0 siblings, 1 reply; 12+ messages in thread
From: hejunhao @ 2026-03-26 13:26 UTC (permalink / raw)
  To: Shuai Xue, Rafael J. Wysocki, Luck, Tony
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei, Linuxarm, Junhao He


On 2026/3/25 20:40, Shuai Xue wrote:
>
>
> On 3/25/26 5:24 PM, hejunhao wrote:
>>
>>
>> On 2026/3/25 10:12, Shuai Xue wrote:
>>> Hi, junhao
>>>
>>> On 3/24/26 6:04 PM, hejunhao wrote:
>>>> Hi shuai xue,
>>>>
>>>>
>>>> On 2026/3/3 22:42, Shuai Xue wrote:
>>>>> Hi, junhao,
>>>>>
>>>>> On 2/27/26 8:12 PM, hejunhao wrote:
>>>>>>
>>>>>>
>>>>>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>>>>>
>>>>>>>
>>>>>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>>>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>>>>>
>>>>>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>>>>>> error will be reprocessed.
>>>>>>>>>
>>>>>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>>>>>> memory corruption. If the same SEA error appears multiple times
>>>>>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>>>>>> return a failure when encountering the same error again, and then proceed
>>>>>>>>> to arm64_do_kernel_sea for further processing.
>>>>>
>>>>> There is no such function in the arm64 tree. If apei_claim_sea() returns
>>>>
>>>> Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
>>>> be arm64_notify_die().
>>>>
>>>>> an error, the actual fallback path in do_sea() is arm64_notify_die(),
>>>>> which sends SIGBUS?
>>>>>
>>>>
>>>> If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
>>>> followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.
>>>
>>> So the process is expected to killed by SIGBUS?
>>
>> Yes. The devmem process is expected to terminate upon receiving a SIGBUS signal, you can
>> see this at the last line of the test log after the patch is applied.
>> For other processes whether it terminates depends on whether it catches the signal; the kernel is
>> responsible for sending it immediately.
>>
>>>
>>>>
>>>>>>>>>
>>>>>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>>>>>> the SEA error exception handler. All such handlers will call
>>>>>>>>> memory_failure() to handle the faulty page.
>>>>>>>>>
>>>>>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>>>>>
>>>>>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>>>>>> scenario occurs:
>>>>>>>>>
>>>>>>>>> When a user process directly requests and accesses a page with hardware
>>>>>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>>>>>> However, since the process does not request the page through the kernel's
>>>>>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>>>>>> result, these processes continues to access the faulty page, causing
>>>>>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>>>>>> an SEA error interrupt storm.
>>>>>
>>>>> In such case, the user process which accessing the poisoned page will be killed
>>>>> by memory_fauilre?
>>>>>
>>>>> // memory_failure():
>>>>>
>>>>>       if (TestSetPageHWPoison(p)) {
>>>>>           res = -EHWPOISON;
>>>>>           if (flags & MF_ACTION_REQUIRED)
>>>>>               res = kill_accessing_process(current, pfn, flags);
>>>>>           if (flags & MF_COUNT_INCREASED)
>>>>>               put_page(p);
>>>>>           action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>>>>>           goto unlock_mutex;
>>>>>       }
>>>>>
>>>>> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
>>>>> fix infinite UCE for VM_PFNMAP pfn").
>>>>>
>>>>> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
>>>>> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
>>>>> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
>>>>> the UCE was handled properly while the process was never actually killed.
>>>>>
>>>>> Did you try the lastest kernel version?
>>>>>
>>>>
>>>> I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.
>>>>
>>>>
>>>> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>>>           ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>>
>>>>           /* This error has been reported before, don't process it again. */
>>>> -       if (ghes_estatus_cached(estatus))
>>>> +       if (ghes_estatus_cached(estatus)) {
>>>> +               pr_info("This error has been reported before, don't process it again.\n");
>>>>                   goto no_work;
>>>> +       }
>>>>
>>>> the test log Only some debug logs are retained here.
>>>>
>>>> [2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
>>>> [2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
>>>> [2026/3/24 14:51:58.458] [  130.558038][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>> [2026/3/24 14:51:58.459] [  130.572517][   C40] {1}[Hardware Error]: event severity: recoverable
>>>> [2026/3/24 14:51:58.459] [  130.578861][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
>>>> [2026/3/24 14:51:58.459] [  130.585203][   C40] {1}[Hardware Error]:   section_type: ARM processor error
>>>> [2026/3/24 14:51:58.459] [  130.592238][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
>>>> [2026/3/24 14:51:58.459] [  130.598492][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>>> [2026/3/24 14:51:58.459] [  130.607871][   C40] {1}[Hardware Error]:   error affinity level: 0
>>>> [2026/3/24 14:51:58.459] [  130.614038][   C40] {1}[Hardware Error]:   running state: 0x1
>>>> [2026/3/24 14:51:58.459] [  130.619770][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
>>>> [2026/3/24 14:51:58.459] [  130.627673][   C40] {1}[Hardware Error]:   Error info structure 0:
>>>> [2026/3/24 14:51:58.459] [  130.633839][   C40] {1}[Hardware Error]:   num errors: 1
>>>> [2026/3/24 14:51:58.459] [  130.639137][   C40] {1}[Hardware Error]:    error_type: 0, cache error
>>>> [2026/3/24 14:51:58.459] [  130.645652][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
>>>> [2026/3/24 14:51:58.459] [  130.652514][   C40] {1}[Hardware Error]:     cache level: 1
>>>> [2026/3/24 14:51:58.551] [  130.658073][   C40] {1}[Hardware Error]:     the error has not been corrected
>>>> [2026/3/24 14:51:58.551] [  130.665194][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
>>>> [2026/3/24 14:51:58.551] [  130.673097][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>> [2026/3/24 14:51:58.551] [  130.680744][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>> [2026/3/24 14:51:58.551] [  130.690471][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>> [2026/3/24 14:51:58.552] [  130.700198][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>> [2026/3/24 14:51:58.552] [  130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
>>>> [2026/3/24 14:51:58.638] [  130.790952][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:51:58.903] [  131.046994][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:51:58.991] [  131.132360][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:51:59.969] [  132.071431][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:00.860] [  133.010255][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:01.927] [  134.034746][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:02.906] [  135.058973][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:03.971] [  136.083213][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:04.860] [  137.021956][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:06.018] [  138.131460][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:06.905] [  139.070280][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:07.886] [  140.009147][   C40] This error has been reported before, don't process it again.
>>>> [2026/3/24 14:52:08.596] [  140.777368][   C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>> [2026/3/24 14:52:08.683] [  140.791921][   C40] {2}[Hardware Error]: event severity: recoverable
>>>> [2026/3/24 14:52:08.683] [  140.798263][   C40] {2}[Hardware Error]:  Error 0, type: recoverable
>>>> [2026/3/24 14:52:08.683] [  140.804606][   C40] {2}[Hardware Error]:   section_type: ARM processor error
>>>> [2026/3/24 14:52:08.683] [  140.811641][   C40] {2}[Hardware Error]:   MIDR: 0x0000000000000000
>>>> [2026/3/24 14:52:08.684] [  140.817895][   C40] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>>> [2026/3/24 14:52:08.684] [  140.827274][   C40] {2}[Hardware Error]:   error affinity level: 0
>>>> [2026/3/24 14:52:08.684] [  140.833440][   C40] {2}[Hardware Error]:   running state: 0x1
>>>> [2026/3/24 14:52:08.684] [  140.839173][   C40] {2}[Hardware Error]:   Power State Coordination Interface state: 0
>>>> [2026/3/24 14:52:08.684] [  140.847076][   C40] {2}[Hardware Error]:   Error info structure 0:
>>>> [2026/3/24 14:52:08.684] [  140.853241][   C40] {2}[Hardware Error]:   num errors: 1
>>>> [2026/3/24 14:52:08.684] [  140.858540][   C40] {2}[Hardware Error]:    error_type: 0, cache error
>>>> [2026/3/24 14:52:08.684] [  140.865055][   C40] {2}[Hardware Error]:    error_info: 0x0000000020400014
>>>> [2026/3/24 14:52:08.684] [  140.871917][   C40] {2}[Hardware Error]:     cache level: 1
>>>> [2026/3/24 14:52:08.684] [  140.877475][   C40] {2}[Hardware Error]:     the error has not been corrected
>>>> [2026/3/24 14:52:08.764] [  140.884596][   C40] {2}[Hardware Error]:    physical fault address: 0x0000001351811800
>>>> [2026/3/24 14:52:08.764] [  140.892499][   C40] {2}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>> [2026/3/24 14:52:08.766] [  140.900145][   C40] {2}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>> [2026/3/24 14:52:08.767] [  140.909872][   C40] {2}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>> [2026/3/24 14:52:08.767] [  140.919598][   C40] {2}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>> [2026/3/24 14:52:08.768] [  140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
>>>> [2026/3/24 14:52:08.768] [  140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption
>>>
>>> Did you cut off some logs here?
>>
>> I just removed some duplicate debug logs: "This error has already been...", these were added by myself.

Hi, Shuai

Compared to the original commit message and the logs reproducing this issue
on kernel v7.0.0-rc4, perhaps you are asking whether the current log is missing
information such as 'NOTICE: SEA Handle'?
These miss logs are from the firmware. To reduce serial output, the firmware has
hidden these debug prints. However, using my own custom debug logs, I can
still see that the kernel's do_sea() process is continuously running during the
10-second cache timeout. Although only one debug log is retained per second.
This confirms that the issue is still present on the latest kernel v7.0.0-rc4.

>>> The error log also indicates that the SIGBUS is delivered as expected.
>>
>> An SError occurs at kernel time 130.558038. Then, after 10 seconds, the kernel
>> can re-enter the SEA processing flow and send the SIGBUS signal to the process.
>> This 10-second delay corresponds to the cache timeout threshold of the
>> ghes_estatus_cached() feature.
>> Therefore, the purpose of this patch is to send the SIGBUS signal to the process
>> immediately, rather than waiting for the timeout to expire.
>
> Hi, hejun,
>
> Sorry, but I am still not convinced by the log you provided.
>
> As I understand your commit message, there are two different cases being discussed:
>
> Case 1: memory error interrupt first, then SEA
>
> When hardware memory corruption occurs, a memory error interrupt is
> triggered first. If the kernel later accesses the corrupted data, it may
> then enter the SEA handler. In this case, the faulty page would already
> have been marked poisoned by the memory error interrupt path, and the SEA
> handling path would eventually send SIGBUS to the task accessing that page.
>
> Case 2: SEA first, then memory error interrupt
>
> Your commit message describes this as the problematic scenario:
>
> A user process directly accesses hardware-corrupted memory through a
> PFNMAP-style mapping such as devmem. The page may still be in the free
> buddy state when SEA is handled first. In that case, memory_failure()
> poisons the page during SEA handling, but the process is not killed
> immediately. Since the task continues accessing the same corrupted
> location, it keeps re-entering the SEA handler, leading to an SEA storm.
> Later, the memory error interrupt path also cannot kill the task, so the
> system remains stuck in this repeated SEA loop.
Yes.
>
> My concern is that your recent explanation and log seem to demonstrate
> something different from what the commit message claims to fix.
>
> From the log, what I can see is:
>
> the first SEA occurs,
> the page is marked poisoned as a free buddy page,
> repeated SEAs are suppressed by ghes_estatus_cached(),
> after the cache timeout expires, the SEA path runs again,
> then memory_failure() reports "already hardware poisoned" and SIGBUS is
> sent to the busybox devmem process.
> This seems to show a delayed SIGBUS delivery caused by the GHES cache
> timeout, rather than clearly demonstrating the SEA storm problem described
> in the commit message.
>
> So I think there is still a mismatch here:
>
> If the patch is intended to fix the SEA storm described in case 2,
> then I would expect evidence that the storm still exists on the latest
> kernel and that this patch is what actually breaks that loop.
> If instead the patch is intended to avoid the 10-second delay before
> SIGBUS delivery, then that should be stated explicitly, because that is
> a different problem statement from what the current commit message says.
> Also, regarding the devmem/PFNMAP case: I previously pointed to commit
> 2e6053fea379 ("mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn"),
> which was meant to address the failure to kill tasks accessing poisoned
> VM_PFNMAP mappings.
>

This patch was already merged prior to kernel v7.0.0-rc4, therefore, it cannot fix this issue.

I reverted the patch on kernel v7.0.0-rc4 to reproduce the issue.
The debug logs show that the message 'This error has already been...' persists
for more than 10 seconds, and the printing cannot be stopped. so it fixes other issue.

> So my main question is:
>
> Does the SEA storm issue still exist on the latest kernel version, or is
> the remaining issue only that SIGBUS is delayed by the GHES estatus cache
> timeout?

We should not treat them separately.

In case 2, First SEA can only poisons the page, and then re-enter the SEA processing flow.
Due to the reporting throttle of the ghes_estatus_cached(), SEA cannot timely invoke
memory_failure()  to kill the task, the task will continues accessing the same corrupted
location, then re-enter the SEA processing flow loop, so causing the SEA storm...
Perhaps I never clearly explained why the SEA storm occurred.

Best regards,
Junhao.

>
> I think the answer to that question is important before deciding whether
> this patch is correct, and before finalizing the commit message.
>
> Thanks,
> Shuai
>
> .
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-03-26 13:26                 ` hejunhao
@ 2026-04-07  2:23                   ` Shuai Xue
  2026-04-09  3:10                     ` hejunhao
  0 siblings, 1 reply; 12+ messages in thread
From: Shuai Xue @ 2026-04-07  2:23 UTC (permalink / raw)
  To: hejunhao, Rafael J. Wysocki, Luck, Tony, linmiaohe@huawei.com,
	Luck, Tony
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei, Linuxarm



On 3/26/26 9:26 PM, hejunhao wrote:
> 
> On 2026/3/25 20:40, Shuai Xue wrote:
>>
>>
>> On 3/25/26 5:24 PM, hejunhao wrote:
>>>
>>>
>>> On 2026/3/25 10:12, Shuai Xue wrote:
>>>> Hi, junhao
>>>>
>>>> On 3/24/26 6:04 PM, hejunhao wrote:
>>>>> Hi shuai xue,
>>>>>
>>>>>
>>>>> On 2026/3/3 22:42, Shuai Xue wrote:
>>>>>> Hi, junhao,
>>>>>>
>>>>>> On 2/27/26 8:12 PM, hejunhao wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>>>>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>>>>>>
>>>>>>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>>>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>>>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>>>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>>>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>>>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>>>>>>> error will be reprocessed.
>>>>>>>>>>
>>>>>>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>>>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>>>>>>> memory corruption. If the same SEA error appears multiple times
>>>>>>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>>>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>>>>>>> return a failure when encountering the same error again, and then proceed
>>>>>>>>>> to arm64_do_kernel_sea for further processing.
>>>>>>
>>>>>> There is no such function in the arm64 tree. If apei_claim_sea() returns
>>>>>
>>>>> Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
>>>>> be arm64_notify_die().
>>>>>
>>>>>> an error, the actual fallback path in do_sea() is arm64_notify_die(),
>>>>>> which sends SIGBUS?
>>>>>>
>>>>>
>>>>> If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
>>>>> followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.
>>>>
>>>> So the process is expected to killed by SIGBUS?
>>>
>>> Yes. The devmem process is expected to terminate upon receiving a SIGBUS signal, you can
>>> see this at the last line of the test log after the patch is applied.
>>> For other processes whether it terminates depends on whether it catches the signal; the kernel is
>>> responsible for sending it immediately.
>>>
>>>>
>>>>>
>>>>>>>>>>
>>>>>>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>>>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>>>>>>> the SEA error exception handler. All such handlers will call
>>>>>>>>>> memory_failure() to handle the faulty page.
>>>>>>>>>>
>>>>>>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>>>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>>>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>>>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>>>>>>
>>>>>>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>>>>>>> scenario occurs:
>>>>>>>>>>
>>>>>>>>>> When a user process directly requests and accesses a page with hardware
>>>>>>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>>>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>>>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>>>>>>> However, since the process does not request the page through the kernel's
>>>>>>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>>>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>>>>>>> result, these processes continues to access the faulty page, causing
>>>>>>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>>>>>>> an SEA error interrupt storm.
>>>>>>
>>>>>> In such case, the user process which accessing the poisoned page will be killed
>>>>>> by memory_fauilre?
>>>>>>
>>>>>> // memory_failure():
>>>>>>
>>>>>>        if (TestSetPageHWPoison(p)) {
>>>>>>            res = -EHWPOISON;
>>>>>>            if (flags & MF_ACTION_REQUIRED)
>>>>>>                res = kill_accessing_process(current, pfn, flags);
>>>>>>            if (flags & MF_COUNT_INCREASED)
>>>>>>                put_page(p);
>>>>>>            action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>>>>>>            goto unlock_mutex;
>>>>>>        }
>>>>>>
>>>>>> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
>>>>>> fix infinite UCE for VM_PFNMAP pfn").
>>>>>>
>>>>>> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
>>>>>> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
>>>>>> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
>>>>>> the UCE was handled properly while the process was never actually killed.
>>>>>>
>>>>>> Did you try the lastest kernel version?
>>>>>>
>>>>>
>>>>> I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.
>>>>>
>>>>>
>>>>> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>>>>            ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>>>
>>>>>            /* This error has been reported before, don't process it again. */
>>>>> -       if (ghes_estatus_cached(estatus))
>>>>> +       if (ghes_estatus_cached(estatus)) {
>>>>> +               pr_info("This error has been reported before, don't process it again.\n");
>>>>>                    goto no_work;
>>>>> +       }
>>>>>
>>>>> the test log Only some debug logs are retained here.
>>>>>
>>>>> [2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
>>>>> [2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
>>>>> [2026/3/24 14:51:58.458] [  130.558038][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>> [2026/3/24 14:51:58.459] [  130.572517][   C40] {1}[Hardware Error]: event severity: recoverable
>>>>> [2026/3/24 14:51:58.459] [  130.578861][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
>>>>> [2026/3/24 14:51:58.459] [  130.585203][   C40] {1}[Hardware Error]:   section_type: ARM processor error
>>>>> [2026/3/24 14:51:58.459] [  130.592238][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>> [2026/3/24 14:51:58.459] [  130.598492][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>>>> [2026/3/24 14:51:58.459] [  130.607871][   C40] {1}[Hardware Error]:   error affinity level: 0
>>>>> [2026/3/24 14:51:58.459] [  130.614038][   C40] {1}[Hardware Error]:   running state: 0x1
>>>>> [2026/3/24 14:51:58.459] [  130.619770][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>> [2026/3/24 14:51:58.459] [  130.627673][   C40] {1}[Hardware Error]:   Error info structure 0:
>>>>> [2026/3/24 14:51:58.459] [  130.633839][   C40] {1}[Hardware Error]:   num errors: 1
>>>>> [2026/3/24 14:51:58.459] [  130.639137][   C40] {1}[Hardware Error]:    error_type: 0, cache error
>>>>> [2026/3/24 14:51:58.459] [  130.645652][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
>>>>> [2026/3/24 14:51:58.459] [  130.652514][   C40] {1}[Hardware Error]:     cache level: 1
>>>>> [2026/3/24 14:51:58.551] [  130.658073][   C40] {1}[Hardware Error]:     the error has not been corrected
>>>>> [2026/3/24 14:51:58.551] [  130.665194][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
>>>>> [2026/3/24 14:51:58.551] [  130.673097][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>> [2026/3/24 14:51:58.551] [  130.680744][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>> [2026/3/24 14:51:58.551] [  130.690471][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>> [2026/3/24 14:51:58.552] [  130.700198][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>> [2026/3/24 14:51:58.552] [  130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
>>>>> [2026/3/24 14:51:58.638] [  130.790952][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:51:58.903] [  131.046994][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:51:58.991] [  131.132360][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:51:59.969] [  132.071431][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:00.860] [  133.010255][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:01.927] [  134.034746][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:02.906] [  135.058973][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:03.971] [  136.083213][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:04.860] [  137.021956][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:06.018] [  138.131460][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:06.905] [  139.070280][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:07.886] [  140.009147][   C40] This error has been reported before, don't process it again.
>>>>> [2026/3/24 14:52:08.596] [  140.777368][   C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>> [2026/3/24 14:52:08.683] [  140.791921][   C40] {2}[Hardware Error]: event severity: recoverable
>>>>> [2026/3/24 14:52:08.683] [  140.798263][   C40] {2}[Hardware Error]:  Error 0, type: recoverable
>>>>> [2026/3/24 14:52:08.683] [  140.804606][   C40] {2}[Hardware Error]:   section_type: ARM processor error
>>>>> [2026/3/24 14:52:08.683] [  140.811641][   C40] {2}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>> [2026/3/24 14:52:08.684] [  140.817895][   C40] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>>>> [2026/3/24 14:52:08.684] [  140.827274][   C40] {2}[Hardware Error]:   error affinity level: 0
>>>>> [2026/3/24 14:52:08.684] [  140.833440][   C40] {2}[Hardware Error]:   running state: 0x1
>>>>> [2026/3/24 14:52:08.684] [  140.839173][   C40] {2}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>> [2026/3/24 14:52:08.684] [  140.847076][   C40] {2}[Hardware Error]:   Error info structure 0:
>>>>> [2026/3/24 14:52:08.684] [  140.853241][   C40] {2}[Hardware Error]:   num errors: 1
>>>>> [2026/3/24 14:52:08.684] [  140.858540][   C40] {2}[Hardware Error]:    error_type: 0, cache error
>>>>> [2026/3/24 14:52:08.684] [  140.865055][   C40] {2}[Hardware Error]:    error_info: 0x0000000020400014
>>>>> [2026/3/24 14:52:08.684] [  140.871917][   C40] {2}[Hardware Error]:     cache level: 1
>>>>> [2026/3/24 14:52:08.684] [  140.877475][   C40] {2}[Hardware Error]:     the error has not been corrected
>>>>> [2026/3/24 14:52:08.764] [  140.884596][   C40] {2}[Hardware Error]:    physical fault address: 0x0000001351811800
>>>>> [2026/3/24 14:52:08.764] [  140.892499][   C40] {2}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>> [2026/3/24 14:52:08.766] [  140.900145][   C40] {2}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>> [2026/3/24 14:52:08.767] [  140.909872][   C40] {2}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>> [2026/3/24 14:52:08.767] [  140.919598][   C40] {2}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>> [2026/3/24 14:52:08.768] [  140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
>>>>> [2026/3/24 14:52:08.768] [  140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption
>>>>
>>>> Did you cut off some logs here?
>>>
>>> I just removed some duplicate debug logs: "This error has already been...", these were added by myself.
> 
> Hi, Shuai

Hi, Junhao,

Sorry for late reply.

> 
> Compared to the original commit message and the logs reproducing this issue
> on kernel v7.0.0-rc4, perhaps you are asking whether the current log is missing
> information such as 'NOTICE: SEA Handle'?
> These miss logs are from the firmware. To reduce serial output, the firmware has
> hidden these debug prints. However, using my own custom debug logs, I can
> still see that the kernel's do_sea() process is continuously running during the
> 10-second cache timeout. Although only one debug log is retained per second.
> This confirms that the issue is still present on the latest kernel v7.0.0-rc4.
> 
>>>> The error log also indicates that the SIGBUS is delivered as expected.
>>>
>>> An SError occurs at kernel time 130.558038. Then, after 10 seconds, the kernel
>>> can re-enter the SEA processing flow and send the SIGBUS signal to the process.
>>> This 10-second delay corresponds to the cache timeout threshold of the
>>> ghes_estatus_cached() feature.
>>> Therefore, the purpose of this patch is to send the SIGBUS signal to the process
>>> immediately, rather than waiting for the timeout to expire.
>>
>> Hi, hejun,
>>
>> Sorry, but I am still not convinced by the log you provided.
>>
>> As I understand your commit message, there are two different cases being discussed:
>>
>> Case 1: memory error interrupt first, then SEA
>>
>> When hardware memory corruption occurs, a memory error interrupt is
>> triggered first. If the kernel later accesses the corrupted data, it may
>> then enter the SEA handler. In this case, the faulty page would already
>> have been marked poisoned by the memory error interrupt path, and the SEA
>> handling path would eventually send SIGBUS to the task accessing that page.
>>
>> Case 2: SEA first, then memory error interrupt
>>
>> Your commit message describes this as the problematic scenario:
>>
>> A user process directly accesses hardware-corrupted memory through a
>> PFNMAP-style mapping such as devmem. The page may still be in the free
>> buddy state when SEA is handled first. In that case, memory_failure()
>> poisons the page during SEA handling, but the process is not killed
>> immediately. Since the task continues accessing the same corrupted
>> location, it keeps re-entering the SEA handler, leading to an SEA storm.
>> Later, the memory error interrupt path also cannot kill the task, so the
>> system remains stuck in this repeated SEA loop.
> Yes.
>>
>> My concern is that your recent explanation and log seem to demonstrate
>> something different from what the commit message claims to fix.
>>
>>  From the log, what I can see is:
>>
>> the first SEA occurs,
>> the page is marked poisoned as a free buddy page,
>> repeated SEAs are suppressed by ghes_estatus_cached(),
>> after the cache timeout expires, the SEA path runs again,
>> then memory_failure() reports "already hardware poisoned" and SIGBUS is
>> sent to the busybox devmem process.
>> This seems to show a delayed SIGBUS delivery caused by the GHES cache
>> timeout, rather than clearly demonstrating the SEA storm problem described
>> in the commit message.
>>
>> So I think there is still a mismatch here:
>>
>> If the patch is intended to fix the SEA storm described in case 2,
>> then I would expect evidence that the storm still exists on the latest
>> kernel and that this patch is what actually breaks that loop.
>> If instead the patch is intended to avoid the 10-second delay before
>> SIGBUS delivery, then that should be stated explicitly, because that is
>> a different problem statement from what the current commit message says.
>> Also, regarding the devmem/PFNMAP case: I previously pointed to commit
>> 2e6053fea379 ("mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn"),
>> which was meant to address the failure to kill tasks accessing poisoned
>> VM_PFNMAP mappings.
>>
> 
> This patch was already merged prior to kernel v7.0.0-rc4, therefore, it cannot fix this issue.
> 
> I reverted the patch on kernel v7.0.0-rc4 to reproduce the issue.
> The debug logs show that the message 'This error has already been...' persists
> for more than 10 seconds, and the printing cannot be stopped. so it fixes other issue.

Thanks for confirm.

> 
>> So my main question is:
>>
>> Does the SEA storm issue still exist on the latest kernel version, or is
>> the remaining issue only that SIGBUS is delayed by the GHES estatus cache
>> timeout?
> 
> We should not treat them separately.

Agreed. Please update the commit message to explain the causal chain explicitly:

- The first SEA poisons the free buddy page but does not kill the
   accessing task, because memory_failure() takes the free-buddy recovery
   path and never reaches kill_accessing_process().

- The task re-enters the SEA handler repeatedly, but
   ghes_estatus_cached() suppresses all subsequent entries during the
   10-second window, preventing ghes_do_proc() from being called and
   blocking the MF_ACTION_REQUIRED-based SIGBUS delivery.

- This suppression is what sustains the SEA storm.

> 
> In case 2, First SEA can only poisons the page, and then re-enter the SEA processing flow.
> Due to the reporting throttle of the ghes_estatus_cached(), SEA cannot timely invoke
> memory_failure()  to kill the task, the task will continues accessing the same corrupted
> location, then re-enter the SEA processing flow loop, so causing the SEA storm...
> Perhaps I never clearly explained why the SEA storm occurred.

+cc Lin Miaohe for the memory_failure() discussion.

Regarding the memory_failure() path: since SEA is a synchronous
notification, is_hest_syncnotify() returns true, ghesdo_proc() sets sync
= true, and MF_ACTION_REQUIRED is passed into ghes_do_memory_failure().
This means that on the second and subsequent SEAs (after cache expiry),
memory_failure() would reach the already-poisoned branch and call
kill_accessing_process() to terminate the task:


	if (TestSetPageHWPoison(p)) {
		res = -EHWPOISON;
		if (flags & MF_ACTION_REQUIRED)
			res = kill_accessing_process(current, pfn, flags);
		if (flags & MF_COUNT_INCREASED)
			put_page(p);
		action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
		goto unlock_mutex;
	}

The patch short-circuits this by terminating the task earlier, via
arm64_notify_die(), on every cache-suppressed SEA. I have no objection
to killing the process early in this way.

+cc Tony Luck for the ghes_notify_nmi path.

One concern is the impact on ghes_notify_nmi().

ghes_in_nmi_queue_one_entry() is shared between two callers:

ghes_notify_sea() → ghes_in_nmi_spool_from_list(&ghes_sea, ...)
ghes_notify_nmi() → ghes_in_nmi_spool_from_list(&ghes_nmi, ...)

For the NMI path, if ghes_estatuscached() hits and
ghesin_nmi_queue_one_entry() now returns -ECANCELED instead of 0,
ghesinnmi_spool_from_list() will not set ret = 0, and ghes_notify_nmi()
will return NMI_DONE instead of NMI_HANDLED. This tells the NMI handler
chain that no handler claimed the interrupt, which is semantically
incorrect — an active hardware error was observed, but deliberately
suppressed by the cache. NMI errors are asynchronous (sync = false,
MF_ACTION_REQUIRED not set), so there is no practical impact on the kill
path. However, returning NMI_DONE for a cache-suppressed NMI could cause
spurious warnings from the NMI dispatcher on some platforms. To avoid
this, I suggest scoping the -ECANCELED return to the synchronous (SEA)
case only. One approach is to pass a bool sync parameter down through
ghes_in_nmi_spool_from_list() and ghes_innmiqueue_one_entry(), returning
-ECANCELED on cache-hit only when sync is true. Alternatively, this
logic can be handled at the ghes_notify_sea() call site directly.

Shuai
Thanks.
Shuai


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
  2026-04-07  2:23                   ` Shuai Xue
@ 2026-04-09  3:10                     ` hejunhao
  0 siblings, 0 replies; 12+ messages in thread
From: hejunhao @ 2026-04-09  3:10 UTC (permalink / raw)
  To: Shuai Xue, Rafael J. Wysocki, Luck, Tony, linmiaohe@huawei.com
  Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
	Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
	linux-edac, shiju.jose, tanxiaofei, Linuxarm


On 2026/4/7 10:23, Shuai Xue wrote:
>
>
> On 3/26/26 9:26 PM, hejunhao wrote:
>>
>> On 2026/3/25 20:40, Shuai Xue wrote:
>>>
>>>
>>> On 3/25/26 5:24 PM, hejunhao wrote:
>>>>
>>>>
>>>> On 2026/3/25 10:12, Shuai Xue wrote:
>>>>> Hi, junhao
>>>>>
>>>>> On 3/24/26 6:04 PM, hejunhao wrote:
>>>>>> Hi shuai xue,
>>>>>>
>>>>>>
>>>>>> On 2026/3/3 22:42, Shuai Xue wrote:
>>>>>>> Hi, junhao,
>>>>>>>
>>>>>>> On 2/27/26 8:12 PM, hejunhao wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>>>>>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>>>>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>>>>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>>>>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>>>>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>>>>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>>>>>>>> error will be reprocessed.
>>>>>>>>>>>
>>>>>>>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>>>>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>>>>>>>> memory corruption. If the same SEA error appears multiple times
>>>>>>>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>>>>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>>>>>>>> return a failure when encountering the same error again, and then proceed
>>>>>>>>>>> to arm64_do_kernel_sea for further processing.
>>>>>>>
>>>>>>> There is no such function in the arm64 tree. If apei_claim_sea() returns
>>>>>>
>>>>>> Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
>>>>>> be arm64_notify_die().
>>>>>>
>>>>>>> an error, the actual fallback path in do_sea() is arm64_notify_die(),
>>>>>>> which sends SIGBUS?
>>>>>>>
>>>>>>
>>>>>> If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
>>>>>> followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.
>>>>>
>>>>> So the process is expected to killed by SIGBUS?
>>>>
>>>> Yes. The devmem process is expected to terminate upon receiving a SIGBUS signal, you can
>>>> see this at the last line of the test log after the patch is applied.
>>>> For other processes whether it terminates depends on whether it catches the signal; the kernel is
>>>> responsible for sending it immediately.
>>>>
>>>>>
>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>>>>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>>>>>>>> the SEA error exception handler. All such handlers will call
>>>>>>>>>>> memory_failure() to handle the faulty page.
>>>>>>>>>>>
>>>>>>>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>>>>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>>>>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>>>>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>>>>>>>
>>>>>>>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>>>>>>>> scenario occurs:
>>>>>>>>>>>
>>>>>>>>>>> When a user process directly requests and accesses a page with hardware
>>>>>>>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>>>>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>>>>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>>>>>>>> However, since the process does not request the page through the kernel's
>>>>>>>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>>>>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>>>>>>>> result, these processes continues to access the faulty page, causing
>>>>>>>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>>>>>>>> an SEA error interrupt storm.
>>>>>>>
>>>>>>> In such case, the user process which accessing the poisoned page will be killed
>>>>>>> by memory_fauilre?
>>>>>>>
>>>>>>> // memory_failure():
>>>>>>>
>>>>>>>        if (TestSetPageHWPoison(p)) {
>>>>>>>            res = -EHWPOISON;
>>>>>>>            if (flags & MF_ACTION_REQUIRED)
>>>>>>>                res = kill_accessing_process(current, pfn, flags);
>>>>>>>            if (flags & MF_COUNT_INCREASED)
>>>>>>>                put_page(p);
>>>>>>>            action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>>>>>>>            goto unlock_mutex;
>>>>>>>        }
>>>>>>>
>>>>>>> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
>>>>>>> fix infinite UCE for VM_PFNMAP pfn").
>>>>>>>
>>>>>>> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
>>>>>>> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
>>>>>>> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
>>>>>>> the UCE was handled properly while the process was never actually killed.
>>>>>>>
>>>>>>> Did you try the lastest kernel version?
>>>>>>>
>>>>>>
>>>>>> I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.
>>>>>>
>>>>>>
>>>>>> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>>>>>            ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>>>>
>>>>>>            /* This error has been reported before, don't process it again. */
>>>>>> -       if (ghes_estatus_cached(estatus))
>>>>>> +       if (ghes_estatus_cached(estatus)) {
>>>>>> +               pr_info("This error has been reported before, don't process it again.\n");
>>>>>>                    goto no_work;
>>>>>> +       }
>>>>>>
>>>>>> the test log Only some debug logs are retained here.
>>>>>>
>>>>>> [2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
>>>>>> [2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
>>>>>> [2026/3/24 14:51:58.458] [  130.558038][   C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>>> [2026/3/24 14:51:58.459] [  130.572517][   C40] {1}[Hardware Error]: event severity: recoverable
>>>>>> [2026/3/24 14:51:58.459] [  130.578861][   C40] {1}[Hardware Error]:  Error 0, type: recoverable
>>>>>> [2026/3/24 14:51:58.459] [  130.585203][   C40] {1}[Hardware Error]:   section_type: ARM processor error
>>>>>> [2026/3/24 14:51:58.459] [  130.592238][   C40] {1}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>>> [2026/3/24 14:51:58.459] [  130.598492][   C40] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>>>>> [2026/3/24 14:51:58.459] [  130.607871][   C40] {1}[Hardware Error]:   error affinity level: 0
>>>>>> [2026/3/24 14:51:58.459] [  130.614038][   C40] {1}[Hardware Error]:   running state: 0x1
>>>>>> [2026/3/24 14:51:58.459] [  130.619770][   C40] {1}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>>> [2026/3/24 14:51:58.459] [  130.627673][   C40] {1}[Hardware Error]:   Error info structure 0:
>>>>>> [2026/3/24 14:51:58.459] [  130.633839][   C40] {1}[Hardware Error]:   num errors: 1
>>>>>> [2026/3/24 14:51:58.459] [  130.639137][   C40] {1}[Hardware Error]:    error_type: 0, cache error
>>>>>> [2026/3/24 14:51:58.459] [  130.645652][   C40] {1}[Hardware Error]:    error_info: 0x0000000020400014
>>>>>> [2026/3/24 14:51:58.459] [  130.652514][   C40] {1}[Hardware Error]:     cache level: 1
>>>>>> [2026/3/24 14:51:58.551] [  130.658073][   C40] {1}[Hardware Error]:     the error has not been corrected
>>>>>> [2026/3/24 14:51:58.551] [  130.665194][   C40] {1}[Hardware Error]:    physical fault address: 0x0000001351811800
>>>>>> [2026/3/24 14:51:58.551] [  130.673097][   C40] {1}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>>> [2026/3/24 14:51:58.551] [  130.680744][   C40] {1}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>>> [2026/3/24 14:51:58.551] [  130.690471][   C40] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>>> [2026/3/24 14:51:58.552] [  130.700198][   C40] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>>> [2026/3/24 14:51:58.552] [  130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
>>>>>> [2026/3/24 14:51:58.638] [  130.790952][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:51:58.903] [  131.046994][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:51:58.991] [  131.132360][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:51:59.969] [  132.071431][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:00.860] [  133.010255][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:01.927] [  134.034746][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:02.906] [  135.058973][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:03.971] [  136.083213][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:04.860] [  137.021956][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:06.018] [  138.131460][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:06.905] [  139.070280][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:07.886] [  140.009147][   C40] This error has been reported before, don't process it again.
>>>>>> [2026/3/24 14:52:08.596] [  140.777368][   C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>>>>> [2026/3/24 14:52:08.683] [  140.791921][   C40] {2}[Hardware Error]: event severity: recoverable
>>>>>> [2026/3/24 14:52:08.683] [  140.798263][   C40] {2}[Hardware Error]:  Error 0, type: recoverable
>>>>>> [2026/3/24 14:52:08.683] [  140.804606][   C40] {2}[Hardware Error]:   section_type: ARM processor error
>>>>>> [2026/3/24 14:52:08.683] [  140.811641][   C40] {2}[Hardware Error]:   MIDR: 0x0000000000000000
>>>>>> [2026/3/24 14:52:08.684] [  140.817895][   C40] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>>>>> [2026/3/24 14:52:08.684] [  140.827274][   C40] {2}[Hardware Error]:   error affinity level: 0
>>>>>> [2026/3/24 14:52:08.684] [  140.833440][   C40] {2}[Hardware Error]:   running state: 0x1
>>>>>> [2026/3/24 14:52:08.684] [  140.839173][   C40] {2}[Hardware Error]:   Power State Coordination Interface state: 0
>>>>>> [2026/3/24 14:52:08.684] [  140.847076][   C40] {2}[Hardware Error]:   Error info structure 0:
>>>>>> [2026/3/24 14:52:08.684] [  140.853241][   C40] {2}[Hardware Error]:   num errors: 1
>>>>>> [2026/3/24 14:52:08.684] [  140.858540][   C40] {2}[Hardware Error]:    error_type: 0, cache error
>>>>>> [2026/3/24 14:52:08.684] [  140.865055][   C40] {2}[Hardware Error]:    error_info: 0x0000000020400014
>>>>>> [2026/3/24 14:52:08.684] [  140.871917][   C40] {2}[Hardware Error]:     cache level: 1
>>>>>> [2026/3/24 14:52:08.684] [  140.877475][   C40] {2}[Hardware Error]:     the error has not been corrected
>>>>>> [2026/3/24 14:52:08.764] [  140.884596][   C40] {2}[Hardware Error]:    physical fault address: 0x0000001351811800
>>>>>> [2026/3/24 14:52:08.764] [  140.892499][   C40] {2}[Hardware Error]:   Vendor specific error info has 48 bytes:
>>>>>> [2026/3/24 14:52:08.766] [  140.900145][   C40] {2}[Hardware Error]:    00000000: 00000000 00000000 00000000 00000000  ................
>>>>>> [2026/3/24 14:52:08.767] [  140.909872][   C40] {2}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000  ................
>>>>>> [2026/3/24 14:52:08.767] [  140.919598][   C40] {2}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000  ................
>>>>>> [2026/3/24 14:52:08.768] [  140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
>>>>>> [2026/3/24 14:52:08.768] [  140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption
>>>>>
>>>>> Did you cut off some logs here?
>>>>
>>>> I just removed some duplicate debug logs: "This error has already been...", these were added by myself.
>>
>> Hi, Shuai
>
> Hi, Junhao,
>
> Sorry for late reply.
>
>>
>> Compared to the original commit message and the logs reproducing this issue
>> on kernel v7.0.0-rc4, perhaps you are asking whether the current log is missing
>> information such as 'NOTICE: SEA Handle'?
>> These miss logs are from the firmware. To reduce serial output, the firmware has
>> hidden these debug prints. However, using my own custom debug logs, I can
>> still see that the kernel's do_sea() process is continuously running during the
>> 10-second cache timeout. Although only one debug log is retained per second.
>> This confirms that the issue is still present on the latest kernel v7.0.0-rc4.
>>
>>>>> The error log also indicates that the SIGBUS is delivered as expected.
>>>>
>>>> An SError occurs at kernel time 130.558038. Then, after 10 seconds, the kernel
>>>> can re-enter the SEA processing flow and send the SIGBUS signal to the process.
>>>> This 10-second delay corresponds to the cache timeout threshold of the
>>>> ghes_estatus_cached() feature.
>>>> Therefore, the purpose of this patch is to send the SIGBUS signal to the process
>>>> immediately, rather than waiting for the timeout to expire.
>>>
>>> Hi, hejun,
>>>
>>> Sorry, but I am still not convinced by the log you provided.
>>>
>>> As I understand your commit message, there are two different cases being discussed:
>>>
>>> Case 1: memory error interrupt first, then SEA
>>>
>>> When hardware memory corruption occurs, a memory error interrupt is
>>> triggered first. If the kernel later accesses the corrupted data, it may
>>> then enter the SEA handler. In this case, the faulty page would already
>>> have been marked poisoned by the memory error interrupt path, and the SEA
>>> handling path would eventually send SIGBUS to the task accessing that page.
>>>
>>> Case 2: SEA first, then memory error interrupt
>>>
>>> Your commit message describes this as the problematic scenario:
>>>
>>> A user process directly accesses hardware-corrupted memory through a
>>> PFNMAP-style mapping such as devmem. The page may still be in the free
>>> buddy state when SEA is handled first. In that case, memory_failure()
>>> poisons the page during SEA handling, but the process is not killed
>>> immediately. Since the task continues accessing the same corrupted
>>> location, it keeps re-entering the SEA handler, leading to an SEA storm.
>>> Later, the memory error interrupt path also cannot kill the task, so the
>>> system remains stuck in this repeated SEA loop.
>> Yes.
>>>
>>> My concern is that your recent explanation and log seem to demonstrate
>>> something different from what the commit message claims to fix.
>>>
>>>  From the log, what I can see is:
>>>
>>> the first SEA occurs,
>>> the page is marked poisoned as a free buddy page,
>>> repeated SEAs are suppressed by ghes_estatus_cached(),
>>> after the cache timeout expires, the SEA path runs again,
>>> then memory_failure() reports "already hardware poisoned" and SIGBUS is
>>> sent to the busybox devmem process.
>>> This seems to show a delayed SIGBUS delivery caused by the GHES cache
>>> timeout, rather than clearly demonstrating the SEA storm problem described
>>> in the commit message.
>>>
>>> So I think there is still a mismatch here:
>>>
>>> If the patch is intended to fix the SEA storm described in case 2,
>>> then I would expect evidence that the storm still exists on the latest
>>> kernel and that this patch is what actually breaks that loop.
>>> If instead the patch is intended to avoid the 10-second delay before
>>> SIGBUS delivery, then that should be stated explicitly, because that is
>>> a different problem statement from what the current commit message says.
>>> Also, regarding the devmem/PFNMAP case: I previously pointed to commit
>>> 2e6053fea379 ("mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn"),
>>> which was meant to address the failure to kill tasks accessing poisoned
>>> VM_PFNMAP mappings.
>>>
>>
>> This patch was already merged prior to kernel v7.0.0-rc4, therefore, it cannot fix this issue.
>>
>> I reverted the patch on kernel v7.0.0-rc4 to reproduce the issue.
>> The debug logs show that the message 'This error has already been...' persists
>> for more than 10 seconds, and the printing cannot be stopped. so it fixes other issue.
>
> Thanks for confirm.
>
>>
>>> So my main question is:
>>>
>>> Does the SEA storm issue still exist on the latest kernel version, or is
>>> the remaining issue only that SIGBUS is delayed by the GHES estatus cache
>>> timeout?
>>
>> We should not treat them separately.
>
> Agreed. Please update the commit message to explain the causal chain explicitly:

Sure, Will fix in next version.

>
> - The first SEA poisons the free buddy page but does not kill the
>   accessing task, because memory_failure() takes the free-buddy recovery
>   path and never reaches kill_accessing_process().
>
> - The task re-enters the SEA handler repeatedly, but
>   ghes_estatus_cached() suppresses all subsequent entries during the
>   10-second window, preventing ghes_do_proc() from being called and
>   blocking the MF_ACTION_REQUIRED-based SIGBUS delivery.
>
> - This suppression is what sustains the SEA storm.
>
>>
>> In case 2, First SEA can only poisons the page, and then re-enter the SEA processing flow.
>> Due to the reporting throttle of the ghes_estatus_cached(), SEA cannot timely invoke
>> memory_failure()  to kill the task, the task will continues accessing the same corrupted
>> location, then re-enter the SEA processing flow loop, so causing the SEA storm...
>> Perhaps I never clearly explained why the SEA storm occurred.
>
> +cc Lin Miaohe for the memory_failure() discussion.
>
> Regarding the memory_failure() path: since SEA is a synchronous
> notification, is_hest_syncnotify() returns true, ghesdo_proc() sets sync
> = true, and MF_ACTION_REQUIRED is passed into ghes_do_memory_failure().
> This means that on the second and subsequent SEAs (after cache expiry),
> memory_failure() would reach the already-poisoned branch and call
> kill_accessing_process() to terminate the task:
>
>
>     if (TestSetPageHWPoison(p)) {
>         res = -EHWPOISON;
>         if (flags & MF_ACTION_REQUIRED)
>             res = kill_accessing_process(current, pfn, flags);
>         if (flags & MF_COUNT_INCREASED)
>             put_page(p);
>         action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>         goto unlock_mutex;
>     }
>
> The patch short-circuits this by terminating the task earlier, via
> arm64_notify_die(), on every cache-suppressed SEA. I have no objection
> to killing the process early in this way.
>
> +cc Tony Luck for the ghes_notify_nmi path.
>
> One concern is the impact on ghes_notify_nmi().
>
> ghes_in_nmi_queue_one_entry() is shared between two callers:
>
> ghes_notify_sea() → ghes_in_nmi_spool_from_list(&ghes_sea, ...)
> ghes_notify_nmi() → ghes_in_nmi_spool_from_list(&ghes_nmi, ...)

Can we use fixmap_idx to distinguish between SEA and NMI? The basis for
differentiation is that the parameters passed to ghes_in_nmi_spool_from_list()
differ when these two exceptions are handled.

ghes_in_nmi_spool_from_list(&ghes_sea, FIX_APEI_GHES_SEA)
ghes_in_nmi_spool_from_list(&ghes_nmi, FIX_APEI_GHES_NMI)


  diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
  index 8acd2742bb27..5c0a7ecad7db 100644
  --- a/drivers/acpi/apei/ghes.c
  +++ b/drivers/acpi/apei/ghes.c
  @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
          ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
 
          /* This error has been reported before, don't process it again. */
  -       if (ghes_estatus_cached(estatus))
  +       if (ghes_estatus_cached(estatus)) {
  +               if (fixmap_idx == FIX_APEI_GHES_SEA)
  +                       rc = -ECANCELED;
                  goto no_work;
  +       }
 
          llist_add(&estatus_node->llnode, &ghes_estatus_llist);

Best regards,
Junhao.

>
> For the NMI path, if ghes_estatuscached() hits and
> ghesin_nmi_queue_one_entry() now returns -ECANCELED instead of 0,
> ghesinnmi_spool_from_list() will not set ret = 0, and ghes_notify_nmi()
> will return NMI_DONE instead of NMI_HANDLED. This tells the NMI handler
> chain that no handler claimed the interrupt, which is semantically
> incorrect — an active hardware error was observed, but deliberately
> suppressed by the cache. NMI errors are asynchronous (sync = false,
> MF_ACTION_REQUIRED not set), so there is no practical impact on the kill
> path. However, returning NMI_DONE for a cache-suppressed NMI could cause
> spurious warnings from the NMI dispatcher on some platforms. To avoid
> this, I suggest scoping the -ECANCELED return to the synchronous (SEA)
> case only. One approach is to pass a bool sync parameter down through
> ghes_in_nmi_spool_from_list() and ghes_innmiqueue_one_entry(), returning
> -ECANCELED on cache-hit only when sync is true. Alternatively, this
> logic can be handled at the ghes_notify_sea() call site directly.
>
> Shuai
> Thanks.
> Shuai
> .
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-04-09  3:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30  7:13 [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios Junhao He
2025-11-03 16:19 ` Rafael J. Wysocki
2025-11-04  1:32   ` Shuai Xue
2026-02-27 12:12     ` hejunhao
2026-03-03 14:42       ` Shuai Xue
2026-03-24 10:04         ` hejunhao
2026-03-25  2:12           ` Shuai Xue
2026-03-25  9:24             ` hejunhao
2026-03-25 12:40               ` Shuai Xue
2026-03-26 13:26                 ` hejunhao
2026-04-07  2:23                   ` Shuai Xue
2026-04-09  3:10                     ` hejunhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox