linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] ACPI: APEI: EINJ: Allow injection on legacy persistent memory
@ 2025-08-25 22:33 Jiaqi Yan
  2025-08-25 23:13 ` Luck, Tony
  0 siblings, 1 reply; 5+ messages in thread
From: Jiaqi Yan @ 2025-08-25 22:33 UTC (permalink / raw)
  To: tony.luck, rafael; +Cc: bp, guohanjun, mchehab, xueshuai, linux-acpi, Jiaqi Yan

Legacy persistent memory, e.g. Device DAX configured like the following

  440000000-303fffffff : Persistent Memory (legacy)
      440000000-47fffffff : dax1.0
      480000000-4bfffffff : dax2.0
      4c0000000-4ffffffff : dax3.0
      500000000-53fffffff : dax4.0
      ...

can support recover from Machine Check Exception due to memory failure.
Therefore there is need to test it by injecting memory error
using EINJ to legacy persistent memory.

However, current EINJ only check if physical address falls into the
IORES_DESC_PERSISTENT_MEMORY_LEGACY region. So attempt to inject
error to "Persistent Memory (legacy)" fails with -EINVAL.

Allow EINJ to inject at physical address belongin to
IORES_DESC_PERSISTENT_MEMORY_LEGACY.

Tested on a machine configured with Device DAX:
  memmap=4G!12G nd_e820.pmem=12G,4G,mode=fsdax memmap=176G!17G
  nd_e820.pmem=17G,1G,mode=devdax,align=1G memmap=176G!209G
  nd_e820.pmem=209G,1G,mode=devdax,align=1G memmap=176G!401G
  nd_e820.pmem=401G,1G,mode=devdax,align=1G memmap=176G!593G
  nd_e820.pmem=593G,1G,mode=devdax,align=1G

Injected error at 0x35238d2000, within Device DAX region and
allocated to a userspace test process. EINJ driver now issue the
injection request to firmware, and firmware logs shows injection
at 0x35238d2000 succeeded. The userspace test process then
accessed 0x35238d2000, caused a MCE, and killed by SIGBUS.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 drivers/acpi/apei/einj-core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index 2561b045acc7b..e746fa66f92ff 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -712,6 +712,8 @@ int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
 				!= REGION_INTERSECTS) &&
+	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY_LEGACY)
+				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_SOFT_RESERVED)
 				!= REGION_INTERSECTS) &&
 	     !arch_is_platform_page(base_addr)))
-- 
2.51.0.268.g9569e192d0-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] ACPI: APEI: EINJ: Allow injection on legacy persistent memory
  2025-08-25 22:33 [PATCH v1] ACPI: APEI: EINJ: Allow injection on legacy persistent memory Jiaqi Yan
@ 2025-08-25 23:13 ` Luck, Tony
  2025-08-26  0:03   ` Jiaqi Yan
  0 siblings, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2025-08-25 23:13 UTC (permalink / raw)
  To: Jiaqi Yan, Dan Williams
  Cc: rafael, bp, guohanjun, mchehab, xueshuai, linux-acpi

On Mon, Aug 25, 2025 at 10:33:48PM +0000, Jiaqi Yan wrote:
> Legacy persistent memory, e.g. Device DAX configured like the following
> 
>   440000000-303fffffff : Persistent Memory (legacy)
>       440000000-47fffffff : dax1.0
>       480000000-4bfffffff : dax2.0
>       4c0000000-4ffffffff : dax3.0
>       500000000-53fffffff : dax4.0
>       ...
> 
> can support recover from Machine Check Exception due to memory failure.
> Therefore there is need to test it by injecting memory error
> using EINJ to legacy persistent memory.
> 
> However, current EINJ only check if physical address falls into the
> IORES_DESC_PERSISTENT_MEMORY_LEGACY region. So attempt to inject
> error to "Persistent Memory (legacy)" fails with -EINVAL.
> 
> Allow EINJ to inject at physical address belongin to
> IORES_DESC_PERSISTENT_MEMORY_LEGACY.
> 
> Tested on a machine configured with Device DAX:
>   memmap=4G!12G nd_e820.pmem=12G,4G,mode=fsdax memmap=176G!17G
>   nd_e820.pmem=17G,1G,mode=devdax,align=1G memmap=176G!209G
>   nd_e820.pmem=209G,1G,mode=devdax,align=1G memmap=176G!401G
>   nd_e820.pmem=401G,1G,mode=devdax,align=1G memmap=176G!593G
>   nd_e820.pmem=593G,1G,mode=devdax,align=1G
> 
> Injected error at 0x35238d2000, within Device DAX region and
> allocated to a userspace test process. EINJ driver now issue the
> injection request to firmware, and firmware logs shows injection
> at 0x35238d2000 succeeded. The userspace test process then
> accessed 0x35238d2000, caused a MCE, and killed by SIGBUS.
> 
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> ---
>  drivers/acpi/apei/einj-core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> index 2561b045acc7b..e746fa66f92ff 100644
> --- a/drivers/acpi/apei/einj-core.c
> +++ b/drivers/acpi/apei/einj-core.c
> @@ -712,6 +712,8 @@ int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
>  				!= REGION_INTERSECTS) &&
>  	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
>  				!= REGION_INTERSECTS) &&
> +	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY_LEGACY)
> +				!= REGION_INTERSECTS) &&

I chatted offline with Dan Williams. He wondered whether this sanity
check should just be reduced to pass through any address except MMIO
and leave it to the BIOS to decide what is a legitimate injection
target.

>  	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_SOFT_RESERVED)
>  				!= REGION_INTERSECTS) &&
>  	     !arch_is_platform_page(base_addr)))
> -- 
> 2.51.0.268.g9569e192d0-goog

-Tony

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] ACPI: APEI: EINJ: Allow injection on legacy persistent memory
  2025-08-25 23:13 ` Luck, Tony
@ 2025-08-26  0:03   ` Jiaqi Yan
  2025-08-26  0:12     ` Luck, Tony
  0 siblings, 1 reply; 5+ messages in thread
From: Jiaqi Yan @ 2025-08-26  0:03 UTC (permalink / raw)
  To: Luck, Tony, Dan Williams, rafael
  Cc: bp, guohanjun, mchehab, xueshuai, linux-acpi

On Mon, Aug 25, 2025 at 4:13 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Mon, Aug 25, 2025 at 10:33:48PM +0000, Jiaqi Yan wrote:
> > Legacy persistent memory, e.g. Device DAX configured like the following
> >
> >   440000000-303fffffff : Persistent Memory (legacy)
> >       440000000-47fffffff : dax1.0
> >       480000000-4bfffffff : dax2.0
> >       4c0000000-4ffffffff : dax3.0
> >       500000000-53fffffff : dax4.0
> >       ...
> >
> > can support recover from Machine Check Exception due to memory failure.
> > Therefore there is need to test it by injecting memory error
> > using EINJ to legacy persistent memory.
> >
> > However, current EINJ only check if physical address falls into the
> > IORES_DESC_PERSISTENT_MEMORY_LEGACY region. So attempt to inject
> > error to "Persistent Memory (legacy)" fails with -EINVAL.
> >
> > Allow EINJ to inject at physical address belongin to
> > IORES_DESC_PERSISTENT_MEMORY_LEGACY.
> >
> > Tested on a machine configured with Device DAX:
> >   memmap=4G!12G nd_e820.pmem=12G,4G,mode=fsdax memmap=176G!17G
> >   nd_e820.pmem=17G,1G,mode=devdax,align=1G memmap=176G!209G
> >   nd_e820.pmem=209G,1G,mode=devdax,align=1G memmap=176G!401G
> >   nd_e820.pmem=401G,1G,mode=devdax,align=1G memmap=176G!593G
> >   nd_e820.pmem=593G,1G,mode=devdax,align=1G
> >
> > Injected error at 0x35238d2000, within Device DAX region and
> > allocated to a userspace test process. EINJ driver now issue the
> > injection request to firmware, and firmware logs shows injection
> > at 0x35238d2000 succeeded. The userspace test process then
> > accessed 0x35238d2000, caused a MCE, and killed by SIGBUS.
> >
> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > ---
> >  drivers/acpi/apei/einj-core.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> > index 2561b045acc7b..e746fa66f92ff 100644
> > --- a/drivers/acpi/apei/einj-core.c
> > +++ b/drivers/acpi/apei/einj-core.c
> > @@ -712,6 +712,8 @@ int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
> >                               != REGION_INTERSECTS) &&
> >            (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
> >                               != REGION_INTERSECTS) &&
> > +          (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY_LEGACY)
> > +                             != REGION_INTERSECTS) &&
>
> I chatted offline with Dan Williams. He wondered whether this sanity
> check should just be reduced to pass through any address except MMIO
> and leave it to the BIOS to decide what is a legitimate injection
> target.

I guess that's fine, but wonder should the EINJ driver still exclude
IORES_DESC_ACPI_TABLES and IORES_DESC_ACPI_NV_STORAGE? to prevent
that, say, by coincidence or not an error "corrupted" EINJ table
itself, or "corrupted" other ACPI data.

>
> >            (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_SOFT_RESERVED)
> >                               != REGION_INTERSECTS) &&
> >            !arch_is_platform_page(base_addr)))
> > --
> > 2.51.0.268.g9569e192d0-goog
>
> -Tony

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v1] ACPI: APEI: EINJ: Allow injection on legacy persistent memory
  2025-08-26  0:03   ` Jiaqi Yan
@ 2025-08-26  0:12     ` Luck, Tony
  2025-08-26  5:06       ` Jiaqi Yan
  0 siblings, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2025-08-26  0:12 UTC (permalink / raw)
  To: Jiaqi Yan, Williams, Dan J, rafael@kernel.org
  Cc: bp@alien8.de, guohanjun@huawei.com, mchehab@kernel.org,
	xueshuai@linux.alibaba.com, linux-acpi@vger.kernel.org

> > I chatted offline with Dan Williams. He wondered whether this sanity
> > check should just be reduced to pass through any address except MMIO
> > and leave it to the BIOS to decide what is a legitimate injection
> > target.
>
> I guess that's fine, but wonder should the EINJ driver still exclude
> IORES_DESC_ACPI_TABLES and IORES_DESC_ACPI_NV_STORAGE? to prevent
> that, say, by coincidence or not an error "corrupted" EINJ table
> itself, or "corrupted" other ACPI data.

What if I want to find out what the system does with a corrupted ACPI table?

See discussion earlier this year about limiting error injection:

https://lore.kernel.org/all/aDt77p9GYCIRIOMa@agluck-desk3/

-Tony

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] ACPI: APEI: EINJ: Allow injection on legacy persistent memory
  2025-08-26  0:12     ` Luck, Tony
@ 2025-08-26  5:06       ` Jiaqi Yan
  0 siblings, 0 replies; 5+ messages in thread
From: Jiaqi Yan @ 2025-08-26  5:06 UTC (permalink / raw)
  To: Luck, Tony, Williams, Dan J, bp@alien8.de, rafael@kernel.org
  Cc: guohanjun@huawei.com, mchehab@kernel.org,
	xueshuai@linux.alibaba.com, linux-acpi@vger.kernel.org

On Mon, Aug 25, 2025 at 5:12 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> > > I chatted offline with Dan Williams. He wondered whether this sanity
> > > check should just be reduced to pass through any address except MMIO
> > > and leave it to the BIOS to decide what is a legitimate injection
> > > target.
> >
> > I guess that's fine, but wonder should the EINJ driver still exclude
> > IORES_DESC_ACPI_TABLES and IORES_DESC_ACPI_NV_STORAGE? to prevent
> > that, say, by coincidence or not an error "corrupted" EINJ table
> > itself, or "corrupted" other ACPI data.
>
> What if I want to find out what the system does with a corrupted ACPI table?

Yeah, it sounds like a valid use case for a smart user of EINJ. I
guess I just wasn't thinking "wild" enough ;).

If other APEI maintainers and reviewers agree, I will do it in V2;
IIUC basically the following small diff:

diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index 2561b045acc7b..3be20177d630f 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -707,14 +707,7 @@ int einj_error_inject(u32 type, u32 flags, u64
param1, u64 param2, u64 param3,
        base_addr = param1 & param2;
        size = ~param2 + 1;

-       if (((param2 & PAGE_MASK) != PAGE_MASK) ||
-           ((region_intersects(base_addr, size,
IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
-                               != REGION_INTERSECTS) &&
-            (region_intersects(base_addr, size, IORESOURCE_MEM,
IORES_DESC_PERSISTENT_MEMORY)
-                               != REGION_INTERSECTS) &&
-            (region_intersects(base_addr, size, IORESOURCE_MEM,
IORES_DESC_SOFT_RESERVED)
-                               != REGION_INTERSECTS) &&
-            !arch_is_platform_page(base_addr)))
+       if (((param2 & PAGE_MASK) != PAGE_MASK))
                return -EINVAL;


>
> See discussion earlier this year about limiting error injection:
>
> https://lore.kernel.org/all/aDt77p9GYCIRIOMa@agluck-desk3/
>
> -Tony

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-26  5:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25 22:33 [PATCH v1] ACPI: APEI: EINJ: Allow injection on legacy persistent memory Jiaqi Yan
2025-08-25 23:13 ` Luck, Tony
2025-08-26  0:03   ` Jiaqi Yan
2025-08-26  0:12     ` Luck, Tony
2025-08-26  5:06       ` Jiaqi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).