[PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records
@ 2026-01-08 11:35 Mauro Carvalho Chehab
  2026-01-08 11:35 ` [PATCH v6 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-08 11:35 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ard Biesheuvel, Robert Moore
  Cc: Mauro Carvalho Chehab, acpica-devel, linux-acpi, linux-edac,
	linux-efi, linux-kernel, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Dan Williams, Dave Jiang, Hanjun Guo, Huang Yiwei,
	Ira Weiny, Jason Tian, Jonathan Cameron, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck

Rafael,

Current parsing logic at apei/ghes for ARM Processor Error
assumes that the record sizes are correct. Yet, a bad BIOS
might produce malformed GHES reports.

Worse than that, it may end exposing data from other memory
addresses, as the logic may end dumping large portions of
the memory.

Avoid that by checking the buffer sizes where needed.

---

v6:
 - No code changes, just a cosmetic change at patch 3 description
 - Added Jonathan's review on all patches

v5:
 - Changed the name of a var as requested by Jonathan

v4:
 - addressed Jonathan comments;
 - added two extra patches to prevent other OOM issues.

v3:
  - addressed Shuai feedback;
  - moved all ghes code to one patch;
  - fixed a typo and a bad indent;
  - cleanup the size check logic at ghes.c.

Mauro Carvalho Chehab (4):
  apei/ghes: ARM processor Error: don't go past allocated memory
  efi/cper: don't go past the ARM processor CPER record buffer
  apei/ghes: ensure that won't go past CPER allocated record
  efi/cper: don't dump the entire memory region

 drivers/acpi/apei/ghes.c        | 38 ++++++++++++++++++++++++++++-----
 drivers/firmware/efi/cper-arm.c | 12 +++++++----
 drivers/firmware/efi/cper.c     |  8 ++++++-
 drivers/ras/ras.c               |  6 +++++-
 include/acpi/ghes.h             |  1 +
 include/linux/cper.h            |  3 ++-
 6 files changed, 56 insertions(+), 12 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v6 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
  2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
@ 2026-01-08 11:35 ` Mauro Carvalho Chehab
  2026-03-17 17:14   ` Guenter Roeck
  2026-01-08 11:35 ` [PATCH v6 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 9+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-08 11:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, acpica-devel, linux-acpi, linux-edac,
	linux-efi, linux-kernel, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Huang Yiwei, Jason Tian,
	Jonathan Cameron, Len Brown, Mauro Carvalho Chehab, Shuai Xue,
	Smita Koralahalli, Tony Luck

If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence

	err->section_length
and
	ctx_info->size

Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:

[    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
[    1.495449] Modules linked in:
[    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
[    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    1.497199] pc : log_arm_hw_error+0x5c/0x200
[    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220

0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70		err_info = (struct cper_arm_err_info *)(err + 1);
71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72		ctx_err = (u8 *)ctx_info;
73
74		for (n = 0; n < err->context_info_num; n++) {
75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77			ctx_len += sz;
78		}
79

and similar ones while trying to access section_length on an
error dump with too small size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++++++++----
 drivers/ras/ras.c        |  6 +++++-
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0dc767392a6c..fc3f8aed99d5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -552,21 +552,45 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 {
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
+	int length = gdata->error_data_length;
 	char error_type[120];
 	bool queued = false;
 	int sec_sev, i;
 	char *p;
 
 	sec_sev = ghes_severity(gdata->error_severity);
-	log_arm_hw_error(err, sec_sev);
+	if (length >= sizeof(*err)) {
+		log_arm_hw_error(err, sec_sev);
+	} else {
+		pr_warn(FW_BUG "arm error length: %d\n", length);
+		pr_warn(FW_BUG "length is too small\n");
+		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
+		return false;
+	}
+
 	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
 		return false;
 
 	p = (char *)(err + 1);
+	length -= sizeof(err);
+
 	for (i = 0; i < err->err_info_num; i++) {
-		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
-		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
-		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
+		struct cper_arm_err_info *err_info;
+		bool is_cache, has_pa;
+
+		/* Ensure we have enough data for the error info header */
+		if (length < sizeof(*err_info))
+			break;
+
+		err_info = (struct cper_arm_err_info *)p;
+
+		/* Validate the claimed length before using it */
+		length -= err_info->length;
+		if (length < 0)
+			break;
+
+		is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
+		has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
 
 		/*
 		 * The field (err_info->error_info & BIT(26)) is fixed to set to
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 2a5b5a9fdcb3..03df3db62334 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 	ctx_err = (u8 *)ctx_info;
 
 	for (n = 0; n < err->context_info_num; n++) {
-		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
+		sz = sizeof(struct cper_arm_ctx_info);
+
+		if (sz + (long)ctx_info - (long)err >= err->section_length)
+			sz += ctx_info->size;
+
 		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
 		ctx_len += sz;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v6 2/4] efi/cper: don't go past the ARM processor CPER record buffer
  2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
  2026-01-08 11:35 ` [PATCH v6 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-01-08 11:35 ` Mauro Carvalho Chehab
  2026-01-08 11:35 ` [PATCH v6 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-08 11:35 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ard Biesheuvel
  Cc: Mauro Carvalho Chehab, acpica-devel, linux-acpi, linux-edac,
	linux-efi, linux-kernel, Dan Williams, Dave Jiang, Ira Weiny,
	Jonathan Cameron, Smita Koralahalli

There's a logic inside ghes/cper to detect if the section_length
is too small, but it doesn't detect if it is too big.

Currently, if the firmware receives an ARM processor CPER record
stating that a section length is big, kernel will blindly trust
section_length, producing a very long dump. For instance, a 67
bytes record with ERR_INFO_NUM set 46198 and section length
set to 854918320 would dump a lot of data going a way past the
firmware memory-mapped area.

Fix it by adding a logic to prevent it to go past the buffer
if ERR_INFO_NUM is too big, making it report instead:

	[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
	[Hardware Error]: event severity: recoverable
	[Hardware Error]:  Error 0, type: recoverable
	[Hardware Error]:   section_type: ARM processor error
	[Hardware Error]:   MIDR: 0xff304b2f8476870a
	[Hardware Error]:   section length: 854918320, CPER size: 67
	[Hardware Error]:   section length is too big
	[Hardware Error]:   firmware-generated error record is incorrect
	[Hardware Error]:   ERR_INFO_NUM is 46198

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/firmware/efi/cper-arm.c | 12 ++++++++----
 drivers/firmware/efi/cper.c     |  3 ++-
 include/linux/cper.h            |  3 ++-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index 76542a53e202..b21cb1232d82 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -226,7 +226,8 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 }
 
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc)
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length)
 {
 	int i, len, max_ctx_type;
 	struct cper_arm_err_info *err_info;
@@ -238,9 +239,12 @@ void cper_print_proc_arm(const char *pfx,
 
 	len = proc->section_length - (sizeof(*proc) +
 		proc->err_info_num * (sizeof(*err_info)));
-	if (len < 0) {
-		printk("%ssection length: %d\n", pfx, proc->section_length);
-		printk("%ssection length is too small\n", pfx);
+
+	if (len < 0 || proc->section_length > length) {
+		printk("%ssection length: %d, CPER size: %d\n",
+		       pfx, proc->section_length, length);
+		printk("%ssection length is too %s\n", pfx,
+		       (len < 0) ? "small" : "big");
 		printk("%sfirmware-generated error record is incorrect\n", pfx);
 		printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
 		return;
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 0232bd040f61..88fc0293f876 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -659,7 +659,8 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
 
 		printk("%ssection_type: ARM processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*arm_err))
-			cper_print_proc_arm(newpfx, arm_err);
+			cper_print_proc_arm(newpfx, arm_err,
+					    gdata->error_data_length);
 		else
 			goto err_section_too_small;
 #endif
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 5b1236d8c65b..440b35e459e5 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -595,7 +595,8 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *,
 const char *cper_mem_err_unpack(struct trace_seq *,
 				struct cper_mem_err_compact *);
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc);
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length);
 void cper_print_proc_ia(const char *pfx,
 			const struct cper_sec_proc_ia *proc);
 int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v6 3/4] apei/ghes: ensure that won't go past CPER allocated record
  2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
  2026-01-08 11:35 ` [PATCH v6 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
  2026-01-08 11:35 ` [PATCH v6 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
@ 2026-01-08 11:35 ` Mauro Carvalho Chehab
  2026-01-08 11:35 ` [PATCH v6 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-08 11:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, acpica-devel, linux-acpi, linux-edac,
	linux-efi, linux-kernel, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Jason Tian, Jonathan Cameron, Len Brown,
	Mauro Carvalho Chehab, Robert Moore, Shuai Xue, Smita Koralahalli,
	Tony Luck

The logic at ghes_new() prevents allocating too large records, by
checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
Yet, the allocation is done with the actual number of pages from the
CPER bios table location, which can be smaller.

Yet, a bad firmware could send data with a different size, which might
be bigger than the allocated memory, causing an OOPS:

    Unable to handle kernel paging request at virtual address fff00000f9b40000
    Mem abort info:
      ESR = 0x0000000096000007
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x07: level 3 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
    [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
    Internal error: Oops: 0000000096000007 [#1]  SMP
    Modules linked in:
    CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
    Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
    Workqueue: kacpi_notify acpi_os_execute_deferred
    pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
    pc : hex_dump_to_buffer+0x30c/0x4a0
    lr : hex_dump_to_buffer+0x328/0x4a0
    sp : ffff800080e13880
    x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
    x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
    x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
    x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
    x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
    x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
    x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
    x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
    x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
    x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
    Call trace:
     hex_dump_to_buffer+0x30c/0x4a0 (P)
     print_hex_dump+0xac/0x170
     cper_estatus_print_section+0x90c/0x968
     cper_estatus_print+0xf0/0x158
     __ghes_print_estatus+0xa0/0x148
     ghes_proc+0x1bc/0x220
     ghes_notify_hed+0x5c/0xb8
     notifier_call_chain+0x78/0x148
     blocking_notifier_call_chain+0x4c/0x80
     acpi_hed_notify+0x28/0x40
     acpi_ev_notify_dispatch+0x50/0x80
     acpi_os_execute_deferred+0x24/0x48
     process_one_work+0x15c/0x3b0
     worker_thread+0x2d0/0x400
     kthread+0x148/0x228
     ret_from_fork+0x10/0x20
    Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
    ---[ end trace 0000000000000000 ]---

Prevent that by taking the actual allocated are into account when
checking for CPER length.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/acpi/apei/ghes.c | 6 +++++-
 include/acpi/ghes.h      | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fc3f8aed99d5..77ea7a5b761f 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -29,6 +29,7 @@
 #include <linux/cper.h>
 #include <linux/cleanup.h>
 #include <linux/platform_device.h>
+#include <linux/minmax.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
 #include <linux/vmalloc.h>
@@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 		error_block_length = GHES_ESTATUS_MAX_SIZE;
 	}
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
+	ghes->estatus_length = error_block_length;
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
 		goto err_unmap_status_addr;
@@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
 				struct acpi_hest_generic_status *estatus)
 {
 	u32 len = cper_estatus_len(estatus);
+	u32 max_len = min(ghes->generic->error_block_length,
+			  ghes->estatus_length);
 
 	if (len < sizeof(*estatus)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
 		return -EIO;
 	}
 
-	if (len > ghes->generic->error_block_length) {
+	if (!len || len > max_len) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
 		return -EIO;
 	}
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index ebd21b05fe6e..93db60da5934 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -21,6 +21,7 @@ struct ghes {
 		struct acpi_hest_generic_v2 *generic_v2;
 	};
 	struct acpi_hest_generic_status *estatus;
+	unsigned int estatus_length;
 	unsigned long flags;
 	union {
 		struct list_head list;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v6 4/4] efi/cper: don't dump the entire memory region
  2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2026-01-08 11:35 ` [PATCH v6 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
@ 2026-01-08 11:35 ` Mauro Carvalho Chehab
  2026-01-08 12:08 ` [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Ard Biesheuvel
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-08 11:35 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ard Biesheuvel
  Cc: Mauro Carvalho Chehab, acpica-devel, linux-acpi, linux-edac,
	linux-efi, linux-kernel, Jonathan Cameron

The current logic at cper_print_fw_err() doesn't check if the
error record length is big enough to handle offset. On a bad firmware,
if the ofset is above the actual record, length -= offset will
underflow, making it dump the entire memory.

The end result can be:

- the logic taking a lot of time dumping large regions of memory;
- data disclosure due to the memory dumps;
- an OOPS, if it tries to dump an unmapped memory region.

Fix it by checking if the section length is too small before doing
a hex dump.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/firmware/efi/cper.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 88fc0293f876..0e938fc5ccb1 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -560,6 +560,11 @@ static void cper_print_fw_err(const char *pfx,
 	} else {
 		offset = sizeof(*fw_err);
 	}
+	if (offset > length) {
+		printk("%s""error section length is too small: offset=%d, length=%d\n",
+		       pfx, offset, length);
+		return;
+	}
 
 	buf += offset;
 	length -= offset;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records
  2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2026-01-08 11:35 ` [PATCH v6 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab
@ 2026-01-08 12:08 ` Ard Biesheuvel
  2026-01-10  4:13 ` Hanjun Guo
  2026-01-14 15:54 ` Rafael J. Wysocki
  6 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2026-01-08 12:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Robert Moore, acpica-devel, linux-acpi,
	linux-edac, linux-efi, linux-kernel, Ankit Agrawal,
	Borislav Petkov, Breno Leitao, Dan Williams, Dave Jiang,
	Hanjun Guo, Huang Yiwei, Ira Weiny, Jason Tian, Jonathan Cameron,
	Len Brown, Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli,
	Tony Luck

On Thu, 8 Jan 2026 at 12:35, Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Rafael,
>
> Current parsing logic at apei/ghes for ARM Processor Error
> assumes that the record sizes are correct. Yet, a bad BIOS
> might produce malformed GHES reports.
>
> Worse than that, it may end exposing data from other memory
> addresses, as the logic may end dumping large portions of
> the memory.
>
> Avoid that by checking the buffer sizes where needed.
>
> ---
>
> v6:
>  - No code changes, just a cosmetic change at patch 3 description
>  - Added Jonathan's review on all patches
>
> v5:
>  - Changed the name of a var as requested by Jonathan
>
> v4:
>  - addressed Jonathan comments;
>  - added two extra patches to prevent other OOM issues.
>
> v3:
>   - addressed Shuai feedback;
>   - moved all ghes code to one patch;
>   - fixed a typo and a bad indent;
>   - cleanup the size check logic at ghes.c.
>
> Mauro Carvalho Chehab (4):
>   apei/ghes: ARM processor Error: don't go past allocated memory
>   efi/cper: don't go past the ARM processor CPER record buffer
>   apei/ghes: ensure that won't go past CPER allocated record
>   efi/cper: don't dump the entire memory region
>

I've skimmed over this and it all looks reasonable to me

Acked-by: Ard Biesheuvel <ardb@kernel.org>

Thanks for cleaning this up.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records
  2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2026-01-08 12:08 ` [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Ard Biesheuvel
@ 2026-01-10  4:13 ` Hanjun Guo
  2026-01-14 15:54 ` Rafael J. Wysocki
  6 siblings, 0 replies; 9+ messages in thread
From: Hanjun Guo @ 2026-01-10  4:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Rafael J. Wysocki, Ard Biesheuvel,
	Robert Moore
  Cc: acpica-devel, linux-acpi, linux-edac, linux-efi, linux-kernel,
	Ankit Agrawal, Borislav Petkov, Breno Leitao, Dan Williams,
	Dave Jiang, Huang Yiwei, Ira Weiny, Jason Tian, Jonathan Cameron,
	Len Brown, Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli,
	Tony Luck

On 2026/1/8 19:35, Mauro Carvalho Chehab wrote:
> Rafael,
> 
> Current parsing logic at apei/ghes for ARM Processor Error
> assumes that the record sizes are correct. Yet, a bad BIOS
> might produce malformed GHES reports.
> 
> Worse than that, it may end exposing data from other memory
> addresses, as the logic may end dumping large portions of
> the memory.
> 
> Avoid that by checking the buffer sizes where needed.
> 
> ---
> 
> v6:
>   - No code changes, just a cosmetic change at patch 3 description
>   - Added Jonathan's review on all patches
> 
> v5:
>   - Changed the name of a var as requested by Jonathan
> 
> v4:
>   - addressed Jonathan comments;
>   - added two extra patches to prevent other OOM issues.
> 
> v3:
>    - addressed Shuai feedback;
>    - moved all ghes code to one patch;
>    - fixed a typo and a bad indent;
>    - cleanup the size check logic at ghes.c.
> 
> Mauro Carvalho Chehab (4):
>    apei/ghes: ARM processor Error: don't go past allocated memory
>    efi/cper: don't go past the ARM processor CPER record buffer
>    apei/ghes: ensure that won't go past CPER allocated record
>    efi/cper: don't dump the entire memory region
> 
>   drivers/acpi/apei/ghes.c        | 38 ++++++++++++++++++++++++++++-----
>   drivers/firmware/efi/cper-arm.c | 12 +++++++----
>   drivers/firmware/efi/cper.c     |  8 ++++++-
>   drivers/ras/ras.c               |  6 +++++-
>   include/acpi/ghes.h             |  1 +
>   include/linux/cper.h            |  3 ++-
>   6 files changed, 56 insertions(+), 12 deletions(-)

Reviewed-by: Hanjun Guo <guohanjun@huawei.com>

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records
  2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2026-01-10  4:13 ` Hanjun Guo
@ 2026-01-14 15:54 ` Rafael J. Wysocki
  6 siblings, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2026-01-14 15:54 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Ard Biesheuvel, Robert Moore, acpica-devel,
	linux-acpi, linux-edac, linux-efi, linux-kernel, Ankit Agrawal,
	Borislav Petkov, Breno Leitao, Dan Williams, Dave Jiang,
	Hanjun Guo, Huang Yiwei, Ira Weiny, Jason Tian, Jonathan Cameron,
	Len Brown, Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli,
	Tony Luck

On Thu, Jan 8, 2026 at 12:35 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Rafael,
>
> Current parsing logic at apei/ghes for ARM Processor Error
> assumes that the record sizes are correct. Yet, a bad BIOS
> might produce malformed GHES reports.
>
> Worse than that, it may end exposing data from other memory
> addresses, as the logic may end dumping large portions of
> the memory.
>
> Avoid that by checking the buffer sizes where needed.
>
> ---
>
> v6:
>  - No code changes, just a cosmetic change at patch 3 description
>  - Added Jonathan's review on all patches
>
> v5:
>  - Changed the name of a var as requested by Jonathan
>
> v4:
>  - addressed Jonathan comments;
>  - added two extra patches to prevent other OOM issues.
>
> v3:
>   - addressed Shuai feedback;
>   - moved all ghes code to one patch;
>   - fixed a typo and a bad indent;
>   - cleanup the size check logic at ghes.c.
>
> Mauro Carvalho Chehab (4):
>   apei/ghes: ARM processor Error: don't go past allocated memory
>   efi/cper: don't go past the ARM processor CPER record buffer
>   apei/ghes: ensure that won't go past CPER allocated record
>   efi/cper: don't dump the entire memory region
>
>  drivers/acpi/apei/ghes.c        | 38 ++++++++++++++++++++++++++++-----
>  drivers/firmware/efi/cper-arm.c | 12 +++++++----
>  drivers/firmware/efi/cper.c     |  8 ++++++-
>  drivers/ras/ras.c               |  6 +++++-
>  include/acpi/ghes.h             |  1 +
>  include/linux/cper.h            |  3 ++-
>  6 files changed, 56 insertions(+), 12 deletions(-)
>
> --

Applied as 6.20 material, but I changed the spelling of EFI, APEI,
CPER, and GHES in the subjects/changelogs to all capitals.

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
  2026-01-08 11:35 ` [PATCH v6 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-03-17 17:14   ` Guenter Roeck
  0 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2026-03-17 17:14 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, acpica-devel, linux-acpi, linux-edac,
	linux-efi, linux-kernel, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Huang Yiwei, Jason Tian,
	Jonathan Cameron, Len Brown, Mauro Carvalho Chehab, Shuai Xue,
	Smita Koralahalli, Tony Luck

Hi,

On Thu, Jan 08, 2026 at 12:35:03PM +0100, Mauro Carvalho Chehab wrote:
> If the BIOS generates a very small ARM Processor Error, or
> an incomplete one, the current logic will fail to deferrence
> 
> 	err->section_length
> and
> 	ctx_info->size
> 
> Add checks to avoid that. With such changes, such GHESv2
> records won't cause OOPSes like this:
> 
> [    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
> [    1.495449] Modules linked in:
> [    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
> [    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
> [    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [    1.497199] pc : log_arm_hw_error+0x5c/0x200
> [    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
> 
> 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
> 70		err_info = (struct cper_arm_err_info *)(err + 1);
> 71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> 72		ctx_err = (u8 *)ctx_info;
> 73
> 74		for (n = 0; n < err->context_info_num; n++) {
> 75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> 76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> 77			ctx_len += sz;
> 78		}
> 79
> 
> and similar ones while trying to access section_length on an
> error dump with too small size.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

AI code review feedback inline. I don't claim to understand the code or
its purpose, but it does seem to have a point. Please have a look.

Thanks,
Guenter

> ---
>  drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++++++++----
>  drivers/ras/ras.c        |  6 +++++-
>  2 files changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0dc767392a6c..fc3f8aed99d5 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -552,21 +552,45 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
>  {
>  	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>  	int flags = sync ? MF_ACTION_REQUIRED : 0;
> +	int length = gdata->error_data_length;
>  	char error_type[120];
>  	bool queued = false;
>  	int sec_sev, i;
>  	char *p;
>  
>  	sec_sev = ghes_severity(gdata->error_severity);
> -	log_arm_hw_error(err, sec_sev);
> +	if (length >= sizeof(*err)) {
> +		log_arm_hw_error(err, sec_sev);
> +	} else {
> +		pr_warn(FW_BUG "arm error length: %d\n", length);
> +		pr_warn(FW_BUG "length is too small\n");
> +		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
> +		return false;
> +	}
> +
>  	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
>  		return false;
>  
>  	p = (char *)(err + 1);
> +	length -= sizeof(err);
> +

Does this code subtract the size of the pointer instead of the size of the
structure? It looks like length will be larger than the remaining buffer,
which could allow out-of-bounds accesses in the loop below.

>  	for (i = 0; i < err->err_info_num; i++) {
> -		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
> -		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> -		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
> +		struct cper_arm_err_info *err_info;
> +		bool is_cache, has_pa;
> +
> +		/* Ensure we have enough data for the error info header */
> +		if (length < sizeof(*err_info))
> +			break;
> +
> +		err_info = (struct cper_arm_err_info *)p;
> +
> +		/* Validate the claimed length before using it */
> +		length -= err_info->length;
> +		if (length < 0)
> +			break;
> +
> +		is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> +		has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
>  
>  		/*
>  		 * The field (err_info->error_info & BIT(26)) is fixed to set to
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index 2a5b5a9fdcb3..03df3db62334 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
>  	ctx_err = (u8 *)ctx_info;
>  
>  	for (n = 0; n < err->context_info_num; n++) {
> -		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> +		sz = sizeof(struct cper_arm_ctx_info);
> +
> +		if (sz + (long)ctx_info - (long)err >= err->section_length)
> +			sz += ctx_info->size;
> +

Is this condition inverted? It looks like we add ctx_info->size to sz only if
the context info header is already out of bounds. This might cause us to read
ctx_info->size from unallocated memory.

Also, if the header is completely within bounds, sz does not include
ctx_info->size. Could this cause the loop to parse the register context array
as the next cper_arm_ctx_info header?

>  		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
>  		ctx_len += sz;
>  	}
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-17 17:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-08 11:35 [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2026-01-08 11:35 ` [PATCH v6 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
2026-03-17 17:14   ` Guenter Roeck
2026-01-08 11:35 ` [PATCH v6 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
2026-01-08 11:35 ` [PATCH v6 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
2026-01-08 11:35 ` [PATCH v6 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab
2026-01-08 12:08 ` [PATCH v6 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Ard Biesheuvel
2026-01-10  4:13 ` Hanjun Guo
2026-01-14 15:54 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox