[PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records
@ 2026-01-06 10:01 Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Robert Moore
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, Ankit Agrawal,
	Borislav Petkov, Breno Leitao, Dan Williams, Dave Jiang,
	Gregory Price, Hanjun Guo, Jason Tian, Jonathan Cameron,
	Jonathan Cameron, Len Brown, Mauro Carvalho Chehab, Shuai Xue,
	Smita Koralahalli, Tony Luck, acpica-devel, linux-acpi,
	linux-edac, linux-efi, linux-kernel, pengdonglin

Rafael,

Current parsing logic at apei/ghes for ARM Processor Error
assumes that the record sizes are correct. Yet, a bad BIOS
might produce malformed GHES reports.

Worse than that, it may end exposing data from other memory
addresses, as the logic may end dumping large portions of
the memory.

Avoid that by checking the buffer sizes where needed.

---

v4:
 - addressed Jonathan comments;
 - added two extra patches to prevent other OOM issues.

v3:
  - addressed Shuai feedback;
  - moved all ghes code to one patch;
  - fixed a typo and a bad indent;
  - cleanup the size check logic at ghes.c.


Mauro Carvalho Chehab (4):
  apei/ghes: ARM processor Error: don't go past allocated memory
  efi/cper: don't go past the ARM processor CPER record buffer
  apei/ghes: ensure that won't go past CPER allocated record
  efi/cper: don't dump the entire memory region

 drivers/acpi/apei/ghes.c        | 38 ++++++++++++++++++++++++++++-----
 drivers/firmware/efi/cper-arm.c | 12 +++++++----
 drivers/firmware/efi/cper.c     |  8 ++++++-
 drivers/ras/ras.c               |  6 +++++-
 include/acpi/ghes.h             |  1 +
 include/linux/cper.h            |  3 ++-
 6 files changed, 56 insertions(+), 12 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
  2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
  2026-01-06 15:57   ` Jonathan Cameron
  2026-01-06 10:01 ` [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Jason Tian, Jonathan Cameron, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
	linux-acpi, linux-edac, linux-kernel, pengdonglin

If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence

	err->section_length
and
	ctx_info->size

Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:

[    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
[    1.495449] Modules linked in:
[    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
[    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    1.497199] pc : log_arm_hw_error+0x5c/0x200
[    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220

0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70		err_info = (struct cper_arm_err_info *)(err + 1);
71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72		ctx_err = (u8 *)ctx_info;
73
74		for (n = 0; n < err->context_info_num; n++) {
75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77			ctx_len += sz;
78		}
79

and similar ones while trying to access section_length on an
error dump with too small size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++++++++----
 drivers/ras/ras.c        |  6 +++++-
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0dc767392a6c..fc3f8aed99d5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -552,21 +552,45 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 {
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
+	int length = gdata->error_data_length;
 	char error_type[120];
 	bool queued = false;
 	int sec_sev, i;
 	char *p;
 
 	sec_sev = ghes_severity(gdata->error_severity);
-	log_arm_hw_error(err, sec_sev);
+	if (length >= sizeof(*err)) {
+		log_arm_hw_error(err, sec_sev);
+	} else {
+		pr_warn(FW_BUG "arm error length: %d\n", length);
+		pr_warn(FW_BUG "length is too small\n");
+		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
+		return false;
+	}
+
 	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
 		return false;
 
 	p = (char *)(err + 1);
+	length -= sizeof(err);
+
 	for (i = 0; i < err->err_info_num; i++) {
-		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
-		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
-		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
+		struct cper_arm_err_info *err_info;
+		bool is_cache, has_pa;
+
+		/* Ensure we have enough data for the error info header */
+		if (length < sizeof(*err_info))
+			break;
+
+		err_info = (struct cper_arm_err_info *)p;
+
+		/* Validate the claimed length before using it */
+		length -= err_info->length;
+		if (length < 0)
+			break;
+
+		is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
+		has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
 
 		/*
 		 * The field (err_info->error_info & BIT(26)) is fixed to set to
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 2a5b5a9fdcb3..03df3db62334 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 	ctx_err = (u8 *)ctx_info;
 
 	for (n = 0; n < err->context_info_num; n++) {
-		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
+		sz = sizeof(struct cper_arm_ctx_info);
+
+		if (sz + (long)ctx_info - (long)err >= err->section_length)
+			sz += ctx_info->size;
+
 		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
 		ctx_len += sz;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
  2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-01-06 15:57   ` Jonathan Cameron
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2026-01-06 15:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Ankit Agrawal, Borislav Petkov, Breno Leitao,
	Hanjun Guo, Jason Tian, Len Brown, Mauro Carvalho Chehab,
	Shuai Xue, Smita Koralahalli, Tony Luck, linux-acpi, linux-edac,
	linux-kernel, pengdonglin

On Tue,  6 Jan 2026 11:01:35 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> If the BIOS generates a very small ARM Processor Error, or
> an incomplete one, the current logic will fail to deferrence
> 
> 	err->section_length
> and
> 	ctx_info->size
> 
> Add checks to avoid that. With such changes, such GHESv2
> records won't cause OOPSes like this:
> 
> [    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
> [    1.495449] Modules linked in:
> [    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
> [    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
> [    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [    1.497199] pc : log_arm_hw_error+0x5c/0x200
> [    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
> 
> 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
> 70		err_info = (struct cper_arm_err_info *)(err + 1);
> 71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> 72		ctx_err = (u8 *)ctx_info;
> 73
> 74		for (n = 0; n < err->context_info_num; n++) {
> 75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> 76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> 77			ctx_len += sz;
> 78		}
> 79
> 
> and similar ones while trying to access section_length on an
> error dump with too small size.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Hi Mauro,

Resolved the stuff I pointed out in previous, so LGTM.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer
  2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab
  3 siblings, 0 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mauro Carvalho Chehab, Dan Williams, Dave Jiang, Gregory Price,
	Jonathan Cameron, Smita Koralahalli, linux-efi, linux-kernel

There's a logic inside ghes/cper to detect if the section_length
is too small, but it doesn't detect if it is too big.

Currently, if the firmware receives an ARM processor CPER record
stating that a section length is big, kernel will blindly trust
section_length, producing a very long dump. For instance, a 67
bytes record with ERR_INFO_NUM set 46198 and section length
set to 854918320 would dump a lot of data going a way past the
firmware memory-mapped area.

Fix it by adding a logic to prevent it to go past the buffer
if ERR_INFO_NUM is too big, making it report instead:

	[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
	[Hardware Error]: event severity: recoverable
	[Hardware Error]:  Error 0, type: recoverable
	[Hardware Error]:   section_type: ARM processor error
	[Hardware Error]:   MIDR: 0xff304b2f8476870a
	[Hardware Error]:   section length: 854918320, CPER size: 67
	[Hardware Error]:   section length is too big
	[Hardware Error]:   firmware-generated error record is incorrect
	[Hardware Error]:   ERR_INFO_NUM is 46198

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 drivers/firmware/efi/cper-arm.c | 12 ++++++++----
 drivers/firmware/efi/cper.c     |  3 ++-
 include/linux/cper.h            |  3 ++-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index 76542a53e202..b21cb1232d82 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -226,7 +226,8 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 }
 
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc)
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length)
 {
 	int i, len, max_ctx_type;
 	struct cper_arm_err_info *err_info;
@@ -238,9 +239,12 @@ void cper_print_proc_arm(const char *pfx,
 
 	len = proc->section_length - (sizeof(*proc) +
 		proc->err_info_num * (sizeof(*err_info)));
-	if (len < 0) {
-		printk("%ssection length: %d\n", pfx, proc->section_length);
-		printk("%ssection length is too small\n", pfx);
+
+	if (len < 0 || proc->section_length > length) {
+		printk("%ssection length: %d, CPER size: %d\n",
+		       pfx, proc->section_length, length);
+		printk("%ssection length is too %s\n", pfx,
+		       (len < 0) ? "small" : "big");
 		printk("%sfirmware-generated error record is incorrect\n", pfx);
 		printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
 		return;
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 0232bd040f61..88fc0293f876 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -659,7 +659,8 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
 
 		printk("%ssection_type: ARM processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*arm_err))
-			cper_print_proc_arm(newpfx, arm_err);
+			cper_print_proc_arm(newpfx, arm_err,
+					    gdata->error_data_length);
 		else
 			goto err_section_too_small;
 #endif
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 5b1236d8c65b..440b35e459e5 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -595,7 +595,8 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *,
 const char *cper_mem_err_unpack(struct trace_seq *,
 				struct cper_mem_err_compact *);
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc);
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length);
 void cper_print_proc_ia(const char *pfx,
 			const struct cper_sec_proc_ia *proc);
 int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record
  2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
  2026-01-06 16:07   ` Jonathan Cameron
  2026-01-06 10:01 ` [PATCH v4 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab
  3 siblings, 1 reply; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Robert Moore
  Cc: Mauro Carvalho Chehab, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Jason Tian, Jonathan Cameron, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
	acpica-devel, linux-acpi, linux-kernel

The logic at ghes_new() prevents allocating too large records, by
checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
Yet, the allocation is done with the actual number of pages from the
CPER bios table location, which can be smaller.

Yet, a bad firmware could send data with a different size, which might
be bigger than the allocated memory, causing an OOPS:

[13095.899926] Unable to handle kernel paging request at virtual address fff00000f9b40000
[13095.899961] Mem abort info:
[13095.900017]   ESR = 0x0000000096000007
[13095.900088]   EC = 0x25: DABT (current EL), IL = 32 bits
[13095.900156]   SET = 0, FnV = 0
[13095.900181]   EA = 0, S1PTW = 0
[13095.900211]   FSC = 0x07: level 3 translation fault
[13095.900255] Data abort info:
[13095.900421]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[13095.900486]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[13095.900525]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[13095.900713] swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
[13095.900752] [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
[13095.901312] Internal error: Oops: 0000000096000007 [#1]  SMP
[13095.901659] Modules linked in:
[13095.902201] CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
[13095.902461] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[13095.902719] Workqueue: kacpi_notify acpi_os_execute_deferred
[13095.903778] pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[13095.903892] pc : hex_dump_to_buffer+0x30c/0x4a0
[13095.904146] lr : hex_dump_to_buffer+0x328/0x4a0
[13095.904204] sp : ffff800080e13880
[13095.904291] x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
[13095.904704] x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
[13095.905335] x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
[13095.905483] x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
[13095.905617] x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
[13095.905850] x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
[13095.906175] x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
[13095.906733] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
[13095.907197] x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
[13095.907623] x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
[13095.908284] Call trace:
[13095.908866]  hex_dump_to_buffer+0x30c/0x4a0 (P)
[13095.909135]  print_hex_dump+0xac/0x170
[13095.909179]  cper_estatus_print_section+0x90c/0x968
[13095.909336]  cper_estatus_print+0xf0/0x158
[13095.909348]  __ghes_print_estatus+0xa0/0x148
[13095.909656]  ghes_proc+0x1bc/0x220
[13095.909883]  ghes_notify_hed+0x5c/0xb8
[13095.909957]  notifier_call_chain+0x78/0x148
[13095.910180]  blocking_notifier_call_chain+0x4c/0x80
[13095.910246]  acpi_hed_notify+0x28/0x40
[13095.910558]  acpi_ev_notify_dispatch+0x50/0x80
[13095.910576]  acpi_os_execute_deferred+0x24/0x48
[13095.911161]  process_one_work+0x15c/0x3b0
[13095.911326]  worker_thread+0x2d0/0x400
[13095.911775]  kthread+0x148/0x228
[13095.912082]  ret_from_fork+0x10/0x20
[13095.912687] Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
[13095.914085] ---[ end trace 0000000000000000 ]---

Prevent that by taking the actual allocated are into account when
checking for CPER length.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/acpi/apei/ghes.c | 6 +++++-
 include/acpi/ghes.h      | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fc3f8aed99d5..350f666b7783 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -29,6 +29,7 @@
 #include <linux/cper.h>
 #include <linux/cleanup.h>
 #include <linux/platform_device.h>
+#include <linux/minmax.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
 #include <linux/vmalloc.h>
@@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 		error_block_length = GHES_ESTATUS_MAX_SIZE;
 	}
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
+	ghes->error_block_length = error_block_length;
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
 		goto err_unmap_status_addr;
@@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
 				struct acpi_hest_generic_status *estatus)
 {
 	u32 len = cper_estatus_len(estatus);
+	u32 max_len = min(ghes->generic->error_block_length,
+			  ghes->error_block_length);
 
 	if (len < sizeof(*estatus)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
 		return -EIO;
 	}
 
-	if (len > ghes->generic->error_block_length) {
+	if (!len || len > max_len) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
 		return -EIO;
 	}
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index ebd21b05fe6e..5866f50bac0c 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -27,6 +27,7 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+	unsigned int error_block_length;
 	struct device *dev;
 	struct list_head elist;
 };
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record
  2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
@ 2026-01-06 16:07   ` Jonathan Cameron
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2026-01-06 16:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Robert Moore, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Jason Tian, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
	acpica-devel, linux-acpi, linux-kernel

On Tue,  6 Jan 2026 11:01:37 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The logic at ghes_new() prevents allocating too large records, by
> checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
> Yet, the allocation is done with the actual number of pages from the
> CPER bios table location, which can be smaller.
> 
> Yet, a bad firmware could send data with a different size, which might
> be bigger than the allocated memory, causing an OOPS:
> 
> [13095.899926] Unable to handle kernel paging request at virtual address fff00000f9b40000
> [13095.899961] Mem abort info:
> [13095.900017]   ESR = 0x0000000096000007
> [13095.900088]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13095.900156]   SET = 0, FnV = 0
> [13095.900181]   EA = 0, S1PTW = 0
> [13095.900211]   FSC = 0x07: level 3 translation fault
> [13095.900255] Data abort info:
> [13095.900421]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> [13095.900486]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [13095.900525]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [13095.900713] swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
> [13095.900752] [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
> [13095.901312] Internal error: Oops: 0000000096000007 [#1]  SMP
> [13095.901659] Modules linked in:
> [13095.902201] CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
> [13095.902461] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [13095.902719] Workqueue: kacpi_notify acpi_os_execute_deferred
> [13095.903778] pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [13095.903892] pc : hex_dump_to_buffer+0x30c/0x4a0
> [13095.904146] lr : hex_dump_to_buffer+0x328/0x4a0
> [13095.904204] sp : ffff800080e13880
> [13095.904291] x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
> [13095.904704] x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
> [13095.905335] x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
> [13095.905483] x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
> [13095.905617] x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
> [13095.905850] x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
> [13095.906175] x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
> [13095.906733] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
> [13095.907197] x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
> [13095.907623] x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
> [13095.908284] Call trace:
> [13095.908866]  hex_dump_to_buffer+0x30c/0x4a0 (P)
> [13095.909135]  print_hex_dump+0xac/0x170
> [13095.909179]  cper_estatus_print_section+0x90c/0x968
> [13095.909336]  cper_estatus_print+0xf0/0x158
> [13095.909348]  __ghes_print_estatus+0xa0/0x148
> [13095.909656]  ghes_proc+0x1bc/0x220
> [13095.909883]  ghes_notify_hed+0x5c/0xb8
> [13095.909957]  notifier_call_chain+0x78/0x148
> [13095.910180]  blocking_notifier_call_chain+0x4c/0x80
> [13095.910246]  acpi_hed_notify+0x28/0x40
> [13095.910558]  acpi_ev_notify_dispatch+0x50/0x80
> [13095.910576]  acpi_os_execute_deferred+0x24/0x48
> [13095.911161]  process_one_work+0x15c/0x3b0
> [13095.911326]  worker_thread+0x2d0/0x400
> [13095.911775]  kthread+0x148/0x228
> [13095.912082]  ret_from_fork+0x10/0x20
> [13095.912687] Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
> [13095.914085] ---[ end trace 0000000000000000 ]---
> 
> Prevent that by taking the actual allocated are into account when
> checking for CPER length.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

A naming comment inline.   The bikeshed must be blue!

Might have been better to +CC these to linux-edac like earlier
ones. If nothing else that wouldn't have broken my filtering and
I'd have gotten patch 4 (didn't as don't subscribe to efi or lkml
lists). FWIW patch 4 looks fine to me.

> ---
>  drivers/acpi/apei/ghes.c | 6 +++++-
>  include/acpi/ghes.h      | 1 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fc3f8aed99d5..350f666b7783 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -29,6 +29,7 @@
>  #include <linux/cper.h>
>  #include <linux/cleanup.h>
>  #include <linux/platform_device.h>
> +#include <linux/minmax.h>
>  #include <linux/mutex.h>
>  #include <linux/ratelimit.h>
>  #include <linux/vmalloc.h>
> @@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
>  		error_block_length = GHES_ESTATUS_MAX_SIZE;
>  	}
>  	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
> +	ghes->error_block_length = error_block_length;
Maybe it would be clearer to call it after what we care about which
is the length of estatus.  So
	ghes->estatus_length 
or something like that.  The reason I raise this is it feels a bit random
stashed in the ghes struct and there is no documentation to say what it
is the length of wrt to other elements of that structure.

>  	if (!ghes->estatus) {
>  		rc = -ENOMEM;
>  		goto err_unmap_status_addr;
> @@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
>  				struct acpi_hest_generic_status *estatus)
>  {
>  	u32 len = cper_estatus_len(estatus);
> +	u32 max_len = min(ghes->generic->error_block_length,
> +			  ghes->error_block_length);
>  
>  	if (len < sizeof(*estatus)) {
>  		pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
>  		return -EIO;
>  	}
>  
> -	if (len > ghes->generic->error_block_length) {
> +	if (!len || len > max_len) {
>  		pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
>  		return -EIO;
>  	}
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index ebd21b05fe6e..5866f50bac0c 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -27,6 +27,7 @@ struct ghes {
>  		struct timer_list timer;
>  		unsigned int irq;
>  	};
> +	unsigned int error_block_length;
Would it cause a big hole if this moved up near estatus?
If not I would be tempted to do that given it's effectively the length of that.

>  	struct device *dev;
>  	struct list_head elist;
>  };


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 4/4] efi/cper: don't dump the entire memory region
  2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
  3 siblings, 0 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Mauro Carvalho Chehab, linux-efi, linux-kernel

The current logic at cper_print_fw_err() doesn't check if the
error record length is big enough to handle offset. On a bad firmware,
if the ofset is above the actual record, length -= offset will
underflow, making it dump the entire memory.

The end result can be:

- the logic taking a lot of time dumping large regions of memory;
- data disclosure due to the memory dumps;
- an OOPS, if it tries to dump an unmapped memory region.

Fix it by checking if the section length is too small before doing
a hex dump.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/firmware/efi/cper.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 88fc0293f876..0e938fc5ccb1 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -560,6 +560,11 @@ static void cper_print_fw_err(const char *pfx,
 	} else {
 		offset = sizeof(*fw_err);
 	}
+	if (offset > length) {
+		printk("%s""error section length is too small: offset=%d, length=%d\n",
+		       pfx, offset, length);
+		return;
+	}
 
 	buf += offset;
 	length -= offset;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-06 16:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
2026-01-06 15:57   ` Jonathan Cameron
2026-01-06 10:01 ` [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
2026-01-06 16:07   ` Jonathan Cameron
2026-01-06 10:01 ` [PATCH v4 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox