[PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records
@ 2026-01-06 10:01 Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
  0 siblings, 2 replies; 5+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Robert Moore
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, Ankit Agrawal,
	Borislav Petkov, Breno Leitao, Dan Williams, Dave Jiang,
	Gregory Price, Hanjun Guo, Jason Tian, Jonathan Cameron,
	Jonathan Cameron, Len Brown, Mauro Carvalho Chehab, Shuai Xue,
	Smita Koralahalli, Tony Luck, acpica-devel, linux-acpi,
	linux-edac, linux-efi, linux-kernel, pengdonglin

Rafael,

Current parsing logic at apei/ghes for ARM Processor Error
assumes that the record sizes are correct. Yet, a bad BIOS
might produce malformed GHES reports.

Worse than that, it may end exposing data from other memory
addresses, as the logic may end dumping large portions of
the memory.

Avoid that by checking the buffer sizes where needed.

---

v4:
 - addressed Jonathan comments;
 - added two extra patches to prevent other OOM issues.

v3:
  - addressed Shuai feedback;
  - moved all ghes code to one patch;
  - fixed a typo and a bad indent;
  - cleanup the size check logic at ghes.c.


Mauro Carvalho Chehab (4):
  apei/ghes: ARM processor Error: don't go past allocated memory
  efi/cper: don't go past the ARM processor CPER record buffer
  apei/ghes: ensure that won't go past CPER allocated record
  efi/cper: don't dump the entire memory region

 drivers/acpi/apei/ghes.c        | 38 ++++++++++++++++++++++++++++-----
 drivers/firmware/efi/cper-arm.c | 12 +++++++----
 drivers/firmware/efi/cper.c     |  8 ++++++-
 drivers/ras/ras.c               |  6 +++++-
 include/acpi/ghes.h             |  1 +
 include/linux/cper.h            |  3 ++-
 6 files changed, 56 insertions(+), 12 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
  2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
  2026-01-06 15:57   ` Jonathan Cameron
  2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
  1 sibling, 1 reply; 5+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Jason Tian, Jonathan Cameron, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
	linux-acpi, linux-edac, linux-kernel, pengdonglin

If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence

	err->section_length
and
	ctx_info->size

Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:

[    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
[    1.495449] Modules linked in:
[    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
[    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    1.497199] pc : log_arm_hw_error+0x5c/0x200
[    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220

0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70		err_info = (struct cper_arm_err_info *)(err + 1);
71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72		ctx_err = (u8 *)ctx_info;
73
74		for (n = 0; n < err->context_info_num; n++) {
75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77			ctx_len += sz;
78		}
79

and similar ones while trying to access section_length on an
error dump with too small size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++++++++----
 drivers/ras/ras.c        |  6 +++++-
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0dc767392a6c..fc3f8aed99d5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -552,21 +552,45 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 {
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
+	int length = gdata->error_data_length;
 	char error_type[120];
 	bool queued = false;
 	int sec_sev, i;
 	char *p;
 
 	sec_sev = ghes_severity(gdata->error_severity);
-	log_arm_hw_error(err, sec_sev);
+	if (length >= sizeof(*err)) {
+		log_arm_hw_error(err, sec_sev);
+	} else {
+		pr_warn(FW_BUG "arm error length: %d\n", length);
+		pr_warn(FW_BUG "length is too small\n");
+		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
+		return false;
+	}
+
 	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
 		return false;
 
 	p = (char *)(err + 1);
+	length -= sizeof(err);
+
 	for (i = 0; i < err->err_info_num; i++) {
-		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
-		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
-		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
+		struct cper_arm_err_info *err_info;
+		bool is_cache, has_pa;
+
+		/* Ensure we have enough data for the error info header */
+		if (length < sizeof(*err_info))
+			break;
+
+		err_info = (struct cper_arm_err_info *)p;
+
+		/* Validate the claimed length before using it */
+		length -= err_info->length;
+		if (length < 0)
+			break;
+
+		is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
+		has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
 
 		/*
 		 * The field (err_info->error_info & BIT(26)) is fixed to set to
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 2a5b5a9fdcb3..03df3db62334 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 	ctx_err = (u8 *)ctx_info;
 
 	for (n = 0; n < err->context_info_num; n++) {
-		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
+		sz = sizeof(struct cper_arm_ctx_info);
+
+		if (sz + (long)ctx_info - (long)err >= err->section_length)
+			sz += ctx_info->size;
+
 		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
 		ctx_len += sz;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record
  2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
  2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
  2026-01-06 16:07   ` Jonathan Cameron
  1 sibling, 1 reply; 5+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Robert Moore
  Cc: Mauro Carvalho Chehab, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Jason Tian, Jonathan Cameron, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
	acpica-devel, linux-acpi, linux-kernel

The logic at ghes_new() prevents allocating too large records, by
checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
Yet, the allocation is done with the actual number of pages from the
CPER bios table location, which can be smaller.

Yet, a bad firmware could send data with a different size, which might
be bigger than the allocated memory, causing an OOPS:

[13095.899926] Unable to handle kernel paging request at virtual address fff00000f9b40000
[13095.899961] Mem abort info:
[13095.900017]   ESR = 0x0000000096000007
[13095.900088]   EC = 0x25: DABT (current EL), IL = 32 bits
[13095.900156]   SET = 0, FnV = 0
[13095.900181]   EA = 0, S1PTW = 0
[13095.900211]   FSC = 0x07: level 3 translation fault
[13095.900255] Data abort info:
[13095.900421]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[13095.900486]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[13095.900525]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[13095.900713] swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
[13095.900752] [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
[13095.901312] Internal error: Oops: 0000000096000007 [#1]  SMP
[13095.901659] Modules linked in:
[13095.902201] CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
[13095.902461] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[13095.902719] Workqueue: kacpi_notify acpi_os_execute_deferred
[13095.903778] pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[13095.903892] pc : hex_dump_to_buffer+0x30c/0x4a0
[13095.904146] lr : hex_dump_to_buffer+0x328/0x4a0
[13095.904204] sp : ffff800080e13880
[13095.904291] x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
[13095.904704] x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
[13095.905335] x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
[13095.905483] x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
[13095.905617] x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
[13095.905850] x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
[13095.906175] x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
[13095.906733] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
[13095.907197] x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
[13095.907623] x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
[13095.908284] Call trace:
[13095.908866]  hex_dump_to_buffer+0x30c/0x4a0 (P)
[13095.909135]  print_hex_dump+0xac/0x170
[13095.909179]  cper_estatus_print_section+0x90c/0x968
[13095.909336]  cper_estatus_print+0xf0/0x158
[13095.909348]  __ghes_print_estatus+0xa0/0x148
[13095.909656]  ghes_proc+0x1bc/0x220
[13095.909883]  ghes_notify_hed+0x5c/0xb8
[13095.909957]  notifier_call_chain+0x78/0x148
[13095.910180]  blocking_notifier_call_chain+0x4c/0x80
[13095.910246]  acpi_hed_notify+0x28/0x40
[13095.910558]  acpi_ev_notify_dispatch+0x50/0x80
[13095.910576]  acpi_os_execute_deferred+0x24/0x48
[13095.911161]  process_one_work+0x15c/0x3b0
[13095.911326]  worker_thread+0x2d0/0x400
[13095.911775]  kthread+0x148/0x228
[13095.912082]  ret_from_fork+0x10/0x20
[13095.912687] Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
[13095.914085] ---[ end trace 0000000000000000 ]---

Prevent that by taking the actual allocated are into account when
checking for CPER length.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/acpi/apei/ghes.c | 6 +++++-
 include/acpi/ghes.h      | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fc3f8aed99d5..350f666b7783 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -29,6 +29,7 @@
 #include <linux/cper.h>
 #include <linux/cleanup.h>
 #include <linux/platform_device.h>
+#include <linux/minmax.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
 #include <linux/vmalloc.h>
@@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 		error_block_length = GHES_ESTATUS_MAX_SIZE;
 	}
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
+	ghes->error_block_length = error_block_length;
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
 		goto err_unmap_status_addr;
@@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
 				struct acpi_hest_generic_status *estatus)
 {
 	u32 len = cper_estatus_len(estatus);
+	u32 max_len = min(ghes->generic->error_block_length,
+			  ghes->error_block_length);
 
 	if (len < sizeof(*estatus)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
 		return -EIO;
 	}
 
-	if (len > ghes->generic->error_block_length) {
+	if (!len || len > max_len) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
 		return -EIO;
 	}
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index ebd21b05fe6e..5866f50bac0c 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -27,6 +27,7 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+	unsigned int error_block_length;
 	struct device *dev;
 	struct list_head elist;
 };
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
  2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-01-06 15:57   ` Jonathan Cameron
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Cameron @ 2026-01-06 15:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Ankit Agrawal, Borislav Petkov, Breno Leitao,
	Hanjun Guo, Jason Tian, Len Brown, Mauro Carvalho Chehab,
	Shuai Xue, Smita Koralahalli, Tony Luck, linux-acpi, linux-edac,
	linux-kernel, pengdonglin

On Tue,  6 Jan 2026 11:01:35 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> If the BIOS generates a very small ARM Processor Error, or
> an incomplete one, the current logic will fail to deferrence
> 
> 	err->section_length
> and
> 	ctx_info->size
> 
> Add checks to avoid that. With such changes, such GHESv2
> records won't cause OOPSes like this:
> 
> [    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
> [    1.495449] Modules linked in:
> [    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
> [    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
> [    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [    1.497199] pc : log_arm_hw_error+0x5c/0x200
> [    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
> 
> 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
> 70		err_info = (struct cper_arm_err_info *)(err + 1);
> 71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> 72		ctx_err = (u8 *)ctx_info;
> 73
> 74		for (n = 0; n < err->context_info_num; n++) {
> 75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> 76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> 77			ctx_len += sz;
> 78		}
> 79
> 
> and similar ones while trying to access section_length on an
> error dump with too small size.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Hi Mauro,

Resolved the stuff I pointed out in previous, so LGTM.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record
  2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
@ 2026-01-06 16:07   ` Jonathan Cameron
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Cameron @ 2026-01-06 16:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Robert Moore, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Jason Tian, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
	acpica-devel, linux-acpi, linux-kernel

On Tue,  6 Jan 2026 11:01:37 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The logic at ghes_new() prevents allocating too large records, by
> checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
> Yet, the allocation is done with the actual number of pages from the
> CPER bios table location, which can be smaller.
> 
> Yet, a bad firmware could send data with a different size, which might
> be bigger than the allocated memory, causing an OOPS:
> 
> [13095.899926] Unable to handle kernel paging request at virtual address fff00000f9b40000
> [13095.899961] Mem abort info:
> [13095.900017]   ESR = 0x0000000096000007
> [13095.900088]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13095.900156]   SET = 0, FnV = 0
> [13095.900181]   EA = 0, S1PTW = 0
> [13095.900211]   FSC = 0x07: level 3 translation fault
> [13095.900255] Data abort info:
> [13095.900421]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> [13095.900486]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [13095.900525]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [13095.900713] swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
> [13095.900752] [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
> [13095.901312] Internal error: Oops: 0000000096000007 [#1]  SMP
> [13095.901659] Modules linked in:
> [13095.902201] CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
> [13095.902461] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [13095.902719] Workqueue: kacpi_notify acpi_os_execute_deferred
> [13095.903778] pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [13095.903892] pc : hex_dump_to_buffer+0x30c/0x4a0
> [13095.904146] lr : hex_dump_to_buffer+0x328/0x4a0
> [13095.904204] sp : ffff800080e13880
> [13095.904291] x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
> [13095.904704] x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
> [13095.905335] x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
> [13095.905483] x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
> [13095.905617] x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
> [13095.905850] x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
> [13095.906175] x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
> [13095.906733] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
> [13095.907197] x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
> [13095.907623] x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
> [13095.908284] Call trace:
> [13095.908866]  hex_dump_to_buffer+0x30c/0x4a0 (P)
> [13095.909135]  print_hex_dump+0xac/0x170
> [13095.909179]  cper_estatus_print_section+0x90c/0x968
> [13095.909336]  cper_estatus_print+0xf0/0x158
> [13095.909348]  __ghes_print_estatus+0xa0/0x148
> [13095.909656]  ghes_proc+0x1bc/0x220
> [13095.909883]  ghes_notify_hed+0x5c/0xb8
> [13095.909957]  notifier_call_chain+0x78/0x148
> [13095.910180]  blocking_notifier_call_chain+0x4c/0x80
> [13095.910246]  acpi_hed_notify+0x28/0x40
> [13095.910558]  acpi_ev_notify_dispatch+0x50/0x80
> [13095.910576]  acpi_os_execute_deferred+0x24/0x48
> [13095.911161]  process_one_work+0x15c/0x3b0
> [13095.911326]  worker_thread+0x2d0/0x400
> [13095.911775]  kthread+0x148/0x228
> [13095.912082]  ret_from_fork+0x10/0x20
> [13095.912687] Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
> [13095.914085] ---[ end trace 0000000000000000 ]---
> 
> Prevent that by taking the actual allocated are into account when
> checking for CPER length.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

A naming comment inline.   The bikeshed must be blue!

Might have been better to +CC these to linux-edac like earlier
ones. If nothing else that wouldn't have broken my filtering and
I'd have gotten patch 4 (didn't as don't subscribe to efi or lkml
lists). FWIW patch 4 looks fine to me.

> ---
>  drivers/acpi/apei/ghes.c | 6 +++++-
>  include/acpi/ghes.h      | 1 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fc3f8aed99d5..350f666b7783 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -29,6 +29,7 @@
>  #include <linux/cper.h>
>  #include <linux/cleanup.h>
>  #include <linux/platform_device.h>
> +#include <linux/minmax.h>
>  #include <linux/mutex.h>
>  #include <linux/ratelimit.h>
>  #include <linux/vmalloc.h>
> @@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
>  		error_block_length = GHES_ESTATUS_MAX_SIZE;
>  	}
>  	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
> +	ghes->error_block_length = error_block_length;
Maybe it would be clearer to call it after what we care about which
is the length of estatus.  So
	ghes->estatus_length 
or something like that.  The reason I raise this is it feels a bit random
stashed in the ghes struct and there is no documentation to say what it
is the length of wrt to other elements of that structure.

>  	if (!ghes->estatus) {
>  		rc = -ENOMEM;
>  		goto err_unmap_status_addr;
> @@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
>  				struct acpi_hest_generic_status *estatus)
>  {
>  	u32 len = cper_estatus_len(estatus);
> +	u32 max_len = min(ghes->generic->error_block_length,
> +			  ghes->error_block_length);
>  
>  	if (len < sizeof(*estatus)) {
>  		pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
>  		return -EIO;
>  	}
>  
> -	if (len > ghes->generic->error_block_length) {
> +	if (!len || len > max_len) {
>  		pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
>  		return -EIO;
>  	}
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index ebd21b05fe6e..5866f50bac0c 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -27,6 +27,7 @@ struct ghes {
>  		struct timer_list timer;
>  		unsigned int irq;
>  	};
> +	unsigned int error_block_length;
Would it cause a big hole if this moved up near estatus?
If not I would be tempted to do that given it's effectively the length of that.

>  	struct device *dev;
>  	struct list_head elist;
>  };


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-06 16:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
2026-01-06 15:57   ` Jonathan Cameron
2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
2026-01-06 16:07   ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox