linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer
  2025-12-19 10:49 [PATCH v3 0/2] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
@ 2025-12-19 10:40 ` Mauro Carvalho Chehab
  2025-12-19 10:50   ` Mauro Carvalho Chehab
  2025-12-22 11:36   ` Jonathan Cameron
  2025-12-19 10:49 ` [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
  1 sibling, 2 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2025-12-19 10:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, Borislav Petkov,
	Dave Jiang, Fan Ni, Jonathan Cameron, Shuai Xue,
	Smita Koralahalli, linux-efi, linux-kernel

There's a logic inside ghes/cper to detect if the section_length
is too small, but it doesn't detect if it is too big.

Currently, if the firmware receives an ARM processor CPER record
stating that a section length is big, kernel will blindly trust
section_length, producing a very long dump. For instance, a 67
bytes record with ERR_INFO_NUM set 46198 and section length
set to 854918320 would dump a lot of data going a way past the
firmware memory-mapped area.

Fix it by adding a logic to prevent it to go past the buffer
if ERR_INFO_NUM is too big, making it report instead:

	[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
	[Hardware Error]: event severity: recoverable
	[Hardware Error]:  Error 0, type: recoverable
	[Hardware Error]:   section_type: ARM processor error
	[Hardware Error]:   MIDR: 0xff304b2f8476870a
	[Hardware Error]:   section length: 854918320, CPER size: 67
	[Hardware Error]:   section length is too big
	[Hardware Error]:   firmware-generated error record is incorrect
	[Hardware Error]:   ERR_INFO_NUM is 46198

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/firmware/efi/cper-arm.c | 12 ++++++++----
 drivers/firmware/efi/cper.c     |  3 ++-
 include/linux/cper.h            |  3 ++-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index 76542a53e202..b21cb1232d82 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -226,7 +226,8 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 }
 
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc)
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length)
 {
 	int i, len, max_ctx_type;
 	struct cper_arm_err_info *err_info;
@@ -238,9 +239,12 @@ void cper_print_proc_arm(const char *pfx,
 
 	len = proc->section_length - (sizeof(*proc) +
 		proc->err_info_num * (sizeof(*err_info)));
-	if (len < 0) {
-		printk("%ssection length: %d\n", pfx, proc->section_length);
-		printk("%ssection length is too small\n", pfx);
+
+	if (len < 0 || proc->section_length > length) {
+		printk("%ssection length: %d, CPER size: %d\n",
+		       pfx, proc->section_length, length);
+		printk("%ssection length is too %s\n", pfx,
+		       (len < 0) ? "small" : "big");
 		printk("%sfirmware-generated error record is incorrect\n", pfx);
 		printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
 		return;
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 0232bd040f61..88fc0293f876 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -659,7 +659,8 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
 
 		printk("%ssection_type: ARM processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*arm_err))
-			cper_print_proc_arm(newpfx, arm_err);
+			cper_print_proc_arm(newpfx, arm_err,
+					    gdata->error_data_length);
 		else
 			goto err_section_too_small;
 #endif
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 5b1236d8c65b..440b35e459e5 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -595,7 +595,8 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *,
 const char *cper_mem_err_unpack(struct trace_seq *,
 				struct cper_mem_err_compact *);
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc);
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length);
 void cper_print_proc_ia(const char *pfx,
 			const struct cper_sec_proc_ia *proc);
 int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 0/2] apei/ghes: don't OOPS with bad ARM error CPER records
@ 2025-12-19 10:49 Mauro Carvalho Chehab
  2025-12-19 10:40 ` [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
  2025-12-19 10:49 ` [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
  0 siblings, 2 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2025-12-19 10:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, Shuai Xue, linux-efi, linux-acpi,
	linux-edac, linux-kernel

Current parsing logic at apei/ghes for ARM Processor Error
assumes that the record sizes are correct. Yet, a bad BIOS
might produce malformed GHES reports.

Worse than that, it may end exposing data from other memory
addresses, as the logic may end dumping large portions of
the memory.

Avoid that by checking the buffer sizes where needed.

---

v3:
  - addressed Shuai feedback;
  - moved all ghes code to one patch;
  - fixed a typo and a bad indent;
  - cleanup the size check logic at ghes.c.

Mauro Carvalho Chehab (2):
  apei/ghes: ARM processor Error: don't go past allocated memory
  efi/cper: don't go past the ARM processor CPER record buffer

 drivers/acpi/apei/ghes.c        | 33 +++++++++++++++++++++++++++++----
 drivers/firmware/efi/cper-arm.c | 12 ++++++++----
 drivers/firmware/efi/cper.c     |  3 ++-
 drivers/ras/ras.c               |  6 +++++-
 include/linux/cper.h            |  3 ++-
 5 files changed, 46 insertions(+), 11 deletions(-)

-- 
2.52.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory
  2025-12-19 10:49 [PATCH v3 0/2] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
  2025-12-19 10:40 ` [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
@ 2025-12-19 10:49 ` Mauro Carvalho Chehab
  2025-12-22 11:38   ` Jonathan Cameron
  1 sibling, 1 reply; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2025-12-19 10:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, Ankit Agrawal, Borislav Petkov,
	Breno Leitao, Hanjun Guo, Ingo Molnar, Jason Tian,
	Jonathan Cameron, Len Brown, Mauro Carvalho Chehab, Shuai Xue,
	Smita Koralahalli, Tony Luck, linux-efi, linux-acpi, linux-edac,
	linux-kernel

If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence

	err->section_length
and
	ctx_info->size

Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:

[    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
[    1.495449] Modules linked in:
[    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
[    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    1.497199] pc : log_arm_hw_error+0x5c/0x200
[    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220

0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70		err_info = (struct cper_arm_err_info *)(err + 1);
71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72		ctx_err = (u8 *)ctx_info;
73
74		for (n = 0; n < err->context_info_num; n++) {
75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77			ctx_len += sz;
78		}
79

and similar ones while trying to access section_length on an
error dump with too small size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/acpi/apei/ghes.c | 33 +++++++++++++++++++++++++++++----
 drivers/ras/ras.c        |  6 +++++-
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0dc767392a6c..9bf4ec84f160 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -552,21 +552,46 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 {
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
+	int length = gdata->error_data_length;
 	char error_type[120];
 	bool queued = false;
 	int sec_sev, i;
 	char *p;
 
 	sec_sev = ghes_severity(gdata->error_severity);
-	log_arm_hw_error(err, sec_sev);
+	if (length >= sizeof(*err)) {
+		log_arm_hw_error(err, sec_sev);
+	} else {
+		pr_warn(FW_BUG "arm error length: %d\n", length);
+		pr_warn(FW_BUG "length is too small\n");
+		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
+		return false;
+	}
+
 	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
 		return false;
 
 	p = (char *)(err + 1);
+	length -= sizeof(err);
+
 	for (i = 0; i < err->err_info_num; i++) {
-		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
-		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
-		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
+		struct cper_arm_err_info *err_info;
+		bool is_cache, has_pa;
+
+		/* Ensure we have enough data for the error info header */
+		length -= sizeof(*err_info);
+		if (length < 0)
+			break;
+
+		err_info = (struct cper_arm_err_info *)p;
+
+		/* Validate the claimed length before using it */
+		length -= err_info->length;
+		if (length < 0)
+			break;
+
+		is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
+		has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
 
 		/*
 		 * The field (err_info->error_info & BIT(26)) is fixed to set to
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 2a5b5a9fdcb3..03df3db62334 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 	ctx_err = (u8 *)ctx_info;
 
 	for (n = 0; n < err->context_info_num; n++) {
-		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
+		sz = sizeof(struct cper_arm_ctx_info);
+
+		if (sz + (long)ctx_info - (long)err >= err->section_length)
+			sz += ctx_info->size;
+
 		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
 		ctx_len += sz;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer
  2025-12-19 10:40 ` [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
@ 2025-12-19 10:50   ` Mauro Carvalho Chehab
  2025-12-22 11:36   ` Jonathan Cameron
  1 sibling, 0 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2025-12-19 10:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, Borislav Petkov,
	Dave Jiang, Fan Ni, Jonathan Cameron, Shuai Xue,
	Smita Koralahalli, linux-efi, linux-acpi, linux-edac,
	linux-kernel

There's a logic inside ghes/cper to detect if the section_length
is too small, but it doesn't detect if it is too big.

Currently, if the firmware receives an ARM processor CPER record
stating that a section length is big, kernel will blindly trust
section_length, producing a very long dump. For instance, a 67
bytes record with ERR_INFO_NUM set 46198 and section length
set to 854918320 would dump a lot of data going a way past the
firmware memory-mapped area.

Fix it by adding a logic to prevent it to go past the buffer
if ERR_INFO_NUM is too big, making it report instead:

	[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
	[Hardware Error]: event severity: recoverable
	[Hardware Error]:  Error 0, type: recoverable
	[Hardware Error]:   section_type: ARM processor error
	[Hardware Error]:   MIDR: 0xff304b2f8476870a
	[Hardware Error]:   section length: 854918320, CPER size: 67
	[Hardware Error]:   section length is too big
	[Hardware Error]:   firmware-generated error record is incorrect
	[Hardware Error]:   ERR_INFO_NUM is 46198

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/firmware/efi/cper-arm.c | 12 ++++++++----
 drivers/firmware/efi/cper.c     |  3 ++-
 include/linux/cper.h            |  3 ++-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index 76542a53e202..b21cb1232d82 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -226,7 +226,8 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 }
 
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc)
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length)
 {
 	int i, len, max_ctx_type;
 	struct cper_arm_err_info *err_info;
@@ -238,9 +239,12 @@ void cper_print_proc_arm(const char *pfx,
 
 	len = proc->section_length - (sizeof(*proc) +
 		proc->err_info_num * (sizeof(*err_info)));
-	if (len < 0) {
-		printk("%ssection length: %d\n", pfx, proc->section_length);
-		printk("%ssection length is too small\n", pfx);
+
+	if (len < 0 || proc->section_length > length) {
+		printk("%ssection length: %d, CPER size: %d\n",
+		       pfx, proc->section_length, length);
+		printk("%ssection length is too %s\n", pfx,
+		       (len < 0) ? "small" : "big");
 		printk("%sfirmware-generated error record is incorrect\n", pfx);
 		printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
 		return;
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 0232bd040f61..88fc0293f876 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -659,7 +659,8 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
 
 		printk("%ssection_type: ARM processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*arm_err))
-			cper_print_proc_arm(newpfx, arm_err);
+			cper_print_proc_arm(newpfx, arm_err,
+					    gdata->error_data_length);
 		else
 			goto err_section_too_small;
 #endif
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 5b1236d8c65b..440b35e459e5 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -595,7 +595,8 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *,
 const char *cper_mem_err_unpack(struct trace_seq *,
 				struct cper_mem_err_compact *);
 void cper_print_proc_arm(const char *pfx,
-			 const struct cper_sec_proc_arm *proc);
+			 const struct cper_sec_proc_arm *proc,
+			 u32 length);
 void cper_print_proc_ia(const char *pfx,
 			const struct cper_sec_proc_ia *proc);
 int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer
  2025-12-19 10:40 ` [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
  2025-12-19 10:50   ` Mauro Carvalho Chehab
@ 2025-12-22 11:36   ` Jonathan Cameron
  1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2025-12-22 11:36 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Ard Biesheuvel, Borislav Petkov, Dave Jiang,
	Fan Ni, Shuai Xue, Smita Koralahalli, linux-efi, linux-acpi,
	linux-edac, linux-kernel

On Fri, 19 Dec 2025 11:50:00 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> There's a logic inside ghes/cper to detect if the section_length
> is too small, but it doesn't detect if it is too big.
> 
> Currently, if the firmware receives an ARM processor CPER record
> stating that a section length is big, kernel will blindly trust
> section_length, producing a very long dump. For instance, a 67
> bytes record with ERR_INFO_NUM set 46198 and section length
> set to 854918320 would dump a lot of data going a way past the
> firmware memory-mapped area.
> 
> Fix it by adding a logic to prevent it to go past the buffer
> if ERR_INFO_NUM is too big, making it report instead:
> 
> 	[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> 	[Hardware Error]: event severity: recoverable
> 	[Hardware Error]:  Error 0, type: recoverable
> 	[Hardware Error]:   section_type: ARM processor error
> 	[Hardware Error]:   MIDR: 0xff304b2f8476870a
> 	[Hardware Error]:   section length: 854918320, CPER size: 67
> 	[Hardware Error]:   section length is too big
> 	[Hardware Error]:   firmware-generated error record is incorrect
> 	[Hardware Error]:   ERR_INFO_NUM is 46198
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory
  2025-12-19 10:49 ` [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2025-12-22 11:38   ` Jonathan Cameron
  2025-12-22 13:53     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2025-12-22 11:38 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rafael J. Wysocki, Ankit Agrawal, Borislav Petkov, Breno Leitao,
	Hanjun Guo, Ingo Molnar, Jason Tian, Len Brown,
	Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
	linux-efi, linux-acpi, linux-edac, linux-kernel

On Fri, 19 Dec 2025 11:49:59 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> If the BIOS generates a very small ARM Processor Error, or
> an incomplete one, the current logic will fail to deferrence
> 
> 	err->section_length
> and
> 	ctx_info->size
> 
> Add checks to avoid that. With such changes, such GHESv2
> records won't cause OOPSes like this:
> 
> [    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
> [    1.495449] Modules linked in:
> [    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
> [    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
> [    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [    1.497199] pc : log_arm_hw_error+0x5c/0x200
> [    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
> 
> 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
> 70		err_info = (struct cper_arm_err_info *)(err + 1);
> 71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> 72		ctx_err = (u8 *)ctx_info;
> 73
> 74		for (n = 0; n < err->context_info_num; n++) {
> 75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> 76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> 77			ctx_len += sz;
> 78		}
> 79
> 
> and similar ones while trying to access section_length on an
> error dump with too small size.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Hi Mauro,

This is fiddly stuff to read in the spec but I think you have a double
counting of the "ARM Processors Error Information Structure" size as
the length in that this time is the length of the structure itself,
not a following body.

Jonathan


> ---
>  drivers/acpi/apei/ghes.c | 33 +++++++++++++++++++++++++++++----
>  drivers/ras/ras.c        |  6 +++++-
>  2 files changed, 34 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0dc767392a6c..9bf4ec84f160 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -552,21 +552,46 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
>  {
>  	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
>  	int flags = sync ? MF_ACTION_REQUIRED : 0;
> +	int length = gdata->error_data_length;
>  	char error_type[120];
>  	bool queued = false;
>  	int sec_sev, i;
>  	char *p;
>  
>  	sec_sev = ghes_severity(gdata->error_severity);
> -	log_arm_hw_error(err, sec_sev);
> +	if (length >= sizeof(*err)) {
> +		log_arm_hw_error(err, sec_sev);
> +	} else {
> +		pr_warn(FW_BUG "arm error length: %d\n", length);
> +		pr_warn(FW_BUG "length is too small\n");
> +		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
> +		return false;
> +	}
> +
>  	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
>  		return false;
>  
>  	p = (char *)(err + 1);
> +	length -= sizeof(err);
Hacks off the bit of the section that is fixed size.
> +
>  	for (i = 0; i < err->err_info_num; i++) {
> -		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
> -		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> -		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
> +		struct cper_arm_err_info *err_info;
> +		bool is_cache, has_pa;
> +
> +		/* Ensure we have enough data for the error info header */
> +		length -= sizeof(*err_info);
hacks of length of one processor error information structure (fixed 32 bytes)

> +		if (length < 0)
> +			break;
> +
> +		err_info = (struct cper_arm_err_info *)p;
> +
> +		/* Validate the claimed length before using it */
> +		length -= err_info->length;

This one confuses me.  err_info->length is the same 32 bytes you removed above.

So I think this check is wrong.

 
> +		if (length < 0)
> +			break;
> +
> +		is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> +		has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
>  
>  		/*
>  		 * The field (err_info->error_info & BIT(26)) is fixed to set to
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index 2a5b5a9fdcb3..03df3db62334 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
>  	ctx_err = (u8 *)ctx_info;
>  
>  	for (n = 0; n < err->context_info_num; n++) {
> -		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> +		sz = sizeof(struct cper_arm_ctx_info);
> +
> +		if (sz + (long)ctx_info - (long)err >= err->section_length)
> +			sz += ctx_info->size;
> +
>  		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
>  		ctx_len += sz;
>  	}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory
  2025-12-22 11:38   ` Jonathan Cameron
@ 2025-12-22 13:53     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2025-12-22 13:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mauro Carvalho Chehab, Rafael J. Wysocki, Ankit Agrawal,
	Borislav Petkov, Breno Leitao, Hanjun Guo, Ingo Molnar,
	Jason Tian, Len Brown, Mauro Carvalho Chehab, Shuai Xue,
	Smita Koralahalli, Tony Luck, linux-efi, linux-acpi, linux-edac,
	linux-kernel

On Mon, Dec 22, 2025 at 11:38:51AM +0000, Jonathan Cameron wrote:
> On Fri, 19 Dec 2025 11:49:59 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > If the BIOS generates a very small ARM Processor Error, or
> > an incomplete one, the current logic will fail to deferrence
> > 
> > 	err->section_length
> > and
> > 	ctx_info->size
> > 
> > Add checks to avoid that. With such changes, such GHESv2
> > records won't cause OOPSes like this:
> > 
> > [    1.492129] Internal error: Oops: 0000000096000005 [#1]  SMP
> > [    1.495449] Modules linked in:
> > [    1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
> > [    1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> > [    1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
> > [    1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > [    1.497199] pc : log_arm_hw_error+0x5c/0x200
> > [    1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
> > 
> > 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
> > 70		err_info = (struct cper_arm_err_info *)(err + 1);
> > 71		ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> > 72		ctx_err = (u8 *)ctx_info;
> > 73
> > 74		for (n = 0; n < err->context_info_num; n++) {
> > 75			sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> > 76			ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> > 77			ctx_len += sz;
> > 78		}
> > 79
> > 
> > and similar ones while trying to access section_length on an
> > error dump with too small size.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Hi Mauro,
> 
> This is fiddly stuff to read in the spec but I think you have a double
> counting of the "ARM Processors Error Information Structure" size as
> the length in that this time is the length of the structure itself,
> not a following body.

True. The change below should fix it:

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d37540ef8c00..aacb8d66a3e1 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -582,8 +582,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 		bool is_cache, has_pa;
 
 		/* Ensure we have enough data for the error info header */
-		length -= sizeof(*err_info);
-		if (length < 0)
+		if (length < sizeof(*err_info))
 			break;
 
 		err_info = (struct cper_arm_err_info *)p;


I'll run some tests here after the change before submitting v4.

Thanks!

Mauro

> 
> Jonathan
> 
> 
> > ---
> >  drivers/acpi/apei/ghes.c | 33 +++++++++++++++++++++++++++++----
> >  drivers/ras/ras.c        |  6 +++++-
> >  2 files changed, 34 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > index 0dc767392a6c..9bf4ec84f160 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -552,21 +552,46 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
> >  {
> >  	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
> >  	int flags = sync ? MF_ACTION_REQUIRED : 0;
> > +	int length = gdata->error_data_length;
> >  	char error_type[120];
> >  	bool queued = false;
> >  	int sec_sev, i;
> >  	char *p;
> >  
> >  	sec_sev = ghes_severity(gdata->error_severity);
> > -	log_arm_hw_error(err, sec_sev);
> > +	if (length >= sizeof(*err)) {
> > +		log_arm_hw_error(err, sec_sev);
> > +	} else {
> > +		pr_warn(FW_BUG "arm error length: %d\n", length);
> > +		pr_warn(FW_BUG "length is too small\n");
> > +		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
> > +		return false;
> > +	}
> > +
> >  	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
> >  		return false;
> >  
> >  	p = (char *)(err + 1);
> > +	length -= sizeof(err);
> Hacks off the bit of the section that is fixed size.
> > +
> >  	for (i = 0; i < err->err_info_num; i++) {
> > -		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
> > -		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> > -		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
> > +		struct cper_arm_err_info *err_info;
> > +		bool is_cache, has_pa;
> > +
> > +		/* Ensure we have enough data for the error info header */
> > +		length -= sizeof(*err_info);
> hacks of length of one processor error information structure (fixed 32 bytes)
> 
> > +		if (length < 0)
> > +			break;
> > +
> > +		err_info = (struct cper_arm_err_info *)p;
> > +
> > +		/* Validate the claimed length before using it */
> > +		length -= err_info->length;
> 
> This one confuses me.  err_info->length is the same 32 bytes you removed above.
> 
> So I think this check is wrong.
> 
>  
> > +		if (length < 0)
> > +			break;
> > +
> > +		is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> > +		has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
> >  
> >  		/*
> >  		 * The field (err_info->error_info & BIT(26)) is fixed to set to
> > diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> > index 2a5b5a9fdcb3..03df3db62334 100644
> > --- a/drivers/ras/ras.c
> > +++ b/drivers/ras/ras.c
> > @@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
> >  	ctx_err = (u8 *)ctx_info;
> >  
> >  	for (n = 0; n < err->context_info_num; n++) {
> > -		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> > +		sz = sizeof(struct cper_arm_ctx_info);
> > +
> > +		if (sz + (long)ctx_info - (long)err >= err->section_length)
> > +			sz += ctx_info->size;
> > +
> >  		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> >  		ctx_len += sz;
> >  	}
> 

-- 
Thanks,
Mauro

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-12-22 13:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 10:49 [PATCH v3 0/2] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2025-12-19 10:40 ` [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
2025-12-19 10:50   ` Mauro Carvalho Chehab
2025-12-22 11:36   ` Jonathan Cameron
2025-12-19 10:49 ` [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
2025-12-22 11:38   ` Jonathan Cameron
2025-12-22 13:53     ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).