* [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
2026-01-06 15:57 ` Jonathan Cameron
2026-01-06 10:01 ` [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Mauro Carvalho Chehab, Ankit Agrawal, Borislav Petkov,
Breno Leitao, Hanjun Guo, Jason Tian, Jonathan Cameron, Len Brown,
Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
linux-acpi, linux-edac, linux-kernel, pengdonglin
If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence
err->section_length
and
ctx_info->size
Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:
[ 1.492129] Internal error: Oops: 0000000096000005 [#1] SMP
[ 1.495449] Modules linked in:
[ 1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
[ 1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[ 1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[ 1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1.497199] pc : log_arm_hw_error+0x5c/0x200
[ 1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70 err_info = (struct cper_arm_err_info *)(err + 1);
71 ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72 ctx_err = (u8 *)ctx_info;
73
74 for (n = 0; n < err->context_info_num; n++) {
75 sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76 ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77 ctx_len += sz;
78 }
79
and similar ones while trying to access section_length on an
error dump with too small size.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++++++++----
drivers/ras/ras.c | 6 +++++-
2 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0dc767392a6c..fc3f8aed99d5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -552,21 +552,45 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
{
struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
int flags = sync ? MF_ACTION_REQUIRED : 0;
+ int length = gdata->error_data_length;
char error_type[120];
bool queued = false;
int sec_sev, i;
char *p;
sec_sev = ghes_severity(gdata->error_severity);
- log_arm_hw_error(err, sec_sev);
+ if (length >= sizeof(*err)) {
+ log_arm_hw_error(err, sec_sev);
+ } else {
+ pr_warn(FW_BUG "arm error length: %d\n", length);
+ pr_warn(FW_BUG "length is too small\n");
+ pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
+ return false;
+ }
+
if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
return false;
p = (char *)(err + 1);
+ length -= sizeof(err);
+
for (i = 0; i < err->err_info_num; i++) {
- struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
- bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
- bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
+ struct cper_arm_err_info *err_info;
+ bool is_cache, has_pa;
+
+ /* Ensure we have enough data for the error info header */
+ if (length < sizeof(*err_info))
+ break;
+
+ err_info = (struct cper_arm_err_info *)p;
+
+ /* Validate the claimed length before using it */
+ length -= err_info->length;
+ if (length < 0)
+ break;
+
+ is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
+ has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
/*
* The field (err_info->error_info & BIT(26)) is fixed to set to
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 2a5b5a9fdcb3..03df3db62334 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
ctx_err = (u8 *)ctx_info;
for (n = 0; n < err->context_info_num; n++) {
- sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
+ sz = sizeof(struct cper_arm_ctx_info);
+
+ if (sz + (long)ctx_info - (long)err >= err->section_length)
+ sz += ctx_info->size;
+
ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
ctx_len += sz;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory
2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-01-06 15:57 ` Jonathan Cameron
0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2026-01-06 15:57 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Rafael J. Wysocki, Ankit Agrawal, Borislav Petkov, Breno Leitao,
Hanjun Guo, Jason Tian, Len Brown, Mauro Carvalho Chehab,
Shuai Xue, Smita Koralahalli, Tony Luck, linux-acpi, linux-edac,
linux-kernel, pengdonglin
On Tue, 6 Jan 2026 11:01:35 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> If the BIOS generates a very small ARM Processor Error, or
> an incomplete one, the current logic will fail to deferrence
>
> err->section_length
> and
> ctx_info->size
>
> Add checks to avoid that. With such changes, such GHESv2
> records won't cause OOPSes like this:
>
> [ 1.492129] Internal error: Oops: 0000000096000005 [#1] SMP
> [ 1.495449] Modules linked in:
> [ 1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
> [ 1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [ 1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
> [ 1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [ 1.497199] pc : log_arm_hw_error+0x5c/0x200
> [ 1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
>
> 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
> 70 err_info = (struct cper_arm_err_info *)(err + 1);
> 71 ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> 72 ctx_err = (u8 *)ctx_info;
> 73
> 74 for (n = 0; n < err->context_info_num; n++) {
> 75 sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> 76 ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> 77 ctx_len += sz;
> 78 }
> 79
>
> and similar ones while trying to access section_length on an
> error dump with too small size.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Hi Mauro,
Resolved the stuff I pointed out in previous, so LGTM.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer
2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab
3 siblings, 0 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Mauro Carvalho Chehab, Dan Williams, Dave Jiang, Gregory Price,
Jonathan Cameron, Smita Koralahalli, linux-efi, linux-kernel
There's a logic inside ghes/cper to detect if the section_length
is too small, but it doesn't detect if it is too big.
Currently, if the firmware receives an ARM processor CPER record
stating that a section length is big, kernel will blindly trust
section_length, producing a very long dump. For instance, a 67
bytes record with ERR_INFO_NUM set 46198 and section length
set to 854918320 would dump a lot of data going a way past the
firmware memory-mapped area.
Fix it by adding a logic to prevent it to go past the buffer
if ERR_INFO_NUM is too big, making it report instead:
[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[Hardware Error]: event severity: recoverable
[Hardware Error]: Error 0, type: recoverable
[Hardware Error]: section_type: ARM processor error
[Hardware Error]: MIDR: 0xff304b2f8476870a
[Hardware Error]: section length: 854918320, CPER size: 67
[Hardware Error]: section length is too big
[Hardware Error]: firmware-generated error record is incorrect
[Hardware Error]: ERR_INFO_NUM is 46198
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
drivers/firmware/efi/cper-arm.c | 12 ++++++++----
drivers/firmware/efi/cper.c | 3 ++-
include/linux/cper.h | 3 ++-
3 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index 76542a53e202..b21cb1232d82 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -226,7 +226,8 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
}
void cper_print_proc_arm(const char *pfx,
- const struct cper_sec_proc_arm *proc)
+ const struct cper_sec_proc_arm *proc,
+ u32 length)
{
int i, len, max_ctx_type;
struct cper_arm_err_info *err_info;
@@ -238,9 +239,12 @@ void cper_print_proc_arm(const char *pfx,
len = proc->section_length - (sizeof(*proc) +
proc->err_info_num * (sizeof(*err_info)));
- if (len < 0) {
- printk("%ssection length: %d\n", pfx, proc->section_length);
- printk("%ssection length is too small\n", pfx);
+
+ if (len < 0 || proc->section_length > length) {
+ printk("%ssection length: %d, CPER size: %d\n",
+ pfx, proc->section_length, length);
+ printk("%ssection length is too %s\n", pfx,
+ (len < 0) ? "small" : "big");
printk("%sfirmware-generated error record is incorrect\n", pfx);
printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
return;
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 0232bd040f61..88fc0293f876 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -659,7 +659,8 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
printk("%ssection_type: ARM processor error\n", newpfx);
if (gdata->error_data_length >= sizeof(*arm_err))
- cper_print_proc_arm(newpfx, arm_err);
+ cper_print_proc_arm(newpfx, arm_err,
+ gdata->error_data_length);
else
goto err_section_too_small;
#endif
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 5b1236d8c65b..440b35e459e5 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -595,7 +595,8 @@ void cper_mem_err_pack(const struct cper_sec_mem_err *,
const char *cper_mem_err_unpack(struct trace_seq *,
struct cper_mem_err_compact *);
void cper_print_proc_arm(const char *pfx,
- const struct cper_sec_proc_arm *proc);
+ const struct cper_sec_proc_arm *proc,
+ u32 length);
void cper_print_proc_ia(const char *pfx,
const struct cper_sec_proc_ia *proc);
int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg);
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record
2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
2026-01-06 16:07 ` Jonathan Cameron
2026-01-06 10:01 ` [PATCH v4 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab
3 siblings, 1 reply; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
To: Rafael J. Wysocki, Robert Moore
Cc: Mauro Carvalho Chehab, Ankit Agrawal, Borislav Petkov,
Breno Leitao, Hanjun Guo, Jason Tian, Jonathan Cameron, Len Brown,
Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
acpica-devel, linux-acpi, linux-kernel
The logic at ghes_new() prevents allocating too large records, by
checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
Yet, the allocation is done with the actual number of pages from the
CPER bios table location, which can be smaller.
Yet, a bad firmware could send data with a different size, which might
be bigger than the allocated memory, causing an OOPS:
[13095.899926] Unable to handle kernel paging request at virtual address fff00000f9b40000
[13095.899961] Mem abort info:
[13095.900017] ESR = 0x0000000096000007
[13095.900088] EC = 0x25: DABT (current EL), IL = 32 bits
[13095.900156] SET = 0, FnV = 0
[13095.900181] EA = 0, S1PTW = 0
[13095.900211] FSC = 0x07: level 3 translation fault
[13095.900255] Data abort info:
[13095.900421] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[13095.900486] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[13095.900525] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[13095.900713] swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
[13095.900752] [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
[13095.901312] Internal error: Oops: 0000000096000007 [#1] SMP
[13095.901659] Modules linked in:
[13095.902201] CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
[13095.902461] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[13095.902719] Workqueue: kacpi_notify acpi_os_execute_deferred
[13095.903778] pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[13095.903892] pc : hex_dump_to_buffer+0x30c/0x4a0
[13095.904146] lr : hex_dump_to_buffer+0x328/0x4a0
[13095.904204] sp : ffff800080e13880
[13095.904291] x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
[13095.904704] x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
[13095.905335] x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
[13095.905483] x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
[13095.905617] x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
[13095.905850] x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
[13095.906175] x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
[13095.906733] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
[13095.907197] x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
[13095.907623] x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
[13095.908284] Call trace:
[13095.908866] hex_dump_to_buffer+0x30c/0x4a0 (P)
[13095.909135] print_hex_dump+0xac/0x170
[13095.909179] cper_estatus_print_section+0x90c/0x968
[13095.909336] cper_estatus_print+0xf0/0x158
[13095.909348] __ghes_print_estatus+0xa0/0x148
[13095.909656] ghes_proc+0x1bc/0x220
[13095.909883] ghes_notify_hed+0x5c/0xb8
[13095.909957] notifier_call_chain+0x78/0x148
[13095.910180] blocking_notifier_call_chain+0x4c/0x80
[13095.910246] acpi_hed_notify+0x28/0x40
[13095.910558] acpi_ev_notify_dispatch+0x50/0x80
[13095.910576] acpi_os_execute_deferred+0x24/0x48
[13095.911161] process_one_work+0x15c/0x3b0
[13095.911326] worker_thread+0x2d0/0x400
[13095.911775] kthread+0x148/0x228
[13095.912082] ret_from_fork+0x10/0x20
[13095.912687] Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
[13095.914085] ---[ end trace 0000000000000000 ]---
Prevent that by taking the actual allocated are into account when
checking for CPER length.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
drivers/acpi/apei/ghes.c | 6 +++++-
include/acpi/ghes.h | 1 +
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fc3f8aed99d5..350f666b7783 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -29,6 +29,7 @@
#include <linux/cper.h>
#include <linux/cleanup.h>
#include <linux/platform_device.h>
+#include <linux/minmax.h>
#include <linux/mutex.h>
#include <linux/ratelimit.h>
#include <linux/vmalloc.h>
@@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
error_block_length = GHES_ESTATUS_MAX_SIZE;
}
ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
+ ghes->error_block_length = error_block_length;
if (!ghes->estatus) {
rc = -ENOMEM;
goto err_unmap_status_addr;
@@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
struct acpi_hest_generic_status *estatus)
{
u32 len = cper_estatus_len(estatus);
+ u32 max_len = min(ghes->generic->error_block_length,
+ ghes->error_block_length);
if (len < sizeof(*estatus)) {
pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
return -EIO;
}
- if (len > ghes->generic->error_block_length) {
+ if (!len || len > max_len) {
pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
return -EIO;
}
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index ebd21b05fe6e..5866f50bac0c 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -27,6 +27,7 @@ struct ghes {
struct timer_list timer;
unsigned int irq;
};
+ unsigned int error_block_length;
struct device *dev;
struct list_head elist;
};
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record
2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
@ 2026-01-06 16:07 ` Jonathan Cameron
0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2026-01-06 16:07 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Rafael J. Wysocki, Robert Moore, Ankit Agrawal, Borislav Petkov,
Breno Leitao, Hanjun Guo, Jason Tian, Len Brown,
Mauro Carvalho Chehab, Shuai Xue, Smita Koralahalli, Tony Luck,
acpica-devel, linux-acpi, linux-kernel
On Tue, 6 Jan 2026 11:01:37 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The logic at ghes_new() prevents allocating too large records, by
> checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
> Yet, the allocation is done with the actual number of pages from the
> CPER bios table location, which can be smaller.
>
> Yet, a bad firmware could send data with a different size, which might
> be bigger than the allocated memory, causing an OOPS:
>
> [13095.899926] Unable to handle kernel paging request at virtual address fff00000f9b40000
> [13095.899961] Mem abort info:
> [13095.900017] ESR = 0x0000000096000007
> [13095.900088] EC = 0x25: DABT (current EL), IL = 32 bits
> [13095.900156] SET = 0, FnV = 0
> [13095.900181] EA = 0, S1PTW = 0
> [13095.900211] FSC = 0x07: level 3 translation fault
> [13095.900255] Data abort info:
> [13095.900421] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> [13095.900486] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [13095.900525] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [13095.900713] swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
> [13095.900752] [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
> [13095.901312] Internal error: Oops: 0000000096000007 [#1] SMP
> [13095.901659] Modules linked in:
> [13095.902201] CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
> [13095.902461] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [13095.902719] Workqueue: kacpi_notify acpi_os_execute_deferred
> [13095.903778] pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [13095.903892] pc : hex_dump_to_buffer+0x30c/0x4a0
> [13095.904146] lr : hex_dump_to_buffer+0x328/0x4a0
> [13095.904204] sp : ffff800080e13880
> [13095.904291] x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
> [13095.904704] x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
> [13095.905335] x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
> [13095.905483] x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
> [13095.905617] x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
> [13095.905850] x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
> [13095.906175] x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
> [13095.906733] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
> [13095.907197] x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
> [13095.907623] x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
> [13095.908284] Call trace:
> [13095.908866] hex_dump_to_buffer+0x30c/0x4a0 (P)
> [13095.909135] print_hex_dump+0xac/0x170
> [13095.909179] cper_estatus_print_section+0x90c/0x968
> [13095.909336] cper_estatus_print+0xf0/0x158
> [13095.909348] __ghes_print_estatus+0xa0/0x148
> [13095.909656] ghes_proc+0x1bc/0x220
> [13095.909883] ghes_notify_hed+0x5c/0xb8
> [13095.909957] notifier_call_chain+0x78/0x148
> [13095.910180] blocking_notifier_call_chain+0x4c/0x80
> [13095.910246] acpi_hed_notify+0x28/0x40
> [13095.910558] acpi_ev_notify_dispatch+0x50/0x80
> [13095.910576] acpi_os_execute_deferred+0x24/0x48
> [13095.911161] process_one_work+0x15c/0x3b0
> [13095.911326] worker_thread+0x2d0/0x400
> [13095.911775] kthread+0x148/0x228
> [13095.912082] ret_from_fork+0x10/0x20
> [13095.912687] Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
> [13095.914085] ---[ end trace 0000000000000000 ]---
>
> Prevent that by taking the actual allocated are into account when
> checking for CPER length.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
A naming comment inline. The bikeshed must be blue!
Might have been better to +CC these to linux-edac like earlier
ones. If nothing else that wouldn't have broken my filtering and
I'd have gotten patch 4 (didn't as don't subscribe to efi or lkml
lists). FWIW patch 4 looks fine to me.
> ---
> drivers/acpi/apei/ghes.c | 6 +++++-
> include/acpi/ghes.h | 1 +
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fc3f8aed99d5..350f666b7783 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -29,6 +29,7 @@
> #include <linux/cper.h>
> #include <linux/cleanup.h>
> #include <linux/platform_device.h>
> +#include <linux/minmax.h>
> #include <linux/mutex.h>
> #include <linux/ratelimit.h>
> #include <linux/vmalloc.h>
> @@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
> error_block_length = GHES_ESTATUS_MAX_SIZE;
> }
> ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
> + ghes->error_block_length = error_block_length;
Maybe it would be clearer to call it after what we care about which
is the length of estatus. So
ghes->estatus_length
or something like that. The reason I raise this is it feels a bit random
stashed in the ghes struct and there is no documentation to say what it
is the length of wrt to other elements of that structure.
> if (!ghes->estatus) {
> rc = -ENOMEM;
> goto err_unmap_status_addr;
> @@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus)
> {
> u32 len = cper_estatus_len(estatus);
> + u32 max_len = min(ghes->generic->error_block_length,
> + ghes->error_block_length);
>
> if (len < sizeof(*estatus)) {
> pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
> return -EIO;
> }
>
> - if (len > ghes->generic->error_block_length) {
> + if (!len || len > max_len) {
> pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
> return -EIO;
> }
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index ebd21b05fe6e..5866f50bac0c 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -27,6 +27,7 @@ struct ghes {
> struct timer_list timer;
> unsigned int irq;
> };
> + unsigned int error_block_length;
Would it cause a big hole if this moved up near estatus?
If not I would be tempted to do that given it's effectively the length of that.
> struct device *dev;
> struct list_head elist;
> };
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 4/4] efi/cper: don't dump the entire memory region
2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
` (2 preceding siblings ...)
2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
@ 2026-01-06 10:01 ` Mauro Carvalho Chehab
3 siblings, 0 replies; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-06 10:01 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: Mauro Carvalho Chehab, linux-efi, linux-kernel
The current logic at cper_print_fw_err() doesn't check if the
error record length is big enough to handle offset. On a bad firmware,
if the ofset is above the actual record, length -= offset will
underflow, making it dump the entire memory.
The end result can be:
- the logic taking a lot of time dumping large regions of memory;
- data disclosure due to the memory dumps;
- an OOPS, if it tries to dump an unmapped memory region.
Fix it by checking if the section length is too small before doing
a hex dump.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
drivers/firmware/efi/cper.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 88fc0293f876..0e938fc5ccb1 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -560,6 +560,11 @@ static void cper_print_fw_err(const char *pfx,
} else {
offset = sizeof(*fw_err);
}
+ if (offset > length) {
+ printk("%s""error section length is too small: offset=%d, length=%d\n",
+ pfx, offset, length);
+ return;
+ }
buf += offset;
length -= offset;
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread