Re: [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Robert Moore <robert.moore@intel.com>,
	Ankit Agrawal <ankita@nvidia.com>,
	"Borislav Petkov" <bp@alien8.de>,
	Breno Leitao <leitao@debian.org>,
	Hanjun Guo <guohanjun@huawei.com>,
	Jason Tian <jason@os.amperecomputing.com>,
	"Len Brown" <lenb@kernel.org>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	"Shuai Xue" <xueshuai@linux.alibaba.com>,
	Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	Tony Luck <tony.luck@intel.com>, <acpica-devel@lists.linux.dev>,
	<linux-acpi@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record
Date: Tue, 6 Jan 2026 16:07:27 +0000	[thread overview]
Message-ID: <20260106160727.00005ee2@huawei.com> (raw)
In-Reply-To: <8731f124c82a48850648695530a5442d60034de1.1767693532.git.mchehab+huawei@kernel.org>

On Tue,  6 Jan 2026 11:01:37 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The logic at ghes_new() prevents allocating too large records, by
> checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
> Yet, the allocation is done with the actual number of pages from the
> CPER bios table location, which can be smaller.
> 
> Yet, a bad firmware could send data with a different size, which might
> be bigger than the allocated memory, causing an OOPS:
> 
> [13095.899926] Unable to handle kernel paging request at virtual address fff00000f9b40000
> [13095.899961] Mem abort info:
> [13095.900017]   ESR = 0x0000000096000007
> [13095.900088]   EC = 0x25: DABT (current EL), IL = 32 bits
> [13095.900156]   SET = 0, FnV = 0
> [13095.900181]   EA = 0, S1PTW = 0
> [13095.900211]   FSC = 0x07: level 3 translation fault
> [13095.900255] Data abort info:
> [13095.900421]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> [13095.900486]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [13095.900525]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [13095.900713] swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
> [13095.900752] [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
> [13095.901312] Internal error: Oops: 0000000096000007 [#1]  SMP
> [13095.901659] Modules linked in:
> [13095.902201] CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
> [13095.902461] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [13095.902719] Workqueue: kacpi_notify acpi_os_execute_deferred
> [13095.903778] pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [13095.903892] pc : hex_dump_to_buffer+0x30c/0x4a0
> [13095.904146] lr : hex_dump_to_buffer+0x328/0x4a0
> [13095.904204] sp : ffff800080e13880
> [13095.904291] x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
> [13095.904704] x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
> [13095.905335] x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
> [13095.905483] x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
> [13095.905617] x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
> [13095.905850] x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
> [13095.906175] x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
> [13095.906733] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
> [13095.907197] x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
> [13095.907623] x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
> [13095.908284] Call trace:
> [13095.908866]  hex_dump_to_buffer+0x30c/0x4a0 (P)
> [13095.909135]  print_hex_dump+0xac/0x170
> [13095.909179]  cper_estatus_print_section+0x90c/0x968
> [13095.909336]  cper_estatus_print+0xf0/0x158
> [13095.909348]  __ghes_print_estatus+0xa0/0x148
> [13095.909656]  ghes_proc+0x1bc/0x220
> [13095.909883]  ghes_notify_hed+0x5c/0xb8
> [13095.909957]  notifier_call_chain+0x78/0x148
> [13095.910180]  blocking_notifier_call_chain+0x4c/0x80
> [13095.910246]  acpi_hed_notify+0x28/0x40
> [13095.910558]  acpi_ev_notify_dispatch+0x50/0x80
> [13095.910576]  acpi_os_execute_deferred+0x24/0x48
> [13095.911161]  process_one_work+0x15c/0x3b0
> [13095.911326]  worker_thread+0x2d0/0x400
> [13095.911775]  kthread+0x148/0x228
> [13095.912082]  ret_from_fork+0x10/0x20
> [13095.912687] Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
> [13095.914085] ---[ end trace 0000000000000000 ]---
> 
> Prevent that by taking the actual allocated are into account when
> checking for CPER length.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

A naming comment inline.   The bikeshed must be blue!

Might have been better to +CC these to linux-edac like earlier
ones. If nothing else that wouldn't have broken my filtering and
I'd have gotten patch 4 (didn't as don't subscribe to efi or lkml
lists). FWIW patch 4 looks fine to me.

> ---
>  drivers/acpi/apei/ghes.c | 6 +++++-
>  include/acpi/ghes.h      | 1 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fc3f8aed99d5..350f666b7783 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -29,6 +29,7 @@
>  #include <linux/cper.h>
>  #include <linux/cleanup.h>
>  #include <linux/platform_device.h>
> +#include <linux/minmax.h>
>  #include <linux/mutex.h>
>  #include <linux/ratelimit.h>
>  #include <linux/vmalloc.h>
> @@ -294,6 +295,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
>  		error_block_length = GHES_ESTATUS_MAX_SIZE;
>  	}
>  	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
> +	ghes->error_block_length = error_block_length;
Maybe it would be clearer to call it after what we care about which
is the length of estatus.  So
	ghes->estatus_length 
or something like that.  The reason I raise this is it feels a bit random
stashed in the ghes struct and there is no documentation to say what it
is the length of wrt to other elements of that structure.

>  	if (!ghes->estatus) {
>  		rc = -ENOMEM;
>  		goto err_unmap_status_addr;
> @@ -365,13 +367,15 @@ static int __ghes_check_estatus(struct ghes *ghes,
>  				struct acpi_hest_generic_status *estatus)
>  {
>  	u32 len = cper_estatus_len(estatus);
> +	u32 max_len = min(ghes->generic->error_block_length,
> +			  ghes->error_block_length);
>  
>  	if (len < sizeof(*estatus)) {
>  		pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
>  		return -EIO;
>  	}
>  
> -	if (len > ghes->generic->error_block_length) {
> +	if (!len || len > max_len) {
>  		pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
>  		return -EIO;
>  	}
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index ebd21b05fe6e..5866f50bac0c 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -27,6 +27,7 @@ struct ghes {
>  		struct timer_list timer;
>  		unsigned int irq;
>  	};
> +	unsigned int error_block_length;
Would it cause a big hole if this moved up near estatus?
If not I would be tempted to do that given it's effectively the length of that.

>  	struct device *dev;
>  	struct list_head elist;
>  };

next prev parent reply	other threads:[~2026-01-06 16:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-06 10:01 [PATCH v4 0/4] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 1/4] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
2026-01-06 15:57   ` Jonathan Cameron
2026-01-06 10:01 ` [PATCH v4 2/4] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
2026-01-06 10:01 ` [PATCH v4 3/4] apei/ghes: ensure that won't go past CPER allocated record Mauro Carvalho Chehab
2026-01-06 16:07   ` Jonathan Cameron [this message]
2026-01-06 10:01 ` [PATCH v4 4/4] efi/cper: don't dump the entire memory region Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260106160727.00005ee2@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=acpica-devel@lists.linux.dev \
    --cc=ankita@nvidia.com \
    --cc=bp@alien8.de \
    --cc=guohanjun@huawei.com \
    --cc=jason@os.amperecomputing.com \
    --cc=leitao@debian.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mchehab@kernel.org \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=tony.luck@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.