From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Ankit Agrawal <ankita@nvidia.com>, Borislav Petkov <bp@alien8.de>,
Breno Leitao <leitao@debian.org>,
Hanjun Guo <guohanjun@huawei.com>, Ingo Molnar <mingo@kernel.org>,
Jason Tian <jason@os.amperecomputing.com>,
"Len Brown" <lenb@kernel.org>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Shuai Xue <xueshuai@linux.alibaba.com>,
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
Tony Luck <tony.luck@intel.com>, <linux-efi@vger.kernel.org>,
<linux-acpi@vger.kernel.org>, <linux-edac@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory
Date: Mon, 22 Dec 2025 11:38:51 +0000 [thread overview]
Message-ID: <20251222113851.000048f6@huawei.com> (raw)
In-Reply-To: <e80bc4eba43d0211713fe66958ec0c582d9bfda7.1766140788.git.mchehab+huawei@kernel.org>
On Fri, 19 Dec 2025 11:49:59 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> If the BIOS generates a very small ARM Processor Error, or
> an incomplete one, the current logic will fail to deferrence
>
> err->section_length
> and
> ctx_info->size
>
> Add checks to avoid that. With such changes, such GHESv2
> records won't cause OOPSes like this:
>
> [ 1.492129] Internal error: Oops: 0000000096000005 [#1] SMP
> [ 1.495449] Modules linked in:
> [ 1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
> [ 1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
> [ 1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
> [ 1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [ 1.497199] pc : log_arm_hw_error+0x5c/0x200
> [ 1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
>
> 0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
> 70 err_info = (struct cper_arm_err_info *)(err + 1);
> 71 ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> 72 ctx_err = (u8 *)ctx_info;
> 73
> 74 for (n = 0; n < err->context_info_num; n++) {
> 75 sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> 76 ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> 77 ctx_len += sz;
> 78 }
> 79
>
> and similar ones while trying to access section_length on an
> error dump with too small size.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Hi Mauro,
This is fiddly stuff to read in the spec but I think you have a double
counting of the "ARM Processors Error Information Structure" size as
the length in that this time is the length of the structure itself,
not a following body.
Jonathan
> ---
> drivers/acpi/apei/ghes.c | 33 +++++++++++++++++++++++++++++----
> drivers/ras/ras.c | 6 +++++-
> 2 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0dc767392a6c..9bf4ec84f160 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -552,21 +552,46 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
> {
> struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
> int flags = sync ? MF_ACTION_REQUIRED : 0;
> + int length = gdata->error_data_length;
> char error_type[120];
> bool queued = false;
> int sec_sev, i;
> char *p;
>
> sec_sev = ghes_severity(gdata->error_severity);
> - log_arm_hw_error(err, sec_sev);
> + if (length >= sizeof(*err)) {
> + log_arm_hw_error(err, sec_sev);
> + } else {
> + pr_warn(FW_BUG "arm error length: %d\n", length);
> + pr_warn(FW_BUG "length is too small\n");
> + pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
> + return false;
> + }
> +
> if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
> return false;
>
> p = (char *)(err + 1);
> + length -= sizeof(err);
Hacks off the bit of the section that is fixed size.
> +
> for (i = 0; i < err->err_info_num; i++) {
> - struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
> - bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> - bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
> + struct cper_arm_err_info *err_info;
> + bool is_cache, has_pa;
> +
> + /* Ensure we have enough data for the error info header */
> + length -= sizeof(*err_info);
hacks of length of one processor error information structure (fixed 32 bytes)
> + if (length < 0)
> + break;
> +
> + err_info = (struct cper_arm_err_info *)p;
> +
> + /* Validate the claimed length before using it */
> + length -= err_info->length;
This one confuses me. err_info->length is the same 32 bytes you removed above.
So I think this check is wrong.
> + if (length < 0)
> + break;
> +
> + is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
> + has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
>
> /*
> * The field (err_info->error_info & BIT(26)) is fixed to set to
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index 2a5b5a9fdcb3..03df3db62334 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -72,7 +72,11 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
> ctx_err = (u8 *)ctx_info;
>
> for (n = 0; n < err->context_info_num; n++) {
> - sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> + sz = sizeof(struct cper_arm_ctx_info);
> +
> + if (sz + (long)ctx_info - (long)err >= err->section_length)
> + sz += ctx_info->size;
> +
> ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> ctx_len += sz;
> }
next prev parent reply other threads:[~2025-12-22 11:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-19 10:49 [PATCH v3 0/2] apei/ghes: don't OOPS with bad ARM error CPER records Mauro Carvalho Chehab
2025-12-19 10:40 ` [PATCH v3 1/2] apei/ghes: ARM processor Error: don't go past allocated memory Mauro Carvalho Chehab
2025-12-19 10:49 ` Mauro Carvalho Chehab
2025-12-22 11:38 ` Jonathan Cameron [this message]
2025-12-22 13:53 ` Mauro Carvalho Chehab
2025-12-19 10:40 ` [PATCH v3 2/2] efi/cper: don't go past the ARM processor CPER record buffer Mauro Carvalho Chehab
2025-12-19 10:50 ` Mauro Carvalho Chehab
2025-12-22 11:36 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251222113851.000048f6@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=ankita@nvidia.com \
--cc=bp@alien8.de \
--cc=guohanjun@huawei.com \
--cc=jason@os.amperecomputing.com \
--cc=leitao@debian.org \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+huawei@kernel.org \
--cc=mchehab@kernel.org \
--cc=mingo@kernel.org \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.