From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDCA23AB460; Wed, 17 Jun 2026 14:10:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781705459; cv=none; b=Up6oRw/S16LvPTutH0dCEQqthmmh1Ubvu1OKwl6NASBzYKUI9IXQH4Fyzu7ncihkIA9eurHEp+jw4hFlCq0OF1WxV2xpcpRKBltAZJY9aggEFXhdGPcwQkQREzJpUYcNhwNTcqdha70CrQIVPhTyU1ZruBLZx64l6lwqjigaatQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781705459; c=relaxed/simple; bh=tzs0QnWcRKujWGxzUNNfuq3zOMNv4/IKWRBoX6CFwXc=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=lGApK2+vSUeNWqc9IHZp0Q6AnWT6MIYMAkI2mGt6aMqqx/pWPZH7fuLyVik+zZVIuqhMQLLKc3t8sDTyXNHhlSMX/JNH6d69tk7NB/i81UWTHPffXxHefThYOyCe4P37OWvrrWNIlffJFvBDr+oAaJA5Mnad/+K4yHMRB5ZeInI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eirwE3Vi; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eirwE3Vi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 75CAF1F000E9; Wed, 17 Jun 2026 14:10:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781705457; bh=7JVuUmQYzVTqj6TbcDcOl4vf5etSCNQftFAAWwJC/L4=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=eirwE3ViBtj9p4lN4KNGUonYxrzN2NQIvaruFYaqO6bugiF/t1roaZ4ncKdW1SEnv Xg2IOXopWEppnZDRUA00WtP6KEfkHf4u3iMfxeaJ5NmeS3VOuHhtgKPweUfPGQsPqE 4ooxZwQ0OyKq8MLPbQ1tGUtZ07tYWSjGcUaUfdvnsLZc/xLx0+ZID75FCMbPNRcZZQ EdkXItB88iR3zuoJWhEpiMrSCBoJJCqIqO6TVdt9lMAJ2VF9vBYPeg/73i2AcjCp4W rB1sRH+qU3B49/+Ymc0rjVpjmX8ApqufrJbUF7Q2qgLFZxlgSxDQpsLHUgs1YunLr2 0zL7mIhCZtTIg== From: sashiko-bot@kernel.org Subject: Re: [PATCH v6 10/10] RAS: add firmware-first CPER provider Reply-To: sashiko-reviews@lists.linux.dev To: "Ahmed Tiba" Cc: conor+dt@kernel.org, robh@kernel.org, linux-cxl@vger.kernel.org, devicetree@vger.kernel.org In-Reply-To: <20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-10-91f725174aa0@arm.com> References: <20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-0-91f725174aa0@arm.com> <20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-10-91f725174aa0@arm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 17 Jun 2026 14:10:56 +0000 Message-Id: <20260617141057.75CAF1F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 5 potential issue(= s) to consider: New issues: - [High] Unbounded memory leak of the global GHES estatus pool backing memo= ry on probe failure or repeated module bind/unbind. - [High] Unnecessary `spinlock_irqsave` entirely defeats the threaded IRQ h= andler and causes high interrupt latency. - [High] The threaded IRQ handler unconditionally returns `IRQ_HANDLED` eve= n when no error block is present, bypassing the kernel's spurious interrupt= detector. - [Medium] Missing Kconfig dependency on `OF_RESERVED_MEM` causes guarantee= d probe failure. Pre-existing issues: - [Critical] Integer overflow in `data_length` validation allows massive ou= t-of-bounds kernel memory read when `raw_data_length` is non-zero. -- commit d0750a54f0673f163ace66f8e9921ee0f1ba6bd4 Author: Ahmed Tiba RAS: add firmware-first CPER provider Add a firmware-first CPER provider that reuses the shared GHES helpers, wire it into the RAS Kconfig/Makefile and document it in the admin guide. > diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig > index fc4f4bb94a4c6..3c1c63b2fefc3 100644 > --- a/drivers/ras/Kconfig > +++ b/drivers/ras/Kconfig > @@ -34,6 +34,17 @@ if RAS > source "arch/x86/ras/Kconfig" > source "drivers/ras/amd/atl/Kconfig" > =20 > +config RAS_CPER_ESOURCE > + bool "Firmware-first CPER error source block provider" > + select GHES_CPER_HELPERS [Severity: Medium] Will this fail to probe on configurations without OF_RESERVED_MEM? The driver calls of_reserved_mem_region_to_resource() unconditionally, but if CONFIG_OF_RESERVED_MEM is disabled, that function is an inline stub that always returns -ENOSYS. Should there be a "depends on OF_RESERVED_MEM" here so it doesn't build broken configurations? > + help > + Enable support for firmware-first Common Platform Error Record [ ... ] > diff --git a/drivers/ras/cper-esource.c b/drivers/ras/cper-esource.c > new file mode 100644 > index 0000000000000..cc9f5f522400d > --- /dev/null > +++ b/drivers/ras/cper-esource.c [ ... ] > +static int cper_esource_validate_status(struct cper_esource *ctx) > +{ > + size_t estatus_len; > + > + if (!ctx->estatus->block_status) > + return -ENOENT; > + > + if (cper_estatus_check_header(ctx->estatus)) > + return -EINVAL; > + > + if (ctx->estatus->raw_data_length && > + (ctx->estatus->raw_data_offset > ctx->status_len || > + ctx->estatus->raw_data_length > > + ctx->status_len - ctx->estatus->raw_data_offset)) > + return -EINVAL; [Severity: Critical] This isn't a bug introduced by this patch, but does the validation logic have an integer overflow vulnerability? When raw_data_length is non-zero, this function relies on cper_estatus_check_header() to validate data_length. In drivers/firmware/efi/cper.c:cper_estatus_check_header(): if (estatus->raw_data_length && estatus->raw_data_offset < sizeof(*estatus) + estatus->data_len= gth) return -EINVAL; If firmware provides a massive data_length (e.g., 0xFFFFFFF0), the addition wraps around to a very small integer, bypassing the check. The subsequent cper_estatus_check() will then iterate over this unvalidated length, allowi= ng a massive out-of-bounds kernel memory read past the ctx->estatus buffer. Should this driver independently verify that sizeof(*estatus) + data_length <=3D ctx->status_len to protect against this? > + > + estatus_len =3D cper_esource_estatus_len(ctx->estatus); [ ... ] > +static void cper_esource_process(struct cper_esource *ctx) > +{ > + int rc; > + int sev; > + > + guard(spinlock_irqsave)(&ctx->lock); [Severity: High] Does this unnecessarily disable local interrupts? The cper_esource_irq() handler is requested as a threaded IRQ with IRQF_ONESHOT, which naturally serializes execution. Disabling interrupts across this entire function defeats the purpose of delegating work to a threaded context and introduces high interrupt latency, especially since it contains memory copies and potentially blocking reporting helpers. > + > + memcpy(ctx->estatus, ctx->status, ctx->status_len); [ ... ] > +static irqreturn_t cper_esource_irq(int irq, void *data) > +{ > + struct cper_esource *ctx =3D data; > + > + cper_esource_process(ctx); > + > + return IRQ_HANDLED; > +} [Severity: High] If cper_esource_process() returns early without doing any work, shouldn't this propagate an IRQ_NONE return? If the firmware hasn't set block_status, cper_esource_process() bails out early without clearing device state or acknowledging the interrupt. By always returning IRQ_HANDLED here, the spurious interrupt detector is bypassed. If this is a level-triggered interrupt, won't this result in an infinite loop of executing the threaded handler and cause a hard CPU lockup? > + > +static int cper_esource_init_ack(struct cper_esource *ctx) [ ... ] > +static int cper_esource_probe(struct platform_device *pdev) > +{ > + struct device *dev =3D &pdev->dev; > + struct cper_esource *ctx; > + size_t size; > + int source_id; > + int rc; [ ... ] > + rc =3D cper_esource_init_ack(ctx); > + if (rc) > + return rc; > + > + rc =3D cper_esource_init_pool(); > + if (rc) > + return rc; [Severity: High] Does this cause an unbounded memory leak if the probe sequence fails or the module is repeatedly bound/unbound? cper_esource_init_pool() calls ghes_estatus_pool_init(1). In drivers/acpi/apei/ghes_cper.c:ghes_estatus_pool_init(): addr =3D (unsigned long)vmalloc(len); ... rc =3D gen_pool_add(ghes_estatus_pool, addr, len, -1); This dynamically allocates and adds memory to the global pool on every call, even if the pool already exists. If the driver is unbound and rebound, or if platform_get_irq() later returns -EPROBE_DEFER, the allocated memory is permanently leaked. > + > + ctx->estatus =3D devm_kzalloc(dev, ctx->status_len, GFP_KERNEL); --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260617-topics-ahm= tib01-ras_ffh_arm_internal_review-v6-0-91f725174aa0@arm.com?part=3D10