* [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler
@ 2026-06-16 0:40 Dave Jiang
2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
0 siblings, 2 replies; 5+ messages in thread
From: Dave Jiang @ 2026-06-16 0:40 UTC (permalink / raw)
To: linux-cxl; +Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien
Fix the issues reported by Flavien to the Linux security mailing list.
First patch addresses accessing possible invalid pointers for cxlmd and
cxmd->endpoint.
The second patch adddresses the endpoint lifetime that is shorter than
mds and needs appropriate guardrails to access.
Dave Jiang (2):
cxl/mce: Validate memdev and endpoint before dereference in
cxl_handle_mce()
cxl/mce: Serialize the MCE handler against endpoint teardown
drivers/cxl/core/mce.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
--
2.54.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce()
2026-06-16 0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang
@ 2026-06-16 0:40 ` Dave Jiang
2026-06-16 0:54 ` sashiko-bot
2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
1 sibling, 1 reply; 5+ messages in thread
From: Dave Jiang @ 2026-06-16 0:40 UTC (permalink / raw)
To: linux-cxl
Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien,
stable
cxlmd and endpoint are both used in cxl_handle_mce() without proper
validation, which can lead to NULL pointer dereference or invalid pointer
dereference. The notifier is registered in cxl_memdev_state_create()
when the CXL PCI driver first binds, before the memdev is published and
before it is attached to a CXL topology.
Add checks to cxlmd and endpoint to ensure they are valid before usage.
Reported-by: Flavien Solt <flavien@nus.edu.sg>
Fixes: 516e5bd0b6bf ("cxl: Add mce notifier to emit aliased address for extended linear cache")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/mce.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
index ff8d078c6ca1..47566015eb00 100644
--- a/drivers/cxl/core/mce.c
+++ b/drivers/cxl/core/mce.c
@@ -13,7 +13,7 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
mce_notifier);
struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
- struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxl_port *endpoint;
struct mce *mce = data;
u64 spa, spa_alias;
unsigned long pfn;
@@ -21,7 +21,11 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
if (!mce || !mce_usable_address(mce))
return NOTIFY_DONE;
- if (!endpoint)
+ if (!cxlmd)
+ return NOTIFY_DONE;
+
+ endpoint = cxlmd->endpoint;
+ if (IS_ERR_OR_NULL(endpoint))
return NOTIFY_DONE;
spa = mce->addr & MCI_ADDR_PHYSADDR;
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown
2026-06-16 0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang
2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
@ 2026-06-16 0:40 ` Dave Jiang
2026-06-16 1:03 ` sashiko-bot
1 sibling, 1 reply; 5+ messages in thread
From: Dave Jiang @ 2026-06-16 0:40 UTC (permalink / raw)
To: linux-cxl
Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien,
stable
CXL endpoint has a shorter lifetime than CXL memdev state (mds) and
the MCE notifier is part of the mds. The MCE handler needs to take
a reference on the endpoint in order to keep it alive while operating
on it. Take the cxlmd lock to verify the endpoint is still valid and
take a reference on it before accessing it.
Reported-by: Flavien Solt <flavien@nus.edu.sg>
Fixes: 516e5bd0b6bf ("cxl: Add mce notifier to emit aliased address for extended linear cache")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/mce.c | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
index 47566015eb00..e684e411921b 100644
--- a/drivers/cxl/core/mce.c
+++ b/drivers/cxl/core/mce.c
@@ -7,13 +7,27 @@
#include <cxlmem.h>
#include "mce.h"
+static struct device *cxlmd_get_endpoint_dev(struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *endpoint;
+
+ if (!cxlmd)
+ return NULL;
+
+ guard(device)(&cxlmd->dev);
+ endpoint = cxlmd->endpoint;
+ if (IS_ERR_OR_NULL(endpoint))
+ return NULL;
+
+ return get_device(&endpoint->dev);
+}
+
static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
void *data)
{
struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
mce_notifier);
struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
- struct cxl_port *endpoint;
struct mce *mce = data;
u64 spa, spa_alias;
unsigned long pfn;
@@ -24,8 +38,13 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
if (!cxlmd)
return NOTIFY_DONE;
- endpoint = cxlmd->endpoint;
- if (IS_ERR_OR_NULL(endpoint))
+ /*
+ * With the cxlmd device lock held, check the cxlmd->endpoint pointer
+ * and then take a reference of the device in order to keep it alive
+ * while accessing it.
+ */
+ struct device *dev __free(put_device) = cxlmd_get_endpoint_dev(cxlmd);
+ if (!dev)
return NOTIFY_DONE;
spa = mce->addr & MCI_ADDR_PHYSADDR;
@@ -34,7 +53,7 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
if (!pfn_valid(pfn))
return NOTIFY_DONE;
- spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa);
+ spa_alias = cxl_port_get_spa_cache_alias(to_cxl_port(dev), spa);
if (spa_alias == ~0ULL)
return NOTIFY_DONE;
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce()
2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
@ 2026-06-16 0:54 ` sashiko-bot
0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-16 0:54 UTC (permalink / raw)
To: Dave Jiang; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] The validation added for `cxlmd` fails to account for device teardown, where `cxlds->cxlmd` becomes a dangling pointer, leading to an incomplete fix and Use-After-Free.
--
commit d4d366d587ddef82637af37fbf8cd28754d28024
Author: Dave Jiang <dave.jiang@intel.com>
cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce()
cxlmd and endpoint are both used in cxl_handle_mce() without proper
validation, which can lead to NULL pointer dereference or invalid pointer
dereference. The notifier is registered in cxl_memdev_state_create()
when the CXL PCI driver first binds.
> diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
> index ff8d078c6ca1e..47566015eb006 100644
> --- a/drivers/cxl/core/mce.c
> +++ b/drivers/cxl/core/mce.c
[ ... ]
> @@ -21,7 +21,11 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
> if (!mce || !mce_usable_address(mce))
> return NOTIFY_DONE;
>
> - if (!endpoint)
> + if (!cxlmd)
> + return NOTIFY_DONE;
[Severity: High]
This isn't a bug introduced by this patch, but does this check adequately
protect against a use-after-free during device teardown?
The MCE notifier is registered via devm_cxl_register_mce_notifier() early in
the PCI probe sequence. Later, cxl_memdev_autoremove() registers the
cxl_memdev_unregister() teardown action via devm_add_action_or_reset().
Because devm actions execute in LIFO order, cxl_memdev_unregister() runs
before the MCE notifier is unregistered.
During cxl_memdev_unregister(), put_device(&cxlmd->dev) is called, which can
free the cxlmd structure. However, neither cxl_memdev_unregister() nor
cxl_memdev_shutdown() clears the cxlds->cxlmd pointer.
This leaves mds->cxlds.cxlmd as a dangling pointer. If an MCE occurs during
this teardown window, cxl_handle_mce() will find a non-NULL cxlmd and pass it
to cxlmd_get_endpoint_dev(), which calls guard(device)(&cxlmd->dev), resulting
in a use-after-free.
> +
> + endpoint = cxlmd->endpoint;
> + if (IS_ERR_OR_NULL(endpoint))
> return NOTIFY_DONE;
>
> spa = mce->addr & MCI_ADDR_PHYSADDR;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004007.4186004-1-dave.jiang@intel.com?part=1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown
2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
@ 2026-06-16 1:03 ` sashiko-bot
0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-16 1:03 UTC (permalink / raw)
To: Dave Jiang; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [Critical] The MCE notifier accesses a freed `cxlmd` structure and attempts to lock its mutex if an MCE occurs during device teardown, leading to a Use-After-Free and kernel crash.
--
commit a9bb7894eaa94a0fe2d0d4c6dd69b86cf2d1fe0f
Author: Dave Jiang <dave.jiang@intel.com>
cxl/mce: Serialize the MCE handler against endpoint teardown
This commit modifies the MCE handler to take the cxlmd lock before verifying
the endpoint pointer. It then takes a reference on the endpoint device to
ensure it remains alive during access.
> diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
> index 47566015eb006..e684e411921b9 100644
> --- a/drivers/cxl/core/mce.c
> +++ b/drivers/cxl/core/mce.c
> @@ -7,13 +7,27 @@
> #include <cxlmem.h>
> #include "mce.h"
>
> +static struct device *cxlmd_get_endpoint_dev(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint;
> +
> + if (!cxlmd)
> + return NULL;
> +
> + guard(device)(&cxlmd->dev);
[Severity: Critical]
This is a pre-existing issue, but does accessing cxlmd here lead to a
use-after-free if a Machine Check Exception occurs during device teardown?
During device teardown or driver unbind, devm actions execute in reverse
order. The MCE notifier is registered before the cxl_memdev structure
is allocated, which means cxl_memdev is unregistered and its memory freed
before the MCE notifier is deactivated.
While cxl_memdev_shutdown() clears cxlmd->cxlds, it does not clear
mds->cxlds.cxlmd. This leaves mds->cxlds.cxlmd as a dangling pointer.
If a Machine Check Exception happens during this window, cxl_handle_mce()
will read the freed cxlmd pointer and pass it here. Attempting to lock a
mutex inside freed memory will cause memory corruption or a crash.
> + endpoint = cxlmd->endpoint;
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004007.4186004-1-dave.jiang@intel.com?part=2
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-16 1:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang
2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
2026-06-16 0:54 ` sashiko-bot
2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
2026-06-16 1:03 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox