All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler
@ 2026-06-16  0:40 Dave Jiang
  2026-06-16  0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
  2026-06-16  0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
  0 siblings, 2 replies; 5+ messages in thread
From: Dave Jiang @ 2026-06-16  0:40 UTC (permalink / raw)
  To: linux-cxl; +Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien

Fix the issues reported by Flavien to the Linux security mailing list.

First patch addresses accessing possible invalid pointers for cxlmd and
cxmd->endpoint.

The second patch adddresses the endpoint lifetime that is shorter than
mds and needs appropriate guardrails to access.


Dave Jiang (2):
  cxl/mce: Validate memdev and endpoint before dereference in
    cxl_handle_mce()
  cxl/mce: Serialize the MCE handler against endpoint teardown

 drivers/cxl/core/mce.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)


base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
-- 
2.54.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce()
  2026-06-16  0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang
@ 2026-06-16  0:40 ` Dave Jiang
  2026-06-16  0:54   ` sashiko-bot
  2026-06-16  0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Jiang @ 2026-06-16  0:40 UTC (permalink / raw)
  To: linux-cxl
  Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien,
	stable

cxlmd and endpoint are both used in cxl_handle_mce() without proper
validation, which can lead to NULL pointer dereference or invalid pointer
dereference. The notifier is registered in cxl_memdev_state_create()
when the CXL PCI driver first binds, before the memdev is published and
before it is attached to a CXL topology.

Add checks to cxlmd and endpoint to ensure they are valid before usage.

Reported-by: Flavien Solt <flavien@nus.edu.sg>
Fixes: 516e5bd0b6bf ("cxl: Add mce notifier to emit aliased address for extended linear cache")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/mce.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
index ff8d078c6ca1..47566015eb00 100644
--- a/drivers/cxl/core/mce.c
+++ b/drivers/cxl/core/mce.c
@@ -13,7 +13,7 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
 						    mce_notifier);
 	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
-	struct cxl_port *endpoint = cxlmd->endpoint;
+	struct cxl_port *endpoint;
 	struct mce *mce = data;
 	u64 spa, spa_alias;
 	unsigned long pfn;
@@ -21,7 +21,11 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
 	if (!mce || !mce_usable_address(mce))
 		return NOTIFY_DONE;
 
-	if (!endpoint)
+	if (!cxlmd)
+		return NOTIFY_DONE;
+
+	endpoint = cxlmd->endpoint;
+	if (IS_ERR_OR_NULL(endpoint))
 		return NOTIFY_DONE;
 
 	spa = mce->addr & MCI_ADDR_PHYSADDR;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown
  2026-06-16  0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang
  2026-06-16  0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
@ 2026-06-16  0:40 ` Dave Jiang
  2026-06-16  1:03   ` sashiko-bot
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Jiang @ 2026-06-16  0:40 UTC (permalink / raw)
  To: linux-cxl
  Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien,
	stable

CXL endpoint has a shorter lifetime than CXL memdev state (mds) and
the MCE notifier is part of the mds. The MCE handler needs to take
a reference on the endpoint in order to keep it alive while operating
on it. Take the cxlmd lock to verify the endpoint is still valid and
take a reference on it before accessing it.

Reported-by: Flavien Solt <flavien@nus.edu.sg>
Fixes: 516e5bd0b6bf ("cxl: Add mce notifier to emit aliased address for extended linear cache")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/mce.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
index 47566015eb00..e684e411921b 100644
--- a/drivers/cxl/core/mce.c
+++ b/drivers/cxl/core/mce.c
@@ -7,13 +7,27 @@
 #include <cxlmem.h>
 #include "mce.h"
 
+static struct device *cxlmd_get_endpoint_dev(struct cxl_memdev *cxlmd)
+{
+	struct cxl_port *endpoint;
+
+	if (!cxlmd)
+		return NULL;
+
+	guard(device)(&cxlmd->dev);
+	endpoint = cxlmd->endpoint;
+	if (IS_ERR_OR_NULL(endpoint))
+		return NULL;
+
+	return get_device(&endpoint->dev);
+}
+
 static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
 			  void *data)
 {
 	struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
 						    mce_notifier);
 	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
-	struct cxl_port *endpoint;
 	struct mce *mce = data;
 	u64 spa, spa_alias;
 	unsigned long pfn;
@@ -24,8 +38,13 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
 	if (!cxlmd)
 		return NOTIFY_DONE;
 
-	endpoint = cxlmd->endpoint;
-	if (IS_ERR_OR_NULL(endpoint))
+	/*
+	 * With the cxlmd device lock held, check the cxlmd->endpoint pointer
+	 * and then take a reference of the device in order to keep it alive
+	 * while accessing it.
+	 */
+	struct device *dev __free(put_device) = cxlmd_get_endpoint_dev(cxlmd);
+	if (!dev)
 		return NOTIFY_DONE;
 
 	spa = mce->addr & MCI_ADDR_PHYSADDR;
@@ -34,7 +53,7 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
 	if (!pfn_valid(pfn))
 		return NOTIFY_DONE;
 
-	spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa);
+	spa_alias = cxl_port_get_spa_cache_alias(to_cxl_port(dev), spa);
 	if (spa_alias == ~0ULL)
 		return NOTIFY_DONE;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce()
  2026-06-16  0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
@ 2026-06-16  0:54   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-16  0:54 UTC (permalink / raw)
  To: Dave Jiang; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The validation added for `cxlmd` fails to account for device teardown, where `cxlds->cxlmd` becomes a dangling pointer, leading to an incomplete fix and Use-After-Free.
--

commit d4d366d587ddef82637af37fbf8cd28754d28024
Author: Dave Jiang <dave.jiang@intel.com>

cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce()

cxlmd and endpoint are both used in cxl_handle_mce() without proper
validation, which can lead to NULL pointer dereference or invalid pointer
dereference. The notifier is registered in cxl_memdev_state_create()
when the CXL PCI driver first binds.

> diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
> index ff8d078c6ca1e..47566015eb006 100644
> --- a/drivers/cxl/core/mce.c
> +++ b/drivers/cxl/core/mce.c
[ ... ]
> @@ -21,7 +21,11 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
>  	if (!mce || !mce_usable_address(mce))
>  		return NOTIFY_DONE;
>  
> -	if (!endpoint)
> +	if (!cxlmd)
> +		return NOTIFY_DONE;

[Severity: High]
This isn't a bug introduced by this patch, but does this check adequately
protect against a use-after-free during device teardown?

The MCE notifier is registered via devm_cxl_register_mce_notifier() early in
the PCI probe sequence. Later, cxl_memdev_autoremove() registers the
cxl_memdev_unregister() teardown action via devm_add_action_or_reset().
Because devm actions execute in LIFO order, cxl_memdev_unregister() runs
before the MCE notifier is unregistered.

During cxl_memdev_unregister(), put_device(&cxlmd->dev) is called, which can
free the cxlmd structure. However, neither cxl_memdev_unregister() nor
cxl_memdev_shutdown() clears the cxlds->cxlmd pointer.

This leaves mds->cxlds.cxlmd as a dangling pointer. If an MCE occurs during
this teardown window, cxl_handle_mce() will find a non-NULL cxlmd and pass it
to cxlmd_get_endpoint_dev(), which calls guard(device)(&cxlmd->dev), resulting
in a use-after-free.

> +
> +	endpoint = cxlmd->endpoint;
> +	if (IS_ERR_OR_NULL(endpoint))
>  		return NOTIFY_DONE;
>  
>  	spa = mce->addr & MCI_ADDR_PHYSADDR;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004007.4186004-1-dave.jiang@intel.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown
  2026-06-16  0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
@ 2026-06-16  1:03   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-16  1:03 UTC (permalink / raw)
  To: Dave Jiang; +Cc: linux-cxl

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] The MCE notifier accesses a freed `cxlmd` structure and attempts to lock its mutex if an MCE occurs during device teardown, leading to a Use-After-Free and kernel crash.
--

commit a9bb7894eaa94a0fe2d0d4c6dd69b86cf2d1fe0f
Author: Dave Jiang <dave.jiang@intel.com>

cxl/mce: Serialize the MCE handler against endpoint teardown

This commit modifies the MCE handler to take the cxlmd lock before verifying
the endpoint pointer. It then takes a reference on the endpoint device to
ensure it remains alive during access.

> diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
> index 47566015eb006..e684e411921b9 100644
> --- a/drivers/cxl/core/mce.c
> +++ b/drivers/cxl/core/mce.c
> @@ -7,13 +7,27 @@
>  #include <cxlmem.h>
>  #include "mce.h"
>  
> +static struct device *cxlmd_get_endpoint_dev(struct cxl_memdev *cxlmd)
> +{
> +	struct cxl_port *endpoint;
> +
> +	if (!cxlmd)
> +		return NULL;
> +
> +	guard(device)(&cxlmd->dev);

[Severity: Critical]
This is a pre-existing issue, but does accessing cxlmd here lead to a
use-after-free if a Machine Check Exception occurs during device teardown?

During device teardown or driver unbind, devm actions execute in reverse
order. The MCE notifier is registered before the cxl_memdev structure
is allocated, which means cxl_memdev is unregistered and its memory freed
before the MCE notifier is deactivated.

While cxl_memdev_shutdown() clears cxlmd->cxlds, it does not clear
mds->cxlds.cxlmd. This leaves mds->cxlds.cxlmd as a dangling pointer.

If a Machine Check Exception happens during this window, cxl_handle_mce()
will read the freed cxlmd pointer and pass it here. Attempting to lock a
mutex inside freed memory will cause memory corruption or a crash.

> +	endpoint = cxlmd->endpoint;

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004007.4186004-1-dave.jiang@intel.com?part=2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-16  1:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16  0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang
2026-06-16  0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
2026-06-16  0:54   ` sashiko-bot
2026-06-16  0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
2026-06-16  1:03   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.