* [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler
@ 2026-06-16 0:40 Dave Jiang
2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang
2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang
0 siblings, 2 replies; 5+ messages in thread
From: Dave Jiang @ 2026-06-16 0:40 UTC (permalink / raw)
To: linux-cxl; +Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien
Fix the issues reported by Flavien to the Linux security mailing list.
First patch addresses accessing possible invalid pointers for cxlmd and
cxmd->endpoint.
The second patch adddresses the endpoint lifetime that is shorter than
mds and needs appropriate guardrails to access.
Dave Jiang (2):
cxl/mce: Validate memdev and endpoint before dereference in
cxl_handle_mce()
cxl/mce: Serialize the MCE handler against endpoint teardown
drivers/cxl/core/mce.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
--
2.54.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() 2026-06-16 0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang @ 2026-06-16 0:40 ` Dave Jiang 2026-06-16 0:54 ` sashiko-bot 2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang 1 sibling, 1 reply; 5+ messages in thread From: Dave Jiang @ 2026-06-16 0:40 UTC (permalink / raw) To: linux-cxl Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien, stable cxlmd and endpoint are both used in cxl_handle_mce() without proper validation, which can lead to NULL pointer dereference or invalid pointer dereference. The notifier is registered in cxl_memdev_state_create() when the CXL PCI driver first binds, before the memdev is published and before it is attached to a CXL topology. Add checks to cxlmd and endpoint to ensure they are valid before usage. Reported-by: Flavien Solt <flavien@nus.edu.sg> Fixes: 516e5bd0b6bf ("cxl: Add mce notifier to emit aliased address for extended linear cache") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- drivers/cxl/core/mce.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c index ff8d078c6ca1..47566015eb00 100644 --- a/drivers/cxl/core/mce.c +++ b/drivers/cxl/core/mce.c @@ -13,7 +13,7 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val, struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state, mce_notifier); struct cxl_memdev *cxlmd = mds->cxlds.cxlmd; - struct cxl_port *endpoint = cxlmd->endpoint; + struct cxl_port *endpoint; struct mce *mce = data; u64 spa, spa_alias; unsigned long pfn; @@ -21,7 +21,11 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val, if (!mce || !mce_usable_address(mce)) return NOTIFY_DONE; - if (!endpoint) + if (!cxlmd) + return NOTIFY_DONE; + + endpoint = cxlmd->endpoint; + if (IS_ERR_OR_NULL(endpoint)) return NOTIFY_DONE; spa = mce->addr & MCI_ADDR_PHYSADDR; -- 2.54.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() 2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang @ 2026-06-16 0:54 ` sashiko-bot 0 siblings, 0 replies; 5+ messages in thread From: sashiko-bot @ 2026-06-16 0:54 UTC (permalink / raw) To: Dave Jiang; +Cc: linux-cxl Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: Pre-existing issues: - [High] The validation added for `cxlmd` fails to account for device teardown, where `cxlds->cxlmd` becomes a dangling pointer, leading to an incomplete fix and Use-After-Free. -- commit d4d366d587ddef82637af37fbf8cd28754d28024 Author: Dave Jiang <dave.jiang@intel.com> cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() cxlmd and endpoint are both used in cxl_handle_mce() without proper validation, which can lead to NULL pointer dereference or invalid pointer dereference. The notifier is registered in cxl_memdev_state_create() when the CXL PCI driver first binds. > diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c > index ff8d078c6ca1e..47566015eb006 100644 > --- a/drivers/cxl/core/mce.c > +++ b/drivers/cxl/core/mce.c [ ... ] > @@ -21,7 +21,11 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val, > if (!mce || !mce_usable_address(mce)) > return NOTIFY_DONE; > > - if (!endpoint) > + if (!cxlmd) > + return NOTIFY_DONE; [Severity: High] This isn't a bug introduced by this patch, but does this check adequately protect against a use-after-free during device teardown? The MCE notifier is registered via devm_cxl_register_mce_notifier() early in the PCI probe sequence. Later, cxl_memdev_autoremove() registers the cxl_memdev_unregister() teardown action via devm_add_action_or_reset(). Because devm actions execute in LIFO order, cxl_memdev_unregister() runs before the MCE notifier is unregistered. During cxl_memdev_unregister(), put_device(&cxlmd->dev) is called, which can free the cxlmd structure. However, neither cxl_memdev_unregister() nor cxl_memdev_shutdown() clears the cxlds->cxlmd pointer. This leaves mds->cxlds.cxlmd as a dangling pointer. If an MCE occurs during this teardown window, cxl_handle_mce() will find a non-NULL cxlmd and pass it to cxlmd_get_endpoint_dev(), which calls guard(device)(&cxlmd->dev), resulting in a use-after-free. > + > + endpoint = cxlmd->endpoint; > + if (IS_ERR_OR_NULL(endpoint)) > return NOTIFY_DONE; > > spa = mce->addr & MCI_ADDR_PHYSADDR; -- Sashiko AI review · https://sashiko.dev/#/patchset/20260616004007.4186004-1-dave.jiang@intel.com?part=1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown 2026-06-16 0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang 2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang @ 2026-06-16 0:40 ` Dave Jiang 2026-06-16 1:03 ` sashiko-bot 1 sibling, 1 reply; 5+ messages in thread From: Dave Jiang @ 2026-06-16 0:40 UTC (permalink / raw) To: linux-cxl Cc: djbw, dave, jic23, alison.schofield, vishal.l.verma, flavien, stable CXL endpoint has a shorter lifetime than CXL memdev state (mds) and the MCE notifier is part of the mds. The MCE handler needs to take a reference on the endpoint in order to keep it alive while operating on it. Take the cxlmd lock to verify the endpoint is still valid and take a reference on it before accessing it. Reported-by: Flavien Solt <flavien@nus.edu.sg> Fixes: 516e5bd0b6bf ("cxl: Add mce notifier to emit aliased address for extended linear cache") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- drivers/cxl/core/mce.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c index 47566015eb00..e684e411921b 100644 --- a/drivers/cxl/core/mce.c +++ b/drivers/cxl/core/mce.c @@ -7,13 +7,27 @@ #include <cxlmem.h> #include "mce.h" +static struct device *cxlmd_get_endpoint_dev(struct cxl_memdev *cxlmd) +{ + struct cxl_port *endpoint; + + if (!cxlmd) + return NULL; + + guard(device)(&cxlmd->dev); + endpoint = cxlmd->endpoint; + if (IS_ERR_OR_NULL(endpoint)) + return NULL; + + return get_device(&endpoint->dev); +} + static int cxl_handle_mce(struct notifier_block *nb, unsigned long val, void *data) { struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state, mce_notifier); struct cxl_memdev *cxlmd = mds->cxlds.cxlmd; - struct cxl_port *endpoint; struct mce *mce = data; u64 spa, spa_alias; unsigned long pfn; @@ -24,8 +38,13 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val, if (!cxlmd) return NOTIFY_DONE; - endpoint = cxlmd->endpoint; - if (IS_ERR_OR_NULL(endpoint)) + /* + * With the cxlmd device lock held, check the cxlmd->endpoint pointer + * and then take a reference of the device in order to keep it alive + * while accessing it. + */ + struct device *dev __free(put_device) = cxlmd_get_endpoint_dev(cxlmd); + if (!dev) return NOTIFY_DONE; spa = mce->addr & MCI_ADDR_PHYSADDR; @@ -34,7 +53,7 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val, if (!pfn_valid(pfn)) return NOTIFY_DONE; - spa_alias = cxl_port_get_spa_cache_alias(endpoint, spa); + spa_alias = cxl_port_get_spa_cache_alias(to_cxl_port(dev), spa); if (spa_alias == ~0ULL) return NOTIFY_DONE; -- 2.54.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown 2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang @ 2026-06-16 1:03 ` sashiko-bot 0 siblings, 0 replies; 5+ messages in thread From: sashiko-bot @ 2026-06-16 1:03 UTC (permalink / raw) To: Dave Jiang; +Cc: linux-cxl Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: Pre-existing issues: - [Critical] The MCE notifier accesses a freed `cxlmd` structure and attempts to lock its mutex if an MCE occurs during device teardown, leading to a Use-After-Free and kernel crash. -- commit a9bb7894eaa94a0fe2d0d4c6dd69b86cf2d1fe0f Author: Dave Jiang <dave.jiang@intel.com> cxl/mce: Serialize the MCE handler against endpoint teardown This commit modifies the MCE handler to take the cxlmd lock before verifying the endpoint pointer. It then takes a reference on the endpoint device to ensure it remains alive during access. > diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c > index 47566015eb006..e684e411921b9 100644 > --- a/drivers/cxl/core/mce.c > +++ b/drivers/cxl/core/mce.c > @@ -7,13 +7,27 @@ > #include <cxlmem.h> > #include "mce.h" > > +static struct device *cxlmd_get_endpoint_dev(struct cxl_memdev *cxlmd) > +{ > + struct cxl_port *endpoint; > + > + if (!cxlmd) > + return NULL; > + > + guard(device)(&cxlmd->dev); [Severity: Critical] This is a pre-existing issue, but does accessing cxlmd here lead to a use-after-free if a Machine Check Exception occurs during device teardown? During device teardown or driver unbind, devm actions execute in reverse order. The MCE notifier is registered before the cxl_memdev structure is allocated, which means cxl_memdev is unregistered and its memory freed before the MCE notifier is deactivated. While cxl_memdev_shutdown() clears cxlmd->cxlds, it does not clear mds->cxlds.cxlmd. This leaves mds->cxlds.cxlmd as a dangling pointer. If a Machine Check Exception happens during this window, cxl_handle_mce() will read the freed cxlmd pointer and pass it here. Attempting to lock a mutex inside freed memory will cause memory corruption or a crash. > + endpoint = cxlmd->endpoint; [ ... ] -- Sashiko AI review · https://sashiko.dev/#/patchset/20260616004007.4186004-1-dave.jiang@intel.com?part=2 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-16 1:03 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-16 0:40 [PATCH 0/2] cxl: Fix endpoint access issues with CXL MCE notifier handler Dave Jiang 2026-06-16 0:40 ` [PATCH 1/2] cxl/mce: Validate memdev and endpoint before dereference in cxl_handle_mce() Dave Jiang 2026-06-16 0:54 ` sashiko-bot 2026-06-16 0:40 ` [PATCH 2/2] cxl/mce: Serialize the MCE handler against endpoint teardown Dave Jiang 2026-06-16 1:03 ` sashiko-bot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.