From: Dan Williams <dan.j.williams@intel.com>
To: Robert Richter <rrichter@amd.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Ben Widawsky <bwidawsk@kernel.org>,
Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Terry Bowman <terry.bowman@amd.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: RE: [PATCH v12 01/20] cxl/port: Fix release of RCD endpoints
Date: Thu, 26 Oct 2023 20:46:33 -0700 [thread overview]
Message-ID: <653b3299c1a33_244c8f29449@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20231018171713.1883517-2-rrichter@amd.com>
Robert Richter wrote:
> Binding and unbindind RCD endpoints (e.g. mem0 device) caused the
> corresponding endpoint not being released. Reason for that is the
> wrong port discovered for RCD endpoints. See cxl_mem_probe() for
> proper endpoint parent detection. Fix delete_endpoint() respectively.
>
> Fixes: 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration")
> Signed-off-by: Robert Richter <rrichter@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
This patch causes cxl-topology.sh to crash with use-after-free
signatures. I notice that this path should just be using
endpoint->dev.parent and is missing reference counting on the parent.
I will replace this path with the following:
-- >8 --
Subject: cxl/port: Fix delete_endpoint() vs parent unregistration race
From: Dan Williams <dan.j.williams@intel.com>
The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of
ports (struct cxl_port objects) between an endpoint and the root of a
CXL topology. Each port including the endpoint port is attached to the
cxl_port driver.
Given that setup it follows that when either any port in that lineage
goes through a cxl_port ->remove() event, or the memdev goes through a
cxl_mem ->remove() event. The hierarchy below the removed port, or the
entire hierarchy if the memdev is removed needs to come down.
The delete_endpoint() callback is careful to check whether it is being
called to tear down the hierarchy, or if it is only being called to
teardown the memdev because an ancestor port is going through
->remove().
That care needs to take the device_lock() of the endpoint's parent.
Which requires 2 bugs to be fixed:
1/ A reference on the parent is needed to prevent use-after-free
scenarios like this signature:
BUG: spinlock bad magic on CPU#0, kworker/u56:0/11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
Workqueue: cxl_port detach_memdev [cxl_core]
RIP: 0010:spin_bug+0x65/0xa0
Call Trace:
do_raw_spin_lock+0x69/0xa0
__mutex_lock+0x695/0xb80
delete_endpoint+0xad/0x150 [cxl_core]
devres_release_all+0xb8/0x110
device_unbind_cleanup+0xe/0x70
device_release_driver_internal+0x1d2/0x210
detach_memdev+0x15/0x20 [cxl_core]
process_one_work+0x1e3/0x4c0
worker_thread+0x1dd/0x3d0
2/ In the case of RCH topologies, the parent device that needs to be
locked is not always @port->dev as returned by cxl_mem_find_port(), use
endpoint->dev.parent instead.
Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver")
Cc: <stable@vger.kernel.org>
Reported-by: Robert Richter <rrichter@amd.com>
Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/port.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 5ba606c6e03f..6bbaca5ac60d 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1227,13 +1227,7 @@ static void delete_endpoint(void *data)
{
struct cxl_memdev *cxlmd = data;
struct cxl_port *endpoint = cxlmd->endpoint;
- struct cxl_port *parent_port;
- struct device *parent;
-
- parent_port = cxl_mem_find_port(cxlmd, NULL);
- if (!parent_port)
- goto out;
- parent = &parent_port->dev;
+ struct device *parent = endpoint->dev.parent;
device_lock(parent);
if (parent->driver && !endpoint->dead) {
@@ -1243,15 +1237,16 @@ static void delete_endpoint(void *data)
}
cxlmd->endpoint = NULL;
device_unlock(parent);
- put_device(parent);
-out:
put_device(&endpoint->dev);
+ put_device(parent);
}
int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
{
+ struct device *parent = endpoint->dev.parent;
struct device *dev = &cxlmd->dev;
+ get_device(parent);
get_device(&endpoint->dev);
cxlmd->endpoint = endpoint;
cxlmd->depth = endpoint->depth;
next prev parent reply other threads:[~2023-10-27 3:46 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-18 17:16 [PATCH v12 00/20] cxl/pci: Add support for RCH RAS error handling Robert Richter
2023-10-18 17:16 ` [PATCH v12 01/20] cxl/port: Fix release of RCD endpoints Robert Richter
2023-10-27 3:46 ` Dan Williams [this message]
2023-10-27 22:55 ` Robert Richter
2023-10-28 0:32 ` Dan Williams
2023-10-28 1:39 ` Dan Williams
2023-10-29 16:17 ` Robert Richter
2023-10-18 17:16 ` [PATCH v12 02/20] cxl/core/regs: Rename @dev to @host in struct cxl_register_map Robert Richter
2023-10-27 20:04 ` Dan Williams
2023-10-18 17:16 ` [PATCH v12 03/20] cxl/port: Fix @host confusion in cxl_dport_setup_regs() Robert Richter
2023-10-18 17:16 ` [PATCH v12 04/20] cxl/port: Rename @comp_map to @reg_map in struct cxl_register_map Robert Richter
2023-10-18 17:16 ` [PATCH v12 05/20] cxl/port: Pre-initialize component register mappings Robert Richter
2023-10-18 17:16 ` [PATCH v12 06/20] cxl/pci: Store the endpoint's Component Register mappings in struct cxl_dev_state Robert Richter
2023-10-18 17:17 ` [PATCH v12 07/20] cxl/hdm: Use stored Component Register mappings to map HDM decoder capability Robert Richter
2023-10-27 21:51 ` Dan Williams
2023-10-18 17:17 ` [PATCH v12 08/20] cxl/pci: Remove Component Register base address from struct cxl_dev_state Robert Richter
2023-10-27 21:54 ` Dan Williams
2023-10-18 17:17 ` [PATCH v12 09/20] cxl/port: Remove Component Register base address from struct cxl_port Robert Richter
2023-10-18 17:17 ` [PATCH v12 10/20] cxl/pci: Introduce config option PCIEAER_CXL Robert Richter
2023-10-19 14:30 ` Jonathan Cameron
2023-10-20 22:36 ` Robert Richter
2023-10-27 22:02 ` Dan Williams
2023-10-18 17:17 ` [PATCH v12 11/20] cxl/pci: Add RCH downstream port AER register discovery Robert Richter
2023-10-27 22:12 ` Dan Williams
2023-10-18 17:17 ` [PATCH v12 12/20] PCI/AER: Refactor cper_print_aer() for use by CXL driver module Robert Richter
2023-10-18 17:17 ` [PATCH v12 13/20] cxl/pci: Update CXL error logging to use RAS register address Robert Richter
2023-10-18 17:17 ` [PATCH v12 14/20] cxl/pci: Map RCH downstream AER registers for logging protocol errors Robert Richter
2023-10-27 22:16 ` Dan Williams
2023-10-28 3:23 ` Dan Williams
2023-10-18 17:17 ` [PATCH v12 15/20] cxl/pci: Add RCH downstream port error logging Robert Richter
2023-10-18 17:17 ` [PATCH v12 16/20] cxl/pci: Disable root port interrupts in RCH mode Robert Richter
2023-10-18 17:17 ` [PATCH v12 17/20] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler Robert Richter
2023-10-18 17:17 ` [PATCH v12 18/20] PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling Robert Richter
2023-10-18 17:17 ` [PATCH v12 19/20] cxl/core/regs: Rename phys_addr in cxl_map_component_regs() Robert Richter
2023-10-18 17:17 ` [PATCH v12 20/20] cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for devm Robert Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=653b3299c1a33_244c8f29449@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=bwidawsk@kernel.org \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rrichter@amd.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox