From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Dave Jiang <dave.jiang@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Bjorn Helgaas <bhelgaas@google.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Sasha Levin <sashal@kernel.org>,
dave@stgolabs.net, jonathan.cameron@huawei.com,
alison.schofield@intel.com, vishal.l.verma@intel.com,
ira.weiny@intel.com, rrichter@amd.com, terry.bowman@amd.com,
ming4.li@intel.com, linux-cxl@vger.kernel.org
Subject: [PATCH AUTOSEL 6.8 19/24] cxl: Add post-reset warning if reset results in loss of previously committed HDM decoders
Date: Wed, 5 Jun 2024 07:50:29 -0400 [thread overview]
Message-ID: <20240605115101.2962372-19-sashal@kernel.org> (raw)
In-Reply-To: <20240605115101.2962372-1-sashal@kernel.org>
From: Dave Jiang <dave.jiang@intel.com>
[ Upstream commit 934edcd436dca0447e0d3691a908394ba16d06c3 ]
Secondary Bus Reset (SBR) is equivalent to a device being hot removed and
inserted again. Doing a SBR on a CXL type 3 device is problematic if the
exported device memory is part of system memory that cannot be offlined.
The event is equivalent to violently ripping out that range of memory from
the kernel. While the hardware requires the "Unmask SBR" bit set in the
Port Control Extensions register and the kernel currently does not unmask
it, user can unmask this bit via setpci or similar tool.
The driver does not have a way to detect whether a reset coming from the
PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to
detect is to note if a decoder is marked as enabled in software but the
decoder control register indicates it's not committed.
Add a helper function to find discrepancy between the decoder software
state versus the hardware register state.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240502165851.1948523-6-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/cxl/core/pci.c | 29 +++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 2 ++
drivers/cxl/pci.c | 22 ++++++++++++++++++++++
3 files changed, 53 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index e9e6c81ce034a..e71cfd490d829 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1034,3 +1034,32 @@ long cxl_pci_get_latency(struct pci_dev *pdev)
return cxl_flit_size(pdev) * MEGA / bw;
}
+
+static int __cxl_endpoint_decoder_reset_detected(struct device *dev, void *data)
+{
+ struct cxl_port *port = data;
+ struct cxl_decoder *cxld;
+ struct cxl_hdm *cxlhdm;
+ void __iomem *hdm;
+ u32 ctrl;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxld = to_cxl_decoder(dev);
+ if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
+ return 0;
+
+ cxlhdm = dev_get_drvdata(&port->dev);
+ hdm = cxlhdm->regs.hdm_decoder;
+ ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id));
+
+ return !FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl);
+}
+
+bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port)
+{
+ return device_for_each_child(&port->dev, port,
+ __cxl_endpoint_decoder_reset_detected);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, CXL);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index de477eb7f5d54..2624c775c5c93 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -888,6 +888,8 @@ void cxl_coordinates_combine(struct access_coordinate *out,
struct access_coordinate *c1,
struct access_coordinate *c2);
+bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
+
/*
* Unit test builds overrides this to __weak, find the 'strong' version
* of these symbols in tools/testing/cxl/.
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 2ff361e756d66..659f9d46b154c 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -957,11 +957,33 @@ static void cxl_error_resume(struct pci_dev *pdev)
dev->driver ? "successful" : "failed");
}
+static void cxl_reset_done(struct pci_dev *pdev)
+{
+ struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct cxl_memdev *cxlmd = cxlds->cxlmd;
+ struct device *dev = &pdev->dev;
+
+ /*
+ * FLR does not expect to touch the HDM decoders and related
+ * registers. SBR, however, will wipe all device configurations.
+ * Issue a warning if there was an active decoder before the reset
+ * that no longer exists.
+ */
+ guard(device)(&cxlmd->dev);
+ if (cxlmd->endpoint &&
+ cxl_endpoint_decoder_reset_detected(cxlmd->endpoint)) {
+ dev_crit(dev, "SBR happened without memory regions removal.\n");
+ dev_crit(dev, "System may be unstable if regions hosted system memory.\n");
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+ }
+}
+
static const struct pci_error_handlers cxl_error_handlers = {
.error_detected = cxl_error_detected,
.slot_reset = cxl_slot_reset,
.resume = cxl_error_resume,
.cor_error_detected = cxl_cor_error_detected,
+ .reset_done = cxl_reset_done,
};
static struct pci_driver cxl_pci_driver = {
--
2.43.0
next prev parent reply other threads:[~2024-06-05 11:51 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 11:50 [PATCH AUTOSEL 6.8 01/24] usb: gadget: uvc: configfs: ensure guid to be valid before set Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 02/24] f2fs: remove clear SB_INLINECRYPT flag in default_options Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 03/24] usb: typec: ucsi_glink: rework quirks implementation Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 04/24] usb: misc: uss720: check for incompatible versions of the Belkin F5U002 Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 05/24] Avoid hw_desc array overrun in dw-axi-dmac Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 06/24] usb: dwc3: pci: Don't set "linux,phy_charger_detect" property on Lenovo Yoga Tab2 1380 Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 07/24] usb: typec: ucsi_glink: drop special handling for CCI_BUSY Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 08/24] udf: udftime: prevent overflow in udf_disk_stamp_to_time() Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 09/24] PCI/PM: Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 10/24] f2fs: don't set RO when shutting down f2fs Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 11/24] MIPS: Octeon: Add PCIe link status check Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 12/24] serial: imx: Introduce timeout when waiting on transmitter empty Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 13/24] serial: exar: adding missing CTI and Exar PCI ids Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 14/24] usb: gadget: function: Remove usage of the deprecated ida_simple_xx() API Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 15/24] usb: dwc3: core: Access XHCI address space temporarily to read port info Sasha Levin
2024-06-05 11:54 ` Johan Hovold
2024-07-22 12:56 ` Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 16/24] f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode() Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 17/24] tty: add the option to have a tty reject a new ldisc Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 18/24] greybus: Fix use-after-free bug in gb_interface_release due to race condition Sasha Levin
2024-06-05 11:50 ` Sasha Levin [this message]
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 20/24] vfio/pci: Collect hot-reset devices to local buffer Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 21/24] cpufreq: amd-pstate: fix memory leak on CPU EPP exit Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 22/24] ACPI: EC: Install address space handler at the namespace root Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 23/24] PCI: Do not wait for disconnected devices when resuming Sasha Levin
2024-06-05 11:50 ` [PATCH AUTOSEL 6.8 24/24] OPP: Fix required_opp_tables for multiple genpds using same table Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240605115101.2962372-19-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming4.li@intel.com \
--cc=rrichter@amd.com \
--cc=stable@vger.kernel.org \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox