* [PATCH 0/3 v2] PCI: Add Secondary Bus Reset (SBR) support for CXL
@ 2024-03-25 23:58 Dave Jiang
2024-03-25 23:58 ` [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset Dave Jiang
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Dave Jiang @ 2024-03-25 23:58 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield,
Jonathan.Cameron, dave, bhelgaas, lukas
Hi Bjorn,
Please consider this series for kernel 6.10. The series attempt to
add secondary bus reset (SBR) support to CXL. By default, SBR for CXL is
masked. Per CXL specification r3.1 8.1.5.2, the Port Control Extensions
register bit 0 (Unmask SBR) in the host bridge controls the masking of CXL SBR.
"When 0, SBR bit in Bridge Control register of this Port has no effect. When 1,
the Port shall generate hot reset when SBR in Bridge Control gets set to 1.
Default value of this bit is 0. When the Port is operating in PCIe mode or RCD
mode, this field has no effect on SBR functionality and Port shall follow PCIe
Base Specification."
v2:
- Use pci_upstream_bridge() instead of dev->bus->self. (Lukas)
- Rename is_cxl_device() to pci_is_cxl(). (Lukas)
- Return -ENOTTY on error. (Lukas)
Patch 1:
Add check to PCI bus_reset path for CXL device and return with error if "Unmask
SBR" bit is set to 0. This allows user to realize that SBR is masked for this
CXL device. However, if the user sets the "Unmask SBR" bit via a tool such as
setpci, then the bus_reset will proceed.
Patch2:
Add a new PCI reset method "cxl_bus_force" in order to allow the user an
intetional way to perform SBR. The code will set the "Unmask SBR" bit to
1 and proceed with bus_reset. The original value of the bit will be restored
after the reset operation.
Patch3:
CXL driver change that provides a ->reset_done() callback. It compares the
hardware decoder settings with the existing software configuration and emit
warning if they differ. The difference indicates that decoders were programmed
before the reset and are now cleared after reset. There may be onlined system
memory backed by CXL memory device that are now violently ripped away from
kernel mapping.
Patch series stemmed from this [1] patch. With comments [2] from Bjorn.
[1]: https://lore.kernel.org/linux-cxl/20240215232307.2793530-1-dave.jiang@intel.com/
[2]: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-25 23:58 [PATCH 0/3 v2] PCI: Add Secondary Bus Reset (SBR) support for CXL Dave Jiang @ 2024-03-25 23:58 ` Dave Jiang 2024-03-27 21:26 ` Bjorn Helgaas 2024-03-28 1:43 ` Dan Williams 2024-03-25 23:58 ` [PATCH v2 2/3] PCI: Create new reset method to force SBR for CXL Dave Jiang 2024-03-25 23:58 ` [PATCH v2 3/3] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) Dave Jiang 2 siblings, 2 replies; 18+ messages in thread From: Dave Jiang @ 2024-03-25 23:58 UTC (permalink / raw) To: linux-cxl, linux-pci Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas Per CXL spec r3.1 8.1.5.2, secondary bus reset is masked unless the "Unmask SBR" bit is set. Add a check to the PCI secondary bus reset path to fail the CXL SBR request if the "Unmask SBR" bit is clear in the CXL Port Control Extensions register by returning -ENOTTY. With the current behavior, the bus_reset would appear to have executed successfully, however the operation is actually masked if the "Unmask SBR" bit is set with the default value. The intention is to inform the user that SBR for the CXL device is masked and will not go through. The expectation is that if a user overrides the "Unmask SBR" via a user tool such as setpci then they can trigger a bus reset successfully. Link: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/ Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- v2: - Rename is_cxl_device() to pci_is_cxl(). (Lukas) - Inverse xmas tree var declaration for is_cxl_port_sbr_masked(). (Lukas) - Return -ENOTTY on error of reset method. (Lukas) - Use pci_upstream_bridge() instead of dev->bus->self. (Lukas) - Additional explanation in commit log on behavior. (Lukas) --- drivers/cxl/cxlpci.h | 2 -- drivers/pci/pci.c | 45 +++++++++++++++++++++++++++++++++++ include/uapi/linux/pci_regs.h | 7 ++++++ 3 files changed, 52 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h index 93992a1c8eec..55be4dccbed0 100644 --- a/drivers/cxl/cxlpci.h +++ b/drivers/cxl/cxlpci.h @@ -13,10 +13,8 @@ * "DVSEC" redundancies removed. When obvious, abbreviations may be used. */ #define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20) -#define PCI_DVSEC_VENDOR_ID_CXL 0x1E98 /* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */ -#define CXL_DVSEC_PCIE_DEVICE 0 #define CXL_DVSEC_CAP_OFFSET 0xA #define CXL_DVSEC_MEM_CAPABLE BIT(2) #define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e5f243dd4288..259e5d6538bb 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4927,10 +4927,55 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) return pci_reset_hotplug_slot(dev->slot->hotplug, probe); } +static bool pci_is_cxl(struct pci_dev *dev) +{ + return pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, + CXL_DVSEC_PCIE_DEVICE); +} + +static bool is_cxl_port_sbr_masked(struct pci_dev *dev) +{ + int dvsec; + u16 reg; + int rc; + + /* + * No DVSEC found, must not be CXL port. + */ + dvsec = pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, + CXL_DVSEC_PCIE_PORT); + if (!dvsec) + return false; + + rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_PORT_CONTROL, ®); + if (rc) + return true; + + /* + * CXL spec r3.1 8.1.5.2 + * When 0, SBR bit in Bridge Control register of this Port has no effect. + * When 1, the Port shall generate hot reset when SBR bit in Bridge + * Control gets set to 1. + */ + if (reg & CXL_DVSEC_PORT_CONTROL_UNMASK_SBR) + return false; + + return true; +} + static int pci_reset_bus_function(struct pci_dev *dev, bool probe) { + struct pci_dev *bridge = pci_upstream_bridge(dev); int rc; + /* If it's a CXL port and the SBR control is masked, fail the SBR */ + if (pci_is_cxl(dev) && bridge && is_cxl_port_sbr_masked(bridge)) { + if (probe) + return 0; + + return -ENOTTY; + } + rc = pci_dev_reset_slot_function(dev, probe); if (rc != -ENOTTY) return rc; diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index a39193213ff2..5f2c66987299 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -1148,4 +1148,11 @@ #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL 0x00ff0000 #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX 0xff000000 +/* Compute Express Link (CXL) */ +#define PCI_DVSEC_VENDOR_ID_CXL 0x1e98 +#define CXL_DVSEC_PCIE_DEVICE 0 +#define CXL_DVSEC_PCIE_PORT 3 +#define CXL_DVSEC_PORT_CONTROL 0x0c +#define CXL_DVSEC_PORT_CONTROL_UNMASK_SBR 0x00000001 + #endif /* LINUX_PCI_REGS_H */ -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-25 23:58 ` [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset Dave Jiang @ 2024-03-27 21:26 ` Bjorn Helgaas 2024-03-27 23:57 ` Dave Jiang 2024-03-28 1:43 ` Dan Williams 1 sibling, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2024-03-27 21:26 UTC (permalink / raw) To: Dave Jiang Cc: linux-cxl, linux-pci, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas On Mon, Mar 25, 2024 at 04:58:01PM -0700, Dave Jiang wrote: > Per CXL spec r3.1 8.1.5.2, secondary bus reset is masked unless the > "Unmask SBR" bit is set. Add a check to the PCI secondary bus reset > path to fail the CXL SBR request if the "Unmask SBR" bit is clear in > the CXL Port Control Extensions register by returning -ENOTTY. s/secondary bus reset/Secondary Bus Reset (SBR)/ to show that this is defined by PCIe spec and introduce the initialism. > With the current behavior, the bus_reset would appear to have executed > successfully, however the operation is actually masked if the "Unmask > SBR" bit is set with the default value. The intention is to inform the > user that SBR for the CXL device is masked and will not go through. The default value of Unmask SBR isn't really relevant here. > The expectation is that if a user overrides the "Unmask SBR" via a > user tool such as setpci then they can trigger a bus reset successfully. ... if a user *sets* Unmask SBR ... to be more specific about what value is required. > Link: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/ > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > v2: > - Rename is_cxl_device() to pci_is_cxl(). (Lukas) > - Inverse xmas tree var declaration for is_cxl_port_sbr_masked(). (Lukas) > - Return -ENOTTY on error of reset method. (Lukas) > - Use pci_upstream_bridge() instead of dev->bus->self. (Lukas) > - Additional explanation in commit log on behavior. (Lukas) > --- > drivers/cxl/cxlpci.h | 2 -- > drivers/pci/pci.c | 45 +++++++++++++++++++++++++++++++++++ > include/uapi/linux/pci_regs.h | 7 ++++++ > 3 files changed, 52 insertions(+), 2 deletions(-) > > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h > index 93992a1c8eec..55be4dccbed0 100644 > --- a/drivers/cxl/cxlpci.h > +++ b/drivers/cxl/cxlpci.h > @@ -13,10 +13,8 @@ > * "DVSEC" redundancies removed. When obvious, abbreviations may be used. > */ > #define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20) > -#define PCI_DVSEC_VENDOR_ID_CXL 0x1E98 > > /* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */ > -#define CXL_DVSEC_PCIE_DEVICE 0 > #define CXL_DVSEC_CAP_OFFSET 0xA > #define CXL_DVSEC_MEM_CAPABLE BIT(2) > #define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4) > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e5f243dd4288..259e5d6538bb 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4927,10 +4927,55 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) > return pci_reset_hotplug_slot(dev->slot->hotplug, probe); > } > > +static bool pci_is_cxl(struct pci_dev *dev) > +{ > + return pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, > + CXL_DVSEC_PCIE_DEVICE); > +} > + > +static bool is_cxl_port_sbr_masked(struct pci_dev *dev) s/is_cxl_port_sbr_masked/cxl_sbr_masked/ or similar > +{ > + int dvsec; u16 > + u16 reg; > + int rc; > + > + /* > + * No DVSEC found, must not be CXL port. > + */ > + dvsec = pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, > + CXL_DVSEC_PCIE_PORT); > + if (!dvsec) > + return false; > + > + rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_PORT_CONTROL, ®); > + if (rc) > + return true; > + > + /* > + * CXL spec r3.1 8.1.5.2 > + * When 0, SBR bit in Bridge Control register of this Port has no effect. > + * When 1, the Port shall generate hot reset when SBR bit in Bridge > + * Control gets set to 1. > + */ > + if (reg & CXL_DVSEC_PORT_CONTROL_UNMASK_SBR) > + return false; > + > + return true; > +} > + > static int pci_reset_bus_function(struct pci_dev *dev, bool probe) > { > + struct pci_dev *bridge = pci_upstream_bridge(dev); > int rc; > > + /* If it's a CXL port and the SBR control is masked, fail the SBR */ > + if (pci_is_cxl(dev) && bridge && is_cxl_port_sbr_masked(bridge)) { This sounds like solely a property of the bridge, so why do we care what "dev" is? I assume SBR is asserted in the standard PCIe way, and the only question is whether the bridge itself masks it. Would this be enough? if (bridge && is_cxl_port_sbr_masked(bridge)) > + if (probe) > + return 0; > + > + return -ENOTTY; > + } > + > rc = pci_dev_reset_slot_function(dev, probe); > if (rc != -ENOTTY) > return rc; > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index a39193213ff2..5f2c66987299 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -1148,4 +1148,11 @@ > #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL 0x00ff0000 > #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX 0xff000000 > > +/* Compute Express Link (CXL) */ > +#define PCI_DVSEC_VENDOR_ID_CXL 0x1e98 I see that you're just moving this #define from drivers/cxl/cxlpci.h, but I'm scratching my head a bit. As used here, this is to match the DVSEC Vendor ID (PCIe r6.0, sec 7.9.6.2). IIUC, this should be just a PCI SIG-defined "Vendor ID", e.g., "PCI_VENDOR_ID_CXL", that doesn't need the "DVSEC" qualifier in the name, and would normally be defined in include/linux/pci_ids.h. But I don't see CXL in pci_ids.h, and I don't see either "CXL" or the value "1e98" in the PCI SIG list at https://pcisig.com/membership/member-companies. > +#define CXL_DVSEC_PCIE_DEVICE 0 I think this is the "DVSEC ID" (PCIe r6.0, sec 7.9.6.3), right? And the "0" value comes from CXL r3.1, sec 8.1.3? Sec 8.1.3 says "Software may use the presence of this DVSEC to differentiate between a CXL device and a PCIe device. As such, a standard PCIe device must not expose this DVSEC." I think the "PCIE" in the name here may end up being confusing since the presence of this DVSEC means the device is a CXL device, *not* a standard PCIe device. > +#define CXL_DVSEC_PCIE_PORT 3 And "3" comes from CXL r3.1, sec 8.1.5? Same here; I'm not sure "PCIE" should be in the name, or maybe it should be at a different place, e.g., "PCI_DVSEC_CXL_PORT" or something, since this DVSEC controls CXL-specific things. I kind of think a "PCI_DVSEC_" prefix might be appropriate since PCI is where the DVSEC concept is defined, and then the more specific details can follow > +#define CXL_DVSEC_PORT_CONTROL 0x0c > +#define CXL_DVSEC_PORT_CONTROL_UNMASK_SBR 0x00000001 s/CONTROL/CTL/ (or CTRL, though CTL is more common in this file) Bjorn ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-27 21:26 ` Bjorn Helgaas @ 2024-03-27 23:57 ` Dave Jiang 2024-03-28 17:38 ` Bjorn Helgaas 0 siblings, 1 reply; 18+ messages in thread From: Dave Jiang @ 2024-03-27 23:57 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-cxl, linux-pci, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas On 3/27/24 2:26 PM, Bjorn Helgaas wrote: > On Mon, Mar 25, 2024 at 04:58:01PM -0700, Dave Jiang wrote: >> Per CXL spec r3.1 8.1.5.2, secondary bus reset is masked unless the >> "Unmask SBR" bit is set. Add a check to the PCI secondary bus reset >> path to fail the CXL SBR request if the "Unmask SBR" bit is clear in >> the CXL Port Control Extensions register by returning -ENOTTY. > > s/secondary bus reset/Secondary Bus Reset (SBR)/ > to show that this is defined by PCIe spec and introduce the > initialism. ok will fix > >> With the current behavior, the bus_reset would appear to have executed >> successfully, however the operation is actually masked if the "Unmask >> SBR" bit is set with the default value. The intention is to inform the >> user that SBR for the CXL device is masked and will not go through. > > The default value of Unmask SBR isn't really relevant here. Changing to: When the "Unmask SBR" bit is set to 0 (default), the bus_reset would appear to have executed successfully. However the operation is actually masked. The intention is to inform the user that SBR for the CXL device is masked and will not go through. > >> The expectation is that if a user overrides the "Unmask SBR" via a >> user tool such as setpci then they can trigger a bus reset successfully. > > ... if a user *sets* Unmask SBR ... > to be more specific about what value is required. > If a user overrides the "Unmask SBR" via a user tool such as setpci and sets the value to 1, then the bus reset will execute successfully. >> Link: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/ >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> --- >> v2: >> - Rename is_cxl_device() to pci_is_cxl(). (Lukas) >> - Inverse xmas tree var declaration for is_cxl_port_sbr_masked(). (Lukas) >> - Return -ENOTTY on error of reset method. (Lukas) >> - Use pci_upstream_bridge() instead of dev->bus->self. (Lukas) >> - Additional explanation in commit log on behavior. (Lukas) >> --- >> drivers/cxl/cxlpci.h | 2 -- >> drivers/pci/pci.c | 45 +++++++++++++++++++++++++++++++++++ >> include/uapi/linux/pci_regs.h | 7 ++++++ >> 3 files changed, 52 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h >> index 93992a1c8eec..55be4dccbed0 100644 >> --- a/drivers/cxl/cxlpci.h >> +++ b/drivers/cxl/cxlpci.h >> @@ -13,10 +13,8 @@ >> * "DVSEC" redundancies removed. When obvious, abbreviations may be used. >> */ >> #define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20) >> -#define PCI_DVSEC_VENDOR_ID_CXL 0x1E98 >> >> /* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */ >> -#define CXL_DVSEC_PCIE_DEVICE 0 >> #define CXL_DVSEC_CAP_OFFSET 0xA >> #define CXL_DVSEC_MEM_CAPABLE BIT(2) >> #define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4) >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index e5f243dd4288..259e5d6538bb 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -4927,10 +4927,55 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) >> return pci_reset_hotplug_slot(dev->slot->hotplug, probe); >> } >> >> +static bool pci_is_cxl(struct pci_dev *dev) >> +{ >> + return pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, >> + CXL_DVSEC_PCIE_DEVICE); >> +} >> + >> +static bool is_cxl_port_sbr_masked(struct pci_dev *dev) > > s/is_cxl_port_sbr_masked/cxl_sbr_masked/ or similar will update > >> +{ >> + int dvsec; > > u16 > ok >> + u16 reg; >> + int rc; >> + >> + /* >> + * No DVSEC found, must not be CXL port. >> + */ >> + dvsec = pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, >> + CXL_DVSEC_PCIE_PORT); >> + if (!dvsec) >> + return false; >> + >> + rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_PORT_CONTROL, ®); >> + if (rc) >> + return true; >> + >> + /* >> + * CXL spec r3.1 8.1.5.2 >> + * When 0, SBR bit in Bridge Control register of this Port has no effect. >> + * When 1, the Port shall generate hot reset when SBR bit in Bridge >> + * Control gets set to 1. >> + */ >> + if (reg & CXL_DVSEC_PORT_CONTROL_UNMASK_SBR) >> + return false; >> + >> + return true; >> +} >> + >> static int pci_reset_bus_function(struct pci_dev *dev, bool probe) >> { >> + struct pci_dev *bridge = pci_upstream_bridge(dev); >> int rc; >> >> + /* If it's a CXL port and the SBR control is masked, fail the SBR */ >> + if (pci_is_cxl(dev) && bridge && is_cxl_port_sbr_masked(bridge)) { > > This sounds like solely a property of the bridge, so why do we care > what "dev" is? I assume SBR is asserted in the standard PCIe way, and > the only question is whether the bridge itself masks it. Would this > be enough? > > if (bridge && is_cxl_port_sbr_masked(bridge)) Yes that should work. I'll drop pci_is_cxl() and related bits > >> + if (probe) >> + return 0; >> + >> + return -ENOTTY; >> + } >> + >> rc = pci_dev_reset_slot_function(dev, probe); >> if (rc != -ENOTTY) >> return rc; >> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h >> index a39193213ff2..5f2c66987299 100644 >> --- a/include/uapi/linux/pci_regs.h >> +++ b/include/uapi/linux/pci_regs.h >> @@ -1148,4 +1148,11 @@ >> #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL 0x00ff0000 >> #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX 0xff000000 >> >> +/* Compute Express Link (CXL) */ >> +#define PCI_DVSEC_VENDOR_ID_CXL 0x1e98 > > I see that you're just moving this #define from drivers/cxl/cxlpci.h, > but I'm scratching my head a bit. As used here, this is to match the > DVSEC Vendor ID (PCIe r6.0, sec 7.9.6.2). > > IIUC, this should be just a PCI SIG-defined "Vendor ID", e.g., > "PCI_VENDOR_ID_CXL", that doesn't need the "DVSEC" qualifier in the > name, and would normally be defined in include/linux/pci_ids.h. > > But I don't see CXL in pci_ids.h, and I don't see either "CXL" or the > value "1e98" in the PCI SIG list at > https://pcisig.com/membership/member-companies. > I'll create a new patch and move to include/linux/pci_ids.h first for this define and change to PCI_VENDOR_ID_CXL. The value is defined in CXL spec r3.1 sec 8.1.1. diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index a0c75e467df3..7dfbf6d96b3d 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2607,6 +2607,8 @@ #define PCI_VENDOR_ID_ALIBABA 0x1ded +#define PCI_VENDOR_ID_CXL 0x1e98 + #define PCI_VENDOR_ID_TEHUTI 0x1fc9 #define PCI_DEVICE_ID_TEHUTI_3009 0x3009 #define PCI_DEVICE_ID_TEHUTI_3010 0x3010 >> +#define CXL_DVSEC_PCIE_DEVICE 0 > > I think this is the "DVSEC ID" (PCIe r6.0, sec 7.9.6.3), right? And > the "0" value comes from CXL r3.1, sec 8.1.3? I'll leave this define alone for now since it's not needed anymore with dropping the CXL device check. > > Sec 8.1.3 says "Software may use the presence of this DVSEC to > differentiate between a CXL device and a PCIe device. As such, a > standard PCIe device must not expose this DVSEC." > > I think the "PCIE" in the name here may end up being confusing since > the presence of this DVSEC means the device is a CXL device, *not* a > standard PCIe device. > >> +#define CXL_DVSEC_PCIE_PORT 3 > > And "3" comes from CXL r3.1, sec 8.1.5? > > Same here; I'm not sure "PCIE" should be in the name, or maybe it > should be at a different place, e.g., "PCI_DVSEC_CXL_PORT" or > something, since this DVSEC controls CXL-specific things. > > I kind of think a "PCI_DVSEC_" prefix might be appropriate since > PCI is where the DVSEC concept is defined, and then the more specific > details can follow Will rename to PCI_DVSEC_CXL_PORT* > >> +#define CXL_DVSEC_PORT_CONTROL 0x0c >> +#define CXL_DVSEC_PORT_CONTROL_UNMASK_SBR 0x00000001 > > s/CONTROL/CTL/ (or CTRL, though CTL is more common in this file) ok > > Bjorn > ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-27 23:57 ` Dave Jiang @ 2024-03-28 17:38 ` Bjorn Helgaas 2024-03-28 19:03 ` Dan Williams 0 siblings, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2024-03-28 17:38 UTC (permalink / raw) To: Dave Jiang Cc: linux-cxl, linux-pci, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas On Wed, Mar 27, 2024 at 04:57:40PM -0700, Dave Jiang wrote: > On 3/27/24 2:26 PM, Bjorn Helgaas wrote: > > On Mon, Mar 25, 2024 at 04:58:01PM -0700, Dave Jiang wrote: > >> With the current behavior, the bus_reset would appear to have executed > >> successfully, however the operation is actually masked if the "Unmask > >> SBR" bit is set with the default value. The intention is to inform the > >> user that SBR for the CXL device is masked and will not go through. > > > > The default value of Unmask SBR isn't really relevant here. > > Changing to: > When the "Unmask SBR" bit is set to 0 (default), the bus_reset would > appear to have executed successfully. However the operation is actually > masked. The intention is to inform the user that SBR for the CXL device > is masked and will not go through. > > > > >> The expectation is that if a user overrides the "Unmask SBR" via a > >> user tool such as setpci then they can trigger a bus reset successfully. > > > > ... if a user *sets* Unmask SBR ... > > to be more specific about what value is required. > > > > If a user overrides the "Unmask SBR" via a user tool such as setpci and > sets the value to 1, then the bus reset will execute successfully. I'm not super enamored with the "if a user overrides" language because a driver may change that bit in the future, and then the suggestion of a "user" will be misleading. It doesn't matter *how* it gets set to 1; it only matters that SBR doesn't work when "Unmask SBR" is 0 and it does work when "Unmask SBR" is 1. > >> +/* Compute Express Link (CXL) */ > >> +#define PCI_DVSEC_VENDOR_ID_CXL 0x1e98 > > > > I see that you're just moving this #define from drivers/cxl/cxlpci.h, > > but I'm scratching my head a bit. As used here, this is to match the > > DVSEC Vendor ID (PCIe r6.0, sec 7.9.6.2). > > > > IIUC, this should be just a PCI SIG-defined "Vendor ID", e.g., > > "PCI_VENDOR_ID_CXL", that doesn't need the "DVSEC" qualifier in the > > name, and would normally be defined in include/linux/pci_ids.h. > > > > But I don't see CXL in pci_ids.h, and I don't see either "CXL" or the > > value "1e98" in the PCI SIG list at > > https://pcisig.com/membership/member-companies. > > > I'll create a new patch and move to include/linux/pci_ids.h first > for this define and change to PCI_VENDOR_ID_CXL. The value is > defined in CXL spec r3.1 sec 8.1.1. I saw the CXL mentions of 0x1e98, but IMO that's not an authoritative source; no vendor is allowed to just squat on a Vendor ID value simply by mentioning it in their own internal specs. That would obviously lead to madness. The footnote in CXL r3.1, sec 3.1.2, about how the 1E98h value is only for use in DVSEC etc., is really weird. IIUC, the PCI SIG controls the Vendor ID namespace, so I'm still really confused about why it is not reserved there. I emailed the PCI SIG, but the footnote suggests to me that there some kind of history here that I don't know. Anyway, I think all we can do here is to put the definition in include/linux/pci_ids.h as you did and hope 0x1e98 is never allocated to another vendor. > diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h > index a0c75e467df3..7dfbf6d96b3d 100644 > --- a/include/linux/pci_ids.h > +++ b/include/linux/pci_ids.h > @@ -2607,6 +2607,8 @@ > > #define PCI_VENDOR_ID_ALIBABA 0x1ded > > +#define PCI_VENDOR_ID_CXL 0x1e98 > + > #define PCI_VENDOR_ID_TEHUTI 0x1fc9 > #define PCI_DEVICE_ID_TEHUTI_3009 0x3009 > #define PCI_DEVICE_ID_TEHUTI_3010 0x3010 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-28 17:38 ` Bjorn Helgaas @ 2024-03-28 19:03 ` Dan Williams 2024-03-28 19:14 ` Bjorn Helgaas 2024-04-02 17:23 ` Bjorn Helgaas 0 siblings, 2 replies; 18+ messages in thread From: Dan Williams @ 2024-03-28 19:03 UTC (permalink / raw) To: Bjorn Helgaas, Dave Jiang Cc: linux-cxl, linux-pci, dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas Bjorn Helgaas wrote: > On Wed, Mar 27, 2024 at 04:57:40PM -0700, Dave Jiang wrote: > > On 3/27/24 2:26 PM, Bjorn Helgaas wrote: > > > On Mon, Mar 25, 2024 at 04:58:01PM -0700, Dave Jiang wrote: > > > >> With the current behavior, the bus_reset would appear to have executed > > >> successfully, however the operation is actually masked if the "Unmask > > >> SBR" bit is set with the default value. The intention is to inform the > > >> user that SBR for the CXL device is masked and will not go through. > > > > > > The default value of Unmask SBR isn't really relevant here. > > > > Changing to: > > When the "Unmask SBR" bit is set to 0 (default), the bus_reset would > > appear to have executed successfully. However the operation is actually > > masked. The intention is to inform the user that SBR for the CXL device > > is masked and will not go through. > > > > > > > >> The expectation is that if a user overrides the "Unmask SBR" via a > > >> user tool such as setpci then they can trigger a bus reset successfully. > > > > > > ... if a user *sets* Unmask SBR ... > > > to be more specific about what value is required. > > > > > > > If a user overrides the "Unmask SBR" via a user tool such as setpci and > > sets the value to 1, then the bus reset will execute successfully. > > I'm not super enamored with the "if a user overrides" language because > a driver may change that bit in the future, and then the suggestion of > a "user" will be misleading. > > It doesn't matter *how* it gets set to 1; it only matters that SBR > doesn't work when "Unmask SBR" is 0 and it does work when "Unmask SBR" > is 1. > > > >> +/* Compute Express Link (CXL) */ > > >> +#define PCI_DVSEC_VENDOR_ID_CXL 0x1e98 > > > > > > I see that you're just moving this #define from drivers/cxl/cxlpci.h, > > > but I'm scratching my head a bit. As used here, this is to match the > > > DVSEC Vendor ID (PCIe r6.0, sec 7.9.6.2). > > > > > > IIUC, this should be just a PCI SIG-defined "Vendor ID", e.g., > > > "PCI_VENDOR_ID_CXL", that doesn't need the "DVSEC" qualifier in the > > > name, and would normally be defined in include/linux/pci_ids.h. > > > > > > But I don't see CXL in pci_ids.h, and I don't see either "CXL" or the > > > value "1e98" in the PCI SIG list at > > > https://pcisig.com/membership/member-companies. > > > > > I'll create a new patch and move to include/linux/pci_ids.h first > > for this define and change to PCI_VENDOR_ID_CXL. The value is > > defined in CXL spec r3.1 sec 8.1.1. > > I saw the CXL mentions of 0x1e98, but IMO that's not an authoritative > source; no vendor is allowed to just squat on a Vendor ID value simply > by mentioning it in their own internal specs. That would obviously > lead to madness. > > The footnote in CXL r3.1, sec 3.1.2, about how the 1E98h value is only > for use in DVSEC etc., is really weird. > > IIUC, the PCI SIG controls the Vendor ID namespace, so I'm still > really confused about why it is not reserved there. I emailed the PCI > SIG, but the footnote suggests to me that there some kind of history > here that I don't know. > > Anyway, I think all we can do here is to put the definition in > include/linux/pci_ids.h as you did and hope 0x1e98 is never allocated > to another vendor. Oh, true, I think this should be PCI_DVSEC_VENDOR_ID_CXL, because afaics it is still possible that 0x1e98 be used as a non-DVSEC vendor-id in some future device. In other words I think the CXL specification usage of 0x1e98 is scoped as "DVSEC Vendor ID", not "Vendor ID". However that would mean that a future 0x1e98 device could not publish DVSECs without colliding with CXL DVSECs. I note this footnote in section 3.1.2 about the 0x1e98 value (all caps emphasis is from the spec, not me): --- NOTICE TO USERS: THE UNIQUE VALUE THAT IS PROVIDED IN THIS CXL SPECIFICATION IS FOR USE IN VENDOR DEFINED MESSAGE FIELDS, DESIGNATED VENDOR SPECIFIC EXTENDED CAPABILITIES, AND ALTERNATE PROTOCOL NEGOTIATION ONLY... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-28 19:03 ` Dan Williams @ 2024-03-28 19:14 ` Bjorn Helgaas 2024-04-02 17:23 ` Bjorn Helgaas 1 sibling, 0 replies; 18+ messages in thread From: Bjorn Helgaas @ 2024-03-28 19:14 UTC (permalink / raw) To: Dan Williams Cc: Dave Jiang, linux-cxl, linux-pci, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas On Thu, Mar 28, 2024 at 12:03:17PM -0700, Dan Williams wrote: > Bjorn Helgaas wrote: > > On Wed, Mar 27, 2024 at 04:57:40PM -0700, Dave Jiang wrote: > > > On 3/27/24 2:26 PM, Bjorn Helgaas wrote: > > > > On Mon, Mar 25, 2024 at 04:58:01PM -0700, Dave Jiang wrote: > > > >> +/* Compute Express Link (CXL) */ > > > >> +#define PCI_DVSEC_VENDOR_ID_CXL 0x1e98 > > > > > > > > I see that you're just moving this #define from drivers/cxl/cxlpci.h, > > > > but I'm scratching my head a bit. As used here, this is to match the > > > > DVSEC Vendor ID (PCIe r6.0, sec 7.9.6.2). > > > > > > > > IIUC, this should be just a PCI SIG-defined "Vendor ID", e.g., > > > > "PCI_VENDOR_ID_CXL", that doesn't need the "DVSEC" qualifier in the > > > > name, and would normally be defined in include/linux/pci_ids.h. > > > > > > > > But I don't see CXL in pci_ids.h, and I don't see either "CXL" or the > > > > value "1e98" in the PCI SIG list at > > > > https://pcisig.com/membership/member-companies. > > > > > > > I'll create a new patch and move to include/linux/pci_ids.h first > > > for this define and change to PCI_VENDOR_ID_CXL. The value is > > > defined in CXL spec r3.1 sec 8.1.1. > > > > I saw the CXL mentions of 0x1e98, but IMO that's not an authoritative > > source; no vendor is allowed to just squat on a Vendor ID value simply > > by mentioning it in their own internal specs. That would obviously > > lead to madness. > > > > The footnote in CXL r3.1, sec 3.1.2, about how the 1E98h value is only > > for use in DVSEC etc., is really weird. > > > > IIUC, the PCI SIG controls the Vendor ID namespace, so I'm still > > really confused about why it is not reserved there. I emailed the PCI > > SIG, but the footnote suggests to me that there some kind of history > > here that I don't know. > > > > Anyway, I think all we can do here is to put the definition in > > include/linux/pci_ids.h as you did and hope 0x1e98 is never allocated > > to another vendor. > > Oh, true, I think this should be PCI_DVSEC_VENDOR_ID_CXL, because afaics > it is still possible that 0x1e98 be used as a non-DVSEC vendor-id in > some future device. What a disaster that would be. > In other words I think the CXL specification usage of 0x1e98 is scoped > as "DVSEC Vendor ID", not "Vendor ID". > > However that would mean that a future 0x1e98 device could not publish > DVSECs without colliding with CXL DVSECs. > > I note this footnote in section 3.1.2 about the 0x1e98 value (all caps > emphasis is from the spec, not me): > > --- > NOTICE TO USERS: THE UNIQUE VALUE THAT IS PROVIDED IN THIS CXL SPECIFICATION IS > FOR USE IN VENDOR DEFINED MESSAGE FIELDS, DESIGNATED VENDOR SPECIFIC EXTENDED > CAPABILITIES, AND ALTERNATE PROTOCOL NEGOTIATION ONLY... Right, that's the one I thought was really weird. No sane person would put crap like that in a spec, which is why I thought there must be some "interesting" history behind it. Sigh. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-28 19:03 ` Dan Williams 2024-03-28 19:14 ` Bjorn Helgaas @ 2024-04-02 17:23 ` Bjorn Helgaas 2024-04-02 17:46 ` Dan Williams 1 sibling, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2024-04-02 17:23 UTC (permalink / raw) To: Dan Williams Cc: Dave Jiang, linux-cxl, linux-pci, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas On Thu, Mar 28, 2024 at 12:03:17PM -0700, Dan Williams wrote: > Bjorn Helgaas wrote: > > On Wed, Mar 27, 2024 at 04:57:40PM -0700, Dave Jiang wrote: > > > On 3/27/24 2:26 PM, Bjorn Helgaas wrote: > > > > On Mon, Mar 25, 2024 at 04:58:01PM -0700, Dave Jiang wrote: > ... > > > > IIUC, this should be just a PCI SIG-defined "Vendor ID", e.g., > > > > "PCI_VENDOR_ID_CXL", that doesn't need the "DVSEC" qualifier in the > > > > name, and would normally be defined in include/linux/pci_ids.h. > > > > > > > > But I don't see CXL in pci_ids.h, and I don't see either "CXL" or the > > > > value "1e98" in the PCI SIG list at > > > > https://pcisig.com/membership/member-companies. > > > > > > > I'll create a new patch and move to include/linux/pci_ids.h first > > > for this define and change to PCI_VENDOR_ID_CXL. The value is > > > defined in CXL spec r3.1 sec 8.1.1. > > > > I saw the CXL mentions of 0x1e98, but IMO that's not an authoritative > > source; no vendor is allowed to just squat on a Vendor ID value simply > > by mentioning it in their own internal specs. That would obviously > > lead to madness. > > > > The footnote in CXL r3.1, sec 3.1.2, about how the 1E98h value is only > > for use in DVSEC etc., is really weird. > > > > IIUC, the PCI SIG controls the Vendor ID namespace, so I'm still > > really confused about why it is not reserved there. I emailed the PCI > > SIG, but the footnote suggests to me that there some kind of history > > here that I don't know. > > > > Anyway, I think all we can do here is to put the definition in > > include/linux/pci_ids.h as you did and hope 0x1e98 is never allocated > > to another vendor. > > Oh, true, I think this should be PCI_DVSEC_VENDOR_ID_CXL, because afaics > it is still possible that 0x1e98 be used as a non-DVSEC vendor-id in > some future device. > > In other words I think the CXL specification usage of 0x1e98 is scoped > as "DVSEC Vendor ID", not "Vendor ID". > > However that would mean that a future 0x1e98 device could not publish > DVSECs without colliding with CXL DVSECs. FWIW, I pinged administration@pcisig.com and got the response that "1E98h is not a VID in our system, but 1E98 has already been reserved by CXL." I wish there were a clearer public statement of this reservation, but I interpret the response to mean that CXL is not a "Vendor", maybe due to some strict definition of "Vendor," but that PCI-SIG will not assign 0x1e98 to any other vendor. So IMO we should add "#define PCI_VENDOR_ID_CXL 0x1e98" so that if we ever *do* see such an assignment, we'll be more likely to flag it as an issue. Bjorn ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-04-02 17:23 ` Bjorn Helgaas @ 2024-04-02 17:46 ` Dan Williams 2024-04-03 14:44 ` Jonathan Cameron 0 siblings, 1 reply; 18+ messages in thread From: Dan Williams @ 2024-04-02 17:46 UTC (permalink / raw) To: Bjorn Helgaas, Dan Williams Cc: Dave Jiang, linux-cxl, linux-pci, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas Bjorn Helgaas wrote: [..] > FWIW, I pinged administration@pcisig.com and got the response that > "1E98h is not a VID in our system, but 1E98 has already been reserved > by CXL." > > I wish there were a clearer public statement of this reservation, but > I interpret the response to mean that CXL is not a "Vendor", maybe due > to some strict definition of "Vendor," but that PCI-SIG will not > assign 0x1e98 to any other vendor. > > So IMO we should add "#define PCI_VENDOR_ID_CXL 0x1e98" so that if we > ever *do* see such an assignment, we'll be more likely to flag it as > an issue. Agree. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-04-02 17:46 ` Dan Williams @ 2024-04-03 14:44 ` Jonathan Cameron 2024-04-03 20:36 ` Dan Williams 2024-04-04 9:02 ` Lukas Wunner 0 siblings, 2 replies; 18+ messages in thread From: Jonathan Cameron @ 2024-04-03 14:44 UTC (permalink / raw) To: Dan Williams Cc: Bjorn Helgaas, Dave Jiang, linux-cxl, linux-pci, ira.weiny, vishal.l.verma, alison.schofield, dave, bhelgaas, lukas On Tue, 2 Apr 2024 10:46:08 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > Bjorn Helgaas wrote: > [..] > > FWIW, I pinged administration@pcisig.com and got the response that > > "1E98h is not a VID in our system, but 1E98 has already been reserved > > by CXL." > > > > I wish there were a clearer public statement of this reservation, but > > I interpret the response to mean that CXL is not a "Vendor", maybe due > > to some strict definition of "Vendor," but that PCI-SIG will not > > assign 0x1e98 to any other vendor. > > > > So IMO we should add "#define PCI_VENDOR_ID_CXL 0x1e98" so that if we > > ever *do* see such an assignment, we'll be more likely to flag it as > > an issue. > > Agree. Sorry for late entry on this discussion and I'll be careful what I say on the history. As you've guessed it was "entertaining" and for FWIW that text occurs in other consortium specs (some predate CXL). It's reserved with agreement from the PCI SIG for a strictly defined set of purposes that does not correspond to those allowed for a normal ID granted to a vendor member. As you say CXL isn't a vendor (don't ask how DMTF got a vendor ID - 0x1AB4). Hence the naming gymnastics and vague answers to avoid any chance of lawyers getting involved :( Jonathan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-04-03 14:44 ` Jonathan Cameron @ 2024-04-03 20:36 ` Dan Williams 2024-04-04 9:02 ` Lukas Wunner 1 sibling, 0 replies; 18+ messages in thread From: Dan Williams @ 2024-04-03 20:36 UTC (permalink / raw) To: Jonathan Cameron, Dan Williams Cc: Bjorn Helgaas, Dave Jiang, linux-cxl, linux-pci, ira.weiny, vishal.l.verma, alison.schofield, dave, bhelgaas, lukas Jonathan Cameron wrote: > On Tue, 2 Apr 2024 10:46:08 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > Bjorn Helgaas wrote: > > [..] > > > FWIW, I pinged administration@pcisig.com and got the response that > > > "1E98h is not a VID in our system, but 1E98 has already been reserved > > > by CXL." > > > > > > I wish there were a clearer public statement of this reservation, but > > > I interpret the response to mean that CXL is not a "Vendor", maybe due > > > to some strict definition of "Vendor," but that PCI-SIG will not > > > assign 0x1e98 to any other vendor. > > > > > > So IMO we should add "#define PCI_VENDOR_ID_CXL 0x1e98" so that if we > > > ever *do* see such an assignment, we'll be more likely to flag it as > > > an issue. > > > > Agree. > > Sorry for late entry on this discussion and I'll be careful what I say > on the history. > > As you've guessed it was "entertaining" and for FWIW that text occurs > in other consortium specs (some predate CXL). > > It's reserved with agreement from the PCI SIG for a strictly defined set > of purposes that does not correspond to those allowed for a normal ID > granted to a vendor member. As you say CXL isn't a vendor (don't ask > how DMTF got a vendor ID - 0x1AB4). > > Hence the naming gymnastics and vague answers to avoid any chance of > lawyers getting involved :( Linux has practical reasons for renaming PCI_DVSEC_VENDOR_ID_CXL to PCI_VENDOR_ID_CXL. By this change Linux is asserting that not only does it expect to find 0x1e98 exclusively used for CXL DVSECs, but it expects to never read "0x1e98" from offset 0 in config space. If someone has reason to believe that assertion is invalid they can speak up here, but otherwise the naming solely reflects Linux's expectation of where it will be used. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-04-03 14:44 ` Jonathan Cameron 2024-04-03 20:36 ` Dan Williams @ 2024-04-04 9:02 ` Lukas Wunner 2024-04-04 13:52 ` Jonathan Cameron 1 sibling, 1 reply; 18+ messages in thread From: Lukas Wunner @ 2024-04-04 9:02 UTC (permalink / raw) To: Jonathan Cameron Cc: Dan Williams, Bjorn Helgaas, Dave Jiang, linux-cxl, linux-pci, ira.weiny, vishal.l.verma, alison.schofield, dave, bhelgaas On Wed, Apr 03, 2024 at 03:44:41PM +0100, Jonathan Cameron wrote: > > Bjorn Helgaas wrote: > > > FWIW, I pinged administration@pcisig.com and got the response that > > > "1E98h is not a VID in our system, but 1E98 has already been reserved > > > by CXL." > > > > > > I wish there were a clearer public statement of this reservation, but > > > I interpret the response to mean that CXL is not a "Vendor", maybe due > > > to some strict definition of "Vendor," but that PCI-SIG will not > > > assign 0x1e98 to any other vendor. > > > > > > So IMO we should add "#define PCI_VENDOR_ID_CXL 0x1e98" so that if we > > > ever *do* see such an assignment, we'll be more likely to flag it as > > > an issue. > > Sorry for late entry on this discussion and I'll be careful what I say > on the history. > > As you've guessed it was "entertaining" and for FWIW that text occurs > in other consortium specs (some predate CXL). > > It's reserved with agreement from the PCI SIG for a strictly defined set > of purposes that does not correspond to those allowed for a normal ID > granted to a vendor member. As you say CXL isn't a vendor (don't ask > how DMTF got a vendor ID - 0x1AB4). > > Hence the naming gymnastics and vague answers to avoid any chance of > lawyers getting involved :( Hm, I'm wondering if avoiding the term "vendor" with something like #define PCI_CONSORTIUM_ID_CXL 0x1e98 would assuage the angst of a legal misstep? ;) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-04-04 9:02 ` Lukas Wunner @ 2024-04-04 13:52 ` Jonathan Cameron 0 siblings, 0 replies; 18+ messages in thread From: Jonathan Cameron @ 2024-04-04 13:52 UTC (permalink / raw) To: Lukas Wunner Cc: Dan Williams, Bjorn Helgaas, Dave Jiang, linux-cxl, linux-pci, ira.weiny, vishal.l.verma, alison.schofield, dave, bhelgaas On Thu, 4 Apr 2024 11:02:10 +0200 Lukas Wunner <lukas@wunner.de> wrote: > On Wed, Apr 03, 2024 at 03:44:41PM +0100, Jonathan Cameron wrote: > > > Bjorn Helgaas wrote: > > > > FWIW, I pinged administration@pcisig.com and got the response that > > > > "1E98h is not a VID in our system, but 1E98 has already been reserved > > > > by CXL." > > > > > > > > I wish there were a clearer public statement of this reservation, but > > > > I interpret the response to mean that CXL is not a "Vendor", maybe due > > > > to some strict definition of "Vendor," but that PCI-SIG will not > > > > assign 0x1e98 to any other vendor. > > > > > > > > So IMO we should add "#define PCI_VENDOR_ID_CXL 0x1e98" so that if we > > > > ever *do* see such an assignment, we'll be more likely to flag it as > > > > an issue. > > > > Sorry for late entry on this discussion and I'll be careful what I say > > on the history. > > > > As you've guessed it was "entertaining" and for FWIW that text occurs > > in other consortium specs (some predate CXL). > > > > It's reserved with agreement from the PCI SIG for a strictly defined set > > of purposes that does not correspond to those allowed for a normal ID > > granted to a vendor member. As you say CXL isn't a vendor (don't ask > > how DMTF got a vendor ID - 0x1AB4). > > > > Hence the naming gymnastics and vague answers to avoid any chance of > > lawyers getting involved :( > > Hm, I'm wondering if avoiding the term "vendor" with something like > > #define PCI_CONSORTIUM_ID_CXL 0x1e98 > > would assuage the angst of a legal misstep? ;) Works for me, but I really hope we don't have to care :( Jonathan ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset 2024-03-25 23:58 ` [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset Dave Jiang 2024-03-27 21:26 ` Bjorn Helgaas @ 2024-03-28 1:43 ` Dan Williams 1 sibling, 0 replies; 18+ messages in thread From: Dan Williams @ 2024-03-28 1:43 UTC (permalink / raw) To: Dave Jiang, linux-cxl, linux-pci Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas Dave Jiang wrote: > Per CXL spec r3.1 8.1.5.2, secondary bus reset is masked unless the > "Unmask SBR" bit is set. Add a check to the PCI secondary bus reset > path to fail the CXL SBR request if the "Unmask SBR" bit is clear in > the CXL Port Control Extensions register by returning -ENOTTY. > > With the current behavior, the bus_reset would appear to have executed > successfully, however the operation is actually masked if the "Unmask > SBR" bit is set with the default value. The intention is to inform the > user that SBR for the CXL device is masked and will not go through. > > The expectation is that if a user overrides the "Unmask SBR" via a > user tool such as setpci then they can trigger a bus reset successfully. > > Link: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/ > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- [..] > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e5f243dd4288..259e5d6538bb 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4927,10 +4927,55 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) > return pci_reset_hotplug_slot(dev->slot->hotplug, probe); > } > > +static bool pci_is_cxl(struct pci_dev *dev) > +{ > + return pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, > + CXL_DVSEC_PCIE_DEVICE); > +} > + > +static bool is_cxl_port_sbr_masked(struct pci_dev *dev) > +{ > + int dvsec; > + u16 reg; > + int rc; > + > + /* > + * No DVSEC found, must not be CXL port. > + */ This comment is unfortunately not correct. Per CXL 9.12.3 "Enumerating CXL RPs and DSPs", the CXL_DVSEC_PCIE_PORT disappears when no endpoint is connected. So the comment should be: /* * No DVSEC found, either is not a CXL port, or not connected in which * case mask state is a nop (CXL 3.1 9.12.3 "Enumerating CXL RPs and DSPs") */ > + dvsec = pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, > + CXL_DVSEC_PCIE_PORT); > + if (!dvsec) > + return false; > + > + rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_PORT_CONTROL, ®); > + if (rc) > + return true; If the link gets disconnected after the above check but before reading the register the value returned *should* be 0xffff which should naturally indicate that reset is not masked. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/3] PCI: Create new reset method to force SBR for CXL 2024-03-25 23:58 [PATCH 0/3 v2] PCI: Add Secondary Bus Reset (SBR) support for CXL Dave Jiang 2024-03-25 23:58 ` [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset Dave Jiang @ 2024-03-25 23:58 ` Dave Jiang 2024-03-28 1:53 ` Dan Williams 2024-03-25 23:58 ` [PATCH v2 3/3] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) Dave Jiang 2 siblings, 1 reply; 18+ messages in thread From: Dave Jiang @ 2024-03-25 23:58 UTC (permalink / raw) To: linux-cxl, linux-pci Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas CXL spec r3.1 8.1.5.2 By default Secondary Bus Reset (SBR) is masked for CXL ports. Introduce a new PCI reset method "cxl_bus_force" to force SBR on CXL ports by setting the unmask SBR bit in the CXL DVSEC port control register before performing the bus reset and restore the original value of the bit post reset. The new reset method allows the user to intentionally perform SBR on a CXL device without needing to set the "Unmask SBR" bit via a user tool. Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- v2: - Use pci_upstream_bridge() instead of dev->bus->self. - Return -ENOTTY as error for reset function --- drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++-- include/linux/pci.h | 2 +- 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 259e5d6538bb..cbcad8f0880d 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4933,6 +4933,12 @@ static bool pci_is_cxl(struct pci_dev *dev) CXL_DVSEC_PCIE_DEVICE); } +static int cxl_port_dvsec(struct pci_dev *dev) +{ + return pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, + CXL_DVSEC_PCIE_PORT); +} + static bool is_cxl_port_sbr_masked(struct pci_dev *dev) { int dvsec; @@ -4942,8 +4948,7 @@ static bool is_cxl_port_sbr_masked(struct pci_dev *dev) /* * No DVSEC found, must not be CXL port. */ - dvsec = pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, - CXL_DVSEC_PCIE_PORT); + dvsec = cxl_port_dvsec(dev); if (!dvsec) return false; @@ -4982,6 +4987,48 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe) return pci_parent_bus_reset(dev, probe); } +static int cxl_reset_bus_function(struct pci_dev *dev, bool probe) +{ + struct pci_dev *bridge; + int dvsec; + int rc; + u16 reg, val; + + if (!pci_is_cxl(dev)) + return -ENOTTY; + + bridge = pci_upstream_bridge(dev); + if (!bridge) + return -ENOTTY; + + dvsec = cxl_port_dvsec(bridge); + if (!dvsec) + return -ENOTTY; + + if (probe) + return 0; + + rc = pci_read_config_word(bridge, dvsec + CXL_DVSEC_PORT_CONTROL, + ®); + if (rc) + return -ENOTTY; + + if (!(reg & CXL_DVSEC_PORT_CONTROL_UNMASK_SBR)) { + val = reg | CXL_DVSEC_PORT_CONTROL_UNMASK_SBR; + pci_write_config_word(bridge, + dvsec + CXL_DVSEC_PORT_CONTROL, val); + } else { + val = reg; + } + + rc = pci_reset_bus_function(dev, probe); + + if (reg != val) + pci_write_config_word(bridge, dvsec + CXL_DVSEC_PORT_CONTROL, reg); + + return rc; +} + void pci_dev_lock(struct pci_dev *dev) { /* block PM suspend, driver probe, etc. */ @@ -5066,6 +5113,7 @@ static const struct pci_reset_fn_method pci_reset_fn_methods[] = { { pci_af_flr, .name = "af_flr" }, { pci_pm_reset, .name = "pm" }, { pci_reset_bus_function, .name = "bus" }, + { cxl_reset_bus_function, .name = "cxl_bus_force" }, }; static ssize_t reset_method_show(struct device *dev, diff --git a/include/linux/pci.h b/include/linux/pci.h index 16493426a04f..235f37715a43 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -51,7 +51,7 @@ PCI_STATUS_PARITY) /* Number of reset methods used in pci_reset_fn_methods array in pci.c */ -#define PCI_NUM_RESET_METHODS 7 +#define PCI_NUM_RESET_METHODS 8 #define PCI_RESET_PROBE true #define PCI_RESET_DO_RESET false -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH v2 2/3] PCI: Create new reset method to force SBR for CXL 2024-03-25 23:58 ` [PATCH v2 2/3] PCI: Create new reset method to force SBR for CXL Dave Jiang @ 2024-03-28 1:53 ` Dan Williams 0 siblings, 0 replies; 18+ messages in thread From: Dan Williams @ 2024-03-28 1:53 UTC (permalink / raw) To: Dave Jiang, linux-cxl, linux-pci Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas Dave Jiang wrote: > CXL spec r3.1 8.1.5.2 > By default Secondary Bus Reset (SBR) is masked for CXL ports. Introduce a > new PCI reset method "cxl_bus_force" to force SBR on CXL ports by setting > the unmask SBR bit in the CXL DVSEC port control register before performing > the bus reset and restore the original value of the bit post reset. The > new reset method allows the user to intentionally perform SBR on a CXL > device without needing to set the "Unmask SBR" bit via a user tool. > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > v2: > - Use pci_upstream_bridge() instead of dev->bus->self. > - Return -ENOTTY as error for reset function > --- > drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++-- > include/linux/pci.h | 2 +- > 2 files changed, 51 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 259e5d6538bb..cbcad8f0880d 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4933,6 +4933,12 @@ static bool pci_is_cxl(struct pci_dev *dev) > CXL_DVSEC_PCIE_DEVICE); > } > > +static int cxl_port_dvsec(struct pci_dev *dev) > +{ > + return pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, > + CXL_DVSEC_PCIE_PORT); > +} > + > static bool is_cxl_port_sbr_masked(struct pci_dev *dev) > { > int dvsec; > @@ -4942,8 +4948,7 @@ static bool is_cxl_port_sbr_masked(struct pci_dev *dev) > /* > * No DVSEC found, must not be CXL port. > */ > - dvsec = pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL, > - CXL_DVSEC_PCIE_PORT); Once applied, those 2 lines had a very short life in mainline. Perhaps just define cxl_port_dvsec() in patch1? > + dvsec = cxl_port_dvsec(dev); > if (!dvsec) > return false; > > @@ -4982,6 +4987,48 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe) > return pci_parent_bus_reset(dev, probe); > } > > +static int cxl_reset_bus_function(struct pci_dev *dev, bool probe) > +{ > + struct pci_dev *bridge; > + int dvsec; > + int rc; > + u16 reg, val; > + > + if (!pci_is_cxl(dev)) > + return -ENOTTY; > + > + bridge = pci_upstream_bridge(dev); > + if (!bridge) > + return -ENOTTY; > + > + dvsec = cxl_port_dvsec(bridge); > + if (!dvsec) > + return -ENOTTY; > + > + if (probe) > + return 0; > + > + rc = pci_read_config_word(bridge, dvsec + CXL_DVSEC_PORT_CONTROL, > + ®); > + if (rc) > + return -ENOTTY; > + > + if (!(reg & CXL_DVSEC_PORT_CONTROL_UNMASK_SBR)) { > + val = reg | CXL_DVSEC_PORT_CONTROL_UNMASK_SBR; > + pci_write_config_word(bridge, > + dvsec + CXL_DVSEC_PORT_CONTROL, val); > + } else { > + val = reg; > + } > + > + rc = pci_reset_bus_function(dev, probe); > + > + if (reg != val) > + pci_write_config_word(bridge, dvsec + CXL_DVSEC_PORT_CONTROL, reg); Doesn't this whole sequence need to be wrapped in pci_cfg_access_lock()? Otherwise userspace can get confused if it races to access CXL_DVSEC_PCIE_PORT while the link is down, or if it races to write Unmask SBR and messes up the saved value. I took a quick look and did not see this lock taken from reset_method_store(). > + > + return rc; > +} > + > void pci_dev_lock(struct pci_dev *dev) > { > /* block PM suspend, driver probe, etc. */ > @@ -5066,6 +5113,7 @@ static const struct pci_reset_fn_method pci_reset_fn_methods[] = { > { pci_af_flr, .name = "af_flr" }, > { pci_pm_reset, .name = "pm" }, > { pci_reset_bus_function, .name = "bus" }, > + { cxl_reset_bus_function, .name = "cxl_bus_force" }, Why include "_force" in the name? "cxl_bus" already implies "do what is needed to bus reset this CXL link". ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 3/3] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) 2024-03-25 23:58 [PATCH 0/3 v2] PCI: Add Secondary Bus Reset (SBR) support for CXL Dave Jiang 2024-03-25 23:58 ` [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset Dave Jiang 2024-03-25 23:58 ` [PATCH v2 2/3] PCI: Create new reset method to force SBR for CXL Dave Jiang @ 2024-03-25 23:58 ` Dave Jiang 2024-03-28 2:03 ` Dan Williams 2 siblings, 1 reply; 18+ messages in thread From: Dave Jiang @ 2024-03-25 23:58 UTC (permalink / raw) To: linux-cxl, linux-pci Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas SBR is equivalent to a device been hot removed and inserted again. Doing a SBR on a CXL type 3 device is problematic if the exported device memory is part of system memory that cannot be offlined. The event is equivalent to violently ripping out that range of memory from the kernel. While the hardware requires the "Unmask SBR" bit set in the Port Control Extensions register and the kernel currently does not unmask it, user can unmask this bit via setpci or similar tool. The driver does not have a way to detect whether a reset coming from the PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to detect is to note if a decoder is marked as enabled in software but the decoder control register indicates it's not committed. A helper function is added to find discrepancy between the decoder software state versus the hardware register state. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- v2: - Fix typo in commit log --- drivers/cxl/core/port.c | 30 ++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 2 ++ drivers/cxl/pci.c | 19 +++++++++++++++++++ 3 files changed, 51 insertions(+) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 2b0cab556072..e33d047f992b 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2222,6 +2222,36 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, } EXPORT_SYMBOL_NS_GPL(cxl_endpoint_get_perf_coordinates, CXL); +static int decoder_hw_mismatch(struct device *dev, void *data) +{ + struct cxl_endpoint_decoder *cxled; + struct cxl_port *port = data; + struct cxl_decoder *cxld; + struct cxl_hdm *cxlhdm; + void __iomem *hdm; + u32 ctrl; + + if (!is_endpoint_decoder(dev)) + return 0; + + cxled = to_cxl_endpoint_decoder(dev); + if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0) + return 0; + + cxld = &cxled->cxld; + cxlhdm = dev_get_drvdata(&port->dev); + hdm = cxlhdm->regs.hdm_decoder; + ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id)); + + return !FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl); +} + +bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port) +{ + return device_for_each_child(&port->dev, port, decoder_hw_mismatch); +} +EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, CXL); + /* for user tooling to ensure port disable work has completed */ static ssize_t flush_store(const struct bus_type *bus, const char *buf, size_t count) { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 534e25e2f0a4..e3c237c50b59 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -895,6 +895,8 @@ void cxl_coordinates_combine(struct access_coordinate *out, struct access_coordinate *c1, struct access_coordinate *c2); +bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port); + /* * Unit test builds overrides this to __weak, find the 'strong' version * of these symbols in tools/testing/cxl/. diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 13450e75f5eb..78457542aeec 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -957,11 +957,30 @@ static void cxl_error_resume(struct pci_dev *pdev) dev->driver ? "successful" : "failed"); } +static void cxl_reset_done(struct pci_dev *pdev) +{ + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); + struct cxl_memdev *cxlmd = cxlds->cxlmd; + struct device *dev = &pdev->dev; + + /* + * FLR does not expect to touch the HDM decoders and related registers. + * SBR however will wipe all device configurations. + * Issue warning if there was active decoder before reset that no + * longer exists. + */ + if (cxl_endpoint_decoder_reset_detected(cxlmd->endpoint)) { + dev_warn(dev, "SBR happened without memory regions removal.\n"); + dev_warn(dev, "System may be unstable if regions hosted system memory.\n"); + } +} + static const struct pci_error_handlers cxl_error_handlers = { .error_detected = cxl_error_detected, .slot_reset = cxl_slot_reset, .resume = cxl_error_resume, .cor_error_detected = cxl_cor_error_detected, + .reset_done = cxl_reset_done, }; static struct pci_driver cxl_pci_driver = { -- 2.44.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH v2 3/3] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) 2024-03-25 23:58 ` [PATCH v2 3/3] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) Dave Jiang @ 2024-03-28 2:03 ` Dan Williams 0 siblings, 0 replies; 18+ messages in thread From: Dan Williams @ 2024-03-28 2:03 UTC (permalink / raw) To: Dave Jiang, linux-cxl, linux-pci Cc: dan.j.williams, ira.weiny, vishal.l.verma, alison.schofield, Jonathan.Cameron, dave, bhelgaas, lukas Dave Jiang wrote: > SBR is equivalent to a device been hot removed and inserted again. Doing a > SBR on a CXL type 3 device is problematic if the exported device memory is > part of system memory that cannot be offlined. The event is equivalent to > violently ripping out that range of memory from the kernel. While the > hardware requires the "Unmask SBR" bit set in the Port Control Extensions > register and the kernel currently does not unmask it, user can unmask > this bit via setpci or similar tool. > > The driver does not have a way to detect whether a reset coming from the > PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to > detect is to note if a decoder is marked as enabled in software but the > decoder control register indicates it's not committed. > > A helper function is added to find discrepancy between the decoder > software state versus the hardware register state. > > Suggested-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > v2: > - Fix typo in commit log > --- > drivers/cxl/core/port.c | 30 ++++++++++++++++++++++++++++++ > drivers/cxl/cxl.h | 2 ++ > drivers/cxl/pci.c | 19 +++++++++++++++++++ > 3 files changed, 51 insertions(+) > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 2b0cab556072..e33d047f992b 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -2222,6 +2222,36 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, > } > EXPORT_SYMBOL_NS_GPL(cxl_endpoint_get_perf_coordinates, CXL); > > +static int decoder_hw_mismatch(struct device *dev, void *data) I would just call this __cxl_endpoint_decoder_reset_detected(), which is clearer. > +{ > + struct cxl_endpoint_decoder *cxled; > + struct cxl_port *port = data; > + struct cxl_decoder *cxld; > + struct cxl_hdm *cxlhdm; > + void __iomem *hdm; > + u32 ctrl; > + > + if (!is_endpoint_decoder(dev)) > + return 0; > + > + cxled = to_cxl_endpoint_decoder(dev); > + if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0) > + return 0; > + > + cxld = &cxled->cxld; > + cxlhdm = dev_get_drvdata(&port->dev); > + hdm = cxlhdm->regs.hdm_decoder; > + ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id)); There is no register access in cxl/core/port.c that is all handled in drivers/cxl/core/pci.c. That helps with cxl_test that knows it needs to intercept drivers/cxl/core/pci.c exports. > + > + return !FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl); > +} > + > +bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port) > +{ > + return device_for_each_child(&port->dev, port, decoder_hw_mismatch); > +} > +EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, CXL); > + > /* for user tooling to ensure port disable work has completed */ > static ssize_t flush_store(const struct bus_type *bus, const char *buf, size_t count) > { > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 534e25e2f0a4..e3c237c50b59 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -895,6 +895,8 @@ void cxl_coordinates_combine(struct access_coordinate *out, > struct access_coordinate *c1, > struct access_coordinate *c2); > > +bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port); > + > /* > * Unit test builds overrides this to __weak, find the 'strong' version > * of these symbols in tools/testing/cxl/. > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 13450e75f5eb..78457542aeec 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -957,11 +957,30 @@ static void cxl_error_resume(struct pci_dev *pdev) > dev->driver ? "successful" : "failed"); > } > > +static void cxl_reset_done(struct pci_dev *pdev) > +{ > + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); > + struct cxl_memdev *cxlmd = cxlds->cxlmd; > + struct device *dev = &pdev->dev; > + > + /* > + * FLR does not expect to touch the HDM decoders and related registers. > + * SBR however will wipe all device configurations. > + * Issue warning if there was active decoder before reset that no > + * longer exists. > + */ > + if (cxl_endpoint_decoder_reset_detected(cxlmd->endpoint)) { > + dev_warn(dev, "SBR happened without memory regions removal.\n"); > + dev_warn(dev, "System may be unstable if regions hosted system memory.\n"); This should taint the kernel so that any future crash shows that someone had force reset an active CXL memory configuration. > + } > +} > + > static const struct pci_error_handlers cxl_error_handlers = { > .error_detected = cxl_error_detected, > .slot_reset = cxl_slot_reset, > .resume = cxl_error_resume, > .cor_error_detected = cxl_cor_error_detected, > + .reset_done = cxl_reset_done, > }; > > static struct pci_driver cxl_pci_driver = { > -- > 2.44.0 > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-04-04 13:52 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-25 23:58 [PATCH 0/3 v2] PCI: Add Secondary Bus Reset (SBR) support for CXL Dave Jiang 2024-03-25 23:58 ` [PATCH v2 1/3] PCI: Add check for CXL Secondary Bus Reset Dave Jiang 2024-03-27 21:26 ` Bjorn Helgaas 2024-03-27 23:57 ` Dave Jiang 2024-03-28 17:38 ` Bjorn Helgaas 2024-03-28 19:03 ` Dan Williams 2024-03-28 19:14 ` Bjorn Helgaas 2024-04-02 17:23 ` Bjorn Helgaas 2024-04-02 17:46 ` Dan Williams 2024-04-03 14:44 ` Jonathan Cameron 2024-04-03 20:36 ` Dan Williams 2024-04-04 9:02 ` Lukas Wunner 2024-04-04 13:52 ` Jonathan Cameron 2024-03-28 1:43 ` Dan Williams 2024-03-25 23:58 ` [PATCH v2 2/3] PCI: Create new reset method to force SBR for CXL Dave Jiang 2024-03-28 1:53 ` Dan Williams 2024-03-25 23:58 ` [PATCH v2 3/3] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) Dave Jiang 2024-03-28 2:03 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox