* [PATCH v3 01/15] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 02/15] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe port support Terry Bowman
` (13 subsequent siblings)
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
CXL.io provides PCIe like protocol error implementation, but CXL.io and
PCIe have different handling requirements.
The PCIe AER service driver may attempt recovering PCIe devices with
uncorrectable errors while recovery is not used for CXL.io. Recovery is not
used in the CXL.io case because of the potential for corruption on what can
be system memory.
Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
Create handlers for correctable and uncorrectable CXL.io error
handling.
The CXL error handlers will be used in future patches adding CXL PCIe
port protocol error handling.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
---
include/linux/pci.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 573b4c4c2be6..106ac83e3a7b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -886,6 +886,14 @@ struct pci_error_handlers {
void (*cor_error_detected)(struct pci_dev *dev);
};
+/* CXL bus error event callbacks */
+struct cxl_error_handlers {
+ /* CXL bus error detected on this device */
+ bool (*error_detected)(struct pci_dev *dev);
+
+ /* Allow device driver to record more details of a correctable error */
+ void (*cor_error_detected)(struct pci_dev *dev);
+};
struct module;
@@ -956,6 +964,7 @@ struct pci_driver {
int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf);
const struct pci_error_handlers *err_handler;
+ const struct cxl_error_handlers *cxl_err_handler;
const struct attribute_group **groups;
const struct attribute_group **dev_groups;
struct device_driver driver;
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 02/15] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe port support
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2024-11-13 21:54 ` [PATCH v3 01/15] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver' Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
` (12 subsequent siblings)
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
The AER service driver already includes support for CXL restricted host
(RCH) downstream port error handling. The current implementation is based
on CXL1.1 using a root complex event collector.
Rename function interfaces and parameters where necessary to include
virtual hierarchy (VH) mode CXL PCIe port error handling alongside the RCH
handling.[1] The CXL PCIe port error handling will be added in a future
patch.
Limit changes to renaming variable and function names. No functional
changes are added.
[1] CXL 3.1 Spec, 9.12.2 CXL Virtual Hierarchy
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
---
drivers/pci/pcie/aer.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 13b8586924ea..fe6edf26279e 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1029,7 +1029,7 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
return 0;
}
-static void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
+static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
{
/*
* Internal errors of an RCEC indicate an AER error in an
@@ -1052,30 +1052,30 @@ static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
return *handles_cxl;
}
-static bool handles_cxl_errors(struct pci_dev *rcec)
+static bool handles_cxl_errors(struct pci_dev *dev)
{
bool handles_cxl = false;
- if (pci_pcie_type(rcec) == PCI_EXP_TYPE_RC_EC &&
- pcie_aer_is_native(rcec))
- pcie_walk_rcec(rcec, handles_cxl_error_iter, &handles_cxl);
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
+ pcie_aer_is_native(dev))
+ pcie_walk_rcec(dev, handles_cxl_error_iter, &handles_cxl);
return handles_cxl;
}
-static void cxl_rch_enable_rcec(struct pci_dev *rcec)
+static void cxl_enable_internal_errors(struct pci_dev *dev)
{
- if (!handles_cxl_errors(rcec))
+ if (!handles_cxl_errors(dev))
return;
- pci_aer_unmask_internal_errors(rcec);
- pci_info(rcec, "CXL: Internal errors unmasked");
+ pci_aer_unmask_internal_errors(dev);
+ pci_info(dev, "CXL: Internal errors unmasked");
}
#else
-static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
-static inline void cxl_rch_handle_error(struct pci_dev *dev,
- struct aer_err_info *info) { }
+static inline void cxl_enable_internal_errors(struct pci_dev *dev) { }
+static inline void cxl_handle_error(struct pci_dev *dev,
+ struct aer_err_info *info) { }
#endif
/**
@@ -1113,7 +1113,7 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
{
- cxl_rch_handle_error(dev, info);
+ cxl_handle_error(dev, info);
pci_aer_handle_error(dev, info);
pci_dev_put(dev);
}
@@ -1491,7 +1491,7 @@ static int aer_probe(struct pcie_device *dev)
return status;
}
- cxl_rch_enable_rcec(port);
+ cxl_enable_internal_errors(port);
aer_enable_rootport(rpc);
pci_info(port, "enabled with IRQ %d\n", dev->irq);
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2024-11-13 21:54 ` [PATCH v3 01/15] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver' Terry Bowman
2024-11-13 21:54 ` [PATCH v3 02/15] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe port support Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-14 15:45 ` Lukas Wunner
2024-11-13 21:54 ` [PATCH v3 04/15] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
` (11 subsequent siblings)
14 siblings, 1 reply; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
CXL and AER drivers need the ability to identify CXL devices and CXL port
devices.
First, add set_pcie_cxl() with logic checking for CXL Flexbus DVSEC
presence. The CXL Flexbus DVSEC presence is used because it is required
for all the CXL PCIe devices.[1]
Add boolean 'struct pci_dev::is_cxl' with the purpose to cache the CXL
Flexbus presence.
Add pcie_is_cxl() as a macro to return 'struct pci_dev::is_cxl',
Add pcie_is_cxl_port() to check if a device is a CXL root port, CXL
upstream switch port, or CXL downstream switch port. Also, verify the
CXL extensions DVSEC for port is present.[1]
[1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended
Capability (DVSEC) ID Assignment, Table 8-2
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
---
drivers/pci/pci.c | 14 ++++++++++++++
drivers/pci/probe.c | 10 ++++++++++
include/linux/pci.h | 4 ++++
include/uapi/linux/pci_regs.h | 3 ++-
4 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 225a6cd2e9ca..6db6c171df54 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5038,6 +5038,20 @@ static u16 cxl_port_dvsec(struct pci_dev *dev)
PCI_DVSEC_CXL_PORT);
}
+bool pcie_is_cxl_port(struct pci_dev *dev)
+{
+ if (!pcie_is_cxl(dev))
+ return false;
+
+ if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
+ (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
+ (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
+ return false;
+
+ return cxl_port_dvsec(dev);
+}
+EXPORT_SYMBOL_GPL(pcie_is_cxl_port);
+
static bool cxl_sbr_masked(struct pci_dev *dev)
{
u16 dvsec, reg;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f1615805f5b0..277e3fc8e1a7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1631,6 +1631,14 @@ static void set_pcie_thunderbolt(struct pci_dev *dev)
dev->is_thunderbolt = 1;
}
+static void set_pcie_cxl(struct pci_dev *dev)
+{
+ u16 dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_FLEXBUS);
+ if (dvsec)
+ dev->is_cxl = 1;
+}
+
static void set_pcie_untrusted(struct pci_dev *dev)
{
struct pci_dev *parent;
@@ -1945,6 +1953,8 @@ int pci_setup_device(struct pci_dev *dev)
/* Need to have dev->cfg_size ready */
set_pcie_thunderbolt(dev);
+ set_pcie_cxl(dev);
+
set_pcie_untrusted(dev);
/* "Unknown power state" */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 106ac83e3a7b..d3b1af9fb273 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -443,6 +443,7 @@ struct pci_dev {
unsigned int is_hotplug_bridge:1;
unsigned int shpc_managed:1; /* SHPC owned by shpchp */
unsigned int is_thunderbolt:1; /* Thunderbolt controller */
+ unsigned int is_cxl:1; /* CXL alternate protocol */
/*
* Devices marked being untrusted are the ones that can potentially
* execute DMA attacks and similar. They are typically connected
@@ -743,6 +744,9 @@ static inline bool pci_is_vga(struct pci_dev *pdev)
return false;
}
+#define pcie_is_cxl(dev) (dev->is_cxl)
+bool pcie_is_cxl_port(struct pci_dev *dev);
+
#define for_each_pci_bridge(dev, bus) \
list_for_each_entry(dev, &bus->devices, bus_list) \
if (!pci_is_bridge(dev)) {} else
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 12323b3334a9..5df6c74963c5 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1186,9 +1186,10 @@
#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL 0x00ff0000
#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX 0xff000000
-/* Compute Express Link (CXL r3.1, sec 8.1.5) */
+/* Compute Express Link (CXL r3.1, sec 8.1) */
#define PCI_DVSEC_CXL_PORT 3
#define PCI_DVSEC_CXL_PORT_CTL 0x0c
#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR 0x00000001
+#define PCI_DVSEC_CXL_FLEXBUS 7
#endif /* LINUX_PCI_REGS_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-13 21:54 ` [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
@ 2024-11-14 15:45 ` Lukas Wunner
2024-11-14 16:45 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-14 15:45 UTC (permalink / raw)
To: Terry Bowman
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Wed, Nov 13, 2024 at 03:54:17PM -0600, Terry Bowman wrote:
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5038,6 +5038,20 @@ static u16 cxl_port_dvsec(struct pci_dev *dev)
> PCI_DVSEC_CXL_PORT);
> }
>
> +bool pcie_is_cxl_port(struct pci_dev *dev)
> +{
> + if (!pcie_is_cxl(dev))
> + return false;
> +
> + if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> + (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
> + (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
> + return false;
> +
> + return cxl_port_dvsec(dev);
> +}
> +EXPORT_SYMBOL_GPL(pcie_is_cxl_port);
This doesn't need to be exported because the only caller introduced
in this series is in drivers/pci/pcie/aer.c (in patch 05/15), which
is dependent on CONFIG_PCIEAER, which is bool not tristate.
The "!pcie_is_cxl(dev)" check at the top of the function is identical
to the return value "cxl_port_dvsec(dev)". This looks redundant.
However one cannot call pci_pcie_type() without first checking
pci_is_pcie(). So I'm wondering if the "!pcie_is_cxl(dev)" check
is actually erroneous and supposed to be "!pci_is_pcie(dev)"?
That would make more sense to me.
Alternatively, just return true instead of "cxl_port_dvsec(dev)".
That would probably be the simplest solution here.
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -443,6 +443,7 @@ struct pci_dev {
> unsigned int is_hotplug_bridge:1;
> unsigned int shpc_managed:1; /* SHPC owned by shpchp */
> unsigned int is_thunderbolt:1; /* Thunderbolt controller */
> + unsigned int is_cxl:1; /* CXL alternate protocol */
I suspect the audience consists mostly of CXL-unaware PCI developers,
so spelling out Compute Express Link here (and omitting "alternate
protocol" if it doesn't fit) might be more appropriate.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-14 15:45 ` Lukas Wunner
@ 2024-11-14 16:45 ` Bowman, Terry
2024-11-14 16:52 ` Lukas Wunner
0 siblings, 1 reply; 48+ messages in thread
From: Bowman, Terry @ 2024-11-14 16:45 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
Hi Lukas,
I added comments below.
On 11/14/2024 9:45 AM, Lukas Wunner wrote:
> On Wed, Nov 13, 2024 at 03:54:17PM -0600, Terry Bowman wrote:
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5038,6 +5038,20 @@ static u16 cxl_port_dvsec(struct pci_dev *dev)
>> PCI_DVSEC_CXL_PORT);
>> }
>>
>> +bool pcie_is_cxl_port(struct pci_dev *dev)
>> +{
>> + if (!pcie_is_cxl(dev))
>> + return false;
>> +
>> + if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
>> + (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
>> + (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
>> + return false;
>> +
>> + return cxl_port_dvsec(dev);
>> +}
>> +EXPORT_SYMBOL_GPL(pcie_is_cxl_port);
> This doesn't need to be exported because the only caller introduced
> in this series is in drivers/pci/pcie/aer.c (in patch 05/15), which
> is dependent on CONFIG_PCIEAER, which is bool not tristate.
Ok.
> The "!pcie_is_cxl(dev)" check at the top of the function is identical
> to the return value "cxl_port_dvsec(dev)". This looks redundant.
> However one cannot call pci_pcie_type() without first checking
> pci_is_pcie(). So I'm wondering if the "!pcie_is_cxl(dev)" check
> is actually erroneous and supposed to be "!pci_is_pcie(dev)"?
> That would make more sense to me.
I see pcie_is_cxl(dev) is different than cxl_port_dvsec(dev).They check different DVSECs.[1] CXL flexbus DVSEC presence is cached in pci_dev::is_cxl and returned by pcie_is_cxl().
This is used for indicating CXL device.
cxl_port_dvsec(dev) returns boolean based on presence of CXL port DVSEC to
indicate a CXL port device.
I don't believe they are redundant if you consider you can have a CXL device that
is not a CXL port device.
[1] - CXL 3.1, 8.1.1 Specification, PCIe Designated Vendor-Specific Extended Capability (DVSEC) ID Assignment
> Alternatively, just return true instead of "cxl_port_dvsec(dev)".
> That would probably be the simplest solution here.
>
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -443,6 +443,7 @@ struct pci_dev {
>> unsigned int is_hotplug_bridge:1;
>> unsigned int shpc_managed:1; /* SHPC owned by shpchp */
>> unsigned int is_thunderbolt:1; /* Thunderbolt controller */
>> + unsigned int is_cxl:1; /* CXL alternate protocol */
> I suspect the audience consists mostly of CXL-unaware PCI developers,
> so spelling out Compute Express Link here (and omitting "alternate
> protocol" if it doesn't fit) might be more appropriate.
>
> Thanks,
>
> Lukas
Ok.
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-14 16:45 ` Bowman, Terry
@ 2024-11-14 16:52 ` Lukas Wunner
2024-11-14 17:07 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-14 16:52 UTC (permalink / raw)
To: Bowman, Terry
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Thu, Nov 14, 2024 at 10:45:39AM -0600, Bowman, Terry wrote:
> On 11/14/2024 9:45 AM, Lukas Wunner wrote:
> > On Wed, Nov 13, 2024 at 03:54:17PM -0600, Terry Bowman wrote:
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -5038,6 +5038,20 @@ static u16 cxl_port_dvsec(struct pci_dev *dev)
> > > PCI_DVSEC_CXL_PORT);
> > > }
> > >
> > > +bool pcie_is_cxl_port(struct pci_dev *dev)
> > > +{
> > > + if (!pcie_is_cxl(dev))
> > > + return false;
> > > +
> > > + if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> > > + (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
> > > + (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
> > > + return false;
> > > +
> > > + return cxl_port_dvsec(dev);
> > > +}
> > > +EXPORT_SYMBOL_GPL(pcie_is_cxl_port);
> >
> > The "!pcie_is_cxl(dev)" check at the top of the function is identical
> > to the return value "cxl_port_dvsec(dev)". This looks redundant.
> > However one cannot call pci_pcie_type() without first checking
> > pci_is_pcie(). So I'm wondering if the "!pcie_is_cxl(dev)" check
> > is actually erroneous and supposed to be "!pci_is_pcie(dev)"?
> > That would make more sense to me.
>
> I see pcie_is_cxl(dev) is different than cxl_port_dvsec(dev).
> They check different DVSECs.
Ah, sorry, I missed that.
> CXL flexbus DVSEC presence is cached in pci_dev::is_cxl and returned by
> pcie_is_cxl(). This is used for indicating CXL device.
>
> cxl_port_dvsec(dev) returns boolean based on presence of CXL port DVSEC to
> indicate a CXL port device.
>
> I don't believe they are redundant if you consider you can have a CXL
> device that
> is not a CXL port device.
Can you have a CXL port that is not a CXL device?
If not, it would seem to me that checking for Flexbus DVSEC presence
*is* redundant. Or do you anticipate broken devices which lack the
Flexbus DVSEC and that you explicitly want to exclude?
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-14 16:52 ` Lukas Wunner
@ 2024-11-14 17:07 ` Bowman, Terry
2024-11-15 8:47 ` Lukas Wunner
0 siblings, 1 reply; 48+ messages in thread
From: Bowman, Terry @ 2024-11-14 17:07 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 11/14/2024 10:52 AM, Lukas Wunner wrote:
> On Thu, Nov 14, 2024 at 10:45:39AM -0600, Bowman, Terry wrote:
>> On 11/14/2024 9:45 AM, Lukas Wunner wrote:
>>> On Wed, Nov 13, 2024 at 03:54:17PM -0600, Terry Bowman wrote:
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -5038,6 +5038,20 @@ static u16 cxl_port_dvsec(struct pci_dev *dev)
>>>> PCI_DVSEC_CXL_PORT);
>>>> }
>>>>
>>>> +bool pcie_is_cxl_port(struct pci_dev *dev)
>>>> +{
>>>> + if (!pcie_is_cxl(dev))
>>>> + return false;
>>>> +
>>>> + if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
>>>> + (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
>>>> + (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM))
>>>> + return false;
>>>> +
>>>> + return cxl_port_dvsec(dev);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pcie_is_cxl_port);
>>> The "!pcie_is_cxl(dev)" check at the top of the function is identical
>>> to the return value "cxl_port_dvsec(dev)". This looks redundant.
>>> However one cannot call pci_pcie_type() without first checking
>>> pci_is_pcie(). So I'm wondering if the "!pcie_is_cxl(dev)" check
>>> is actually erroneous and supposed to be "!pci_is_pcie(dev)"?
>>> That would make more sense to me.
>> I see pcie_is_cxl(dev) is different than cxl_port_dvsec(dev).
>> They check different DVSECs.
> Ah, sorry, I missed that.
>
>> CXL flexbus DVSEC presence is cached in pci_dev::is_cxl and returned by
>> pcie_is_cxl(). This is used for indicating CXL device.
>>
>> cxl_port_dvsec(dev) returns boolean based on presence of CXL port DVSEC to
>> indicate a CXL port device.
>>
>> I don't believe they are redundant if you consider you can have a CXL
>> device that
>> is not a CXL port device.
> Can you have a CXL port that is not a CXL device?
>
> If not, it would seem to me that checking for Flexbus DVSEC presence
> *is* redundant. Or do you anticipate broken devices which lack the
> Flexbus DVSEC and that you explicitly want to exclude?
>
> Thanks,
>
> Lukas
No, the CXL port device is always a CXL device per spec.
This was added to short-circuit the function by returning immediately if the device
is _not_ a CXL device. Otherwise for PCIe Port devices, the CXL Port DVSEC will be searched.
I was trying to avoid the unnecessary CXL port DVSEC search unless the other criteria
are met. And I expect most cases will not be a CXL device.
I will remove the "if (!pcie_is_cxl(dev))" block as you suggested.
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-14 17:07 ` Bowman, Terry
@ 2024-11-15 8:47 ` Lukas Wunner
2024-11-15 13:54 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-15 8:47 UTC (permalink / raw)
To: Bowman, Terry
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Thu, Nov 14, 2024 at 11:07:26AM -0600, Bowman, Terry wrote:
> > Can you have a CXL port that is not a CXL device?
> >
> > If not, it would seem to me that checking for Flexbus DVSEC presence
> > *is* redundant. Or do you anticipate broken devices which lack the
> > Flexbus DVSEC and that you explicitly want to exclude?
>
> No, the CXL port device is always a CXL device per spec.
>
> This was added to short-circuit the function by returning immediately
> if the device is _not_ a CXL device. Otherwise for PCIe Port devices,
> the CXL Port DVSEC will be searched. I was trying to avoid the unnecessary
> CXL port DVSEC search unless the other criteria are met.
> And I expect most cases will not be a CXL device.
>
> I will remove the "if (!pcie_is_cxl(dev))" block as you suggested.
Ah, this is meant as a speed-up. Actually that makes sense,
so feel free to keep it.
If you do remove it, I think you'll have to move the cxl_port_dvsec()
invocation up in the function, in front of the pci_pcie_type() checks.
The latter require that one first checks that the device is PCIe.
That's done implicitly by cxl_port_dvsec() because it returns 0 in
the non-PCIe case. (Due to the "if (dev->cfg_size <= PCI_CFG_SPACE_SIZE)"
check in pci_find_next_ext_capability().)
Another idea would be to put a "if (!pcie_is_cxl(dev)) return 0;" speed-up
in cxl_port_dvsec() so that the other caller benefits from it as well.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-15 8:47 ` Lukas Wunner
@ 2024-11-15 13:54 ` Bowman, Terry
2024-11-17 17:02 ` Lukas Wunner
0 siblings, 1 reply; 48+ messages in thread
From: Bowman, Terry @ 2024-11-15 13:54 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 11/15/2024 2:47 AM, Lukas Wunner wrote:
> On Thu, Nov 14, 2024 at 11:07:26AM -0600, Bowman, Terry wrote:
>>> Can you have a CXL port that is not a CXL device?
>>>
>>> If not, it would seem to me that checking for Flexbus DVSEC presence
>>> *is* redundant. Or do you anticipate broken devices which lack the
>>> Flexbus DVSEC and that you explicitly want to exclude?
>> No, the CXL port device is always a CXL device per spec.
>>
>> This was added to short-circuit the function by returning immediately
>> if the device is _not_ a CXL device. Otherwise for PCIe Port devices,
>> the CXL Port DVSEC will be searched. I was trying to avoid the unnecessary
>> CXL port DVSEC search unless the other criteria are met.
>> And I expect most cases will not be a CXL device.
>>
>> I will remove the "if (!pcie_is_cxl(dev))" block as you suggested.
> Ah, this is meant as a speed-up. Actually that makes sense,
> so feel free to keep it.
>
> If you do remove it, I think you'll have to move the cxl_port_dvsec()
> invocation up in the function, in front of the pci_pcie_type() checks.
> The latter require that one first checks that the device is PCIe.
> That's done implicitly by cxl_port_dvsec() because it returns 0 in
> the non-PCIe case. (Due to the "if (dev->cfg_size <= PCI_CFG_SPACE_SIZE)"
> check in pci_find_next_ext_capability().)
>
> Another idea would be to put a "if (!pcie_is_cxl(dev)) return 0;" speed-up
> in cxl_port_dvsec() so that the other caller benefits from it as well.
>
> Thanks,
>
> Lukas
Ok, I'll look at adding the same pcie_is_cxl() call and check in cxl_port_devsec().
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-15 13:54 ` Bowman, Terry
@ 2024-11-17 17:02 ` Lukas Wunner
2024-11-19 12:20 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-17 17:02 UTC (permalink / raw)
To: Bowman, Terry
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Fri, Nov 15, 2024 at 07:54:37AM -0600, Bowman, Terry wrote:
> On 11/15/2024 2:47 AM, Lukas Wunner wrote:
> > On Thu, Nov 14, 2024 at 11:07:26AM -0600, Bowman, Terry wrote:
> > > I will remove the "if (!pcie_is_cxl(dev))" block as you suggested.
> >
> > Ah, this is meant as a speed-up. Actually that makes sense,
> > so feel free to keep it.
> >
> > If you do remove it, I think you'll have to move the cxl_port_dvsec()
> > invocation up in the function, in front of the pci_pcie_type() checks.
> > The latter require that one first checks that the device is PCIe.
> > That's done implicitly by cxl_port_dvsec() because it returns 0 in
> > the non-PCIe case. (Due to the "if (dev->cfg_size <= PCI_CFG_SPACE_SIZE)"
> > check in pci_find_next_ext_capability().)
> >
> > Another idea would be to put a "if (!pcie_is_cxl(dev)) return 0;" speed-up
> > in cxl_port_dvsec() so that the other caller benefits from it as well.
>
> Ok, I'll look at adding the same pcie_is_cxl() call and check in
> cxl_port_devsec().
If you put "if (!pcie_is_cxl(dev)) return 0;" in cxl_port_devsec()
and move the call to cxl_port_devsec() in pcie_is_cxl_port() up in front
of the pci_pcie_type() checks, I think you won't need an additional
"!pcie_is_cxl(dev)" check in pcie_is_cxl_port().
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port()
2024-11-17 17:02 ` Lukas Wunner
@ 2024-11-19 12:20 ` Bowman, Terry
0 siblings, 0 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-19 12:20 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 11/17/2024 11:02 AM, Lukas Wunner wrote:
> On Fri, Nov 15, 2024 at 07:54:37AM -0600, Bowman, Terry wrote:
>> On 11/15/2024 2:47 AM, Lukas Wunner wrote:
>>> On Thu, Nov 14, 2024 at 11:07:26AM -0600, Bowman, Terry wrote:
>>>> I will remove the "if (!pcie_is_cxl(dev))" block as you suggested.
>>> Ah, this is meant as a speed-up. Actually that makes sense,
>>> so feel free to keep it.
>>>
>>> If you do remove it, I think you'll have to move the cxl_port_dvsec()
>>> invocation up in the function, in front of the pci_pcie_type() checks.
>>> The latter require that one first checks that the device is PCIe.
>>> That's done implicitly by cxl_port_dvsec() because it returns 0 in
>>> the non-PCIe case. (Due to the "if (dev->cfg_size <= PCI_CFG_SPACE_SIZE)"
>>> check in pci_find_next_ext_capability().)
>>>
>>> Another idea would be to put a "if (!pcie_is_cxl(dev)) return 0;" speed-up
>>> in cxl_port_dvsec() so that the other caller benefits from it as well.
>> Ok, I'll look at adding the same pcie_is_cxl() call and check in
>> cxl_port_devsec().
> If you put "if (!pcie_is_cxl(dev)) return 0;" in cxl_port_devsec()
> and move the call to cxl_port_devsec() in pcie_is_cxl_port() up in front
> of the pci_pcie_type() checks, I think you won't need an additional
> "!pcie_is_cxl(dev)" check in pcie_is_cxl_port().
>
> Thanks,
>
> Lukas
Ok. I'll make that change.
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 04/15] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (2 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 03/15] cxl/pci: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
` (10 subsequent siblings)
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
The AER driver and aer_event tracing currently log 'PCIe Bus Type'
for all errors.
Update the driver and aer_event tracing to log 'CXL Bus Type' for CXL
device errors.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
---
drivers/pci/pcie/aer.c | 14 ++++++++------
include/ras/ras_event.h | 9 ++++++---
2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index fe6edf26279e..53e9a11f6c0f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -699,13 +699,14 @@ static void __aer_print_error(struct pci_dev *dev,
void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
{
+ const char *bus_type = pcie_is_cxl(dev) ? "CXL" : "PCIe";
int layer, agent;
int id = pci_dev_id(dev);
const char *level;
if (!info->status) {
- pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
- aer_error_severity_string[info->severity]);
+ pci_err(dev, "%s Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
+ bus_type, aer_error_severity_string[info->severity]);
goto out;
}
@@ -714,8 +715,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
- pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
- aer_error_severity_string[info->severity],
+ pci_printk(level, dev, "%s Bus Error: severity=%s, type=%s, (%s)\n",
+ bus_type, aer_error_severity_string[info->severity],
aer_error_layer[layer], aer_agent_string[agent]);
pci_printk(level, dev, " device [%04x:%04x] error status/mask=%08x/%08x\n",
@@ -730,7 +731,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
if (info->id && info->error_dev_num > 1 && info->id == id)
pci_err(dev, " Error of this Agent is reported first\n");
- trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask),
+ trace_aer_event(dev_name(&dev->dev), bus_type, (info->status & ~info->mask),
info->severity, info->tlp_header_valid, &info->tlp);
}
@@ -764,6 +765,7 @@ EXPORT_SYMBOL_GPL(cper_severity_to_aer);
void pci_print_aer(struct pci_dev *dev, int aer_severity,
struct aer_capability_regs *aer)
{
+ const char *bus_type = pcie_is_cxl(dev) ? "CXL" : "PCIe";
int layer, agent, tlp_header_valid = 0;
u32 status, mask;
struct aer_err_info info;
@@ -798,7 +800,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
if (tlp_header_valid)
__print_tlp_header(dev, &aer->header_log);
- trace_aer_event(dev_name(&dev->dev), (status & ~mask),
+ trace_aer_event(dev_name(&dev->dev), bus_type, (status & ~mask),
aer_severity, tlp_header_valid, &aer->header_log);
}
EXPORT_SYMBOL_NS_GPL(pci_print_aer, CXL);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index e5f7ee0864e7..1bf8e7050ba8 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -297,15 +297,17 @@ TRACE_EVENT(non_standard_event,
TRACE_EVENT(aer_event,
TP_PROTO(const char *dev_name,
+ const char *bus_type,
const u32 status,
const u8 severity,
const u8 tlp_header_valid,
struct pcie_tlp_log *tlp),
- TP_ARGS(dev_name, status, severity, tlp_header_valid, tlp),
+ TP_ARGS(dev_name, bus_type, status, severity, tlp_header_valid, tlp),
TP_STRUCT__entry(
__string( dev_name, dev_name )
+ __string( bus_type, bus_type )
__field( u32, status )
__field( u8, severity )
__field( u8, tlp_header_valid)
@@ -314,6 +316,7 @@ TRACE_EVENT(aer_event,
TP_fast_assign(
__assign_str(dev_name);
+ __assign_str(bus_type);
__entry->status = status;
__entry->severity = severity;
__entry->tlp_header_valid = tlp_header_valid;
@@ -325,8 +328,8 @@ TRACE_EVENT(aer_event,
}
),
- TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n",
- __get_str(dev_name),
+ TP_printk("%s %s Bus Error: severity=%s, %s, TLP Header=%s\n",
+ __get_str(dev_name), __get_str(bus_type),
__entry->severity == AER_CORRECTABLE ? "Corrected" :
__entry->severity == AER_FATAL ?
"Fatal" : "Uncorrected, non-fatal",
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (3 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 04/15] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-14 16:44 ` Lukas Wunner
2024-11-27 17:03 ` Jonathan Cameron
2024-11-13 21:54 ` [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
` (9 subsequent siblings)
14 siblings, 2 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
The AER service driver supports handling downstream port protocol errors in
restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
functionality for CXL PCIe ports operating in virtual hierarchy (VH)
mode.[1]
CXL and PCIe protocol error handling have different requirements that
necessitate a separate handling path. The AER service driver may try to
recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
suitable for CXL PCIe port devices because of potential for system memory
corruption. Instead, CXL protocol error handling must use a kernel panic
in the case of a fatal or non-fatal UCE. The AER driver's PCIe error
handling does not panic the kernel in response to a UCE.
Introduce a separate path for CXL protocol error handling in the AER
service driver. This will allow CXL protocol errors to use CXL specific
handling instead of PCIe handling. Add the CXL specific changes without
affecting or adding functionality in the PCIe handling.
Make this update alongside the existing downstream port RCH error handling
logic, extending support to CXL PCIe ports in VH mode.
is_internal_error() is currently limited by CONFIG_PCIEAER_CXL kernel
config. Update is_internal_error()'s function declaration such that it is
always available regardless if CONFIG_PCIEAER_CXL kernel config is enabled
or disabled.
The uncorrectable error (UCE) handling will be added in a future patch.
[1] CXL 3.1 Spec, 12.2.2 CXL Root Ports, Downstream Switch Ports, and
Upstream Switch Ports
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/pci/pcie/aer.c | 59 ++++++++++++++++++++++++++++--------------
1 file changed, 39 insertions(+), 20 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 53e9a11f6c0f..1d3e5b929661 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -941,8 +941,15 @@ static bool find_source_device(struct pci_dev *parent,
return true;
}
-#ifdef CONFIG_PCIEAER_CXL
+static bool is_internal_error(struct aer_err_info *info)
+{
+ if (info->severity == AER_CORRECTABLE)
+ return info->status & PCI_ERR_COR_INTERNAL;
+ return info->status & PCI_ERR_UNC_INTN;
+}
+
+#ifdef CONFIG_PCIEAER_CXL
/**
* pci_aer_unmask_internal_errors - unmask internal errors
* @dev: pointer to the pcie_dev data structure
@@ -994,14 +1001,6 @@ static bool cxl_error_is_native(struct pci_dev *dev)
return (pcie_ports_native || host->native_aer);
}
-static bool is_internal_error(struct aer_err_info *info)
-{
- if (info->severity == AER_CORRECTABLE)
- return info->status & PCI_ERR_COR_INTERNAL;
-
- return info->status & PCI_ERR_UNC_INTN;
-}
-
static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
{
struct aer_err_info *info = (struct aer_err_info *)data;
@@ -1033,14 +1032,23 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
{
- /*
- * Internal errors of an RCEC indicate an AER error in an
- * RCH's downstream port. Check and handle them in the CXL.mem
- * device driver.
- */
- if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
- is_internal_error(info))
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
+
+ if (info->severity == AER_CORRECTABLE) {
+ struct pci_driver *pdrv = dev->driver;
+ int aer = dev->aer_cap;
+
+ if (aer)
+ pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
+ info->status);
+
+ if (pdrv && pdrv->cxl_err_handler &&
+ pdrv->cxl_err_handler->cor_error_detected)
+ pdrv->cxl_err_handler->cor_error_detected(dev);
+
+ pcie_clear_device_status(dev);
+ }
}
static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
@@ -1058,9 +1066,13 @@ static bool handles_cxl_errors(struct pci_dev *dev)
{
bool handles_cxl = false;
- if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
- pcie_aer_is_native(dev))
+ if (!pcie_aer_is_native(dev))
+ return false;
+
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
pcie_walk_rcec(dev, handles_cxl_error_iter, &handles_cxl);
+ else
+ handles_cxl = pcie_is_cxl_port(dev);
return handles_cxl;
}
@@ -1078,6 +1090,10 @@ static void cxl_enable_internal_errors(struct pci_dev *dev)
static inline void cxl_enable_internal_errors(struct pci_dev *dev) { }
static inline void cxl_handle_error(struct pci_dev *dev,
struct aer_err_info *info) { }
+static bool handles_cxl_errors(struct pci_dev *dev)
+{
+ return false;
+}
#endif
/**
@@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
{
- cxl_handle_error(dev, info);
- pci_aer_handle_error(dev, info);
+ if (is_internal_error(info) && handles_cxl_errors(dev))
+ cxl_handle_error(dev, info);
+ else
+ pci_aer_handle_error(dev, info);
+
pci_dev_put(dev);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-13 21:54 ` [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
@ 2024-11-14 16:44 ` Lukas Wunner
2024-11-14 18:41 ` Bowman, Terry
2024-11-27 17:03 ` Jonathan Cameron
1 sibling, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-14 16:44 UTC (permalink / raw)
To: Terry Bowman
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Wed, Nov 13, 2024 at 03:54:19PM -0600, Terry Bowman wrote:
> @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>
> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
> {
> - cxl_handle_error(dev, info);
> - pci_aer_handle_error(dev, info);
> + if (is_internal_error(info) && handles_cxl_errors(dev))
> + cxl_handle_error(dev, info);
> + else
> + pci_aer_handle_error(dev, info);
> +
> pci_dev_put(dev);
> }
If you just do this at the top of cxl_handle_error()...
if (!is_internal_error(info))
return;
...you avoid the need to move is_internal_error() around and the
patch becomes simpler and easier to review.
> The AER service driver supports handling downstream port protocol errors in
> restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
> functionality for CXL PCIe ports operating in virtual hierarchy (VH)
> mode.[1]
This is somewhat minor but by convention, patches in the PCI subsystem
adhere to spec language and capitalization, e.g. "Downstream Port"
instead of "downstream port". Makes it easier to connect the commit
message or code comments to the spec. So maybe you want to consider
that if/when respinning.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-14 16:44 ` Lukas Wunner
@ 2024-11-14 18:41 ` Bowman, Terry
2024-11-15 8:51 ` Lukas Wunner
2024-11-15 14:49 ` Li Ming
0 siblings, 2 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-14 18:41 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
Hi Lukas,
I added comments below.
On 11/14/2024 10:44 AM, Lukas Wunner wrote:
> On Wed, Nov 13, 2024 at 03:54:19PM -0600, Terry Bowman wrote:
>> @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>>
>> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>> {
>> - cxl_handle_error(dev, info);
>> - pci_aer_handle_error(dev, info);
>> + if (is_internal_error(info) && handles_cxl_errors(dev))
>> + cxl_handle_error(dev, info);
>> + else
>> + pci_aer_handle_error(dev, info);
>> +
>> pci_dev_put(dev);
>> }
> If you just do this at the top of cxl_handle_error()...
>
> if (!is_internal_error(info))
> return;
>
> ...you avoid the need to move is_internal_error() around and the
> patch becomes simpler and easier to review.
If is_internal_error()==0, then pci_aer_handle_error() should be called to process the PCIe error. Your suggestion would require returning a value from cxl_handle_error(). And then more "if" logic would be required for the cxl_handle_error() return value. Should both is_internal_error() and handles_cxl_errors()be moved into cxl_handle_error()? Would give this:
static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
{
- cxl_handle_error(dev, info);
- pci_aer_handle_error(dev, info);
+ if (!cxl_handle_error(dev, info))
+ pci_aer_handle_error(dev, info);
+
pci_dev_put(dev);
}
>
>> The AER service driver supports handling downstream port protocol errors in
>> restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
>> functionality for CXL PCIe ports operating in virtual hierarchy (VH)
>> mode.[1]
> This is somewhat minor but by convention, patches in the PCI subsystem
> adhere to spec language and capitalization, e.g. "Downstream Port"
> instead of "downstream port". Makes it easier to connect the commit
> message or code comments to the spec. So maybe you want to consider
> that if/when respinning.
>
> Thanks,
>
> Lukas
Thanks for pointing that out. I'll update as you suggest.
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-14 18:41 ` Bowman, Terry
@ 2024-11-15 8:51 ` Lukas Wunner
2024-11-15 13:56 ` Bowman, Terry
2024-11-15 14:49 ` Li Ming
1 sibling, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-15 8:51 UTC (permalink / raw)
To: Bowman, Terry
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Thu, Nov 14, 2024 at 12:41:13PM -0600, Bowman, Terry wrote:
> On 11/14/2024 10:44 AM, Lukas Wunner wrote:
> > On Wed, Nov 13, 2024 at 03:54:19PM -0600, Terry Bowman wrote:
> > > @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> > >
> > > static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
> > > {
> > > - cxl_handle_error(dev, info);
> > > - pci_aer_handle_error(dev, info);
> > > + if (is_internal_error(info) && handles_cxl_errors(dev))
> > > + cxl_handle_error(dev, info);
> > > + else
> > > + pci_aer_handle_error(dev, info);
> > > +
> > > pci_dev_put(dev);
> > > }
> >
> > If you just do this at the top of cxl_handle_error()...
> >
> > if (!is_internal_error(info))
> > return;
> >
> > ...you avoid the need to move is_internal_error() around and the
> > patch becomes simpler and easier to review.
>
> If is_internal_error()==0, then pci_aer_handle_error() should be called
> to process the PCIe error.
You're absolutely right, I missed that, sorry for the noise.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-15 8:51 ` Lukas Wunner
@ 2024-11-15 13:56 ` Bowman, Terry
0 siblings, 0 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-15 13:56 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 11/15/2024 2:51 AM, Lukas Wunner wrote:
> On Thu, Nov 14, 2024 at 12:41:13PM -0600, Bowman, Terry wrote:
>> On 11/14/2024 10:44 AM, Lukas Wunner wrote:
>>> On Wed, Nov 13, 2024 at 03:54:19PM -0600, Terry Bowman wrote:
>>>> @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>>>>
>>>> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>>>> {
>>>> - cxl_handle_error(dev, info);
>>>> - pci_aer_handle_error(dev, info);
>>>> + if (is_internal_error(info) && handles_cxl_errors(dev))
>>>> + cxl_handle_error(dev, info);
>>>> + else
>>>> + pci_aer_handle_error(dev, info);
>>>> +
>>>> pci_dev_put(dev);
>>>> }
>>> If you just do this at the top of cxl_handle_error()...
>>>
>>> if (!is_internal_error(info))
>>> return;
>>>
>>> ...you avoid the need to move is_internal_error() around and the
>>> patch becomes simpler and easier to review.
>> If is_internal_error()==0, then pci_aer_handle_error() should be called
>> to process the PCIe error.
> You're absolutely right, I missed that, sorry for the noise.
>
> Thanks,
>
> Lukas
Thanks for taking the time to review.
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-14 18:41 ` Bowman, Terry
2024-11-15 8:51 ` Lukas Wunner
@ 2024-11-15 14:49 ` Li Ming
2024-11-15 19:46 ` Bowman, Terry
1 sibling, 1 reply; 48+ messages in thread
From: Li Ming @ 2024-11-15 14:49 UTC (permalink / raw)
To: Bowman, Terry, Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 2024/11/15 2:41, Bowman, Terry wrote:
> Hi Lukas,
>
> I added comments below.
>
> On 11/14/2024 10:44 AM, Lukas Wunner wrote:
>> On Wed, Nov 13, 2024 at 03:54:19PM -0600, Terry Bowman wrote:
>>> @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>>>
>>> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>>> {
>>> - cxl_handle_error(dev, info);
>>> - pci_aer_handle_error(dev, info);
>>> + if (is_internal_error(info) && handles_cxl_errors(dev))
>>> + cxl_handle_error(dev, info);
>>> + else
>>> + pci_aer_handle_error(dev, info);
>>> +
>>> pci_dev_put(dev);
>>> }
>> If you just do this at the top of cxl_handle_error()...
>>
>> if (!is_internal_error(info))
>> return;
>>
>> ...you avoid the need to move is_internal_error() around and the
>> patch becomes simpler and easier to review.
>
> If is_internal_error()==0, then pci_aer_handle_error() should be called to process the PCIe error. Your suggestion would require returning a value from cxl_handle_error(). And then more "if" logic would be required for the cxl_handle_error() return value. Should both is_internal_error() and handles_cxl_errors()be moved into cxl_handle_error()? Would give this:
>
> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
> {
> - cxl_handle_error(dev, info);
> - pci_aer_handle_error(dev, info);
> + if (!cxl_handle_error(dev, info))
> + pci_aer_handle_error(dev, info);
> +
> pci_dev_put(dev);
> }
>
I think is_internal_error() can be moved into handles_cxl_errors().
handles_cxl_errors() helps to check if the error is a CXL error, avoid
this detail which CXL error is an internal error in
handle_error_source(). Like this:
static void handle_error_source(struct pci_dev *dev, struct
aer_err_info *info)
{
- cxl_handle_error(dev, info);
- pci_aer_handle_error(dev, info);
+ if (handles_cxl_errors(dev, info))
+ cxl_handle_error(dev, info);
+ else
+ pci_aer_handle_error(dev, info);
+
pci_dev_put(dev);
}
Ming
>>
>>> The AER service driver supports handling downstream port protocol errors in
>>> restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
>>> functionality for CXL PCIe ports operating in virtual hierarchy (VH)
>>> mode.[1]
>> This is somewhat minor but by convention, patches in the PCI subsystem
>> adhere to spec language and capitalization, e.g. "Downstream Port"
>> instead of "downstream port". Makes it easier to connect the commit
>> message or code comments to the spec. So maybe you want to consider
>> that if/when respinning.
>>
>> Thanks,
>>
>> Lukas
> Thanks for pointing that out. I'll update as you suggest.
>
> Regards,
> Terry
>
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-15 14:49 ` Li Ming
@ 2024-11-15 19:46 ` Bowman, Terry
2024-11-17 7:38 ` Li Ming
0 siblings, 1 reply; 48+ messages in thread
From: Bowman, Terry @ 2024-11-15 19:46 UTC (permalink / raw)
To: Li Ming, Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 11/15/2024 8:49 AM, Li Ming wrote:
>
> On 2024/11/15 2:41, Bowman, Terry wrote:
>> Hi Lukas,
>>
>> I added comments below.
>>
>> On 11/14/2024 10:44 AM, Lukas Wunner wrote:
>>> On Wed, Nov 13, 2024 at 03:54:19PM -0600, Terry Bowman wrote:
>>>> @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>>>>
>>>> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>>>> {
>>>> - cxl_handle_error(dev, info);
>>>> - pci_aer_handle_error(dev, info);
>>>> + if (is_internal_error(info) && handles_cxl_errors(dev))
>>>> + cxl_handle_error(dev, info);
>>>> + else
>>>> + pci_aer_handle_error(dev, info);
>>>> +
>>>> pci_dev_put(dev);
>>>> }
>>> If you just do this at the top of cxl_handle_error()...
>>>
>>> if (!is_internal_error(info))
>>> return;
>>>
>>> ...you avoid the need to move is_internal_error() around and the
>>> patch becomes simpler and easier to review.
>> If is_internal_error()==0, then pci_aer_handle_error() should be called to process the PCIe error. Your suggestion would require returning a value from cxl_handle_error(). And then more "if" logic would be required for the cxl_handle_error() return value. Should both is_internal_error() and handles_cxl_errors()be moved into cxl_handle_error()? Would give this:
>>
>> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>> {
>> - cxl_handle_error(dev, info);
>> - pci_aer_handle_error(dev, info);
>> + if (!cxl_handle_error(dev, info))
>> + pci_aer_handle_error(dev, info);
>> +
>> pci_dev_put(dev);
>> }
>>
We could do that. And with that change it might need handles_cxl_errors() renamed
to something more correct, like handle_cxl_error()?
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-15 19:46 ` Bowman, Terry
@ 2024-11-17 7:38 ` Li Ming
0 siblings, 0 replies; 48+ messages in thread
From: Li Ming @ 2024-11-17 7:38 UTC (permalink / raw)
To: Bowman, Terry, Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 2024/11/16 3:46, Bowman, Terry wrote:
>
>
> On 11/15/2024 8:49 AM, Li Ming wrote:
>>
>> On 2024/11/15 2:41, Bowman, Terry wrote:
>>> Hi Lukas,
>>>
>>> I added comments below.
>>>
>>> On 11/14/2024 10:44 AM, Lukas Wunner wrote:
>>>> On Wed, Nov 13, 2024 at 03:54:19PM -0600, Terry Bowman wrote:
>>>>> @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>>>>>
>>>>> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>>>>> {
>>>>> - cxl_handle_error(dev, info);
>>>>> - pci_aer_handle_error(dev, info);
>>>>> + if (is_internal_error(info) && handles_cxl_errors(dev))
>>>>> + cxl_handle_error(dev, info);
>>>>> + else
>>>>> + pci_aer_handle_error(dev, info);
>>>>> +
>>>>> pci_dev_put(dev);
>>>>> }
>>>> If you just do this at the top of cxl_handle_error()...
>>>>
>>>> if (!is_internal_error(info))
>>>> return;
>>>>
>>>> ...you avoid the need to move is_internal_error() around and the
>>>> patch becomes simpler and easier to review.
>>> If is_internal_error()==0, then pci_aer_handle_error() should be called to process the PCIe error. Your suggestion would require returning a value from cxl_handle_error(). And then more "if" logic would be required for the cxl_handle_error() return value. Should both is_internal_error() and handles_cxl_errors()be moved into cxl_handle_error()? Would give this:
>>>
>>> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>>> {
>>> - cxl_handle_error(dev, info);
>>> - pci_aer_handle_error(dev, info);
>>> + if (!cxl_handle_error(dev, info))
>>> + pci_aer_handle_error(dev, info);
>>> +
>>> pci_dev_put(dev);
>>> }
>>>
>
> We could do that. And with that change it might need handles_cxl_errors() renamed
> to something more correct, like handle_cxl_error()?
Yes, the name you mentioned is better.
Ming
>
> Regards,
> Terry
>
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-13 21:54 ` [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
2024-11-14 16:44 ` Lukas Wunner
@ 2024-11-27 17:03 ` Jonathan Cameron
2024-11-27 20:29 ` Bowman, Terry
1 sibling, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2024-11-27 17:03 UTC (permalink / raw)
To: Terry Bowman
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
dave.jiang, alison.schofield, vishal.l.verma, dan.j.williams,
bhelgaas, mahesh, ira.weiny, oohall, Benjamin.Cheatham, rrichter,
nathan.fontenot, Smita.KoralahalliChannabasappa, lukas
On Wed, 13 Nov 2024 15:54:19 -0600
Terry Bowman <terry.bowman@amd.com> wrote:
> The AER service driver supports handling downstream port protocol errors in
> restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
> functionality for CXL PCIe ports operating in virtual hierarchy (VH)
> mode.[1]
>
> CXL and PCIe protocol error handling have different requirements that
> necessitate a separate handling path. The AER service driver may try to
> recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
> suitable for CXL PCIe port devices because of potential for system memory
> corruption. Instead, CXL protocol error handling must use a kernel panic
> in the case of a fatal or non-fatal UCE. The AER driver's PCIe error
> handling does not panic the kernel in response to a UCE.
>
> Introduce a separate path for CXL protocol error handling in the AER
> service driver. This will allow CXL protocol errors to use CXL specific
> handling instead of PCIe handling. Add the CXL specific changes without
> affecting or adding functionality in the PCIe handling.
>
> Make this update alongside the existing downstream port RCH error handling
> logic, extending support to CXL PCIe ports in VH mode.
>
> is_internal_error() is currently limited by CONFIG_PCIEAER_CXL kernel
> config. Update is_internal_error()'s function declaration such that it is
> always available regardless if CONFIG_PCIEAER_CXL kernel config is enabled
> or disabled.
>
> The uncorrectable error (UCE) handling will be added in a future patch.
>
> [1] CXL 3.1 Spec, 12.2.2 CXL Root Ports, Downstream Switch Ports, and
> Upstream Switch Ports
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
I took another look and so a question inline.
Jonathan
> static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> {
> struct aer_err_info *info = (struct aer_err_info *)data;
> @@ -1033,14 +1032,23 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
>
> static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> {
> - /*
> - * Internal errors of an RCEC indicate an AER error in an
> - * RCH's downstream port. Check and handle them in the CXL.mem
> - * device driver.
> - */
> - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> - is_internal_error(info))
> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
> pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
> +
> + if (info->severity == AER_CORRECTABLE) {
> + struct pci_driver *pdrv = dev->driver;
> + int aer = dev->aer_cap;
> +
> + if (aer)
How do we get here with no aer?
On a PCIe device AER is optional, but not I think on a CXL device
(I can't find the text but there is a change log entry that says
to clarify that it is required for CXL devices)
Maybe the optionality is why the PCIe code has this check.
Anyhow, I don't really mind keeping it, was just curious.
> + pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
> + info->status);
> +
> + if (pdrv && pdrv->cxl_err_handler &&
> + pdrv->cxl_err_handler->cor_error_detected)
> + pdrv->cxl_err_handler->cor_error_detected(dev);
> +
> + pcie_clear_device_status(dev);
> + }
> }
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
2024-11-27 17:03 ` Jonathan Cameron
@ 2024-11-27 20:29 ` Bowman, Terry
0 siblings, 0 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-27 20:29 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
dave.jiang, alison.schofield, vishal.l.verma, dan.j.williams,
bhelgaas, mahesh, ira.weiny, oohall, Benjamin.Cheatham, rrichter,
nathan.fontenot, Smita.KoralahalliChannabasappa, lukas
On 11/27/2024 11:03 AM, Jonathan Cameron wrote:
> On Wed, 13 Nov 2024 15:54:19 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
>
>> The AER service driver supports handling downstream port protocol errors in
>> restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
>> functionality for CXL PCIe ports operating in virtual hierarchy (VH)
>> mode.[1]
>>
>> CXL and PCIe protocol error handling have different requirements that
>> necessitate a separate handling path. The AER service driver may try to
>> recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
>> suitable for CXL PCIe port devices because of potential for system memory
>> corruption. Instead, CXL protocol error handling must use a kernel panic
>> in the case of a fatal or non-fatal UCE. The AER driver's PCIe error
>> handling does not panic the kernel in response to a UCE.
>>
>> Introduce a separate path for CXL protocol error handling in the AER
>> service driver. This will allow CXL protocol errors to use CXL specific
>> handling instead of PCIe handling. Add the CXL specific changes without
>> affecting or adding functionality in the PCIe handling.
>>
>> Make this update alongside the existing downstream port RCH error handling
>> logic, extending support to CXL PCIe ports in VH mode.
>>
>> is_internal_error() is currently limited by CONFIG_PCIEAER_CXL kernel
>> config. Update is_internal_error()'s function declaration such that it is
>> always available regardless if CONFIG_PCIEAER_CXL kernel config is enabled
>> or disabled.
>>
>> The uncorrectable error (UCE) handling will be added in a future patch.
>>
>> [1] CXL 3.1 Spec, 12.2.2 CXL Root Ports, Downstream Switch Ports, and
>> Upstream Switch Ports
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> I took another look and so a question inline.
>
> Jonathan
>
>> static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
>> {
>> struct aer_err_info *info = (struct aer_err_info *)data;
>> @@ -1033,14 +1032,23 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
>>
>> static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>> {
>> - /*
>> - * Internal errors of an RCEC indicate an AER error in an
>> - * RCH's downstream port. Check and handle them in the CXL.mem
>> - * device driver.
>> - */
>> - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
>> - is_internal_error(info))
>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
>> pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
>> +
>> + if (info->severity == AER_CORRECTABLE) {
>> + struct pci_driver *pdrv = dev->driver;
>> + int aer = dev->aer_cap;
>> +
>> + if (aer)
> How do we get here with no aer?
>
> On a PCIe device AER is optional, but not I think on a CXL device
> (I can't find the text but there is a change log entry that says
> to clarify that it is required for CXL devices)
>
> Maybe the optionality is why the PCIe code has this check.
>
> Anyhow, I don't really mind keeping it, was just curious.
Hi Jonathan,
I agree the check can be removed because AER is required for all CXL devices.[1]
[1] - CXL specification v3.1 - Section 3.1.4 'Optional PCIe Features Required for CXL'
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (4 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 05/15] PCI/AER: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-15 9:35 ` Lukas Wunner
2024-11-13 21:54 ` [PATCH v3 07/15] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
` (8 subsequent siblings)
14 siblings, 1 reply; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
The AER service driver's aer_get_device_error_info() function doesn't read
uncorrectable (UCE) fatal error status from PCIe upstream port devices,
including CXL upstream switch ports. As a result, fatal errors are not
logged or handled as needed for CXL PCIe upstream switch port devices.
Update the aer_get_device_error_info() function to read the UCE fatal
status for all CXL PCIe port devices. Make the change to not affect
non-CXL PCIe devices.
The fatal error status will be used in future patches implementing
CXL PCIe port uncorrectable error handling and logging.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/pci/pcie/aer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 1d3e5b929661..bb34e205e082 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1250,7 +1250,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
} else if (type == PCI_EXP_TYPE_ROOT_PORT ||
type == PCI_EXP_TYPE_RC_EC ||
type == PCI_EXP_TYPE_DOWNSTREAM ||
- info->severity == AER_NONFATAL) {
+ info->severity == AER_NONFATAL ||
+ (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) {
/* Link is still healthy for IO reads */
pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices
2024-11-13 21:54 ` [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
@ 2024-11-15 9:35 ` Lukas Wunner
2024-11-21 20:24 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-15 9:35 UTC (permalink / raw)
To: Terry Bowman
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, Shuai Xue, Keith Busch
On Wed, Nov 13, 2024 at 03:54:20PM -0600, Terry Bowman wrote:
> The AER service driver's aer_get_device_error_info() function doesn't read
> uncorrectable (UCE) fatal error status from PCIe upstream port devices,
> including CXL upstream switch ports. As a result, fatal errors are not
> logged or handled as needed for CXL PCIe upstream switch port devices.
>
> Update the aer_get_device_error_info() function to read the UCE fatal
> status for all CXL PCIe port devices. Make the change to not affect
> non-CXL PCIe devices.
>
> The fatal error status will be used in future patches implementing
> CXL PCIe port uncorrectable error handling and logging.
[...]
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1250,7 +1250,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
> } else if (type == PCI_EXP_TYPE_ROOT_PORT ||
> type == PCI_EXP_TYPE_RC_EC ||
> type == PCI_EXP_TYPE_DOWNSTREAM ||
> - info->severity == AER_NONFATAL) {
> + info->severity == AER_NONFATAL ||
> + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) {
>
> /* Link is still healthy for IO reads */
> pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
Just a heads-up, there's another patch pending by Shuai Xue (+cc)
which touches the same code lines. It re-enables error reporting
for PCIe Upstream Ports (as well as Endpoints) under certain
conditions:
https://lore.kernel.org/all/20241112135419.59491-3-xueshuai@linux.alibaba.com/
That was originally disabled by Keith Busch (+cc) with commit
9d938ea53b26 ("PCI/AER: Don't read upstream ports below fatal errors").
There's some merge conflict potential here if your series goes into
the cxl tree and Shuai's patch into the pci tree in the next cycle.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices
2024-11-15 9:35 ` Lukas Wunner
@ 2024-11-21 20:24 ` Bowman, Terry
2024-11-27 17:05 ` Jonathan Cameron
0 siblings, 1 reply; 48+ messages in thread
From: Bowman, Terry @ 2024-11-21 20:24 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, Shuai Xue, Keith Busch
On 11/15/2024 3:35 AM, Lukas Wunner wrote:
> On Wed, Nov 13, 2024 at 03:54:20PM -0600, Terry Bowman wrote:
>> The AER service driver's aer_get_device_error_info() function doesn't read
>> uncorrectable (UCE) fatal error status from PCIe upstream port devices,
>> including CXL upstream switch ports. As a result, fatal errors are not
>> logged or handled as needed for CXL PCIe upstream switch port devices.
>>
>> Update the aer_get_device_error_info() function to read the UCE fatal
>> status for all CXL PCIe port devices. Make the change to not affect
>> non-CXL PCIe devices.
>>
>> The fatal error status will be used in future patches implementing
>> CXL PCIe port uncorrectable error handling and logging.
> [...]
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1250,7 +1250,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>> } else if (type == PCI_EXP_TYPE_ROOT_PORT ||
>> type == PCI_EXP_TYPE_RC_EC ||
>> type == PCI_EXP_TYPE_DOWNSTREAM ||
>> - info->severity == AER_NONFATAL) {
>> + info->severity == AER_NONFATAL ||
>> + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) {
>>
>> /* Link is still healthy for IO reads */
>> pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
> Just a heads-up, there's another patch pending by Shuai Xue (+cc)
> which touches the same code lines. It re-enables error reporting
> for PCIe Upstream Ports (as well as Endpoints) under certain
> conditions:
>
> https://lore.kernel.org/all/20241112135419.59491-3-xueshuai@linux.alibaba.com/
>
> That was originally disabled by Keith Busch (+cc) with commit
> 9d938ea53b26 ("PCI/AER: Don't read upstream ports below fatal errors").
>
> There's some merge conflict potential here if your series goes into
> the cxl tree and Shuai's patch into the pci tree in the next cycle.
>
> Thanks,
>
> Lukas
Thanks Lukas I took a look at the patchset and reached out to Shuai (you're CC'd). Sorry, I thought
I responded here earlier.
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices
2024-11-21 20:24 ` Bowman, Terry
@ 2024-11-27 17:05 ` Jonathan Cameron
2024-11-27 20:53 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2024-11-27 17:05 UTC (permalink / raw)
To: Bowman, Terry
Cc: Lukas Wunner, linux-cxl, linux-kernel, linux-pci, nifan.cxl,
ming4.li, dave, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, Shuai Xue, Keith Busch
On Thu, 21 Nov 2024 14:24:17 -0600
"Bowman, Terry" <terry.bowman@amd.com> wrote:
> On 11/15/2024 3:35 AM, Lukas Wunner wrote:
> > On Wed, Nov 13, 2024 at 03:54:20PM -0600, Terry Bowman wrote:
> >> The AER service driver's aer_get_device_error_info() function doesn't read
> >> uncorrectable (UCE) fatal error status from PCIe upstream port devices,
> >> including CXL upstream switch ports. As a result, fatal errors are not
> >> logged or handled as needed for CXL PCIe upstream switch port devices.
> >>
> >> Update the aer_get_device_error_info() function to read the UCE fatal
> >> status for all CXL PCIe port devices. Make the change to not affect
> >> non-CXL PCIe devices.
> >>
> >> The fatal error status will be used in future patches implementing
> >> CXL PCIe port uncorrectable error handling and logging.
> > [...]
> >> --- a/drivers/pci/pcie/aer.c
> >> +++ b/drivers/pci/pcie/aer.c
> >> @@ -1250,7 +1250,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
> >> } else if (type == PCI_EXP_TYPE_ROOT_PORT ||
> >> type == PCI_EXP_TYPE_RC_EC ||
> >> type == PCI_EXP_TYPE_DOWNSTREAM ||
> >> - info->severity == AER_NONFATAL) {
> >> + info->severity == AER_NONFATAL ||
> >> + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) {
> >>
> >> /* Link is still healthy for IO reads */
> >> pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
> > Just a heads-up, there's another patch pending by Shuai Xue (+cc)
> > which touches the same code lines. It re-enables error reporting
> > for PCIe Upstream Ports (as well as Endpoints) under certain
> > conditions:
> >
> > https://lore.kernel.org/all/20241112135419.59491-3-xueshuai@linux.alibaba.com/
> >
> > That was originally disabled by Keith Busch (+cc) with commit
> > 9d938ea53b26 ("PCI/AER: Don't read upstream ports below fatal errors").
> >
> > There's some merge conflict potential here if your series goes into
> > the cxl tree and Shuai's patch into the pci tree in the next cycle.
> >
> > Thanks,
> >
> > Lukas
> Thanks Lukas I took a look at the patchset and reached out to Shuai (you're CC'd). Sorry, I thought
> I responded here earlier.
I'm guessing we might not need this change if we can base querying on the
link being good. If the error is on the CXL protocol side, the link should
still be fine I think?
Jonathan
>
> Regards,
> Terry
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices
2024-11-27 17:05 ` Jonathan Cameron
@ 2024-11-27 20:53 ` Bowman, Terry
0 siblings, 0 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-27 20:53 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Lukas Wunner, linux-cxl, linux-kernel, linux-pci, nifan.cxl,
ming4.li, dave, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, Shuai Xue, Keith Busch
On 11/27/2024 11:05 AM, Jonathan Cameron wrote:
> On Thu, 21 Nov 2024 14:24:17 -0600
> "Bowman, Terry" <terry.bowman@amd.com> wrote:
>
>> On 11/15/2024 3:35 AM, Lukas Wunner wrote:
>>> On Wed, Nov 13, 2024 at 03:54:20PM -0600, Terry Bowman wrote:
>>>> The AER service driver's aer_get_device_error_info() function doesn't read
>>>> uncorrectable (UCE) fatal error status from PCIe upstream port devices,
>>>> including CXL upstream switch ports. As a result, fatal errors are not
>>>> logged or handled as needed for CXL PCIe upstream switch port devices.
>>>>
>>>> Update the aer_get_device_error_info() function to read the UCE fatal
>>>> status for all CXL PCIe port devices. Make the change to not affect
>>>> non-CXL PCIe devices.
>>>>
>>>> The fatal error status will be used in future patches implementing
>>>> CXL PCIe port uncorrectable error handling and logging.
>>> [...]
>>>> --- a/drivers/pci/pcie/aer.c
>>>> +++ b/drivers/pci/pcie/aer.c
>>>> @@ -1250,7 +1250,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>>>> } else if (type == PCI_EXP_TYPE_ROOT_PORT ||
>>>> type == PCI_EXP_TYPE_RC_EC ||
>>>> type == PCI_EXP_TYPE_DOWNSTREAM ||
>>>> - info->severity == AER_NONFATAL) {
>>>> + info->severity == AER_NONFATAL ||
>>>> + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) {
>>>>
>>>> /* Link is still healthy for IO reads */
>>>> pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
>>> Just a heads-up, there's another patch pending by Shuai Xue (+cc)
>>> which touches the same code lines. It re-enables error reporting
>>> for PCIe Upstream Ports (as well as Endpoints) under certain
>>> conditions:
>>>
>>> https://lore.kernel.org/all/20241112135419.59491-3-xueshuai@linux.alibaba.com/
>>>
>>> That was originally disabled by Keith Busch (+cc) with commit
>>> 9d938ea53b26 ("PCI/AER: Don't read upstream ports below fatal errors").
>>>
>>> There's some merge conflict potential here if your series goes into
>>> the cxl tree and Shuai's patch into the pci tree in the next cycle.
>>>
>>> Thanks,
>>>
>>> Lukas
>> Thanks Lukas I took a look at the patchset and reached out to Shuai (you're CC'd). Sorry, I thought
>> I responded here earlier.
> I'm guessing we might not need this change if we can base querying on the
> link being good. If the error is on the CXL protocol side, the link should
> still be fine I think?
>
> Jonathan
Hi Jonathan,
Shuai is determining upstream link viability using a call to pciehp_check_link_active() in dpc.c. But, link viability is not determined dynamically for call to aer_get_device_error_info() in his patchset. I suppose we could add this for CXL devices and continue to isolate the new logic from PCIe devices. Your thoughts?
Link to the brief discussion with Shuai is here: https://lore.kernel.org/linux-pci/11282df5-9126-4b5b-82ae-5f1ef3b8aaf5@linux.alibaba.com/ Regards, Terry
>> Regards,
>> Terry
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 07/15] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (5 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-18 10:37 ` Lukas Wunner
2024-11-13 21:54 ` [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers Terry Bowman
` (7 subsequent siblings)
14 siblings, 1 reply; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
Existing recovery procedure for PCIe uncorrectable errors (UCE) does not
apply to CXL devices. Recovery can not be used for CXL devices because of
the potential for corruption on what can be system memory. Also, current
PCIe UCE recovery does not begin at the bridge but begins at the bridge's
first downstream device. This will miss handling CXL protocol errors in a
CXL root port. A separate CXL recovery is needed because of the different
handling requirements
Add a new function, cxl_do_recovery() using the following.
Add cxl_walk_bridge() to iterate the detected error's sub-topology.
cxl_walk_bridge() is similar to pci_walk_bridge() but the CXL flavor
will begin iteration at the bridge rather than beginning at the
bridge's first downstream child.
Add cxl_report_error_detected() as an analog to report_error_detected().
It will call pci_driver::cxl_err_handlers for each iterated downstream
child. The pci_driver::cxl_err_handlers UCE handler returns a boolean
indicating if there was a UCE error detected during handling.
cxl_do_recovery() uses the status from cxl_report_error_detected() to
determine how to proceed. Non-fatal CXL UCE errors will be treated as
fatal. If a UCE was present during handling then cxl_do_recovery()
will kernel panic.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/pci/pci.h | 3 +++
drivers/pci/pcie/aer.c | 5 +++-
drivers/pci/pcie/err.c | 54 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 14d00ce45bfa..5a67e41919d8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -658,6 +658,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_channel_state_t state,
pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev));
+/* CXL error reporting and handling */
+void cxl_do_recovery(struct pci_dev *dev);
+
bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
int pcie_retrain_link(struct pci_dev *pdev, bool use_lt);
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index bb34e205e082..87fddd514030 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1048,7 +1048,10 @@ static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
pdrv->cxl_err_handler->cor_error_detected(dev);
pcie_clear_device_status(dev);
- }
+ } else if (info->severity == AER_NONFATAL)
+ cxl_do_recovery(dev);
+ else if (info->severity == AER_FATAL)
+ cxl_do_recovery(dev);
}
static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 31090770fffc..3785f4ca5103 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -276,3 +276,57 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
return status;
}
+
+static void cxl_walk_bridge(struct pci_dev *bridge,
+ int (*cb)(struct pci_dev *, void *),
+ void *userdata)
+{
+ bool *status = userdata;
+
+ cb(bridge, status);
+ if (bridge->subordinate && !*status)
+ pci_walk_bus(bridge->subordinate, cb, status);
+}
+
+static int cxl_report_error_detected(struct pci_dev *dev, void *data)
+{
+ struct pci_driver *pdrv = dev->driver;
+ bool *status = data;
+
+ device_lock(&dev->dev);
+ if (pdrv && pdrv->cxl_err_handler &&
+ pdrv->cxl_err_handler->error_detected) {
+ const struct cxl_error_handlers *cxl_err_handler =
+ pdrv->cxl_err_handler;
+ *status |= cxl_err_handler->error_detected(dev);
+ }
+ device_unlock(&dev->dev);
+ return *status;
+}
+
+void cxl_do_recovery(struct pci_dev *dev)
+{
+ struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
+ int type = pci_pcie_type(dev);
+ struct pci_dev *bridge;
+ int status;
+
+ if (type == PCI_EXP_TYPE_ROOT_PORT ||
+ type == PCI_EXP_TYPE_DOWNSTREAM ||
+ type == PCI_EXP_TYPE_UPSTREAM ||
+ type == PCI_EXP_TYPE_ENDPOINT)
+ bridge = dev;
+ else
+ bridge = pci_upstream_bridge(dev);
+
+ cxl_walk_bridge(bridge, cxl_report_error_detected, &status);
+ if (status)
+ panic("CXL cachemem error. Invoking panic");
+
+ if (host->native_aer || pcie_ports_native) {
+ pcie_clear_device_status(dev);
+ pci_aer_clear_nonfatal_status(dev);
+ }
+
+ pci_info(bridge, "No uncorrectable error found. Continuing.\n");
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [PATCH v3 07/15] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver
2024-11-13 21:54 ` [PATCH v3 07/15] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
@ 2024-11-18 10:37 ` Lukas Wunner
2024-11-19 12:23 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-18 10:37 UTC (permalink / raw)
To: Terry Bowman
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Wed, Nov 13, 2024 at 03:54:21PM -0600, Terry Bowman wrote:
> Non-fatal CXL UCE errors will be treated as fatal.
Hm, I wonder why?
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1048,7 +1048,10 @@ static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> pdrv->cxl_err_handler->cor_error_detected(dev);
>
> pcie_clear_device_status(dev);
> - }
> + } else if (info->severity == AER_NONFATAL)
> + cxl_do_recovery(dev);
> + else if (info->severity == AER_FATAL)
> + cxl_do_recovery(dev);
> }
Nit: Maybe use curly braces and collapse both if-block into one.
> + cxl_walk_bridge(bridge, cxl_report_error_detected, &status);
> + if (status)
> + panic("CXL cachemem error. Invoking panic");
Nit: This will be prefixed by "Kernel panic - not syncing: ",
so another "Invoking panic" message seems somewhat redundant.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 07/15] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver
2024-11-18 10:37 ` Lukas Wunner
@ 2024-11-19 12:23 ` Bowman, Terry
0 siblings, 0 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-19 12:23 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 11/18/2024 4:37 AM, Lukas Wunner wrote:
> On Wed, Nov 13, 2024 at 03:54:21PM -0600, Terry Bowman wrote:
>> Non-fatal CXL UCE errors will be treated as fatal.
> Hm, I wonder why?
>
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1048,7 +1048,10 @@ static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>> pdrv->cxl_err_handler->cor_error_detected(dev);
>>
>> pcie_clear_device_status(dev);
>> - }
>> + } else if (info->severity == AER_NONFATAL)
>> + cxl_do_recovery(dev);
>> + else if (info->severity == AER_FATAL)
>> + cxl_do_recovery(dev);
>> }
> Nit: Maybe use curly braces and collapse both if-block into one.
I'll make the change.
>> + cxl_walk_bridge(bridge, cxl_report_error_detected, &status);
>> + if (status)
>> + panic("CXL cachemem error. Invoking panic");
> Nit: This will be prefixed by "Kernel panic - not syncing: ",
> so another "Invoking panic" message seems somewhat redundant.
>
> Thanks,
>
> Lukas
Ok, good idea.
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (6 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 07/15] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-15 15:28 ` Li Ming
` (2 more replies)
2024-11-13 21:54 ` [PATCH v3 09/15] cxl/pci: Map CXL PCIe upstream " Terry Bowman
` (6 subsequent siblings)
14 siblings, 3 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS
registers for the endpoint's root port. The same needs to be done for
each of the CXL downstream switch ports and CXL root ports found between
the endpoint and CXL host bridge.
Introduce cxl_init_ep_ports_aer() to be called for each port in the
sub-topology between the endpoint and the CXL host bridge. This function
will determine if there are CXL downstream switch ports or CXL root ports
associated with this port. The same check will be added in the future for
upstream switch ports.
Move the RAS register map logic from cxl_dport_map_ras() into
cxl_dport_init_ras_reporting(). This eliminates the need for the helper
function, cxl_dport_map_ras().
cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map
the RAS registers for CXL downstream switch ports and CXL root ports.
cxl_dport_init_ras_reporting() must check for previously mapped registers
before mapping. This is necessary because endpoints under a CXL switch
may share CXL downstream switch ports or CXL root ports. Ensure the port
registers are only mapped once.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 36 +++++++++++++++---------------------
drivers/cxl/cxl.h | 6 ++----
drivers/cxl/mem.c | 28 ++++++++++++++++++++++++++--
3 files changed, 43 insertions(+), 27 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 5b46bc46aaa9..1c6761278579 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -749,18 +749,6 @@ static void cxl_dport_map_rch_aer(struct cxl_dport *dport)
}
}
-static void cxl_dport_map_ras(struct cxl_dport *dport)
-{
- struct cxl_register_map *map = &dport->reg_map;
- struct device *dev = dport->dport_dev;
-
- if (!map->component_map.ras.valid)
- dev_dbg(dev, "RAS registers not found\n");
- else if (cxl_map_component_regs(map, &dport->regs.component,
- BIT(CXL_CM_CAP_CAP_ID_RAS)))
- dev_dbg(dev, "Failed to map RAS capability.\n");
-}
-
static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
{
void __iomem *aer_base = dport->regs.dport_aer;
@@ -790,20 +778,26 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
* @dport: the cxl_dport that needs to be initialized
* @host: host device for devm operations
*/
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host)
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
{
- dport->reg_map.host = host;
- cxl_dport_map_ras(dport);
-
- if (dport->rch) {
- struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport->dport_dev);
-
- if (!host_bridge->native_aer)
- return;
+ struct device *dport_dev = dport->dport_dev;
+ struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport_dev);
+ if (dport->rch && host_bridge->native_aer) {
cxl_dport_map_rch_aer(dport);
cxl_disable_rch_root_ints(dport);
}
+
+ /* dport may have more than 1 downstream EP. Check if already mapped. */
+ if (dport->regs.ras)
+ return;
+
+ dport->reg_map.host = dport_dev;
+ if (cxl_map_component_regs(&dport->reg_map, &dport->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS))) {
+ dev_err(dport_dev, "Failed to map RAS capability.\n");
+ return;
+ }
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 5406e3ab3d4a..51acca3415b4 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -763,11 +763,9 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
resource_size_t rcrb);
#ifdef CONFIG_PCIEAER_CXL
-void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport);
-void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
+void cxl_dport_init_ras_reporting(struct cxl_dport *dport);
#else
-static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
- struct device *host) { }
+static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport) { }
#endif
struct cxl_decoder *to_cxl_decoder(struct device *dev);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index a9fd5cd5a0d2..090f0b74526f 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -45,6 +45,31 @@ static int cxl_mem_dpa_show(struct seq_file *file, void *data)
return 0;
}
+static bool dev_is_cxl_pci(struct device *dev, u32 pcie_type)
+{
+ struct pci_dev *pdev;
+
+ if (!dev || !dev_is_pci(dev))
+ return false;
+
+ pdev = to_pci_dev(dev);
+
+ return (pci_pcie_type(pdev) == pcie_type);
+}
+
+static void cxl_init_ep_ports_aer(struct cxl_ep *ep)
+{
+ struct cxl_dport *dport = ep->dport;
+
+ if (dport) {
+ struct device *dport_dev = dport->dport_dev;
+
+ if (dev_is_cxl_pci(dport_dev, PCI_EXP_TYPE_DOWNSTREAM) ||
+ dev_is_cxl_pci(dport_dev, PCI_EXP_TYPE_ROOT_PORT))
+ cxl_dport_init_ras_reporting(dport);
+ }
+}
+
static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
struct cxl_dport *parent_dport)
{
@@ -62,6 +87,7 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
ep = cxl_ep_load(iter, cxlmd);
ep->next = down;
+ cxl_init_ep_ports_aer(ep);
}
/* Note: endpoint port component registers are derived from @cxlds */
@@ -166,8 +192,6 @@ static int cxl_mem_probe(struct device *dev)
else
endpoint_parent = &parent_port->dev;
- cxl_dport_init_ras_reporting(dport, dev);
-
scoped_guard(device, endpoint_parent) {
if (!endpoint_parent->driver) {
dev_err(dev, "CXL port topology %s not enabled\n",
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
2024-11-13 21:54 ` [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers Terry Bowman
@ 2024-11-15 15:28 ` Li Ming
2024-11-15 19:33 ` Bowman, Terry
2024-11-16 14:49 ` kernel test robot
2024-11-17 7:45 ` Li Ming
2 siblings, 1 reply; 48+ messages in thread
From: Li Ming @ 2024-11-15 15:28 UTC (permalink / raw)
To: Terry Bowman, linux-cxl, linux-kernel, linux-pci, nifan.cxl,
ming4.li, dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, dan.j.williams, bhelgaas, mahesh, ira.weiny,
oohall, Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, lukas
On 2024/11/14 5:54, Terry Bowman wrote:
> The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS
> registers for the endpoint's root port. The same needs to be done for
> each of the CXL downstream switch ports and CXL root ports found between
> the endpoint and CXL host bridge.
>
> Introduce cxl_init_ep_ports_aer() to be called for each port in the
> sub-topology between the endpoint and the CXL host bridge. This function
> will determine if there are CXL downstream switch ports or CXL root ports
> associated with this port. The same check will be added in the future for
> upstream switch ports.
>
> Move the RAS register map logic from cxl_dport_map_ras() into
> cxl_dport_init_ras_reporting(). This eliminates the need for the helper
> function, cxl_dport_map_ras().
>
> cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map
> the RAS registers for CXL downstream switch ports and CXL root ports.
>
> cxl_dport_init_ras_reporting() must check for previously mapped registers
> before mapping. This is necessary because endpoints under a CXL switch
> may share CXL downstream switch ports or CXL root ports. Ensure the port
> registers are only mapped once.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
[snip]
> +static bool dev_is_cxl_pci(struct device *dev, u32 pcie_type)
> +{
> + struct pci_dev *pdev;
> +
> + if (!dev || !dev_is_pci(dev))
> + return false;
> +
> + pdev = to_pci_dev(dev);
> +
> + return (pci_pcie_type(pdev) == pcie_type);
> +}
> +
> +static void cxl_init_ep_ports_aer(struct cxl_ep *ep)
> +{
> + struct cxl_dport *dport = ep->dport;
> +
> + if (dport) {
> + struct device *dport_dev = dport->dport_dev;
> +
> + if (dev_is_cxl_pci(dport_dev, PCI_EXP_TYPE_DOWNSTREAM) ||
> + dev_is_cxl_pci(dport_dev, PCI_EXP_TYPE_ROOT_PORT))
> + cxl_dport_init_ras_reporting(dport);
I think cxl_dport_init_ras_reporting() is needed for both VH case and
RCH case. My understanding is that dport_dev could not be a DSP nor a RP
in RCH case.
Ming
> + }
> +}
> +
> static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
> struct cxl_dport *parent_dport)
> {
> @@ -62,6 +87,7 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
>
> ep = cxl_ep_load(iter, cxlmd);
> ep->next = down;
> + cxl_init_ep_ports_aer(ep);
> }
>
> /* Note: endpoint port component registers are derived from @cxlds */
> @@ -166,8 +192,6 @@ static int cxl_mem_probe(struct device *dev)
> else
> endpoint_parent = &parent_port->dev;
>
> - cxl_dport_init_ras_reporting(dport, dev);
> -
> scoped_guard(device, endpoint_parent) {
> if (!endpoint_parent->driver) {
> dev_err(dev, "CXL port topology %s not enabled\n",
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
2024-11-15 15:28 ` Li Ming
@ 2024-11-15 19:33 ` Bowman, Terry
0 siblings, 0 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-15 19:33 UTC (permalink / raw)
To: Li Ming, linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li,
dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, dan.j.williams, bhelgaas, mahesh, ira.weiny,
oohall, Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, lukas
On 11/15/2024 9:28 AM, Li Ming wrote:
>
> On 2024/11/14 5:54, Terry Bowman wrote:
>> The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS
>> registers for the endpoint's root port. The same needs to be done for
>> each of the CXL downstream switch ports and CXL root ports found between
>> the endpoint and CXL host bridge.
>>
>> Introduce cxl_init_ep_ports_aer() to be called for each port in the
>> sub-topology between the endpoint and the CXL host bridge. This function
>> will determine if there are CXL downstream switch ports or CXL root ports
>> associated with this port. The same check will be added in the future for
>> upstream switch ports.
>>
>> Move the RAS register map logic from cxl_dport_map_ras() into
>> cxl_dport_init_ras_reporting(). This eliminates the need for the helper
>> function, cxl_dport_map_ras().
>>
>> cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map
>> the RAS registers for CXL downstream switch ports and CXL root ports.
>>
>> cxl_dport_init_ras_reporting() must check for previously mapped registers
>> before mapping. This is necessary because endpoints under a CXL switch
>> may share CXL downstream switch ports or CXL root ports. Ensure the port
>> registers are only mapped once.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> [snip]
>
>> +static bool dev_is_cxl_pci(struct device *dev, u32 pcie_type)
>> +{
>> + struct pci_dev *pdev;
>> +
>> + if (!dev || !dev_is_pci(dev))
>> + return false;
>> +
>> + pdev = to_pci_dev(dev);
>> +
>> + return (pci_pcie_type(pdev) == pcie_type);
>> +}
>> +
>> +static void cxl_init_ep_ports_aer(struct cxl_ep *ep)
>> +{
>> + struct cxl_dport *dport = ep->dport;
>> +
>> + if (dport) {
>> + struct device *dport_dev = dport->dport_dev;
>> +
>> + if (dev_is_cxl_pci(dport_dev, PCI_EXP_TYPE_DOWNSTREAM) ||
>> + dev_is_cxl_pci(dport_dev, PCI_EXP_TYPE_ROOT_PORT))
>> + cxl_dport_init_ras_reporting(dport);
> I think cxl_dport_init_ras_reporting() is needed for both VH case and
> RCH case. My understanding is that dport_dev could not be a DSP nor a RP
> in RCH case.
>
> Ming
Good point. I'll look into that. Thanks Ming!
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
2024-11-13 21:54 ` [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers Terry Bowman
2024-11-15 15:28 ` Li Ming
@ 2024-11-16 14:49 ` kernel test robot
2024-11-17 7:45 ` Li Ming
2 siblings, 0 replies; 48+ messages in thread
From: kernel test robot @ 2024-11-16 14:49 UTC (permalink / raw)
To: Terry Bowman, linux-cxl, linux-kernel, linux-pci, nifan.cxl,
ming4.li, dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, dan.j.williams, bhelgaas, mahesh, ira.weiny,
oohall, Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, lukas
Cc: oe-kbuild-all
Hi Terry,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 2d5404caa8c7bb5c4e0435f94b28834ae5456623]
url: https://github.com/intel-lab-lkp/linux/commits/Terry-Bowman/PCI-AER-Introduce-struct-cxl_err_handlers-and-add-to-struct-pci_driver/20241114-060000
base: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
patch link: https://lore.kernel.org/r/20241113215429.3177981-9-terry.bowman%40amd.com
patch subject: [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
config: i386-randconfig-141-20241116 (https://download.01.org/0day-ci/archive/20241116/202411161334.rczGLGKY-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241116/202411161334.rczGLGKY-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411161334.rczGLGKY-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/cxl/core/pci.c:782: warning: Excess function parameter 'host' description in 'cxl_dport_init_ras_reporting'
vim +782 drivers/cxl/core/pci.c
d1a9def33d7043 Terry Bowman 2023-10-18 775
577a67662ff529 Li Ming 2024-08-30 776 /**
577a67662ff529 Li Ming 2024-08-30 777 * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
577a67662ff529 Li Ming 2024-08-30 778 * @dport: the cxl_dport that needs to be initialized
577a67662ff529 Li Ming 2024-08-30 779 * @host: host device for devm operations
577a67662ff529 Li Ming 2024-08-30 780 */
23f51024741fc0 Terry Bowman 2024-11-13 781 void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
f05fd10d138d8b Robert Richter 2023-10-27 @782 {
23f51024741fc0 Terry Bowman 2024-11-13 783 struct device *dport_dev = dport->dport_dev;
23f51024741fc0 Terry Bowman 2024-11-13 784 struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport_dev);
f05fd10d138d8b Robert Richter 2023-10-27 785
23f51024741fc0 Terry Bowman 2024-11-13 786 if (dport->rch && host_bridge->native_aer) {
23f51024741fc0 Terry Bowman 2024-11-13 787 cxl_dport_map_rch_aer(dport);
23f51024741fc0 Terry Bowman 2024-11-13 788 cxl_disable_rch_root_ints(dport);
23f51024741fc0 Terry Bowman 2024-11-13 789 }
6c5f3aacb2963d Terry Bowman 2023-10-18 790
23f51024741fc0 Terry Bowman 2024-11-13 791 /* dport may have more than 1 downstream EP. Check if already mapped. */
23f51024741fc0 Terry Bowman 2024-11-13 792 if (dport->regs.ras)
c8706cc15a5814 Li Ming 2024-08-30 793 return;
d1a9def33d7043 Terry Bowman 2023-10-18 794
23f51024741fc0 Terry Bowman 2024-11-13 795 dport->reg_map.host = dport_dev;
23f51024741fc0 Terry Bowman 2024-11-13 796 if (cxl_map_component_regs(&dport->reg_map, &dport->regs.component,
23f51024741fc0 Terry Bowman 2024-11-13 797 BIT(CXL_CM_CAP_CAP_ID_RAS))) {
23f51024741fc0 Terry Bowman 2024-11-13 798 dev_err(dport_dev, "Failed to map RAS capability.\n");
23f51024741fc0 Terry Bowman 2024-11-13 799 return;
f05fd10d138d8b Robert Richter 2023-10-27 800 }
c8706cc15a5814 Li Ming 2024-08-30 801 }
577a67662ff529 Li Ming 2024-08-30 802 EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
f05fd10d138d8b Robert Richter 2023-10-27 803
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
2024-11-13 21:54 ` [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers Terry Bowman
2024-11-15 15:28 ` Li Ming
2024-11-16 14:49 ` kernel test robot
@ 2024-11-17 7:45 ` Li Ming
2024-11-18 2:21 ` Li Ming
2 siblings, 1 reply; 48+ messages in thread
From: Li Ming @ 2024-11-17 7:45 UTC (permalink / raw)
To: Terry Bowman, linux-cxl, linux-kernel, linux-pci, nifan.cxl,
ming4.li, dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, dan.j.williams, bhelgaas, mahesh, ira.weiny,
oohall, Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, lukas
On 2024/11/14 5:54, Terry Bowman wrote:
> The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS
> registers for the endpoint's root port. The same needs to be done for
> each of the CXL downstream switch ports and CXL root ports found between
> the endpoint and CXL host bridge.
>
> Introduce cxl_init_ep_ports_aer() to be called for each port in the
> sub-topology between the endpoint and the CXL host bridge. This function
> will determine if there are CXL downstream switch ports or CXL root ports
> associated with this port. The same check will be added in the future for
> upstream switch ports.
>
> Move the RAS register map logic from cxl_dport_map_ras() into
> cxl_dport_init_ras_reporting(). This eliminates the need for the helper
> function, cxl_dport_map_ras().
>
> cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map
> the RAS registers for CXL downstream switch ports and CXL root ports.
>
> cxl_dport_init_ras_reporting() must check for previously mapped registers
> before mapping. This is necessary because endpoints under a CXL switch
> may share CXL downstream switch ports or CXL root ports. Ensure the port
> registers are only mapped once.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
[snip]
> static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
> struct cxl_dport *parent_dport)
> {
> @@ -62,6 +87,7 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
>
> ep = cxl_ep_load(iter, cxlmd);
> ep->next = down;
> + cxl_init_ep_ports_aer(ep);
In RCH case, seems like another issue is here, I believe that a RCD will
be added to a CXL root directly rather than a CXL host bridge, it means
that no chance to call cxl_init_ep_ports_aer() for a RCD, because this
loop is only for a EP attaching to a CXL non-root port.
Please correct me if I'm wrong.
Ming
> }
>
> /* Note: endpoint port component registers are derived from @cxlds */
> @@ -166,8 +192,6 @@ static int cxl_mem_probe(struct device *dev)
> else
> endpoint_parent = &parent_port->dev;
>
> - cxl_dport_init_ras_reporting(dport, dev);
> -
> scoped_guard(device, endpoint_parent) {
> if (!endpoint_parent->driver) {
> dev_err(dev, "CXL port topology %s not enabled\n",
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
2024-11-17 7:45 ` Li Ming
@ 2024-11-18 2:21 ` Li Ming
2024-11-19 12:28 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Li Ming @ 2024-11-18 2:21 UTC (permalink / raw)
To: Terry Bowman, linux-cxl, linux-kernel, linux-pci, nifan.cxl,
ming4.li, dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, dan.j.williams, bhelgaas, mahesh, ira.weiny,
oohall, Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, lukas
On 2024/11/17 15:45, Li Ming wrote:
>
>
> On 2024/11/14 5:54, Terry Bowman wrote:
>> The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS
>> registers for the endpoint's root port. The same needs to be done for
>> each of the CXL downstream switch ports and CXL root ports found between
>> the endpoint and CXL host bridge.
>>
>> Introduce cxl_init_ep_ports_aer() to be called for each port in the
>> sub-topology between the endpoint and the CXL host bridge. This function
>> will determine if there are CXL downstream switch ports or CXL root ports
>> associated with this port. The same check will be added in the future for
>> upstream switch ports.
>>
>> Move the RAS register map logic from cxl_dport_map_ras() into
>> cxl_dport_init_ras_reporting(). This eliminates the need for the helper
>> function, cxl_dport_map_ras().
>>
>> cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map
>> the RAS registers for CXL downstream switch ports and CXL root ports.
>>
>> cxl_dport_init_ras_reporting() must check for previously mapped registers
>> before mapping. This is necessary because endpoints under a CXL switch
>> may share CXL downstream switch ports or CXL root ports. Ensure the port
>> registers are only mapped once.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> [snip]
>
>> static int devm_cxl_add_endpoint(struct device *host, struct
>> cxl_memdev *cxlmd,
>> struct cxl_dport *parent_dport)
>> {
>> @@ -62,6 +87,7 @@ static int devm_cxl_add_endpoint(struct device
>> *host, struct cxl_memdev *cxlmd,
>> ep = cxl_ep_load(iter, cxlmd);
>> ep->next = down;
>> + cxl_init_ep_ports_aer(ep);
>
> In RCH case, seems like another issue is here, I believe that a RCD will
> be added to a CXL root directly rather than a CXL host bridge, it means
> that no chance to call cxl_init_ep_ports_aer() for a RCD, because this
> loop is only for a EP attaching to a CXL non-root port.
>
> Please correct me if I'm wrong.
>
I think above explaination is not clear, what I meant is the hierachy
in RCH case should be this:
cxl_port(root) <--> cxl_dport(host bridge) <--> cxl_port(RCD endpoint)
RCD endpoint's parent port is a cxl root port, so that the
cxl_init_ep_ports_aer() cannot be called in that case.
Ming
> Ming
>
>> }
>> /* Note: endpoint port component registers are derived from
>> @cxlds */
>> @@ -166,8 +192,6 @@ static int cxl_mem_probe(struct device *dev)
>> else
>> endpoint_parent = &parent_port->dev;
>> - cxl_dport_init_ras_reporting(dport, dev);
>> -
>> scoped_guard(device, endpoint_parent) {
>> if (!endpoint_parent->driver) {
>> dev_err(dev, "CXL port topology %s not enabled\n",
>
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers
2024-11-18 2:21 ` Li Ming
@ 2024-11-19 12:28 ` Bowman, Terry
0 siblings, 0 replies; 48+ messages in thread
From: Bowman, Terry @ 2024-11-19 12:28 UTC (permalink / raw)
To: Li Ming, linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li,
dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, dan.j.williams, bhelgaas, mahesh, ira.weiny,
oohall, Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa, lukas
On 11/17/2024 8:21 PM, Li Ming wrote:
>
> On 2024/11/17 15:45, Li Ming wrote:
>>
>> On 2024/11/14 5:54, Terry Bowman wrote:
>>> The CXL mem driver (cxl_mem) currently maps and caches a pointer to RAS
>>> registers for the endpoint's root port. The same needs to be done for
>>> each of the CXL downstream switch ports and CXL root ports found between
>>> the endpoint and CXL host bridge.
>>>
>>> Introduce cxl_init_ep_ports_aer() to be called for each port in the
>>> sub-topology between the endpoint and the CXL host bridge. This function
>>> will determine if there are CXL downstream switch ports or CXL root ports
>>> associated with this port. The same check will be added in the future for
>>> upstream switch ports.
>>>
>>> Move the RAS register map logic from cxl_dport_map_ras() into
>>> cxl_dport_init_ras_reporting(). This eliminates the need for the helper
>>> function, cxl_dport_map_ras().
>>>
>>> cxl_init_ep_ports_aer() calls cxl_dport_init_ras_reporting() to map
>>> the RAS registers for CXL downstream switch ports and CXL root ports.
>>>
>>> cxl_dport_init_ras_reporting() must check for previously mapped registers
>>> before mapping. This is necessary because endpoints under a CXL switch
>>> may share CXL downstream switch ports or CXL root ports. Ensure the port
>>> registers are only mapped once.
>>>
>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> [snip]
>>
>>> static int devm_cxl_add_endpoint(struct device *host, struct
>>> cxl_memdev *cxlmd,
>>> struct cxl_dport *parent_dport)
>>> {
>>> @@ -62,6 +87,7 @@ static int devm_cxl_add_endpoint(struct device
>>> *host, struct cxl_memdev *cxlmd,
>>> ep = cxl_ep_load(iter, cxlmd);
>>> ep->next = down;
>>> + cxl_init_ep_ports_aer(ep);
>> In RCH case, seems like another issue is here, I believe that a RCD will
>> be added to a CXL root directly rather than a CXL host bridge, it means
>> that no chance to call cxl_init_ep_ports_aer() for a RCD, because this
>> loop is only for a EP attaching to a CXL non-root port.
>>
>> Please correct me if I'm wrong.
>>
> I think above explaination is not clear, what I meant is the hierachy
> in RCH case should be this:
>
> cxl_port(root) <--> cxl_dport(host bridge) <--> cxl_port(RCD endpoint)
>
> RCD endpoint's parent port is a cxl root port, so that the
> cxl_init_ep_ports_aer() cannot be called in that case.
>
> Ming
You make a good point. I will leave the original cxl_dport_init_ras_reporting()
but renamed. And will add a check for if RCH mode before calling it.
Regards,
Terry
>> Ming
>>
>>> }
>>> /* Note: endpoint port component registers are derived from
>>> @cxlds */
>>> @@ -166,8 +192,6 @@ static int cxl_mem_probe(struct device *dev)
>>> else
>>> endpoint_parent = &parent_port->dev;
>>> - cxl_dport_init_ras_reporting(dport, dev);
>>> -
>>> scoped_guard(device, endpoint_parent) {
>>> if (!endpoint_parent->driver) {
>>> dev_err(dev, "CXL port topology %s not enabled\n",
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 09/15] cxl/pci: Map CXL PCIe upstream switch port RAS registers
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (7 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 08/15] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 10/15] cxl/pci: Update RAS handler interfaces to also support CXL PCIe ports Terry Bowman
` (5 subsequent siblings)
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
Add logic to map CXL PCIe upstream switch port (USP) RAS registers.
Introduce 'struct cxl_regs' member into 'struct cxl_port' to cache a
pointer to the upstream port's mapped RAS registers.
Also, introduce cxl_uport_init_ras_reporting() to perform the USP RAS
register mapping. This is similar to the existing
cxl_dport_init_ras_reporting() but for USP devices.
The upstream port may have multiple downstream endpoints. Before
mapping AER registers check if the registers are already mapped.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 15 +++++++++++++++
drivers/cxl/cxl.h | 4 ++++
drivers/cxl/mem.c | 8 ++++++++
3 files changed, 27 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 1c6761278579..a9f891e9de8a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -773,6 +773,21 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
}
+void cxl_uport_init_ras_reporting(struct cxl_port *port)
+{
+ /* uport may have more than 1 downstream EP. Check if already mapped. */
+ if (port->uport_regs.ras)
+ return;
+
+ port->reg_map.host = &port->dev;
+ if (cxl_map_component_regs(&port->reg_map, &port->uport_regs,
+ BIT(CXL_CM_CAP_CAP_ID_RAS))) {
+ dev_err(&port->dev, "Failed to map RAS capability.\n");
+ return;
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL);
+
/**
* cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
* @dport: the cxl_dport that needs to be initialized
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 51acca3415b4..0cf8d2cfcd8b 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -592,6 +592,7 @@ struct cxl_dax_region {
* @parent_dport: dport that points to this port in the parent
* @decoder_ida: allocator for decoder ids
* @reg_map: component and ras register mapping parameters
+ * @uport_regs: mapped component registers
* @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering
* @commit_end: cursor to track highest committed decoder for commit ordering
@@ -612,6 +613,7 @@ struct cxl_port {
struct cxl_dport *parent_dport;
struct ida decoder_ida;
struct cxl_register_map reg_map;
+ struct cxl_component_regs uport_regs;
int nr_dports;
int hdm_end;
int commit_end;
@@ -764,8 +766,10 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
#ifdef CONFIG_PCIEAER_CXL
void cxl_dport_init_ras_reporting(struct cxl_dport *dport);
+void cxl_uport_init_ras_reporting(struct cxl_port *port);
#else
static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport) { }
+static inline void cxl_uport_init_ras_reporting(struct cxl_port *port) { }
#endif
struct cxl_decoder *to_cxl_decoder(struct device *dev);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 090f0b74526f..160d0b3c3c28 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -60,6 +60,7 @@ static bool dev_is_cxl_pci(struct device *dev, u32 pcie_type)
static void cxl_init_ep_ports_aer(struct cxl_ep *ep)
{
struct cxl_dport *dport = ep->dport;
+ struct cxl_port *port = ep->next;
if (dport) {
struct device *dport_dev = dport->dport_dev;
@@ -68,6 +69,13 @@ static void cxl_init_ep_ports_aer(struct cxl_ep *ep)
dev_is_cxl_pci(dport_dev, PCI_EXP_TYPE_ROOT_PORT))
cxl_dport_init_ras_reporting(dport);
}
+
+ if (port) {
+ struct device *uport_dev = port->uport_dev;
+
+ if (dev_is_cxl_pci(uport_dev, PCI_EXP_TYPE_UPSTREAM))
+ cxl_uport_init_ras_reporting(port);
+ }
}
static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 10/15] cxl/pci: Update RAS handler interfaces to also support CXL PCIe ports
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (8 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 09/15] cxl/pci: Map CXL PCIe upstream " Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 11/15] cxl/pci: Change find_cxl_port() to non-static Terry Bowman
` (4 subsequent siblings)
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
CXL PCIe port protocol error handling support will be added to the
CXL drivers in the future. In preparation, rename the existing
interfaces to support handling all CXL PCIe port protocol errors.
The driver's RAS support functions currently rely on a 'struct
cxl_dev_state' type parameter, which is not available for CXL port
devices. However, since the same CXL RAS capability structure is
needed across most CXL components and devices, a common handling
approach should be adopted.
To accommodate this, update the __cxl_handle_cor_ras() and
__cxl_handle_ras() functions to use a `struct device` instead of
`struct cxl_dev_state`.
No functional changes are introduced.
[1] CXL 3.1 Spec, 8.2.4 CXL.cache and CXL.mem Registers
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index a9f891e9de8a..62d29b5fd8f3 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -650,7 +650,7 @@ void read_cdat_data(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
-static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
+static void __cxl_handle_cor_ras(struct device *dev,
void __iomem *ras_base)
{
void __iomem *addr;
@@ -663,13 +663,13 @@ static void __cxl_handle_cor_ras(struct cxl_dev_state *cxlds,
status = readl(addr);
if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
- trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
+ trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
}
}
static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
{
- return __cxl_handle_cor_ras(cxlds, cxlds->regs.ras);
+ return __cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
}
/* CXL spec rev3.0 8.2.4.16.1 */
@@ -693,8 +693,7 @@ static void header_log_copy(void __iomem *ras_base, u32 *log)
* Log the state of the RAS status registers and prepare them to log the
* next error status. Return 1 if reset needed.
*/
-static bool __cxl_handle_ras(struct cxl_dev_state *cxlds,
- void __iomem *ras_base)
+static bool __cxl_handle_ras(struct device *dev, void __iomem *ras_base)
{
u32 hl[CXL_HEADERLOG_SIZE_U32];
void __iomem *addr;
@@ -721,7 +720,7 @@ static bool __cxl_handle_ras(struct cxl_dev_state *cxlds,
}
header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, hl);
+ trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return true;
@@ -729,7 +728,7 @@ static bool __cxl_handle_ras(struct cxl_dev_state *cxlds,
static bool cxl_handle_endpoint_ras(struct cxl_dev_state *cxlds)
{
- return __cxl_handle_ras(cxlds, cxlds->regs.ras);
+ return __cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->regs.ras);
}
#ifdef CONFIG_PCIEAER_CXL
@@ -819,13 +818,13 @@ EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
static void cxl_handle_rdport_cor_ras(struct cxl_dev_state *cxlds,
struct cxl_dport *dport)
{
- return __cxl_handle_cor_ras(cxlds, dport->regs.ras);
+ return __cxl_handle_cor_ras(&cxlds->cxlmd->dev, dport->regs.ras);
}
static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds,
struct cxl_dport *dport)
{
- return __cxl_handle_ras(cxlds, dport->regs.ras);
+ return __cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras);
}
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 11/15] cxl/pci: Change find_cxl_port() to non-static
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (9 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 10/15] cxl/pci: Update RAS handler interfaces to also support CXL PCIe ports Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 12/15] cxl/pci: Add error handler for CXL PCIe port RAS errors Terry Bowman
` (3 subsequent siblings)
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
CXL PCIe port protocol error support will be added in the future. This
requires searching for a CXL PCIe port device in the CXL topology as
provided by find_cxl_port(). But, find_cxl_port() is defined static
and as a result is not callable outside of this source file.
Update the find_cxl_port() declaration to be non-static.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/core.h | 3 +++
drivers/cxl/core/port.c | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 0c62b4069ba0..d81e5ee25f58 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -110,4 +110,7 @@ bool cxl_need_node_perf_attrs_update(int nid);
int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
struct access_coordinate *c);
+struct cxl_port *find_cxl_port(struct device *dport_dev,
+ struct cxl_dport **dport);
+
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index af92c67bc954..1c4daf9fd2f3 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1342,8 +1342,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
return NULL;
}
-static struct cxl_port *find_cxl_port(struct device *dport_dev,
- struct cxl_dport **dport)
+struct cxl_port *find_cxl_port(struct device *dport_dev,
+ struct cxl_dport **dport)
{
struct cxl_find_port_ctx ctx = {
.dport_dev = dport_dev,
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 12/15] cxl/pci: Add error handler for CXL PCIe port RAS errors
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (10 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 11/15] cxl/pci: Change find_cxl_port() to non-static Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 13/15] cxl/pci: Add trace logging " Terry Bowman
` (2 subsequent siblings)
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
Introduce correctable and uncorrectable CXL PCIe port protocol error
handlers.
The handlers will be called with a 'struct pci_dev' parameter
indicating the CXL port device requiring handling. The CXL PCIe port
device's underlying 'struct device' will match the port device in the
CXL topology.
Use the PCIe port's device object to find the matching upstream,
downstream, or root port in the CXL topology. The matching device will
contain a reference to the RAS register block used to handle and log
the error.
Invoke the existing __cxl_handle_ras() or __cxl_handle_cor_ras() passing
a reference to the RAS registers as a parameter. These functions will use
the register reference to clear the device's RAS status.
Future patches will assign the error handlers and add trace logging.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 64 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 62d29b5fd8f3..2c5cfc506f74 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -772,6 +772,70 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND);
}
+static int match_uport(struct device *dev, const void *data)
+{
+ struct device *uport_dev = (struct device *)data;
+ struct cxl_port *port;
+
+ if (!is_cxl_port(dev))
+ return 0;
+
+ port = to_cxl_port(dev);
+
+ return port->uport_dev == uport_dev;
+}
+
+static void __iomem *cxl_pci_port_ras(struct pci_dev *pdev)
+{
+ void __iomem *ras_base;
+ struct cxl_port *port;
+
+ if (!pdev)
+ return NULL;
+
+ if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) ||
+ (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) {
+ struct cxl_dport *dport;
+
+ port = find_cxl_port(&pdev->dev, &dport);
+ ras_base = dport ? dport->regs.ras : NULL;
+ if (port)
+ put_device(&port->dev);
+ return ras_base;
+ } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) {
+ struct device *port_dev;
+
+ port_dev = bus_find_device(&cxl_bus_type, NULL, &pdev->dev,
+ match_uport);
+ if (!port_dev)
+ return NULL;
+
+ port = to_cxl_port(port_dev);
+ ras_base = port ? port->uport_regs.ras : NULL;
+ put_device(port_dev);
+ return ras_base;
+ }
+
+ return NULL;
+}
+
+static void cxl_port_cor_error_detected(struct pci_dev *pdev)
+{
+ void __iomem *ras_base = cxl_pci_port_ras(pdev);
+
+ __cxl_handle_cor_ras(&pdev->dev, ras_base);
+}
+
+static bool cxl_port_error_detected(struct pci_dev *pdev)
+{
+ void __iomem *ras_base = cxl_pci_port_ras(pdev);
+ bool ue;
+
+ ue = __cxl_handle_ras(&pdev->dev, ras_base);
+
+ return ue;
+}
+
void cxl_uport_init_ras_reporting(struct cxl_port *port)
{
/* uport may have more than 1 downstream EP. Check if already mapped. */
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 13/15] cxl/pci: Add trace logging for CXL PCIe port RAS errors
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (11 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 12/15] cxl/pci: Add error handler for CXL PCIe port RAS errors Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 14/15] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers Terry Bowman
2024-11-13 21:54 ` [PATCH v3 15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports Terry Bowman
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
The CXL drivers use kernel trace functions for logging endpoint and
RCH downstream port RAS errors. Similar functionality is
required for CXL root ports, CXL downstream switch ports, and CXL
upstream switch ports.
Introduce trace logging functions for both RAS correctable and
uncorrectable errors specific to CXL PCIe ports. Additionally, update
the PCIe port error handlers to invoke these new trace functions.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 16 ++++++++++----
drivers/cxl/core/trace.h | 47 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 59 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 2c5cfc506f74..794a601fdbf9 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -661,10 +661,14 @@ static void __cxl_handle_cor_ras(struct device *dev,
addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
status = readl(addr);
- if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
- writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+ if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK))
+ return;
+ writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
+
+ if (is_cxl_memdev(dev))
trace_cxl_aer_correctable_error(to_cxl_memdev(dev), status);
- }
+ else
+ trace_cxl_port_aer_correctable_error(dev, status);
}
static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds)
@@ -720,7 +724,11 @@ static bool __cxl_handle_ras(struct device *dev, void __iomem *ras_base)
}
header_log_copy(ras_base, hl);
- trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
+ if (is_cxl_memdev(dev))
+ trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl);
+ else
+ trace_cxl_port_aer_uncorrectable_error(dev, status, fe, hl);
+
writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr);
return true;
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 8389a94adb1a..681e415ac8f5 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -48,6 +48,34 @@
{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" } \
)
+TRACE_EVENT(cxl_port_aer_uncorrectable_error,
+ TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),
+ TP_ARGS(dev, status, fe, hl),
+ TP_STRUCT__entry(
+ __string(devname, dev_name(dev))
+ __string(host, dev_name(dev->parent))
+ __field(u32, status)
+ __field(u32, first_error)
+ __array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
+ ),
+ TP_fast_assign(
+ __assign_str(devname);
+ __assign_str(host);
+ __entry->status = status;
+ __entry->first_error = fe;
+ /*
+ * Embed the 512B headerlog data for user app retrieval and
+ * parsing, but no need to print this in the trace buffer.
+ */
+ memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);
+ ),
+ TP_printk("device=%s host=%s status: '%s' first_error: '%s'",
+ __get_str(devname), __get_str(host),
+ show_uc_errs(__entry->status),
+ show_uc_errs(__entry->first_error)
+ )
+);
+
TRACE_EVENT(cxl_aer_uncorrectable_error,
TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
TP_ARGS(cxlmd, status, fe, hl),
@@ -96,6 +124,25 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" } \
)
+TRACE_EVENT(cxl_port_aer_correctable_error,
+ TP_PROTO(struct device *dev, u32 status),
+ TP_ARGS(dev, status),
+ TP_STRUCT__entry(
+ __string(devname, dev_name(dev))
+ __string(host, dev_name(dev->parent))
+ __field(u32, status)
+ ),
+ TP_fast_assign(
+ __assign_str(devname);
+ __assign_str(host);
+ __entry->status = status;
+ ),
+ TP_printk("device=%s host=%s status='%s'",
+ __get_str(devname), __get_str(host),
+ show_ce_errs(__entry->status)
+ )
+);
+
TRACE_EVENT(cxl_aer_correctable_error,
TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
TP_ARGS(cxlmd, status),
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 14/15] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (12 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 13/15] cxl/pci: Add trace logging " Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-13 21:54 ` [PATCH v3 15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports Terry Bowman
14 siblings, 0 replies; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
pci_driver::cxl_err_handlers are not currently assigned handler callbacks.
The handlers can't be set in the pci_driver static definition because the
CXL PCIe port devices are bound to the portdrv driver which is not CXL
driver aware.
Add cxl_assign_port_error_handlers() in the cxl_core module. This
function will assign the default handlers for a CXL PCIe port device.
When the CXL port (cxl_port or cxl_dport) is destroyed the CXL PCIe port
device's pci_driver::cxl_err_handlers must be set to NULL to prevent future
use. Create cxl_clear_port_error_handlers() and register it to be called
when the CXL port device (cxl_port or cxl_dport) is destroyed.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 794a601fdbf9..af2ff6936a09 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -844,8 +844,36 @@ static bool cxl_port_error_detected(struct pci_dev *pdev)
return ue;
}
+static const struct cxl_error_handlers cxl_port_error_handlers = {
+ .error_detected = cxl_port_error_detected,
+ .cor_error_detected = cxl_port_cor_error_detected,
+};
+
+static void cxl_assign_port_error_handlers(struct pci_dev *pdev)
+{
+ struct pci_driver *pdrv = pdev->driver;
+
+ if (!pdrv)
+ return;
+
+ pdrv->cxl_err_handler = &cxl_port_error_handlers;
+}
+
+static void cxl_clear_port_error_handlers(void *data)
+{
+ struct pci_dev *pdev = data;
+ struct pci_driver *pdrv = pdev->driver;
+
+ if (!pdrv)
+ return;
+
+ pdrv->cxl_err_handler = NULL;
+}
+
void cxl_uport_init_ras_reporting(struct cxl_port *port)
{
+ struct pci_dev *pdev = to_pci_dev(port->uport_dev);
+
/* uport may have more than 1 downstream EP. Check if already mapped. */
if (port->uport_regs.ras)
return;
@@ -856,6 +884,9 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port)
dev_err(&port->dev, "Failed to map RAS capability.\n");
return;
}
+
+ cxl_assign_port_error_handlers(pdev);
+ devm_add_action_or_reset(port->uport_dev, cxl_clear_port_error_handlers, pdev);
}
EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL);
@@ -868,6 +899,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
{
struct device *dport_dev = dport->dport_dev;
struct pci_host_bridge *host_bridge = to_pci_host_bridge(dport_dev);
+ struct pci_dev *pdev = to_pci_dev(dport_dev);
if (dport->rch && host_bridge->native_aer) {
cxl_dport_map_rch_aer(dport);
@@ -884,6 +916,9 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
dev_err(dport_dev, "Failed to map RAS capability.\n");
return;
}
+
+ cxl_assign_port_error_handlers(pdev);
+ devm_add_action_or_reset(dport_dev, cxl_clear_port_error_handlers, pdev);
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH v3 15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports
2024-11-13 21:54 [PATCH v3 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
` (13 preceding siblings ...)
2024-11-13 21:54 ` [PATCH v3 14/15] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers Terry Bowman
@ 2024-11-13 21:54 ` Terry Bowman
2024-11-18 11:54 ` Lukas Wunner
14 siblings, 1 reply; 48+ messages in thread
From: Terry Bowman @ 2024-11-13 21:54 UTC (permalink / raw)
To: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot, terry.bowman,
Smita.KoralahalliChannabasappa, lukas
The AER service driver enables uncorrectable internal errors (UIE) and
correctable internal errors (CIE) for CXL root ports and CXL RCEC's. The
UIE and CIE are used in reporting CXL protocol errors. The same UIE/CIE
enablement is needed for CXL PCIe upstream and downstream ports inorder to
notify the associated root port and OS.[1]
Export the AER service driver's pci_aer_unmask_internal_errors() function
to CXL namsespace.
Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config
because it is now an exported function.
Call pci_aer_unmask_internal_errors() during RAS initialization in:
cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting().
[1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 2 ++
drivers/pci/pcie/aer.c | 5 +++--
include/linux/aer.h | 1 +
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index af2ff6936a09..4ede038a7148 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -887,6 +887,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port)
cxl_assign_port_error_handlers(pdev);
devm_add_action_or_reset(port->uport_dev, cxl_clear_port_error_handlers, pdev);
+ pci_aer_unmask_internal_errors(pdev);
}
EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL);
@@ -919,6 +920,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
cxl_assign_port_error_handlers(pdev);
devm_add_action_or_reset(dport_dev, cxl_clear_port_error_handlers, pdev);
+ pci_aer_unmask_internal_errors(pdev);
}
EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 87fddd514030..1028814379e4 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info)
return info->status & PCI_ERR_UNC_INTN;
}
-#ifdef CONFIG_PCIEAER_CXL
/**
* pci_aer_unmask_internal_errors - unmask internal errors
* @dev: pointer to the pcie_dev data structure
@@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info)
* Note: AER must be enabled and supported by the device which must be
* checked in advance, e.g. with pcie_aer_is_native().
*/
-static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
+void pci_aer_unmask_internal_errors(struct pci_dev *dev)
{
int aer = dev->aer_cap;
u32 mask;
@@ -973,7 +972,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
mask &= ~PCI_ERR_COR_INTERNAL;
pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
}
+EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, CXL);
+#ifdef CONFIG_PCIEAER_CXL
static bool is_cxl_mem_dev(struct pci_dev *dev)
{
/*
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 4b97f38f3fcf..093293f9f12b 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -55,5 +55,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
int cper_severity_to_aer(int cper_severity);
void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
int severity, struct aer_capability_regs *aer_regs);
+void pci_aer_unmask_internal_errors(struct pci_dev *dev);
#endif //_AER_H_
--
2.34.1
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [PATCH v3 15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports
2024-11-13 21:54 ` [PATCH v3 15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports Terry Bowman
@ 2024-11-18 11:54 ` Lukas Wunner
2024-11-21 22:25 ` Bowman, Terry
0 siblings, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2024-11-18 11:54 UTC (permalink / raw)
To: Terry Bowman
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Wed, Nov 13, 2024 at 03:54:29PM -0600, Terry Bowman wrote:
> Export the AER service driver's pci_aer_unmask_internal_errors() function
> to CXL namsespace.
^^^^^^^^^^
namespace
> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config
> because it is now an exported function.
[...]
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info)
> return info->status & PCI_ERR_UNC_INTN;
> }
>
> -#ifdef CONFIG_PCIEAER_CXL
> /**
> * pci_aer_unmask_internal_errors - unmask internal errors
> * @dev: pointer to the pcie_dev data structure
> @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info)
> * Note: AER must be enabled and supported by the device which must be
> * checked in advance, e.g. with pcie_aer_is_native().
> */
> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
Hm, it seems the reason why you're moving pci_aer_unmask_internal_errors()
outside of "ifdef CONFIG_PCIEAER_CXL" is that drivers/cxl/core/pci.c
is conditional on CONFIG_CXL_BUS, whereas CONFIG_PCIEAER_CXL depends
on CONFIG_CXL_PCI.
In other words, you need this to avoid build breakage if CONFIG_CXL_BUS
is enabled but CONFIG_CXL_PCI is not.
I'm wondering (as a CXL ignoramus) why that can happen in the first
place, i.e. why is drivers/cxl/core/pci.c compiled at all if
CONFIG_CXL_PCI is disabled?
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports
2024-11-18 11:54 ` Lukas Wunner
@ 2024-11-21 22:25 ` Bowman, Terry
2024-11-21 22:32 ` Lukas Wunner
0 siblings, 1 reply; 48+ messages in thread
From: Bowman, Terry @ 2024-11-21 22:25 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On 11/18/2024 5:54 AM, Lukas Wunner wrote:
> On Wed, Nov 13, 2024 at 03:54:29PM -0600, Terry Bowman wrote:
>> Export the AER service driver's pci_aer_unmask_internal_errors() function
>> to CXL namsespace.
> ^^^^^^^^^^
> namespace
Yup, thanks.
>> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config
>> because it is now an exported function.
> [...]
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info)
>> return info->status & PCI_ERR_UNC_INTN;
>> }
>>
>> -#ifdef CONFIG_PCIEAER_CXL
>> /**
>> * pci_aer_unmask_internal_errors - unmask internal errors
>> * @dev: pointer to the pcie_dev data structure
>> @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info)
>> * Note: AER must be enabled and supported by the device which must be
>> * checked in advance, e.g. with pcie_aer_is_native().
>> */
>> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
>> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> Hm, it seems the reason why you're moving pci_aer_unmask_internal_errors()
> outside of "ifdef CONFIG_PCIEAER_CXL" is that drivers/cxl/core/pci.c
> is conditional on CONFIG_CXL_BUS, whereas CONFIG_PCIEAER_CXL depends
> on CONFIG_CXL_PCI.
>
> In other words, you need this to avoid build breakage if CONFIG_CXL_BUS
> is enabled but CONFIG_CXL_PCI is not.
>
> I'm wondering (as a CXL ignoramus) why that can happen in the first
> place, i.e. why is drivers/cxl/core/pci.c compiled at all if
> CONFIG_CXL_PCI is disabled?
>
> Thanks,
>
> Lukas
I moved the function out of the 'ifdef' block because it would be used in
another subsystem. Bjorn requested in earlier review that functions used across
subsystems should not use ifdef.
The drivers/cxl/Makefile file shows CONFIG_CXL_PCI gates cxl_pci.c build with:
obj-$(CONFIG_CXL_PCI) += cxl_pci.o
BTW, CONFIG_CXL_PCI was added in the commit (68cdd3d2af69) below.
commit 68cdd3d2af6964dae2f8d9b53ee94f740dcbda35
Author: Ben Widawsky <bwidawsk@kernel.org>
Date: Sun Jan 23 16:28:44 2022 -0800
cxl: Rename CXL_MEM to CXL_PCI
The cxl_mem module was renamed cxl_pci in commit 21e9f76733a8 ("cxl:
Rename mem to pci"). In preparation for adding an ancillary driver for
cxl_memdev devices (registered on the cxl bus by cxl_pci), go ahead and
rename CONFIG_CXL_MEM to CONFIG_CXL_PCI. Free up the CXL_MEM name for
that new driver to manage CXL.mem endpoint operations.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164298412409.3018233.12407355692407890752.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Regards,
Terry
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [PATCH v3 15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports
2024-11-21 22:25 ` Bowman, Terry
@ 2024-11-21 22:32 ` Lukas Wunner
0 siblings, 0 replies; 48+ messages in thread
From: Lukas Wunner @ 2024-11-21 22:32 UTC (permalink / raw)
To: Bowman, Terry
Cc: linux-cxl, linux-kernel, linux-pci, nifan.cxl, ming4.li, dave,
jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
dan.j.williams, bhelgaas, mahesh, ira.weiny, oohall,
Benjamin.Cheatham, rrichter, nathan.fontenot,
Smita.KoralahalliChannabasappa
On Thu, Nov 21, 2024 at 04:25:31PM -0600, Bowman, Terry wrote:
> On 11/18/2024 5:54 AM, Lukas Wunner wrote:
> > Hm, it seems the reason why you're moving pci_aer_unmask_internal_errors()
> > outside of "ifdef CONFIG_PCIEAER_CXL" is that drivers/cxl/core/pci.c
> > is conditional on CONFIG_CXL_BUS, whereas CONFIG_PCIEAER_CXL depends
> > on CONFIG_CXL_PCI.
> >
> > In other words, you need this to avoid build breakage if CONFIG_CXL_BUS
> > is enabled but CONFIG_CXL_PCI is not.
> >
> > I'm wondering (as a CXL ignoramus) why that can happen in the first
> > place, i.e. why is drivers/cxl/core/pci.c compiled at all if
> > CONFIG_CXL_PCI is disabled?
[...]
> The drivers/cxl/Makefile file shows CONFIG_CXL_PCI gates cxl_pci.c build with:
> obj-$(CONFIG_CXL_PCI) += cxl_pci.o
I wasn't referring to drivers/cxl/pci.c, but drivers/cxl/core/pci.c.
That's gated by CONFIG_CXL_BUS, not CONFIG_CXL_PCI, which seems weird.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 48+ messages in thread