Linux PCI subsystem development
 help / color / mirror / Atom feed
* [PATCH 0/2] PCI, AER, CXL: Fix appropriate _OSC check for CXL RAS Cap
@ 2023-07-19 19:23 Smita Koralahalli
  2023-07-19 19:23 ` [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global Smita Koralahalli
  2023-07-19 19:23 ` [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers Smita Koralahalli
  0 siblings, 2 replies; 11+ messages in thread
From: Smita Koralahalli @ 2023-07-19 19:23 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-cxl
  Cc: Bjorn Helgaas, oohall, Lukas Wunner, Kuppuswamy Sathyanarayanan,
	Mahesh J Salgaonkar, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams, Jonathan Cameron, Yazen Ghannam,
	Terry Bowman, Robert Richter, Smita Koralahalli

This series of patches fixes the appropriate _OSC check for CXL RAS
registers.

First patch moves around pcie_aer_is_native() function declaration to a
common location to be used by cxl/pci module.

Second patch addresses the _OSC check.

Smita Koralahalli (2):
  PCI, AER: Export and make pcie_aer_is_native() global
  cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS
    registers

 drivers/cxl/pci.c          | 7 +++----
 drivers/pci/pcie/aer.c     | 1 +
 drivers/pci/pcie/portdrv.h | 2 --
 include/linux/aer.h        | 2 ++
 4 files changed, 6 insertions(+), 6 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global
  2023-07-19 19:23 [PATCH 0/2] PCI, AER, CXL: Fix appropriate _OSC check for CXL RAS Cap Smita Koralahalli
@ 2023-07-19 19:23 ` Smita Koralahalli
  2023-07-19 20:36   ` Bjorn Helgaas
  2023-07-19 20:40   ` Sathyanarayanan Kuppuswamy
  2023-07-19 19:23 ` [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers Smita Koralahalli
  1 sibling, 2 replies; 11+ messages in thread
From: Smita Koralahalli @ 2023-07-19 19:23 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-cxl
  Cc: Bjorn Helgaas, oohall, Lukas Wunner, Kuppuswamy Sathyanarayanan,
	Mahesh J Salgaonkar, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams, Jonathan Cameron, Yazen Ghannam,
	Terry Bowman, Robert Richter, Smita Koralahalli

Export and move the declaration of pcie_aer_is_native() to a common header
file to be reused by cxl/pci module.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/pci/pcie/aer.c     | 1 +
 drivers/pci/pcie/portdrv.h | 2 --
 include/linux/aer.h        | 2 ++
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f6c24ded134c..87d90dbda023 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -229,6 +229,7 @@ int pcie_aer_is_native(struct pci_dev *dev)
 
 	return pcie_ports_native || host->native_aer;
 }
+EXPORT_SYMBOL_GPL(pcie_aer_is_native);
 
 int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 58a2b1a1cae4..1f3803bde7ee 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -29,10 +29,8 @@ extern bool pcie_ports_dpc_native;
 
 #ifdef CONFIG_PCIEAER
 int pcie_aer_init(void);
-int pcie_aer_is_native(struct pci_dev *dev);
 #else
 static inline int pcie_aer_init(void) { return 0; }
-static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
 #endif
 
 #ifdef CONFIG_HOTPLUG_PCI_PCIE
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 3a3ab05e13fd..94ce49a5f8d5 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -45,6 +45,7 @@ struct aer_capability_regs {
 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
+int pcie_aer_is_native(struct pci_dev *dev);
 #else
 static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
@@ -58,6 +59,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
+static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
 #endif
 
 void cper_print_aer(struct pci_dev *dev, int aer_severity,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
  2023-07-19 19:23 [PATCH 0/2] PCI, AER, CXL: Fix appropriate _OSC check for CXL RAS Cap Smita Koralahalli
  2023-07-19 19:23 ` [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global Smita Koralahalli
@ 2023-07-19 19:23 ` Smita Koralahalli
  2023-07-19 20:39   ` Sathyanarayanan Kuppuswamy
  1 sibling, 1 reply; 11+ messages in thread
From: Smita Koralahalli @ 2023-07-19 19:23 UTC (permalink / raw)
  To: linux-pci, linux-kernel, linux-cxl
  Cc: Bjorn Helgaas, oohall, Lukas Wunner, Kuppuswamy Sathyanarayanan,
	Mahesh J Salgaonkar, Alison Schofield, Vishal Verma, Ira Weiny,
	Ben Widawsky, Dan Williams, Jonathan Cameron, Yazen Ghannam,
	Terry Bowman, Robert Richter, Smita Koralahalli

According to Section 9.17.2, Table 9-26 of CXL Specification [1], owner
of AER should also own CXL Protocol Error Management as there is no
explicit control of CXL Protocol error. And the CXL RAS Cap registers
reported on Protocol errors should check for AER _OSC rather than CXL
Memory Error Reporting Control _OSC.

The CXL Memory Error Reporting Control _OSC specifically highlights
handling Memory Error Logging and Signaling Enhancements. These kinds of
errors are reported through a device's mailbox and can be managed
independently from CXL Protocol Errors.

[1] Compute Express Link (CXL) Specification, Revision 3.1, Aug 1 2022.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/cxl/pci.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 1cb1494c28fe..44a21ab7add5 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -529,7 +529,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
 
 static int cxl_pci_ras_unmask(struct pci_dev *pdev)
 {
-	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
 	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
 	void __iomem *addr;
 	u32 orig_val, val, mask;
@@ -541,9 +540,9 @@ static int cxl_pci_ras_unmask(struct pci_dev *pdev)
 		return 0;
 	}
 
-	/* BIOS has CXL error control */
-	if (!host_bridge->native_cxl_error)
-		return -ENXIO;
+	/* BIOS has PCIe AER error control */
+	if (!pcie_aer_is_native(pdev))
+		return 0;
 
 	rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
 	if (rc)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global
  2023-07-19 19:23 ` [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global Smita Koralahalli
@ 2023-07-19 20:36   ` Bjorn Helgaas
  2023-07-19 22:06     ` Smita Koralahalli
  2023-07-19 20:40   ` Sathyanarayanan Kuppuswamy
  1 sibling, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2023-07-19 20:36 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: linux-pci, linux-kernel, linux-cxl, Bjorn Helgaas, oohall,
	Lukas Wunner, Kuppuswamy Sathyanarayanan, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
	Robert Richter

On Wed, Jul 19, 2023 at 07:23:12PM +0000, Smita Koralahalli wrote:
> Export and move the declaration of pcie_aer_is_native() to a common header
> file to be reused by cxl/pci module.

Run "git log --oneline drivers/pci/pcie/aer.c" and format your subject
line to match.

"Exporting" pretty much means making it global, so "Export
pcie_aer_is_native()" is probably enough.

> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

With the above,

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/pci/pcie/aer.c     | 1 +
>  drivers/pci/pcie/portdrv.h | 2 --
>  include/linux/aer.h        | 2 ++
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..87d90dbda023 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -229,6 +229,7 @@ int pcie_aer_is_native(struct pci_dev *dev)
>  
>  	return pcie_ports_native || host->native_aer;
>  }
> +EXPORT_SYMBOL_GPL(pcie_aer_is_native);
>  
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 58a2b1a1cae4..1f3803bde7ee 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -29,10 +29,8 @@ extern bool pcie_ports_dpc_native;
>  
>  #ifdef CONFIG_PCIEAER
>  int pcie_aer_init(void);
> -int pcie_aer_is_native(struct pci_dev *dev);
>  #else
>  static inline int pcie_aer_init(void) { return 0; }
> -static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>  #endif
>  
>  #ifdef CONFIG_HOTPLUG_PCI_PCIE
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 3a3ab05e13fd..94ce49a5f8d5 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -45,6 +45,7 @@ struct aer_capability_regs {
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +int pcie_aer_is_native(struct pci_dev *dev);
>  #else
>  static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
> @@ -58,6 +59,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  {
>  	return -EINVAL;
>  }
> +static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>  #endif
>  
>  void cper_print_aer(struct pci_dev *dev, int aer_severity,
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
  2023-07-19 19:23 ` [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers Smita Koralahalli
@ 2023-07-19 20:39   ` Sathyanarayanan Kuppuswamy
  2023-07-19 22:30     ` Smita Koralahalli
  0 siblings, 1 reply; 11+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2023-07-19 20:39 UTC (permalink / raw)
  To: Smita Koralahalli, linux-pci, linux-kernel, linux-cxl
  Cc: Bjorn Helgaas, oohall, Lukas Wunner, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
	Robert Richter



On 7/19/23 12:23 PM, Smita Koralahalli wrote:
> According to Section 9.17.2, Table 9-26 of CXL Specification [1], owner
> of AER should also own CXL Protocol Error Management as there is no
> explicit control of CXL Protocol error. And the CXL RAS Cap registers
> reported on Protocol errors should check for AER _OSC rather than CXL
> Memory Error Reporting Control _OSC.
> 
> The CXL Memory Error Reporting Control _OSC specifically highlights
> handling Memory Error Logging and Signaling Enhancements. These kinds of
> errors are reported through a device's mailbox and can be managed
> independently from CXL Protocol Errors.

Does it fix any issue? If yes, please include that in the commit log.

Since you are removing some change, maybe it needs Fixes: tag?
> 
> [1] Compute Express Link (CXL) Specification, Revision 3.1, Aug 1 2022.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  drivers/cxl/pci.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 1cb1494c28fe..44a21ab7add5 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -529,7 +529,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>  
>  static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>  {
> -	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
>  	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>  	void __iomem *addr;
>  	u32 orig_val, val, mask;
> @@ -541,9 +540,9 @@ static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>  		return 0;
>  	}
>  
> -	/* BIOS has CXL error control */
> -	if (!host_bridge->native_cxl_error)
> -		return -ENXIO;
> +	/* BIOS has PCIe AER error control */
> +	if (!pcie_aer_is_native(pdev))
> +		return 0;
>  
>  	rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
>  	if (rc)

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global
  2023-07-19 19:23 ` [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global Smita Koralahalli
  2023-07-19 20:36   ` Bjorn Helgaas
@ 2023-07-19 20:40   ` Sathyanarayanan Kuppuswamy
  1 sibling, 0 replies; 11+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2023-07-19 20:40 UTC (permalink / raw)
  To: Smita Koralahalli, linux-pci, linux-kernel, linux-cxl
  Cc: Bjorn Helgaas, oohall, Lukas Wunner, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
	Robert Richter



On 7/19/23 12:23 PM, Smita Koralahalli wrote:
> Export and move the declaration of pcie_aer_is_native() to a common header
> file to be reused by cxl/pci module.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

> ---
>  drivers/pci/pcie/aer.c     | 1 +
>  drivers/pci/pcie/portdrv.h | 2 --
>  include/linux/aer.h        | 2 ++
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f6c24ded134c..87d90dbda023 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -229,6 +229,7 @@ int pcie_aer_is_native(struct pci_dev *dev)
>  
>  	return pcie_ports_native || host->native_aer;
>  }
> +EXPORT_SYMBOL_GPL(pcie_aer_is_native);
>  
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 58a2b1a1cae4..1f3803bde7ee 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -29,10 +29,8 @@ extern bool pcie_ports_dpc_native;
>  
>  #ifdef CONFIG_PCIEAER
>  int pcie_aer_init(void);
> -int pcie_aer_is_native(struct pci_dev *dev);
>  #else
>  static inline int pcie_aer_init(void) { return 0; }
> -static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>  #endif
>  
>  #ifdef CONFIG_HOTPLUG_PCI_PCIE
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 3a3ab05e13fd..94ce49a5f8d5 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -45,6 +45,7 @@ struct aer_capability_regs {
>  int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>  int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> +int pcie_aer_is_native(struct pci_dev *dev);
>  #else
>  static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>  {
> @@ -58,6 +59,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  {
>  	return -EINVAL;
>  }
> +static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>  #endif
>  
>  void cper_print_aer(struct pci_dev *dev, int aer_severity,

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global
  2023-07-19 20:36   ` Bjorn Helgaas
@ 2023-07-19 22:06     ` Smita Koralahalli
  0 siblings, 0 replies; 11+ messages in thread
From: Smita Koralahalli @ 2023-07-19 22:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, linux-cxl, Bjorn Helgaas, oohall,
	Lukas Wunner, Kuppuswamy Sathyanarayanan, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
	Robert Richter

On 7/19/2023 1:36 PM, Bjorn Helgaas wrote:
> On Wed, Jul 19, 2023 at 07:23:12PM +0000, Smita Koralahalli wrote:
>> Export and move the declaration of pcie_aer_is_native() to a common header
>> file to be reused by cxl/pci module.
> 
> Run "git log --oneline drivers/pci/pcie/aer.c" and format your subject
> line to match.
> 
> "Exporting" pretty much means making it global, so "Export
> pcie_aer_is_native()" is probably enough.
> 
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> 
> With the above,

Will make the above two changes in v2. Thanks for the review!

Thanks,
Smita

> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 
>> ---
>>   drivers/pci/pcie/aer.c     | 1 +
>>   drivers/pci/pcie/portdrv.h | 2 --
>>   include/linux/aer.h        | 2 ++
>>   3 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index f6c24ded134c..87d90dbda023 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -229,6 +229,7 @@ int pcie_aer_is_native(struct pci_dev *dev)
>>   
>>   	return pcie_ports_native || host->native_aer;
>>   }
>> +EXPORT_SYMBOL_GPL(pcie_aer_is_native);
>>   
>>   int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>>   {
>> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
>> index 58a2b1a1cae4..1f3803bde7ee 100644
>> --- a/drivers/pci/pcie/portdrv.h
>> +++ b/drivers/pci/pcie/portdrv.h
>> @@ -29,10 +29,8 @@ extern bool pcie_ports_dpc_native;
>>   
>>   #ifdef CONFIG_PCIEAER
>>   int pcie_aer_init(void);
>> -int pcie_aer_is_native(struct pci_dev *dev);
>>   #else
>>   static inline int pcie_aer_init(void) { return 0; }
>> -static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>   #endif
>>   
>>   #ifdef CONFIG_HOTPLUG_PCI_PCIE
>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>> index 3a3ab05e13fd..94ce49a5f8d5 100644
>> --- a/include/linux/aer.h
>> +++ b/include/linux/aer.h
>> @@ -45,6 +45,7 @@ struct aer_capability_regs {
>>   int pci_enable_pcie_error_reporting(struct pci_dev *dev);
>>   int pci_disable_pcie_error_reporting(struct pci_dev *dev);
>>   int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>> +int pcie_aer_is_native(struct pci_dev *dev);
>>   #else
>>   static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev)
>>   {
>> @@ -58,6 +59,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>   {
>>   	return -EINVAL;
>>   }
>> +static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>   #endif
>>   
>>   void cper_print_aer(struct pci_dev *dev, int aer_severity,
>> -- 
>> 2.17.1
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
  2023-07-19 20:39   ` Sathyanarayanan Kuppuswamy
@ 2023-07-19 22:30     ` Smita Koralahalli
  2023-07-20 13:07       ` Robert Richter
  0 siblings, 1 reply; 11+ messages in thread
From: Smita Koralahalli @ 2023-07-19 22:30 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, linux-pci, linux-kernel, linux-cxl
  Cc: Bjorn Helgaas, oohall, Lukas Wunner, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
	Robert Richter

On 7/19/2023 1:39 PM, Sathyanarayanan Kuppuswamy wrote:
> 
> 
> On 7/19/23 12:23 PM, Smita Koralahalli wrote:
>> According to Section 9.17.2, Table 9-26 of CXL Specification [1], owner
>> of AER should also own CXL Protocol Error Management as there is no
>> explicit control of CXL Protocol error. And the CXL RAS Cap registers
>> reported on Protocol errors should check for AER _OSC rather than CXL
>> Memory Error Reporting Control _OSC.
>>
>> The CXL Memory Error Reporting Control _OSC specifically highlights
>> handling Memory Error Logging and Signaling Enhancements. These kinds of
>> errors are reported through a device's mailbox and can be managed
>> independently from CXL Protocol Errors.
> 
> Does it fix any issue? If yes, please include that in the commit log.

Yes, this fix actually makes Protocol Error handling independent of 
Component/Memory Error handling.

We observed that OS was not able to handle the protocol errors ("i.e 
unable to reference to the cxl device node") with native AER support. 
The reason being Memory/Component Error handling was under FW control.

Since the RAS registers are tied to protocol errors, I think there is no 
reason that memory error reporting being in fw control or os control 
should be a roadblock in handling RAS registers or accessing the cxl 
device node by OS.

> 
> Since you are removing some change, maybe it needs Fixes: tag?

Missed this. Thanks!

Fixes: 248529edc86f ("cxl: add RAS status unmasking for CXL")

Will include in v2.

Thanks,
Smita

>>
>> [1] Compute Express Link (CXL) Specification, Revision 3.1, Aug 1 2022.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>   drivers/cxl/pci.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 1cb1494c28fe..44a21ab7add5 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -529,7 +529,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>>   
>>   static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>>   {
>> -	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
>>   	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>>   	void __iomem *addr;
>>   	u32 orig_val, val, mask;
>> @@ -541,9 +540,9 @@ static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>>   		return 0;
>>   	}
>>   
>> -	/* BIOS has CXL error control */
>> -	if (!host_bridge->native_cxl_error)
>> -		return -ENXIO;
>> +	/* BIOS has PCIe AER error control */
>> +	if (!pcie_aer_is_native(pdev))
>> +		return 0;
>>   
>>   	rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
>>   	if (rc)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
  2023-07-19 22:30     ` Smita Koralahalli
@ 2023-07-20 13:07       ` Robert Richter
  2023-07-20 18:31         ` Smita Koralahalli
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Richter @ 2023-07-20 13:07 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: Sathyanarayanan Kuppuswamy, linux-pci, linux-kernel, linux-cxl,
	Bjorn Helgaas, oohall, Lukas Wunner, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman

Smita,

On 19.07.23 15:30:25, Smita Koralahalli wrote:
> On 7/19/2023 1:39 PM, Sathyanarayanan Kuppuswamy wrote:
> > 
> > 
> > On 7/19/23 12:23 PM, Smita Koralahalli wrote:
> > > According to Section 9.17.2, Table 9-26 of CXL Specification [1], owner
> > > of AER should also own CXL Protocol Error Management as there is no
> > > explicit control of CXL Protocol error. And the CXL RAS Cap registers
> > > reported on Protocol errors should check for AER _OSC rather than CXL
> > > Memory Error Reporting Control _OSC.
> > > 
> > > The CXL Memory Error Reporting Control _OSC specifically highlights
> > > handling Memory Error Logging and Signaling Enhancements. These kinds of
> > > errors are reported through a device's mailbox and can be managed
> > > independently from CXL Protocol Errors.
> > 
> > Does it fix any issue? If yes, please include that in the commit log.
> 
> Yes, this fix actually makes Protocol Error handling independent of
> Component/Memory Error handling.
> 
> We observed that OS was not able to handle the protocol errors ("i.e unable
> to reference to the cxl device node") with native AER support. The reason
> being Memory/Component Error handling was under FW control.
> 
> Since the RAS registers are tied to protocol errors, I think there is no
> reason that memory error reporting being in fw control or os control should
> be a roadblock in handling RAS registers or accessing the cxl device node by
> OS.
> 
> > 
> > Since you are removing some change, maybe it needs Fixes: tag?
> 
> Missed this. Thanks!
> 
> Fixes: 248529edc86f ("cxl: add RAS status unmasking for CXL")

the fix must be isolated to this patch (for automated backports) and
you need to remove the dependency to the first patch then. So swap
them and ... see below.

> 
> Will include in v2.
> 
> Thanks,
> Smita
> 
> > > 
> > > [1] Compute Express Link (CXL) Specification, Revision 3.1, Aug 1 2022.
> > > 
> > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > > ---
> > >   drivers/cxl/pci.c | 7 +++----
> > >   1 file changed, 3 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index 1cb1494c28fe..44a21ab7add5 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -529,7 +529,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> > >   static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> > >   {
> > > -	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> > >   	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> > >   	void __iomem *addr;
> > >   	u32 orig_val, val, mask;
> > > @@ -541,9 +540,9 @@ static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> > >   		return 0;
> > >   	}
> > > -	/* BIOS has CXL error control */
> > > -	if (!host_bridge->native_cxl_error)

For the fix, you could replace that with:

	if (!host_bridge->native_aer) ...

> > > -		return -ENXIO;
> > > +	/* BIOS has PCIe AER error control */
> > > +	if (!pcie_aer_is_native(pdev))
> > > +		return 0;

... and replace it with this function here in the patch where
pcie_aer_is_native() is exported (or in a 3rd patch).

-Robert

> > >   	rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
> > >   	if (rc)
> > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
  2023-07-20 13:07       ` Robert Richter
@ 2023-07-20 18:31         ` Smita Koralahalli
  2023-07-21 13:49           ` Robert Richter
  0 siblings, 1 reply; 11+ messages in thread
From: Smita Koralahalli @ 2023-07-20 18:31 UTC (permalink / raw)
  To: Robert Richter
  Cc: Sathyanarayanan Kuppuswamy, linux-pci, linux-kernel, linux-cxl,
	Bjorn Helgaas, oohall, Lukas Wunner, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman

On 7/20/2023 6:07 AM, Robert Richter wrote:
> Smita,
> 
> On 19.07.23 15:30:25, Smita Koralahalli wrote:
>> On 7/19/2023 1:39 PM, Sathyanarayanan Kuppuswamy wrote:
>>>
>>>
>>> On 7/19/23 12:23 PM, Smita Koralahalli wrote:
>>>> According to Section 9.17.2, Table 9-26 of CXL Specification [1], owner
>>>> of AER should also own CXL Protocol Error Management as there is no
>>>> explicit control of CXL Protocol error. And the CXL RAS Cap registers
>>>> reported on Protocol errors should check for AER _OSC rather than CXL
>>>> Memory Error Reporting Control _OSC.
>>>>
>>>> The CXL Memory Error Reporting Control _OSC specifically highlights
>>>> handling Memory Error Logging and Signaling Enhancements. These kinds of
>>>> errors are reported through a device's mailbox and can be managed
>>>> independently from CXL Protocol Errors.
>>>
>>> Does it fix any issue? If yes, please include that in the commit log.
>>
>> Yes, this fix actually makes Protocol Error handling independent of
>> Component/Memory Error handling.
>>
>> We observed that OS was not able to handle the protocol errors ("i.e unable
>> to reference to the cxl device node") with native AER support. The reason
>> being Memory/Component Error handling was under FW control.
>>
>> Since the RAS registers are tied to protocol errors, I think there is no
>> reason that memory error reporting being in fw control or os control should
>> be a roadblock in handling RAS registers or accessing the cxl device node by
>> OS.
>>
>>>
>>> Since you are removing some change, maybe it needs Fixes: tag?
>>
>> Missed this. Thanks!
>>
>> Fixes: 248529edc86f ("cxl: add RAS status unmasking for CXL")
> 
> the fix must be isolated to this patch (for automated backports) and
> you need to remove the dependency to the first patch then. So swap
> them and ... see below.
> 
>>
>> Will include in v2.
>>
>> Thanks,
>> Smita
>>
>>>>
>>>> [1] Compute Express Link (CXL) Specification, Revision 3.1, Aug 1 2022.
>>>>
>>>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>>>> ---
>>>>    drivers/cxl/pci.c | 7 +++----
>>>>    1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>>>> index 1cb1494c28fe..44a21ab7add5 100644
>>>> --- a/drivers/cxl/pci.c
>>>> +++ b/drivers/cxl/pci.c
>>>> @@ -529,7 +529,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>>>>    static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>>>>    {
>>>> -	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
>>>>    	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>>>>    	void __iomem *addr;
>>>>    	u32 orig_val, val, mask;
>>>> @@ -541,9 +540,9 @@ static int cxl_pci_ras_unmask(struct pci_dev *pdev)
>>>>    		return 0;
>>>>    	}
>>>> -	/* BIOS has CXL error control */
>>>> -	if (!host_bridge->native_cxl_error)
> 
> For the fix, you could replace that with:
> 
> 	if (!host_bridge->native_aer) ...

Yeah I tried something like:
+	if (!pdev->aer_cap &&
+	    !(pcie_ports_native || host_bridge->native_aer))
+		return 0;

But then pcie_ports_native needed to be exported as well. So better just 
keep the check to !host_bridge->native_aer and return zero in first 
patch, EXPORT to second and replacing host_bridge->native_aer with 
pcie_aer_is_native() in third?

Thanks,
Smita

> 
>>>> -		return -ENXIO;
>>>> +	/* BIOS has PCIe AER error control */
>>>> +	if (!pcie_aer_is_native(pdev))
>>>> +		return 0;
> 
> ... and replace it with this function here in the patch where
> pcie_aer_is_native() is exported (or in a 3rd patch).
> 
> -Robert
> 
>>>>    	rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
>>>>    	if (rc)
>>>
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
  2023-07-20 18:31         ` Smita Koralahalli
@ 2023-07-21 13:49           ` Robert Richter
  0 siblings, 0 replies; 11+ messages in thread
From: Robert Richter @ 2023-07-21 13:49 UTC (permalink / raw)
  To: Smita Koralahalli
  Cc: Sathyanarayanan Kuppuswamy, linux-pci, linux-kernel, linux-cxl,
	Bjorn Helgaas, oohall, Lukas Wunner, Mahesh J Salgaonkar,
	Alison Schofield, Vishal Verma, Ira Weiny, Ben Widawsky,
	Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman

On 20.07.23 11:31:15, Smita Koralahalli wrote:
> On 7/20/2023 6:07 AM, Robert Richter wrote:
> > Smita,
> > 
> > On 19.07.23 15:30:25, Smita Koralahalli wrote:
> > > On 7/19/2023 1:39 PM, Sathyanarayanan Kuppuswamy wrote:
> > > > 
> > > > 
> > > > On 7/19/23 12:23 PM, Smita Koralahalli wrote:
> > > > > According to Section 9.17.2, Table 9-26 of CXL Specification [1], owner
> > > > > of AER should also own CXL Protocol Error Management as there is no
> > > > > explicit control of CXL Protocol error. And the CXL RAS Cap registers
> > > > > reported on Protocol errors should check for AER _OSC rather than CXL
> > > > > Memory Error Reporting Control _OSC.
> > > > > 
> > > > > The CXL Memory Error Reporting Control _OSC specifically highlights
> > > > > handling Memory Error Logging and Signaling Enhancements. These kinds of
> > > > > errors are reported through a device's mailbox and can be managed
> > > > > independently from CXL Protocol Errors.
> > > > 
> > > > Does it fix any issue? If yes, please include that in the commit log.
> > > 
> > > Yes, this fix actually makes Protocol Error handling independent of
> > > Component/Memory Error handling.
> > > 
> > > We observed that OS was not able to handle the protocol errors ("i.e unable
> > > to reference to the cxl device node") with native AER support. The reason
> > > being Memory/Component Error handling was under FW control.
> > > 
> > > Since the RAS registers are tied to protocol errors, I think there is no
> > > reason that memory error reporting being in fw control or os control should
> > > be a roadblock in handling RAS registers or accessing the cxl device node by
> > > OS.
> > > 
> > > > 
> > > > Since you are removing some change, maybe it needs Fixes: tag?
> > > 
> > > Missed this. Thanks!
> > > 
> > > Fixes: 248529edc86f ("cxl: add RAS status unmasking for CXL")
> > 
> > the fix must be isolated to this patch (for automated backports) and
> > you need to remove the dependency to the first patch then. So swap
> > them and ... see below.
> > 
> > > 
> > > Will include in v2.
> > > 
> > > Thanks,
> > > Smita
> > > 
> > > > > 
> > > > > [1] Compute Express Link (CXL) Specification, Revision 3.1, Aug 1 2022.
> > > > > 
> > > > > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> > > > > ---
> > > > >    drivers/cxl/pci.c | 7 +++----
> > > > >    1 file changed, 3 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > index 1cb1494c28fe..44a21ab7add5 100644
> > > > > --- a/drivers/cxl/pci.c
> > > > > +++ b/drivers/cxl/pci.c
> > > > > @@ -529,7 +529,6 @@ static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> > > > >    static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> > > > >    {
> > > > > -	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
> > > > >    	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> > > > >    	void __iomem *addr;
> > > > >    	u32 orig_val, val, mask;
> > > > > @@ -541,9 +540,9 @@ static int cxl_pci_ras_unmask(struct pci_dev *pdev)
> > > > >    		return 0;
> > > > >    	}
> > > > > -	/* BIOS has CXL error control */
> > > > > -	if (!host_bridge->native_cxl_error)
> > 
> > For the fix, you could replace that with:
> > 
> > 	if (!host_bridge->native_aer) ...
> 
> Yeah I tried something like:
> +	if (!pdev->aer_cap &&
> +	    !(pcie_ports_native || host_bridge->native_aer))
> +		return 0;
> 
> But then pcie_ports_native needed to be exported as well. So better just
> keep the check to !host_bridge->native_aer and return zero in first patch,
> EXPORT to second and replacing host_bridge->native_aer with
> pcie_aer_is_native() in third?

Looks good.

Thanks,

-Robert

> 
> Thanks,
> Smita
> 
> > 
> > > > > -		return -ENXIO;
> > > > > +	/* BIOS has PCIe AER error control */
> > > > > +	if (!pcie_aer_is_native(pdev))
> > > > > +		return 0;
> > 
> > ... and replace it with this function here in the patch where
> > pcie_aer_is_native() is exported (or in a 3rd patch).
> > 
> > -Robert
> > 
> > > > >    	rc = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap);
> > > > >    	if (rc)
> > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-07-21 13:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-19 19:23 [PATCH 0/2] PCI, AER, CXL: Fix appropriate _OSC check for CXL RAS Cap Smita Koralahalli
2023-07-19 19:23 ` [PATCH 1/2] PCI, AER: Export and make pcie_aer_is_native() global Smita Koralahalli
2023-07-19 20:36   ` Bjorn Helgaas
2023-07-19 22:06     ` Smita Koralahalli
2023-07-19 20:40   ` Sathyanarayanan Kuppuswamy
2023-07-19 19:23 ` [PATCH 2/2] cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers Smita Koralahalli
2023-07-19 20:39   ` Sathyanarayanan Kuppuswamy
2023-07-19 22:30     ` Smita Koralahalli
2023-07-20 13:07       ` Robert Richter
2023-07-20 18:31         ` Smita Koralahalli
2023-07-21 13:49           ` Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox