public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Bowman, Terry" <kibowman@amd.com>
To: Fan Ni <nifan.cxl@gmail.com>, Terry Bowman <terry.bowman@amd.com>
Cc: ming4.li@intel.com, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	dave@stgolabs.net, jonathan.cameron@huawei.com,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, mahesh@linux.ibm.com, oohall@gmail.com,
	Benjamin.Cheatham@amd.com, rrichter@amd.com,
	nathan.fontenot@amd.com, smita.koralahallichannabasappa@amd.com
Subject: Re: [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging
Date: Thu, 17 Oct 2024 12:27:04 -0500	[thread overview]
Message-ID: <8c34b676-e71a-42e8-96fe-485ffeaa8328@amd.com> (raw)
In-Reply-To: <ZxE8jn9M7twa4v2u@fan>

[-- Attachment #1: Type: text/plain, Size: 1700 bytes --]

Hi Fan,

On 10/17/2024 11:34 AM, Fan Ni wrote:
> On Tue, Oct 08, 2024 at 05:16:42PM -0500, Terry Bowman wrote:
>> This is a continuation of the CXL port error handling RFC from earlier.[1]
>> The RFC resulted in the decision to add CXL PCIe port error handling to
>> the existing RCH downstream port handling. This patchset adds the CXL PCIe
>> port handling and logging.
>>
>> The first 7 patches update the existing AER service driver to support CXL
>> PCIe port protocol error handling and reporting. This includes AER service
>> driver changes for adding correctable and uncorrectable error support, CXL
>> specific recovery handling, and addition of CXL driver callback handlers.
>>
>> The following 8 patches address CXL driver support for CXL PCIe port
>> protocol errors. This includes the following changes to the CXL drivers:
>> mapping CXL port and downstream port RAS registers, interface updates for
>> common RCH and VH, adding port specific error handlers, and protocol error
>> logging.
>>
>> [1] - https://lore.kernel.org/linux-cxl/20240617200411.1426554
>> -1-terry.bowman@amd.com/
>>
>> Testing:
>>
>> Below are test results for this patchset. This is using Qemu with a root
>> port (0c:00.0), upstream switch port (0d:00.0),and downstream switch port
>> (0e:00.0).
>>
>> This was tested using aer-inject updated to support CE and UCE internal
>> error injection. CXL RAS was set using a test patch (not upstreamed).
> 
> Hi Terry,
> Can you share the aer-inject repo for the testing or the test patch?
> 
> Fan

Sure, but, its easiest to attach the patch here.

Origin was https://github.com/jderrick/aer-inject.git
Base is 81701cbb30e35a1a76c3876f55692f91bdb9751b

Regards,
Terry

[-- Attachment #2: 0001-aer-inject-Add-internal-error-injection.patch --]
[-- Type: text/plain, Size: 2994 bytes --]

From ca9277866b506723f46f3acd7b264ffa80c37276 Mon Sep 17 00:00:00 2001
From: Terry Bowman <terry.bowman@amd.com>
Date: Thu, 17 Oct 2024 12:12:58 -0500
Subject: [PATCH] aer-inject: Add internal error injection

Add corrected (CE) and uncorrected (UCE) AER internal error injection
support.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 aer.h   | 2 ++
 aer.lex | 2 ++
 aer.y   | 8 ++++----
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/aer.h b/aer.h
index a0ad152..e55a731 100644
--- a/aer.h
+++ b/aer.h
@@ -30,11 +30,13 @@ struct aer_error_inj
 #define  PCI_ERR_UNC_MALF_TLP	0x00040000	/* Malformed TLP */
 #define  PCI_ERR_UNC_ECRC	0x00080000	/* ECRC Error Status */
 #define  PCI_ERR_UNC_UNSUP	0x00100000	/* Unsupported Request */
+#define  PCI_ERR_UNC_INTERNAL   0x00400000      /* Internal error */
 #define  PCI_ERR_COR_RCVR	0x00000001	/* Receiver Error Status */
 #define  PCI_ERR_COR_BAD_TLP	0x00000040	/* Bad TLP Status */
 #define  PCI_ERR_COR_BAD_DLLP	0x00000080	/* Bad DLLP Status */
 #define  PCI_ERR_COR_REP_ROLL	0x00000100	/* REPLAY_NUM Rollover */
 #define  PCI_ERR_COR_REP_TIMER	0x00001000	/* Replay Timer Timeout */
+#define  PCI_ERR_COR_CINTERNAL	0x00004000	/* Internal error */
 
 extern void init_aer(struct aer_error_inj *err);
 extern void submit_aer(struct aer_error_inj *err);
diff --git a/aer.lex b/aer.lex
index 6121e4e..4fadd0e 100644
--- a/aer.lex
+++ b/aer.lex
@@ -82,11 +82,13 @@ static struct key {
 	KEYVAL(MALF_TLP, PCI_ERR_UNC_MALF_TLP),
 	KEYVAL(ECRC, PCI_ERR_UNC_ECRC),
 	KEYVAL(UNSUP, PCI_ERR_UNC_UNSUP),
+	KEYVAL(INTERNAL, PCI_ERR_UNC_INTERNAL),
 	KEYVAL(RCVR, PCI_ERR_COR_RCVR),
 	KEYVAL(BAD_TLP, PCI_ERR_COR_BAD_TLP),
 	KEYVAL(BAD_DLLP, PCI_ERR_COR_BAD_DLLP),
 	KEYVAL(REP_ROLL, PCI_ERR_COR_REP_ROLL),
 	KEYVAL(REP_TIMER, PCI_ERR_COR_REP_TIMER),
+	KEYVAL(CINTERNAL, PCI_ERR_COR_CINTERNAL),
 };
 
 static int cmp_key(const void *av, const void *bv)
diff --git a/aer.y b/aer.y
index e5ecc7d..500dc97 100644
--- a/aer.y
+++ b/aer.y
@@ -34,8 +34,8 @@ static void init(void);
 
 %token AER DOMAIN BUS DEV FN PCI_ID UNCOR_STATUS COR_STATUS HEADER_LOG
 %token <num> TRAIN DLP POISON_TLP FCP COMP_TIME COMP_ABORT UNX_COMP RX_OVER
-%token <num> MALF_TLP ECRC UNSUP
-%token <num> RCVR BAD_TLP BAD_DLLP REP_ROLL REP_TIMER
+%token <num> MALF_TLP ECRC UNSUP INTERNAL
+%token <num> RCVR BAD_TLP BAD_DLLP REP_ROLL REP_TIMER CINTERNAL
 %token <num> SYMBOL NUMBER
 %token <str> PCI_ID_STR
 
@@ -77,14 +77,14 @@ uncor_status_list: /* empty */			{ $$ = 0; }
 	;
 
 uncor_status: TRAIN | DLP | POISON_TLP | FCP | COMP_TIME | COMP_ABORT
-	| UNX_COMP | RX_OVER | MALF_TLP | ECRC | UNSUP | NUMBER
+	| UNX_COMP | RX_OVER | MALF_TLP | ECRC | UNSUP | INTERNAL | NUMBER
 	;
 
 cor_status_list: /* empty */			{ $$ = 0; }
 	| cor_status_list cor_status		{ $$ = $1 | $2; }
 	;
 
-cor_status: RCVR | BAD_TLP | BAD_DLLP | REP_ROLL | REP_TIMER | NUMBER
+cor_status: RCVR | BAD_TLP | BAD_DLLP | REP_ROLL | REP_TIMER | CINTERNAL | NUMBER
 	;
 
 %% 
-- 
2.34.1


  reply	other threads:[~2024-10-17 17:27 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08 22:16 [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2024-10-08 22:16 ` [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler callbacks in AER service driver Terry Bowman
2024-10-22  1:53   ` Dan Williams
2024-10-22 13:50     ` Terry Bowman
2024-10-22 17:09       ` Dan Williams
2024-10-22 18:40         ` Terry Bowman
2024-10-22 23:43           ` Dan Williams
2024-10-24 15:20             ` Bowman, Terry
2024-10-24 19:10               ` Dan Williams
2024-10-08 22:16 ` [PATCH 02/15] cxl/aer/pci: Update is_internal_error() to be callable w/o CONFIG_PCIEAER_CXL Terry Bowman
2024-10-16 16:11   ` Jonathan Cameron
2024-10-22  2:17   ` Dan Williams
2024-10-22 13:54     ` Terry Bowman
2024-10-08 22:16 ` [PATCH 03/15] cxl/aer/pci: Refactor AER driver's existing interfaces to support CXL PCIe ports Terry Bowman
2024-10-10 19:11   ` Bjorn Helgaas
2024-10-14 17:27     ` Terry Bowman
2024-10-08 22:16 ` [PATCH 04/15] cxl/aer/pci: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
2024-10-16 16:22   ` Jonathan Cameron
2024-10-16 17:18     ` Terry Bowman
2024-10-16 17:29       ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 05/15] cxl/aer/pci: Update AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
2024-10-16 16:28   ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type Terry Bowman
2024-10-16 16:30   ` Jonathan Cameron
2024-10-16 17:31     ` Terry Bowman
2024-10-17 13:31       ` Jonathan Cameron
2024-10-17 14:50         ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 07/15] cxl/aer/pci: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
2024-10-16 16:54   ` Jonathan Cameron
2024-10-16 18:07     ` Terry Bowman
2024-10-17 13:43       ` Jonathan Cameron
2024-10-17 16:21         ` Bowman, Terry
2024-10-17 17:08           ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 08/15] cxl/pci: Change find_cxl_ports() to be non-static Terry Bowman
2024-10-08 22:16 ` [PATCH 09/15] cxl/pci: Map CXL PCIe downstream port RAS registers Terry Bowman
2024-10-16 17:14   ` Jonathan Cameron
2024-10-16 18:16     ` Terry Bowman
2024-10-17 13:50       ` Jonathan Cameron
2024-10-17 16:26         ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 10/15] cxl/pci: Map CXL PCIe upstream " Terry Bowman
2024-10-08 22:16 ` [PATCH 11/15] cxl/pci: Update RAS handler interfaces to support CXL PCIe ports Terry Bowman
2024-10-08 22:16 ` [PATCH 12/15] cxl/pci: Add error handler for CXL PCIe port RAS errors Terry Bowman
2024-10-17 13:57   ` Jonathan Cameron
2024-10-17 16:42     ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 13/15] cxl/pci: Add trace logging " Terry Bowman
2024-10-17 14:04   ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 14/15] cxl/aer/pci: Export pci_aer_unmask_internal_errors() Terry Bowman
2024-10-16 17:22   ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 15/15] cxl/pci: Enable internal CE/UCE interrupts for CXL PCIe port devices Terry Bowman
2024-10-16 17:21   ` Jonathan Cameron
2024-10-16 17:24     ` Terry Bowman
2024-10-10 19:07 ` [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging Bjorn Helgaas
2024-10-14 17:22   ` Terry Bowman
2024-10-14 17:29     ` Bjorn Helgaas
2024-10-14 17:33       ` Terry Bowman
2024-10-17 16:34 ` Fan Ni
2024-10-17 17:27   ` Bowman, Terry [this message]
2024-10-21 22:19     ` Fan Ni
2024-10-18 23:22 ` Bjorn Helgaas
2024-10-21 19:22   ` Terry Bowman
2024-10-22  1:43 ` Dan Williams
2024-10-22 13:29   ` Terry Bowman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c34b676-e71a-42e8-96fe-485ffeaa8328@amd.com \
    --to=kibowman@amd.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mahesh@linux.ibm.com \
    --cc=ming4.li@intel.com \
    --cc=nathan.fontenot@amd.com \
    --cc=nifan.cxl@gmail.com \
    --cc=oohall@gmail.com \
    --cc=rrichter@amd.com \
    --cc=smita.koralahallichannabasappa@amd.com \
    --cc=terry.bowman@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox