From: Fan Ni <nifan.cxl@gmail.com>
To: "Bowman, Terry" <kibowman@amd.com>
Cc: Fan Ni <nifan.cxl@gmail.com>, Terry Bowman <terry.bowman@amd.com>,
ming4.li@intel.com, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, dan.j.williams@intel.com,
bhelgaas@google.com, mahesh@linux.ibm.com, oohall@gmail.com,
Benjamin.Cheatham@amd.com, rrichter@amd.com,
nathan.fontenot@amd.com, smita.koralahallichannabasappa@amd.com
Subject: Re: [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging
Date: Mon, 21 Oct 2024 15:19:38 -0700 [thread overview]
Message-ID: <ZxbTepcs8eJqckFN@fan> (raw)
In-Reply-To: <8c34b676-e71a-42e8-96fe-485ffeaa8328@amd.com>
On Thu, Oct 17, 2024 at 12:27:04PM -0500, Bowman, Terry wrote:
> Hi Fan,
>
> On 10/17/2024 11:34 AM, Fan Ni wrote:
> > On Tue, Oct 08, 2024 at 05:16:42PM -0500, Terry Bowman wrote:
> > > This is a continuation of the CXL port error handling RFC from earlier.[1]
> > > The RFC resulted in the decision to add CXL PCIe port error handling to
> > > the existing RCH downstream port handling. This patchset adds the CXL PCIe
> > > port handling and logging.
> > >
> > > The first 7 patches update the existing AER service driver to support CXL
> > > PCIe port protocol error handling and reporting. This includes AER service
> > > driver changes for adding correctable and uncorrectable error support, CXL
> > > specific recovery handling, and addition of CXL driver callback handlers.
> > >
> > > The following 8 patches address CXL driver support for CXL PCIe port
> > > protocol errors. This includes the following changes to the CXL drivers:
> > > mapping CXL port and downstream port RAS registers, interface updates for
> > > common RCH and VH, adding port specific error handlers, and protocol error
> > > logging.
> > >
> > > [1] - https://lore.kernel.org/linux-cxl/20240617200411.1426554
> > > -1-terry.bowman@amd.com/
> > >
> > > Testing:
> > >
> > > Below are test results for this patchset. This is using Qemu with a root
> > > port (0c:00.0), upstream switch port (0d:00.0),and downstream switch port
> > > (0e:00.0).
> > >
> > > This was tested using aer-inject updated to support CE and UCE internal
> > > error injection. CXL RAS was set using a test patch (not upstreamed).
> >
> > Hi Terry,
> > Can you share the aer-inject repo for the testing or the test patch?
Hi Terry,
Could you tell me which code base you use for this patch set?
I hit a lot of issues when trying to apply it on top of "fixes" or
"next" branches.
Fan
> >
> > Fan
>
> Sure, but, its easiest to attach the patch here.
>
> Origin was https://github.com/jderrick/aer-inject.git
> Base is 81701cbb30e35a1a76c3876f55692f91bdb9751b
>
> Regards,
> Terry
> From ca9277866b506723f46f3acd7b264ffa80c37276 Mon Sep 17 00:00:00 2001
> From: Terry Bowman <terry.bowman@amd.com>
> Date: Thu, 17 Oct 2024 12:12:58 -0500
> Subject: [PATCH] aer-inject: Add internal error injection
>
> Add corrected (CE) and uncorrected (UCE) AER internal error injection
> support.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
> aer.h | 2 ++
> aer.lex | 2 ++
> aer.y | 8 ++++----
> 3 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/aer.h b/aer.h
> index a0ad152..e55a731 100644
> --- a/aer.h
> +++ b/aer.h
> @@ -30,11 +30,13 @@ struct aer_error_inj
> #define PCI_ERR_UNC_MALF_TLP 0x00040000 /* Malformed TLP */
> #define PCI_ERR_UNC_ECRC 0x00080000 /* ECRC Error Status */
> #define PCI_ERR_UNC_UNSUP 0x00100000 /* Unsupported Request */
> +#define PCI_ERR_UNC_INTERNAL 0x00400000 /* Internal error */
> #define PCI_ERR_COR_RCVR 0x00000001 /* Receiver Error Status */
> #define PCI_ERR_COR_BAD_TLP 0x00000040 /* Bad TLP Status */
> #define PCI_ERR_COR_BAD_DLLP 0x00000080 /* Bad DLLP Status */
> #define PCI_ERR_COR_REP_ROLL 0x00000100 /* REPLAY_NUM Rollover */
> #define PCI_ERR_COR_REP_TIMER 0x00001000 /* Replay Timer Timeout */
> +#define PCI_ERR_COR_CINTERNAL 0x00004000 /* Internal error */
>
> extern void init_aer(struct aer_error_inj *err);
> extern void submit_aer(struct aer_error_inj *err);
> diff --git a/aer.lex b/aer.lex
> index 6121e4e..4fadd0e 100644
> --- a/aer.lex
> +++ b/aer.lex
> @@ -82,11 +82,13 @@ static struct key {
> KEYVAL(MALF_TLP, PCI_ERR_UNC_MALF_TLP),
> KEYVAL(ECRC, PCI_ERR_UNC_ECRC),
> KEYVAL(UNSUP, PCI_ERR_UNC_UNSUP),
> + KEYVAL(INTERNAL, PCI_ERR_UNC_INTERNAL),
> KEYVAL(RCVR, PCI_ERR_COR_RCVR),
> KEYVAL(BAD_TLP, PCI_ERR_COR_BAD_TLP),
> KEYVAL(BAD_DLLP, PCI_ERR_COR_BAD_DLLP),
> KEYVAL(REP_ROLL, PCI_ERR_COR_REP_ROLL),
> KEYVAL(REP_TIMER, PCI_ERR_COR_REP_TIMER),
> + KEYVAL(CINTERNAL, PCI_ERR_COR_CINTERNAL),
> };
>
> static int cmp_key(const void *av, const void *bv)
> diff --git a/aer.y b/aer.y
> index e5ecc7d..500dc97 100644
> --- a/aer.y
> +++ b/aer.y
> @@ -34,8 +34,8 @@ static void init(void);
>
> %token AER DOMAIN BUS DEV FN PCI_ID UNCOR_STATUS COR_STATUS HEADER_LOG
> %token <num> TRAIN DLP POISON_TLP FCP COMP_TIME COMP_ABORT UNX_COMP RX_OVER
> -%token <num> MALF_TLP ECRC UNSUP
> -%token <num> RCVR BAD_TLP BAD_DLLP REP_ROLL REP_TIMER
> +%token <num> MALF_TLP ECRC UNSUP INTERNAL
> +%token <num> RCVR BAD_TLP BAD_DLLP REP_ROLL REP_TIMER CINTERNAL
> %token <num> SYMBOL NUMBER
> %token <str> PCI_ID_STR
>
> @@ -77,14 +77,14 @@ uncor_status_list: /* empty */ { $$ = 0; }
> ;
>
> uncor_status: TRAIN | DLP | POISON_TLP | FCP | COMP_TIME | COMP_ABORT
> - | UNX_COMP | RX_OVER | MALF_TLP | ECRC | UNSUP | NUMBER
> + | UNX_COMP | RX_OVER | MALF_TLP | ECRC | UNSUP | INTERNAL | NUMBER
> ;
>
> cor_status_list: /* empty */ { $$ = 0; }
> | cor_status_list cor_status { $$ = $1 | $2; }
> ;
>
> -cor_status: RCVR | BAD_TLP | BAD_DLLP | REP_ROLL | REP_TIMER | NUMBER
> +cor_status: RCVR | BAD_TLP | BAD_DLLP | REP_ROLL | REP_TIMER | CINTERNAL | NUMBER
> ;
>
> %%
> --
> 2.34.1
>
--
Fan Ni
next prev parent reply other threads:[~2024-10-21 22:19 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-08 22:16 [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2024-10-08 22:16 ` [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler callbacks in AER service driver Terry Bowman
2024-10-22 1:53 ` Dan Williams
2024-10-22 13:50 ` Terry Bowman
2024-10-22 17:09 ` Dan Williams
2024-10-22 18:40 ` Terry Bowman
2024-10-22 23:43 ` Dan Williams
2024-10-24 15:20 ` Bowman, Terry
2024-10-24 19:10 ` Dan Williams
2024-10-08 22:16 ` [PATCH 02/15] cxl/aer/pci: Update is_internal_error() to be callable w/o CONFIG_PCIEAER_CXL Terry Bowman
2024-10-16 16:11 ` Jonathan Cameron
2024-10-22 2:17 ` Dan Williams
2024-10-22 13:54 ` Terry Bowman
2024-10-08 22:16 ` [PATCH 03/15] cxl/aer/pci: Refactor AER driver's existing interfaces to support CXL PCIe ports Terry Bowman
2024-10-10 19:11 ` Bjorn Helgaas
2024-10-14 17:27 ` Terry Bowman
2024-10-08 22:16 ` [PATCH 04/15] cxl/aer/pci: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
2024-10-16 16:22 ` Jonathan Cameron
2024-10-16 17:18 ` Terry Bowman
2024-10-16 17:29 ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 05/15] cxl/aer/pci: Update AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
2024-10-16 16:28 ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type Terry Bowman
2024-10-16 16:30 ` Jonathan Cameron
2024-10-16 17:31 ` Terry Bowman
2024-10-17 13:31 ` Jonathan Cameron
2024-10-17 14:50 ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 07/15] cxl/aer/pci: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
2024-10-16 16:54 ` Jonathan Cameron
2024-10-16 18:07 ` Terry Bowman
2024-10-17 13:43 ` Jonathan Cameron
2024-10-17 16:21 ` Bowman, Terry
2024-10-17 17:08 ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 08/15] cxl/pci: Change find_cxl_ports() to be non-static Terry Bowman
2024-10-08 22:16 ` [PATCH 09/15] cxl/pci: Map CXL PCIe downstream port RAS registers Terry Bowman
2024-10-16 17:14 ` Jonathan Cameron
2024-10-16 18:16 ` Terry Bowman
2024-10-17 13:50 ` Jonathan Cameron
2024-10-17 16:26 ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 10/15] cxl/pci: Map CXL PCIe upstream " Terry Bowman
2024-10-08 22:16 ` [PATCH 11/15] cxl/pci: Update RAS handler interfaces to support CXL PCIe ports Terry Bowman
2024-10-08 22:16 ` [PATCH 12/15] cxl/pci: Add error handler for CXL PCIe port RAS errors Terry Bowman
2024-10-17 13:57 ` Jonathan Cameron
2024-10-17 16:42 ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 13/15] cxl/pci: Add trace logging " Terry Bowman
2024-10-17 14:04 ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 14/15] cxl/aer/pci: Export pci_aer_unmask_internal_errors() Terry Bowman
2024-10-16 17:22 ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 15/15] cxl/pci: Enable internal CE/UCE interrupts for CXL PCIe port devices Terry Bowman
2024-10-16 17:21 ` Jonathan Cameron
2024-10-16 17:24 ` Terry Bowman
2024-10-10 19:07 ` [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging Bjorn Helgaas
2024-10-14 17:22 ` Terry Bowman
2024-10-14 17:29 ` Bjorn Helgaas
2024-10-14 17:33 ` Terry Bowman
2024-10-17 16:34 ` Fan Ni
2024-10-17 17:27 ` Bowman, Terry
2024-10-21 22:19 ` Fan Ni [this message]
2024-10-18 23:22 ` Bjorn Helgaas
2024-10-21 19:22 ` Terry Bowman
2024-10-22 1:43 ` Dan Williams
2024-10-22 13:29 ` Terry Bowman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZxbTepcs8eJqckFN@fan \
--to=nifan.cxl@gmail.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=jonathan.cameron@huawei.com \
--cc=kibowman@amd.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mahesh@linux.ibm.com \
--cc=ming4.li@intel.com \
--cc=nathan.fontenot@amd.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=smita.koralahallichannabasappa@amd.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.