From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-pci@vger.kernel.org>, <nifan.cxl@gmail.com>,
<dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
<dan.j.williams@intel.com>, <bhelgaas@google.com>,
<mahesh@linux.ibm.com>, <ira.weiny@intel.com>, <oohall@gmail.com>,
<Benjamin.Cheatham@amd.com>, <rrichter@amd.com>,
<nathan.fontenot@amd.com>,
<Smita.KoralahalliChannabasappa@amd.com>, <lukas@wunner.de>,
<ming.li@zohomail.com>, <PradeepVineshReddy.Kodamati@amd.com>
Subject: Re: [PATCH v8 04/16] cxl/aer: AER service driver forwards CXL error to CXL driver
Date: Wed, 23 Apr 2025 16:04:43 +0100 [thread overview]
Message-ID: <20250423160443.00006ee0@huawei.com> (raw)
In-Reply-To: <20250327014717.2988633-5-terry.bowman@amd.com>
On Wed, 26 Mar 2025 20:47:05 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> The AER service driver includes a CXL-specific kfifo, intended to forward
> CXL errors to the CXL driver. However, the forwarding functionality is
> currently unimplemented. Update the AER driver to enable error forwarding
> to the CXL driver.
>
> Modify the AER service driver's handle_error_source(), which is called from
> process_aer_err_devices(), to distinguish between PCIe and CXL errors.
>
> Rename and update is_internal_error() to is_cxl_error(). Ensuring it
> checks both the 'struct aer_info::is_cxl' flag and the AER internal error
> masks.
>
> If the error is a standard PCIe error then continue calling pcie_aer_handle_error()
> as done in the current AER driver.
>
> If the error is a CXL-related error then forward it to the CXL driver for
> handling using the kfifo mechanism.
>
> Introduce a new function forward_cxl_error(), which constructs a CXL
> protocol error context using cxl_create_prot_err_info(). This context is
> then passed to the CXL driver via kfifo using a 'struct work_struct'.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Hi Terry,
Finally got back to this. I'm not following how some of the reference
counting in here is working. It might be fine but there is a lot
taking then dropping device references - some of which are taken again later.
> @@ -1082,10 +1094,44 @@ static void cxl_rch_enable_rcec(struct pci_dev *rcec)
> pci_info(rcec, "CXL: Internal errors unmasked");
> }
>
> +static void forward_cxl_error(struct pci_dev *_pdev, struct aer_err_info *info)
> +{
> + int severity = info->severity;
So far this variable isn't really justified. Maybe it makes sense later in the
series?
> + struct cxl_prot_err_work_data wd;
> + struct cxl_prot_error_info *err_info = &wd.err_info;
Similarly. Why not just use this directly in the call below?
> + struct pci_dev *pdev __free(pci_dev_put) = pci_dev_get(_pdev);
Can you talk me through the reference counting? You take one pci device reference
here....
> +
> + if (!cxl_create_prot_err_info) {
> + pci_err(pdev, "Failed. CXL-AER interface not initialized.");
> + return;
> + }
> +
> + if (cxl_create_prot_err_info(pdev, severity, err_info)) {
...but the implementation of this also takes once internally. Can we skip that
internal one and document that it is always take by the caller?
> + pci_err(pdev, "Failed to create CXL protocol error information");
> + return;
> + }
> +
> + struct device *cxl_dev __free(put_device) = get_device(err_info->dev);
Also this one. A reference was acquired and dropped in cxl_create_prot_err_info()
followed by retaking it here. How do we know it is still about by this call
and once we pull it off the kfifo later?
> +
> + if (!kfifo_put(&cxl_prot_err_fifo, wd)) {
> + pr_err_ratelimited("CXL kfifo overflow\n");
> + return;
> + }
> +
> + schedule_work(cxl_prot_err_work);
> +}
> +
Thanks,
Jonathan
next prev parent reply other threads:[~2025-04-23 15:04 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-27 1:47 [PATCH v8 00/16] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2025-03-27 1:47 ` [PATCH v8 01/16] PCI/CXL: Introduce PCIe helper function pcie_is_cxl() Terry Bowman
2025-03-27 15:11 ` Ira Weiny
2025-03-27 15:30 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 02/16] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
2025-03-27 16:48 ` Bjorn Helgaas
2025-03-27 17:15 ` Bowman, Terry
2025-03-27 17:49 ` Bjorn Helgaas
2025-03-27 16:58 ` Ira Weiny
2025-03-27 17:17 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 03/16] CXL/AER: Introduce Kfifo for forwarding CXL errors Terry Bowman
2025-03-27 17:08 ` Bjorn Helgaas
2025-03-27 18:12 ` Bowman, Terry
2025-03-28 17:02 ` Bjorn Helgaas
2025-03-28 17:36 ` Bowman, Terry
2025-03-28 17:01 ` Ira Weiny
2025-04-07 13:43 ` Bowman, Terry
2025-04-04 16:53 ` Jonathan Cameron
2025-04-23 14:33 ` Jonathan Cameron
2025-04-23 15:04 ` Jonathan Cameron
2025-04-23 22:12 ` Gregory Price
2025-03-27 1:47 ` [PATCH v8 04/16] cxl/aer: AER service driver forwards CXL error to CXL driver Terry Bowman
2025-03-27 17:13 ` Bjorn Helgaas
2025-04-07 14:00 ` Bowman, Terry
2025-04-23 15:04 ` Jonathan Cameron [this message]
2025-04-24 14:17 ` Bowman, Terry
2025-04-25 13:18 ` Jonathan Cameron
2025-04-25 21:03 ` Bowman, Terry
2025-05-15 21:52 ` Bowman, Terry
2025-05-20 11:04 ` Jonathan Cameron
2025-05-20 13:21 ` Bowman, Terry
2025-05-21 18:34 ` Jonathan Cameron
2025-05-21 23:30 ` Bowman, Terry
2025-04-23 22:21 ` Gregory Price
2025-03-27 1:47 ` [PATCH v8 05/16] PCI/AER: CXL driver dequeues CXL error forwarded from AER service driver Terry Bowman
2025-03-27 4:43 ` kernel test robot
2025-04-23 16:28 ` Jonathan Cameron
2025-04-24 15:03 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 06/16] CXL/PCI: Introduce CXL uncorrectable protocol error 'recovery' Terry Bowman
2025-03-27 3:37 ` kernel test robot
2025-03-27 4:19 ` kernel test robot
2025-04-23 16:35 ` Jonathan Cameron
2025-04-24 14:22 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 07/16] cxl/pci: Move existing CXL RAS initialization to CXL's cxl_port driver Terry Bowman
2025-04-17 10:18 ` Jonathan Cameron
2025-04-24 14:25 ` Bowman, Terry
2025-05-12 14:47 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 08/16] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-03-27 1:47 ` [PATCH v8 09/16] cxl/pci: Update RAS handler interfaces to also support CXL PCIe Ports Terry Bowman
2025-03-27 1:47 ` [PATCH v8 10/16] cxl/pci: Add log message if RAS registers are not mapped Terry Bowman
2025-04-23 16:41 ` Jonathan Cameron
2025-04-24 14:30 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 11/16] cxl/pci: Unifi CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-04-23 16:44 ` Jonathan Cameron
2025-05-07 16:28 ` Shiju Jose
2025-05-07 18:30 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 12/16] cxl/pci: Assign CXL Port protocol error handlers Terry Bowman
2025-04-23 16:47 ` Jonathan Cameron
2025-03-27 1:47 ` [PATCH v8 13/16] cxl/pci: Assign CXL Endpoint " Terry Bowman
2025-03-27 19:46 ` kernel test robot
2025-04-23 16:49 ` Jonathan Cameron
2025-03-27 1:47 ` [PATCH v8 14/16] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-04-17 17:22 ` Jonathan Cameron
2025-03-27 1:47 ` [PATCH v8 15/16] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-04-04 17:05 ` Jonathan Cameron
2025-04-07 14:34 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 16/16] CXL/PCI: Disable CXL protocol errors during CXL Port cleanup Terry Bowman
2025-03-28 1:18 ` kernel test robot
2025-04-04 17:04 ` Jonathan Cameron
2025-04-07 14:25 ` Bowman, Terry
2025-04-17 10:13 ` Jonathan Cameron
2025-04-24 16:37 ` Bowman, Terry
2025-03-27 17:16 ` [PATCH v8 00/16] Enable CXL PCIe port protocol error handling and logging Bjorn Helgaas
2025-03-27 22:04 ` Bowman, Terry
2025-05-06 23:06 ` Gregory Price
2025-05-07 18:28 ` Bowman, Terry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250423160443.00006ee0@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=ming.li@zohomail.com \
--cc=nathan.fontenot@amd.com \
--cc=nifan.cxl@gmail.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.