Linux CXL
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Terry Bowman <Terry.Bowman@amd.com>, Li Ming <ming4.li@intel.com>,
	<dan.j.williams@intel.com>, <rrichter@amd.com>
Cc: <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/6] PCI/RCEC: Introduce pcie_walk_rcec_all()
Date: Mon, 15 Apr 2024 21:39:07 -0700	[thread overview]
Message-ID: <661e00eb808e4_4d56129429@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <d69c2157-a0da-4d8c-8684-d42afd285191@amd.com>

Terry Bowman wrote:
> Hi Li,
> 
> I added comments below.
> 
> On 3/13/24 03:35, Li Ming wrote:
> > PCIe RCEC core only provides pcie_walk_rcec() to walk all RCiEP devices
> > associating with RCEC, but CXL subsystem needs a helper function which
> > can walk all devices in RCEC associated bus range other than RCiEPs for
> > below RAS error case.
> > 
> > CXL r3.1 section 12.2.2 mentions that the CXL.cachemem protocol errors
> > detected by a CXL root port could be logged in RCEC AER Extended
> > Capability. The recommendation solution from CXL r3.1 section 9.18.1.5
> > is:
> > 
> > 	"Probe all CXL Downstream Ports and determine whether they have
> > 	logged an error in the CXL.io or CXL.cachemem status registers."
> > 
> > The new helper function called pcie_walk_rcec_all(), CXL RAS error
> > handler can use it to locate all CXL root ports or CXL devices in RCEC
> > associated bus range.
> 
> The RCEC-root port relation you mention is new to me. Typically, not in 
> all cases, RCH-RCD has a RCEC. And a VH mode system has a root port 
> instead. The RCH RCEC and VH root port are both bound to the PCIeport 
> bus driver that supports handling and logging AER. This allows the PCIe 
> port bus driver to handle AER in a RCEC and root port AER using the same 
> procedure and accesses to the AER capability registers. 
> 
> This is oversimplified but are you looking to handle root port AER error 
> in the RCEC from the below diagram? 
> 
> RCEC <--> CXL root port (bridge) <--> Endpoint
> 
> > 
> > Signed-off-by: Li Ming <ming4.li@intel.com>
> > ---
> >  drivers/pci/pci.h       |  6 ++++++
> >  drivers/pci/pcie/rcec.c | 44 +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 48 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 5ecbcf041179..a068f2d7dd28 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -444,6 +444,9 @@ void pcie_link_rcec(struct pci_dev *rcec);
> >  void pcie_walk_rcec(struct pci_dev *rcec,
> >  		    int (*cb)(struct pci_dev *, void *),
> >  		    void *userdata);
> > +void pcie_walk_rcec_all(struct pci_dev *rcec,
> > +			int (*cb)(struct pci_dev *, void *),
> > +			void *userdata);
> >  #else
> >  static inline void pci_rcec_init(struct pci_dev *dev) { }
> >  static inline void pci_rcec_exit(struct pci_dev *dev) { }
> > @@ -451,6 +454,9 @@ static inline void pcie_link_rcec(struct pci_dev *rcec) { }
> >  static inline void pcie_walk_rcec(struct pci_dev *rcec,
> >  				  int (*cb)(struct pci_dev *, void *),
> >  				  void *userdata) { }
> > +static inline void pcie_walk_rcec_all(struct pci_dev *rcec,
> > +				      int (*cb)(struct pci_dev *, void *),
> > +				      void *userdata) { }
> >  #endif
> >  
> >  #ifdef CONFIG_PCI_ATS
> > diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> > index d0bcd141ac9c..189de280660c 100644
> > --- a/drivers/pci/pcie/rcec.c
> > +++ b/drivers/pci/pcie/rcec.c
> > @@ -65,6 +65,15 @@ static int walk_rcec_helper(struct pci_dev *dev, void *data)
> >  	return 0;
> >  }
> >  
> > +static int walk_rcec_all_helper(struct pci_dev *dev, void *data)
> > +{
> > +	struct walk_rcec_data *rcec_data = data;
> > +
> > +	rcec_data->user_callback(dev, rcec_data->user_data);
> > +
> > +	return 0;
> > +}
> > +
> >  static void walk_rcec(int (*cb)(struct pci_dev *dev, void *data),
> >  		      void *userdata)
> >  {
> > @@ -83,7 +92,7 @@ static void walk_rcec(int (*cb)(struct pci_dev *dev, void *data),
> >  	nextbusn = rcec->rcec_ea->nextbusn;
> >  	lastbusn = rcec->rcec_ea->lastbusn;
> >  
> > -	/* All RCiEP devices are on the same bus as the RCEC */
> > +	/* All devices are on the same bus as the RCEC */
> 
> RCiEPs are not guaranteed to be on same bus as RCEC. Details for associated 
> next and last busses:
> 
> "This register does not indicate association between an Event Collector and 
> any Function on the same Bus Number as the Event Collector itself, however 
> it is permitted for the Association Bus Range to include the Bus Number of 
> the Root Complex Event Collector."[1]
> 
> [1] PCI Spec 6.0 - RCEC Associated Bus Numbers Register 9Ofset 08h)

Hi Terry,

This patchset is responding to the implications of the implementation
note in 9.18.1.5 RCEC Downstream Port Association Structure (RDPAS).
That says that CXL.io and CXL.cachemem errors in Root Ports may indeed
be signaled to an RCEC. Do you expect that implementation note to cause
any issues on platforms that do not follow that CXL spec behavior?

My expectation is that it may just cause extra polling for errors, but
not cause any harm.

  reply	other threads:[~2024-04-16  4:39 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13  8:35 [RFC PATCH 0/6] Add support for root port RAS error handling Li Ming
2024-03-13  8:35 ` [RFC PATCH 1/6] PCI/RCEC: Introduce pcie_walk_rcec_all() Li Ming
2024-03-25 20:15   ` Terry Bowman
2024-04-16  4:39     ` Dan Williams [this message]
2024-04-22 14:34       ` Terry Bowman
2024-04-22 23:03         ` Dan Williams
2024-04-23  2:33           ` Li, Ming
2024-04-16  7:23     ` Li, Ming
2024-03-13  8:35 ` [RFC PATCH 2/6] PCI/CXL: A new attribute to indicate CXL-capable host bridge Li Ming
2024-03-13  8:35 ` [RFC PATCH 3/6] PCI/AER: Enable RCEC to report internal error for CXL root port Li Ming
2024-03-25 19:42   ` Terry Bowman
2024-04-16  7:27     ` Li, Ming
2024-04-16 14:46       ` Terry Bowman
2024-04-18  5:53         ` Li, Ming
2024-04-18 14:57           ` Dan Williams
2024-04-22  2:06             ` Li, Ming
2024-04-22 23:01               ` Dan Williams
2024-03-13  8:36 ` [RFC PATCH 4/6] PCI/AER: Extend RCH RAS error handling to support VH topology case Li Ming
2024-03-15  2:30   ` Dan Williams
2024-03-15  3:43     ` Li, Ming
2024-03-15  4:05       ` Dan Williams
2024-03-15  5:08         ` Li, Ming
2024-03-25 19:14   ` Terry Bowman
2024-03-13  8:36 ` [RFC PATCH 5/6] cxl: Use __free() for cxl_pci/mem_find_port() to drop put_device() Li Ming
2024-03-15  2:24   ` Dan Williams
2024-03-15  4:05     ` Li, Ming
2024-03-13  8:36 ` [RFC PATCH 6/6] cxl/pci: Support to handle root port RAS errors captured by RCEC Li Ming
2024-03-15  1:45 ` [RFC PATCH 0/6] Add support for root port RAS error handling Dan Williams
2024-03-15  8:40   ` Li, Ming
2024-03-15 18:21     ` Dan Williams
2024-03-20 12:48       ` Li, Ming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=661e00eb808e4_4d56129429@dwillia2-mobl3.amr.corp.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Terry.Bowman@amd.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming4.li@intel.com \
    --cc=rrichter@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox