public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Terry Bowman <Terry.Bowman@amd.com>
To: Dan Williams <dan.j.williams@intel.com>,
	ira.weiny@intel.com, dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, ming4.li@intel.com,
	vishal.l.verma@intel.com, jim.harris@samsung.com,
	ilpo.jarvinen@linux.intel.com, ardb@kernel.org,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	Yazen.Ghannam@amd.com, Robert.Richter@amd.com
Subject: Re: [RFC PATCH 0/9] Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports
Date: Tue, 25 Jun 2024 09:29:45 -0500	[thread overview]
Message-ID: <6ef2bf5d-b78b-490f-b64f-30dec3197df5@amd.com> (raw)
In-Reply-To: <6679dc345fd4c_5639294a5@dwillia2-xfh.jf.intel.com.notmuch>



On 6/24/24 15:51, Dan Williams wrote:
> Terry Bowman wrote:
>> Hi Dan,
>>
>> I added responses below.
>>
>> On 6/21/24 14:04, Dan Williams wrote:
>>> Terry Bowman wrote:
>>>> This patchset provides RAS logging for CXL root ports, CXL downstream
>>>> switch ports, and CXL upstream switch ports. This includes changes to
>>>> use a portdrv notifier chain to communicate CXL AER/RAS errors to a
>>>> cxl_pci callback.
>>>>
>>>> The first 3 patches prepare for and add an atomic notifier chain to the
>>>> portdrv driver. The portdrv's notifier chain reports the port device's
>>>> AER internal errors to the registered callback(s). The preparation changes
>>>> include a portdrv update to call the uncorrectable handler for PCIe root
>>>> ports and PCIe downstream switch ports. Also, the AER correctable error
>>>> (CE) status is made available to the AER CE handler.
>>>>
>>>> The next 4 patches are in preparation for adding an atomic notification
>>>> callback in the cxl_pci driver. This is for receiving AER internal error
>>>> events from the portdrv notifier chain. Preparation includes adding RAS
>>>> register block mapping, adding trace functions for logging, and
>>>> refactoring cxl_pci RAS functions for reuse.
>>>>
>>>> The final 2 patches enable the AER internal error interrupts.
>>> [..] 
>>>>
>>>> Solutions Considered (1-4):
>>>>   Below are solutions that were considered. Solution #4 is
>>>>   implemented in this patchset. 
>>> [..]
>>>>  2.) Update the AER driver to call cxl_pci driver's error handler before
>>>>  calling pci_aer_handle_error()
>>>>
>>>>  This is similar to the existing RCH port error approach in aer.c.
>>>>  In this solution the AER driver searches for a downstream CXL endpoint
>>>>  to 'handle' detected CXL port protocol errors.
>>>>
>>>>  This is a good solution to consider if the one presented in this patchset
>>>>  is not acceptable. I was initially reluctant to this approach because it
>>>>  adds more CXL coupling to the AER driver. But, I think this solution
>>>>  would technically work. I believe Ming was working towards this
>>>>  solution.
>>>
>>> I feel like the coupling is warranted because these things *are* PCIe
>>> and CXL ports, but it means solving the interrupt distribution problem.
>>>
>>
>> I understand the service driver interrupt issue but it is not clear how it 
>> applies to the CXL port error handling. Can you help me understand how the 
>> interrupt issue affects CXL port AER UIE/CIE handling in the AER driver.
> 
> Just the case of the AER MSI/-X vector being multiplexed with other CXL
> functionality on the same device. If the CXL interrupt vector is to be
> enabled later then it means MSI/-X vector enabling needs to be dynamic.
> 
> ...but yeah, not a problem now as we are only talking about PCIe AER
> events and not multiplexing yet. I.e. that problem can be solved later.
> 
>>
>>
>>>>   3.) Refactor portdrv
>>>>   The portdrv refactoring solution is to change the portdrv service drivers
>>>>   into PCIe auxiliary drivers. With this change the facility drivers can be
>>>>   associated with a PCIe driver instead fixed bound to the portdrv driver.
>>>>
>>>>   In this case the CXL port functionality would be added either as a CXL
>>>>   auxiliary driver or as a CXL specific port driver
>>>>   (PCI_CLASS_BRIDGE_PCI_NORMAL).
>>>>
>>>>   This solution has challenges in the interrupt allocation by separate
>>>>   auxiliary drivers and in binding of a specific driver. Binding is
>>>>   currently based on PCIe class and would require extending the binding
>>>>   logic to support multiple drivers for the same class.
>>>>
>>>>   Jonathan Cameron is working towards this solution by initially solving
>>>>   for the PMU service driver.[1] It is using the auxiliary bus to associate
>>>>   what were service drivers with the portdrv driver. Using a CXL auxiliary
>>>>   for handling CXL port RAS errors would result in RAS logic called from
>>>>   the cxl_pci and CXL auxiliary drivers. This may need a library driver.
>>>
>>> I don't think auxiliary bus is a fundamental step forward from pcie
>>> portdrv, it's just a s/pcie_port_bus_type/auxiliary_bus_type/ rename,
>>> but with all the same problems around how to distribute interrupt
>>> services to different interested parties.
>>>
>>> So I think notifiers are interesting from the perspective of a software
>>> hack to enable interrupt distribution. However, given that dynamic MSI-X
>>> support is within reach I am interested in exploring that path and
>>> mandating that archs that want to handle CXL protocol errors natively
>>> need to enable dynamic MSI-X. Otherwise, those platforms should disclaim
>>> native protocol error handling support via CXL _OSC.
>>>
>>> In other words, I expect native dynamic MSI-X support is more
>>> maintainable in the sense of keeping all the code in one notification
>>> domain.
>>>
>>>>   4.) Using a portdrv notifier chain/callback for CIE/UIE
>>>>   (Implemented in this patchset)
>>>>
>>>>   This solution uses a portdrv atomic chain notifier and a cxl_pci
>>>>   callback to handle and log CXL port RAS errors.
>>>
>>> Oh, I will need to look that the cxl_pci tie in for this, I was
>>> expecting cxl_pci only gets involved in the RCH case because the port
>>> and the endpoint are one in the same object. in the VH case I would only
>>> expect cxl_pci to get involved for its own observed protocol errors, not
>>> those reported upstream from that endpoint.
>>>
>>
>> The CXL port error handling needs a place to live with few options at the moment.
>> Where do you want the CXL port error handlers to reside? 
> 
> I need to go understand exactly why cxl_pci is involved in this current
> proposal, but I was thinking it is probably more natural for cxl_port to
> have error handlers.

Ok. I agree, cxl_port is a better location for the handlers.

Regards,
Terry

  reply	other threads:[~2024-06-25 14:29 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-17 20:04 [RFC PATCH 0/9] Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers Terry Bowman
2024-06-20 11:21   ` Jonathan Cameron
2024-06-24 14:58     ` Terry Bowman
2024-06-21 19:17   ` Dan Williams
2024-06-24 17:56     ` Terry Bowman
2024-07-10 20:48       ` nifan.cxl
2024-07-10 21:48         ` Terry Bowman
2024-07-11  1:14           ` fan
2024-08-19 18:35       ` Fan Ni
2024-06-17 20:04 ` [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register Terry Bowman
2024-06-20 11:31   ` Jonathan Cameron
2024-06-24 15:08     ` Terry Bowman
2024-06-21 19:23   ` Dan Williams
2024-06-24 18:00     ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman
2024-06-20 12:30   ` Jonathan Cameron
2024-06-24 15:22     ` Terry Bowman
2024-06-21 19:36   ` Dan Williams
2024-06-24 18:21     ` Terry Bowman
2024-06-24 21:46       ` Dan Williams
2024-06-25 14:41         ` Terry Bowman
2024-06-26  2:54   ` Li, Ming4
2024-06-26 13:39     ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 4/9] cxl/pci: Map CXL PCIe ports' RAS registers Terry Bowman
2024-06-20 12:46   ` Jonathan Cameron
2024-06-24 15:51     ` Terry Bowman
2024-07-02 15:18       ` Jonathan Cameron
2024-06-26  3:39   ` Li, Ming4
2024-06-17 20:04 ` [RFC PATCH 5/9] cxl/pci: Update RAS handler interfaces to support CXL PCIe ports Terry Bowman
2024-06-20 12:49   ` Jonathan Cameron
2024-07-15 17:50   ` nifan.cxl
2024-06-17 20:04 ` [RFC PATCH 6/9] cxl/pci: Add trace logging for CXL PCIe port RAS errors Terry Bowman
2024-06-20 12:53   ` Jonathan Cameron
2024-06-24 15:53     ` Terry Bowman
2024-07-02 15:53       ` Jonathan Cameron
2024-06-17 20:04 ` [RFC PATCH 7/9] cxl/pci: Add atomic notifier callback for CXL PCIe port AER internal errors Terry Bowman
2024-06-20 13:09   ` Jonathan Cameron
2024-06-24 16:09     ` Terry Bowman
2024-07-02 15:58       ` Jonathan Cameron
2024-06-26  6:22   ` Li, Ming4
2024-06-26 13:51     ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
2024-06-19  7:09   ` Christoph Hellwig
2024-06-19 15:40     ` Terry Bowman
2024-06-20 13:11   ` Jonathan Cameron
2024-06-24 16:22     ` Terry Bowman
2024-07-10 21:47   ` Bjorn Helgaas
2024-06-17 20:04 ` [RFC PATCH 9/9] cxl/pci: Enable interrupts for CXL PCIe ports' AER internal errors Terry Bowman
2024-06-20 13:15   ` Jonathan Cameron
2024-06-24 16:46     ` Terry Bowman
2024-07-02 16:00       ` Jonathan Cameron
2024-06-21 19:04 ` [RFC PATCH 0/9] Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports Dan Williams
2024-06-24 17:47   ` Terry Bowman
2024-06-24 20:51     ` Dan Williams
2024-06-25 14:29       ` Terry Bowman [this message]
2024-07-25 18:49 ` fan
2024-08-19 16:21   ` Terry Bowman
2024-08-19 18:17     ` Fan Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ef2bf5d-b78b-490f-b64f-30dec3197df5@amd.com \
    --to=terry.bowman@amd.com \
    --cc=Robert.Richter@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=ardb@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jim.harris@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming4.li@intel.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox