netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <dan.j.williams@intel.com>
To: Alejandro Lucero Palau <alucerop@amd.com>,
	Dave Jiang <dave.jiang@intel.com>,
	<alejandro.lucero-palau@amd.com>, <linux-cxl@vger.kernel.org>,
	<netdev@vger.kernel.org>, <dan.j.williams@intel.com>,
	<edward.cree@amd.com>, <davem@davemloft.net>, <kuba@kernel.org>,
	<pabeni@redhat.com>, <edumazet@google.com>
Subject: Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
Date: Mon, 28 Jul 2025 10:45:11 -0700	[thread overview]
Message-ID: <6887b72724173_11968100cb@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <a548d317-887f-4b95-a911-4178eee92f0f@amd.com>

Alejandro Lucero Palau wrote:
[..]
> > Can you please explain how the accelerator driver init path is
> > different in this instance that it requires cxl_mem driver to defer
> > probing? Currently with a type3, the cxl_acpi driver will setup the
> > CXL root, hostbridges and PCI root ports. At that point the memdev
> > driver will enumerate the rest of the ports and attempt to establish
> > the hierarchy. However if cxl_acpi is not done, the mem probe will
> > fail. But, the cxl_acpi probe will trigger a re-probe sequence at
> > the end when it is done. At that point, the mem probe should
> > discover all the necessary ports if things are correct. If the
> > accelerator init path is different, can we introduce some
> > documentation to explain the difference?

The biggest difference is that devm_cxl_add_memdev() is "hopeful" in the
cxl_pci case. I.e. cxl_pci_probe() does not fail is the memory device it
registered does not ever pass cxl_mem_probe().

Accelerators are different. They want to know that the CXL side of the
house is up and running before enabling driver features that depend on
it. They also want to safely teardown driver functionality if CXL
capabilities disappear.

cxl_pci does not know or care if or when cxl_mem::probe() succeeds and
cxl_mem::remove() is invoked.

> > Also, it seems as long as port topology is not found, it will always
> > go to deferred probing. At what point do we conclude that things may
> > be missing/broken and we need to fail?

Right, at some point the driver needs to give up on CXL ever arriving.

 
> Hi Dave,
> 
> 
> The patch commit comes from Dan's original one, so I'm afraid I can not 
> explain it better myself.
> 
> 
> I added this patch again after Dan suggesting with cxl_acquire_endpoint 
> the initialization by a Type2 can obtain some protection against cxl_mem 
> or cxl_acpi being removed. I added later protection or handling against 
> this by the sfc driver after initialization. So this is the main reason 
> for this patch at least to me.
> 
> 
> Regarding the goal from the original patch, being honest, I can not see 
> the cxl_acpi problem, although I'm not saying it does not exist. But it 
> is quite confusing to me and as I said in another patch regarding probe 
> deferral, supporting that option would add complexity to the current sfc 
> driver probing. If there exists another workaround for avoiding it, that 
> would be the way I prefer to follow.

The problem is how to handle the "CXL device in PCIe-only mode" problem.
Even with a CXL endpoint directly attached to a CXL host there is no
guarantee that the device trains the link in CXL mode. So in addition to
the software-dynamic problems of module loading and asynchronous driver
bind/unbind, there is this hardware-dynamic problem.

I am losing my nerve with the cxl_acquire_endpoint() approach. Now that
I see how this driver tried to use it and the questions it generated, it
pushes too much complexity to leaf drivers. In the end, I want to
(inspired by faux_device) get to the point where the caller can assume
that successful devm_cxl_add_memdev() means that CXL is operational and
any non-interleaved CXL regions have finished auto-assembly/creation.

To get there this needs Terry's patches that set pdev->is_cxl on all
ancestor devices in order to make a determination that the hardware-CXL
link is up before going to flush software CXL-link establishment.

> Adding documentation about all this would definitely help, even without 
> the Type2 case.

I would ask that you help Terry get the protocol error handling series
in shape as part of the dependency here is to make sure that there is a
capable error model for CXL link events.

Meanwhile, I am going to rework devm_cxl_add_memdev() to make it report
when CXL port arrival is deferred, permanently failed, or successful.

  reply	other threads:[~2025-07-28 17:45 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-06-25 14:06   ` Jonathan Cameron
2025-06-30 14:38     ` Alejandro Lucero Palau
2025-07-25 21:46   ` dan.j.williams
2025-08-05 10:45     ` Alejandro Lucero Palau
2025-08-05 15:14       ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 02/22] sfc: add cxl support alejandro.lucero-palau
2025-06-25 16:37   ` Jonathan Cameron
2025-06-30 14:52     ` Alejandro Lucero Palau
2025-06-30 14:55       ` Alejandro Lucero Palau
2025-06-30 16:07         ` Jonathan Cameron
2025-07-25 22:16   ` dan.j.williams
2025-08-06  8:37     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 03/22] cxl: Move pci generic code alejandro.lucero-palau
2025-07-25 22:41   ` dan.j.williams
2025-08-06  8:46     ` Alejandro Lucero Palau
2025-08-06  9:31       ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs alejandro.lucero-palau
2025-06-27  8:27   ` Jonathan Cameron
2025-07-25 22:55   ` dan.j.williams
2025-07-28 16:23     ` Dave Jiang
2025-08-06  9:43       ` Alejandro Lucero Palau
2025-08-06  9:41     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 05/22] sfc: setup cxl component regs and set media ready alejandro.lucero-palau
2025-06-27  8:39   ` Jonathan Cameron
2025-06-30 15:57     ` Alejandro Lucero Palau
2025-08-08 13:11       ` Alejandro Lucero Palau
2025-06-27  8:45   ` Jonathan Cameron
2025-08-08 13:14     ` Alejandro Lucero Palau
2025-07-25 23:04   ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
2025-06-27  8:42   ` Jonathan Cameron
2025-06-27 16:43     ` Dave Jiang
2025-07-01 15:23     ` Alejandro Lucero Palau
2025-06-27  8:43   ` Jonathan Cameron
2025-07-01 15:25     ` Alejandro Lucero Palau
2025-07-26  0:54   ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 07/22] sfc: initialize dpa alejandro.lucero-palau
2025-07-26  0:55   ` dan.j.williams
2025-08-08 16:59     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 08/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2025-07-26  1:05   ` dan.j.williams
2025-08-08 17:01     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 09/22] sfc: create type2 cxl memdev alejandro.lucero-palau
2025-06-27  8:51   ` Jonathan Cameron
2025-07-01 15:30     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
2025-06-27  8:59   ` Jonathan Cameron
2025-06-27  9:42   ` Jonathan Cameron
2025-07-01 15:30     ` Alejandro Lucero Palau
2025-06-27 18:17   ` Dave Jiang
2025-06-30 16:20     ` Jonathan Cameron
2025-07-01 16:07       ` Alejandro Lucero Palau
2025-07-01 16:25         ` Dave Jiang
2025-07-01 16:44           ` Jonathan Cameron
2025-07-01 16:02     ` Alejandro Lucero Palau
2025-07-28 17:45       ` dan.j.williams [this message]
2025-07-30  3:46         ` dan.j.williams
2025-08-09 11:24         ` Alejandro Lucero Palau
2025-07-16 22:52   ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-06-27 22:42   ` Dave Jiang
2025-07-04 14:45     ` Alejandro Lucero Palau
2025-08-05 16:14   ` dan.j.williams
2025-08-11 12:04     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 12/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-06-27  9:10   ` Jonathan Cameron
2025-07-04 14:51     ` Alejandro Lucero Palau
2025-07-28 16:30   ` dan.j.williams
2025-08-11 14:24     ` Alejandro Lucero Palau
2025-09-02  7:11       ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-06-27  9:06   ` Jonathan Cameron
2025-07-04 15:18     ` Alejandro Lucero Palau
2025-06-27 20:46   ` Dave Jiang
2025-07-04 15:21     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 14/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-06-27  9:11   ` Jonathan Cameron
2025-07-07 11:24     ` Alejandro Lucero Palau
2025-07-16 23:48   ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
2025-09-03 17:20   ` Davidlohr Bueso
2025-06-24 14:13 ` [PATCH v17 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2025-06-27  9:13   ` Jonathan Cameron
2025-06-27 23:05     ` Dave Jiang
2025-06-30 16:20       ` Jonathan Cameron
2025-06-30 16:34         ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-06-27  9:32   ` Jonathan Cameron
2025-07-07 11:31     ` Alejandro Lucero Palau
2025-08-05 16:33   ` dan.j.williams
2025-08-11 14:45     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 19/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2025-06-27  9:33   ` Jonathan Cameron
2025-09-03 17:24   ` Davidlohr Bueso
2025-06-24 14:13 ` [PATCH v17 20/22] sfc: create cxl region alejandro.lucero-palau
2025-06-27  9:38   ` Jonathan Cameron
2025-07-07 11:37     ` Alejandro Lucero Palau
2025-07-28 16:20   ` dan.j.williams
2025-08-11 14:38     ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2025-06-27  9:46   ` Jonathan Cameron
2025-07-07 12:06     ` Alejandro Lucero Palau
2025-08-27 17:26   ` ALOK TIWARI
2025-07-25 20:51 ` [PATCH v17 00/22] Type2 device basic support dan.j.williams
2025-07-25 21:11   ` dan.j.williams
2025-08-27 16:48 ` PJ Waskiewicz
2025-08-28  8:02   ` Alejandro Lucero Palau
2025-09-04 17:48     ` PJ Waskiewicz
2025-09-05 23:23     ` PJ Waskiewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6887b72724173_11968100cb@dwillia2-mobl4.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alucerop@amd.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=edward.cree@amd.com \
    --cc=kuba@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).