From: Alejandro Lucero Palau <alucerop@amd.com>
To: Zhi Wang <zhiw@nvidia.com>, alejandro.lucero-palau@amd.com
Cc: linux-cxl@vger.kernel.org, netdev@vger.kernel.org,
dan.j.williams@intel.com, martin.habets@xilinx.com,
edward.cree@amd.com, davem@davemloft.net, kuba@kernel.org,
pabeni@redhat.com, edumazet@google.com
Subject: Re: [PATCH v3 10/20] cxl: indicate probe deferral
Date: Mon, 16 Sep 2024 11:08:17 +0100 [thread overview]
Message-ID: <f43a3b62-2f36-6165-c142-b0bfc6fc32e6@amd.com> (raw)
In-Reply-To: <20240912121908.000054dc.zhiw@nvidia.com>
On 9/12/24 10:19, Zhi Wang wrote:
> On Sat, 7 Sep 2024 09:18:26 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
> Hi Alejandro:
>
> When working with V2, I noticed that if CONFIG_CXL_MEM=m and cxl_mem.ko
> is not loaded, loading the type-2 driver would fail on
> cxl_acquire_endpoint(). Not sure if you met the same problem.
I think I have some problems with kernel build depending on if CXL code
is configured as modules, and even if CXL is not configured at all, what
it was raised by the kernel build robot.
I'll work on this for v4.
Thanks!
> Now we are waiting for it to be loaded, it seems not ideal with the
> problem.
>
> Thanks,
> Zhi.
>
>> The first stop for a CXL accelerator driver that wants to establish
>> new CXL.mem regions is to register a 'struct cxl_memdev. That kicks
>> off cxl_mem_probe() to enumerate all 'struct cxl_port' instances in
>> the topology up to the root.
>>
>> If the root driver has not attached yet the expectation is that the
>> driver waits until that link is established. The common cxl_pci_driver
>> has reason to keep the 'struct cxl_memdev' device attached to the bus
>> until the root driver attaches. An accelerator may want to instead
>> defer probing until CXL resources can be acquired.
>>
>> Use the @endpoint attribute of a 'struct cxl_memdev' to convey when
>> accelerator driver probing should be deferred vs failed. Provide that
>> indication via a new cxl_acquire_endpoint() API that can retrieve the
>> probe status of the memdev.
>>
>> Based on
>> https://lore.kernel.org/linux-cxl/168592155270.1948938.11536845108449547920.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> drivers/cxl/core/memdev.c | 67
>> +++++++++++++++++++++++++++++++++++++++ drivers/cxl/core/port.c |
>> 2 +- drivers/cxl/mem.c | 4 ++-
>> include/linux/cxl/cxl.h | 2 ++
>> 4 files changed, 73 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index 5f8418620b70..d4406cf3ed32 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -5,6 +5,7 @@
>> #include <linux/io-64-nonatomic-lo-hi.h>
>> #include <linux/firmware.h>
>> #include <linux/device.h>
>> +#include <linux/delay.h>
>> #include <linux/slab.h>
>> #include <linux/idr.h>
>> #include <linux/pci.h>
>> @@ -23,6 +24,8 @@ static DECLARE_RWSEM(cxl_memdev_rwsem);
>> static int cxl_mem_major;
>> static DEFINE_IDA(cxl_memdev_ida);
>>
>> +static unsigned short endpoint_ready_timeout = HZ;
>> +
>> static void cxl_memdev_release(struct device *dev)
>> {
>> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>> @@ -1163,6 +1166,70 @@ struct cxl_memdev *devm_cxl_add_memdev(struct
>> device *host, }
>> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, CXL);
>>
>> +/*
>> + * Try to get a locked reference on a memdev's CXL port topology
>> + * connection. Be careful to observe when cxl_mem_probe() has
>> deposited
>> + * a probe deferral awaiting the arrival of the CXL root driver.
>> + */
>> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
>> +{
>> + struct cxl_port *endpoint;
>> + unsigned long timeout;
>> + int rc = -ENXIO;
>> +
>> + /*
>> + * A memdev creation triggers ports creation through the
>> kernel
>> + * device object model. An endpoint port could not be
>> created yet
>> + * but coming. Wait here for a gentle space of time for
>> ensuring
>> + * and endpoint port not there is due to some error and not
>> because
>> + * the race described.
>> + *
>> + * Note this is a similar case this function is implemented
>> for, but
>> + * instead of the race with the root port, this is against
>> its own
>> + * endpoint port.
>> + */
>> + timeout = jiffies + endpoint_ready_timeout;
>> + do {
>> + device_lock(&cxlmd->dev);
>> + endpoint = cxlmd->endpoint;
>> + if (endpoint)
>> + break;
>> + device_unlock(&cxlmd->dev);
>> + if (msleep_interruptible(100)) {
>> + device_lock(&cxlmd->dev);
>> + break;
>> + }
>> + } while (!time_after(jiffies, timeout));
>> +
>> + if (!endpoint)
>> + goto err;
>> +
>> + if (IS_ERR(endpoint)) {
>> + rc = PTR_ERR(endpoint);
>> + goto err;
>> + }
>> +
>> + device_lock(&endpoint->dev);
>> + if (!endpoint->dev.driver)
>> + goto err_endpoint;
>> +
>> + return endpoint;
>> +
>> +err_endpoint:
>> + device_unlock(&endpoint->dev);
>> +err:
>> + device_unlock(&cxlmd->dev);
>> + return ERR_PTR(rc);
>> +}
>> +EXPORT_SYMBOL_NS(cxl_acquire_endpoint, CXL);
>> +
>> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port
>> *endpoint) +{
>> + device_unlock(&endpoint->dev);
>> + device_unlock(&cxlmd->dev);
>> +}
>> +EXPORT_SYMBOL_NS(cxl_release_endpoint, CXL);
>> +
>> static void sanitize_teardown_notifier(void *data)
>> {
>> struct cxl_memdev_state *mds = data;
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 39b20ddd0296..ca2c993faa9c 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -1554,7 +1554,7 @@ static int add_port_attach_ep(struct cxl_memdev
>> *cxlmd, */
>> dev_dbg(&cxlmd->dev, "%s is a root dport\n",
>> dev_name(dport_dev));
>> - return -ENXIO;
>> + return -EPROBE_DEFER;
>> }
>>
>> parent_port = find_cxl_port(dparent, &parent_dport);
>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
>> index 5c7ad230bccb..56fd7a100c2f 100644
>> --- a/drivers/cxl/mem.c
>> +++ b/drivers/cxl/mem.c
>> @@ -145,8 +145,10 @@ static int cxl_mem_probe(struct device *dev)
>> return rc;
>>
>> rc = devm_cxl_enumerate_ports(cxlmd);
>> - if (rc)
>> + if (rc) {
>> + cxlmd->endpoint = ERR_PTR(rc);
>> return rc;
>> + }
>>
>> parent_port = cxl_mem_find_port(cxlmd, &dport);
>> if (!parent_port) {
>> diff --git a/include/linux/cxl/cxl.h b/include/linux/cxl/cxl.h
>> index fc0859f841dc..7e4580fb8659 100644
>> --- a/include/linux/cxl/cxl.h
>> +++ b/include/linux/cxl/cxl.h
>> @@ -57,4 +57,6 @@ int cxl_release_resource(struct cxl_dev_state
>> *cxlds, enum cxl_resource type); void cxl_set_media_ready(struct
>> cxl_dev_state *cxlds); struct cxl_memdev *devm_cxl_add_memdev(struct
>> device *host, struct cxl_dev_state *cxlds);
>> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd);
>> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port
>> *endpoint); #endif
next prev parent reply other threads:[~2024-09-16 10:09 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-07 8:18 [PATCH v3 00/20] cxl: add Type2 device support alejandro.lucero-palau
2024-09-07 8:18 ` [PATCH v3 01/20] cxl: add type2 device basic support alejandro.lucero-palau
2024-09-07 20:26 ` kernel test robot
2024-09-10 6:12 ` Li, Ming4
2024-09-10 7:25 ` Alejandro Lucero Palau
2024-09-12 8:57 ` Zhi Wang
2024-09-16 9:52 ` Alejandro Lucero Palau
2024-09-12 9:35 ` Zhi Wang
2024-09-16 10:03 ` Alejandro Lucero Palau
2024-09-13 16:41 ` Jonathan Cameron
2024-09-16 12:03 ` Alejandro Lucero Palau
2024-09-16 12:24 ` Jonathan Cameron
2024-09-07 8:18 ` [PATCH v3 02/20] cxl: add capabilities field to cxl_dev_state and cxl_port alejandro.lucero-palau
2024-09-07 18:08 ` kernel test robot
2024-09-11 22:17 ` Dave Jiang
2024-09-16 8:36 ` Alejandro Lucero Palau
2024-09-16 16:07 ` Dave Jiang
2024-09-13 17:25 ` Jonathan Cameron
2024-09-16 12:13 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 03/20] cxl/pci: add check for validating capabilities alejandro.lucero-palau
2024-09-10 3:26 ` Li, Ming4
2024-09-10 6:24 ` Li, Ming4
2024-09-10 7:31 ` Alejandro Lucero Palau
2024-09-11 23:06 ` Dave Jiang
2024-09-16 8:56 ` Alejandro Lucero Palau
2024-09-16 16:11 ` Dave Jiang
2024-09-13 17:28 ` Jonathan Cameron
2024-09-16 12:17 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 04/20] cxl: move pci generic code alejandro.lucero-palau
2024-09-11 23:55 ` Dave Jiang
2024-09-16 9:46 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 05/20] cxl: add function for type2 cxl regs setup alejandro.lucero-palau
2024-09-10 6:00 ` Li, Ming4
2024-09-10 7:24 ` Alejandro Lucero Palau
2024-09-12 9:08 ` Zhi Wang
2024-09-13 17:32 ` Jonathan Cameron
2024-09-16 12:23 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 06/20] cxl: add functions for resource request/release by a driver alejandro.lucero-palau
2024-09-10 6:15 ` Li, Ming4
2024-09-16 8:15 ` Alejandro Lucero Palau
2024-09-13 17:35 ` Jonathan Cameron
2024-09-16 12:33 ` Alejandro Lucero Palau
2024-09-16 13:21 ` Jonathan Cameron
2024-09-07 8:18 ` [PATCH v3 07/20] cxl: harden resource_contains checks to handle zero size resources alejandro.lucero-palau
2024-09-13 17:36 ` Jonathan Cameron
2024-09-16 12:36 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 08/20] cxl: add function for setting media ready by a driver alejandro.lucero-palau
2024-09-07 8:18 ` [PATCH v3 09/20] cxl: support type2 memdev creation alejandro.lucero-palau
2024-09-12 18:19 ` Dave Jiang
2024-09-16 12:38 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 10/20] cxl: indicate probe deferral alejandro.lucero-palau
2024-09-10 6:37 ` Li, Ming4
2024-09-16 8:24 ` Alejandro Lucero Palau
2024-09-17 3:31 ` Li, Ming4
2024-09-17 9:16 ` Alejandro Lucero Palau
2024-09-12 9:19 ` Zhi Wang
2024-09-16 10:08 ` Alejandro Lucero Palau [this message]
2024-09-13 17:43 ` Jonathan Cameron
2024-09-16 13:24 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 11/20] cxl: define a driver interface for HPA free space enumaration alejandro.lucero-palau
2024-09-13 17:52 ` Jonathan Cameron
2024-09-16 14:09 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 12/20] efx: use acquire_endpoint when looking for free HPA alejandro.lucero-palau
2024-09-07 19:33 ` kernel test robot
2024-09-12 23:09 ` Dave Jiang
2024-09-16 10:29 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 13/20] cxl: define a driver interface for DPA allocation alejandro.lucero-palau
2024-09-13 17:59 ` Jonathan Cameron
2024-09-16 14:26 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 14/20] cxl: make region type based on endpoint type alejandro.lucero-palau
2024-09-07 8:18 ` [PATCH v3 15/20] cxl/region: factor out interleave ways setup alejandro.lucero-palau
2024-09-07 8:18 ` [PATCH v3 16/20] cxl/region: factor out interleave granularity setup alejandro.lucero-palau
2024-09-07 8:18 ` [PATCH v3 17/20] cxl: allow region creation by type2 drivers alejandro.lucero-palau
2024-09-13 18:08 ` Jonathan Cameron
2024-09-16 16:31 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 18/20] cxl: preclude device memory to be used for dax alejandro.lucero-palau
2024-09-13 17:26 ` Dave Jiang
2024-09-16 14:32 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 19/20] cxl: add function for obtaining params from a region alejandro.lucero-palau
2024-09-13 17:48 ` Dave Jiang
2024-09-16 16:22 ` Alejandro Lucero Palau
2024-09-07 8:18 ` [PATCH v3 20/20] efx: support pio mapping based on cxl alejandro.lucero-palau
2024-09-13 17:45 ` Edward Cree
2024-09-16 16:12 ` Alejandro Lucero Palau
2024-09-13 17:52 ` Dave Jiang
2024-09-16 16:23 ` Alejandro Lucero Palau
2024-09-13 18:10 ` Jonathan Cameron
2024-09-16 16:23 ` Alejandro Lucero Palau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f43a3b62-2f36-6165-c142-b0bfc6fc32e6@amd.com \
--to=alucerop@amd.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=dan.j.williams@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=edward.cree@amd.com \
--cc=kuba@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=martin.habets@xilinx.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox