From: Ben Cheatham <benjamin.cheatham@amd.com>
To: Alejandro Lucero Palau <alucerop@amd.com>,
<alejandro.lucero-palau@amd.com>
Cc: <linux-cxl@vger.kernel.org>, <netdev@vger.kernel.org>,
<dan.j.williams@intel.com>, <martin.habets@xilinx.com>,
<edward.cree@amd.com>, <davem@davemloft.net>, <kuba@kernel.org>,
<pabeni@redhat.com>, <edumazet@google.com>
Subject: Re: [PATCH v4 22/26] cxl: allow region creation by type2 drivers
Date: Fri, 18 Oct 2024 11:40:38 -0500 [thread overview]
Message-ID: <54dd9faf-0078-4f3f-b31e-a500bcff64ba@amd.com> (raw)
In-Reply-To: <22262215-54de-1a36-056b-5854ff05ccc1@amd.com>
On 10/18/24 3:51 AM, Alejandro Lucero Palau wrote:
>
> On 10/17/24 22:49, Ben Cheatham wrote:
>> On 10/17/24 11:52 AM, alejandro.lucero-palau@amd.com wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Creating a CXL region requires userspace intervention through the cxl
>>> sysfs files. Type2 support should allow accelerator drivers to create
>>> such cxl region from kernel code.
>>>
>>> Adding that functionality and integrating it with current support for
>>> memory expanders.
>>>
>>> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>>>
>> So I ran into an issue at this point when using v3 as a base for my own testing. The problem is that
>> you are doing manual region management while not explicitly preventing auto region discovery when
>> devm_cxl_add_memdev() is called (patch 14/26 in this series). This caused some resource allocation
>> conflicts which then caused both the auto region and the manual region set up to fail. To make it more
>> concrete, here's the flow I encountered (I tried something new here, let me know if the ascii
>> is all mangled):
>>
>> devm_cxl_add_memdev() is called
>> │
>> ├───► cxl_mem probes new memdev
>> │ │
>> │ ├─► cxl_mem probe adds new endpoint port
>> │ │
>> │ └─► cxl_mem probe finishes
>> ├───────────────────────────────────────────────► Manual region set up starts (finding free space, etc.)
>> ├───► cxl_port probes the new endpoint port │
>> │ │ │
>> │ ├─► cxl_port probe sets up new endpoint ├─► create_new_region() is called
>> │ │ │
>> │ ├─► cxl_port calls discover_region() │
>> │ │ │
>> │ ├─► discover_region() creates new auto ├─► create_new_region() creates
>> │ │ discoveredregion │ new manual region
>> │◄────◄────────────────────────────────────────────────┘
>> │
>> └─► Region creation fails due to resource contention/race (DPA resource, RAM resource, etc.)
>>
>> The timeline is a little off here I think, but it should be close enough to illustrate the point.
>
>
> Interesting.
>
>
> I'm aware of that code path when endpoint port is probed, but it is not a problem with my testing because the decoder is not enabled at the time of discover_region.
>
>
> I've tested this with two different emulated devices, one a dumb qemu type2 device with a driver doing nothing but cxl initialization, and another being our network device with CXL support and using RTL emulation, and in both cases the decoder is not enabled at that point, which makes sense since, AFAIK, it is at region creation/attachment when the decoder is committed/enabled. So my obvious question is how are you testing this functionality? It seems as if you could have been creating more than one region somehow, or maybe something I'm just missing about this.
>
I think the reason you aren't seeing this is that QEMU doesn't have regions programmed by firmware. In my setup
the decoders are coming up pre-programmed and enabled by firmware, so it is hitting the path during endpoint probe.
Thanks,
Ben
>
>> The easy solution here to not allow auto region discovery for CXL type 2 devices, like so:
>>
>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
>> index 22a9ba89cf5a..07b991e2c05b 100644
>> --- a/drivers/cxl/port.c
>> +++ b/drivers/cxl/port.c
>> @@ -34,6 +34,7 @@ static void schedule_detach(void *cxlmd)
>> static int discover_region(struct device *dev, void *root)
>> {
>> struct cxl_endpoint_decoder *cxled;
>> + struct cxl_memdev *cxlmd;
>> int rc;
>>
>> dev_err(dev, "%s:%d: Enter\n", __func__, __LINE__);
>> @@ -45,7 +46,9 @@ static int discover_region(struct device *dev, void *root)
>> if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0)
>> return 0;
>>
>> - if (cxled->state != CXL_DECODER_STATE_AUTO)
>> + cxlmd = cxled_to_memdev(cxled);
>> + if (cxled->state != CXL_DECODER_STATE_AUTO ||
>> + cxlmd->cxlds->type == CXL_DEVTYPE_DEVMEM)
>> return 0;
>>
>> I think there's a better way to go about this, more to say about it in patch 24/26. I've
>> dropped this here just in case you don't like my ideas there ;).
>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>> ---
>>> drivers/cxl/core/region.c | 147 ++++++++++++++++++++++++++++++++++----
>>> drivers/cxl/cxlmem.h | 2 +
>>> include/linux/cxl/cxl.h | 4 ++
>>> 3 files changed, 138 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index d08a2a848ac9..04c270a29e96 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -2253,6 +2253,18 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>>> return rc;
>>> }
>>> +int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled)
>>> +{
>>> + int rc;
>>> +
>>> + down_write(&cxl_region_rwsem);
>>> + cxled->mode = CXL_DECODER_DEAD;
>>> + rc = cxl_region_detach(cxled);
>>> + up_write(&cxl_region_rwsem);
>>> + return rc;
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_accel_region_detach, CXL);
>>> +
>>> void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
>>> {
>>> down_write(&cxl_region_rwsem);
>>> @@ -2781,6 +2793,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
>>> return to_cxl_region(region_dev);
>>> }
>>> +static void drop_region(struct cxl_region *cxlr)
>>> +{
>>> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>>> + struct cxl_port *port = cxlrd_to_port(cxlrd);
>>> +
>>> + devm_release_action(port->uport_dev, unregister_region, cxlr);
>>> +}
>>> +
>>> static ssize_t delete_region_store(struct device *dev,
>>> struct device_attribute *attr,
>>> const char *buf, size_t len)
>>> @@ -3386,17 +3406,18 @@ static int match_region_by_range(struct device *dev, void *data)
>>> return rc;
>>> }
>>> -/* Establish an empty region covering the given HPA range */
>>> -static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>> - struct cxl_endpoint_decoder *cxled)
>>> +static void construct_region_end(void)
>>> +{
>>> + up_write(&cxl_region_rwsem);
>>> +}
>>> +
>>> +static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
>>> + struct cxl_endpoint_decoder *cxled)
>>> {
>>> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>>> - struct cxl_port *port = cxlrd_to_port(cxlrd);
>>> - struct range *hpa = &cxled->cxld.hpa_range;
>>> struct cxl_region_params *p;
>>> struct cxl_region *cxlr;
>>> - struct resource *res;
>>> - int rc;
>>> + int err;
>>> do {
>>> cxlr = __create_region(cxlrd, cxled->mode,
>>> @@ -3405,8 +3426,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>>> if (IS_ERR(cxlr)) {
>>> - dev_err(cxlmd->dev.parent,
>>> - "%s:%s: %s failed assign region: %ld\n",
>>> + dev_err(cxlmd->dev.parent, "%s:%s: %s failed assign region: %ld\n",
>>> dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
>>> __func__, PTR_ERR(cxlr));
>>> return cxlr;
>>> @@ -3416,13 +3436,33 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>> p = &cxlr->params;
>>> if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
>>> dev_err(cxlmd->dev.parent,
>>> - "%s:%s: %s autodiscovery interrupted\n",
>>> + "%s:%s: %s region setup interrupted\n",
>>> dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
>>> __func__);
>>> - rc = -EBUSY;
>>> - goto err;
>>> + err = -EBUSY;
>>> + construct_region_end();
>>> + drop_region(cxlr);
>>> + return ERR_PTR(err);
>>> }
>>> + return cxlr;
>>> +}
>>> +
>>> +/* Establish an empty region covering the given HPA range */
>>> +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>> + struct cxl_endpoint_decoder *cxled)
>>> +{
>>> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>>> + struct range *hpa = &cxled->cxld.hpa_range;
>>> + struct cxl_region_params *p;
>>> + struct cxl_region *cxlr;
>>> + struct resource *res;
>>> + int rc;
>>> +
>>> + cxlr = construct_region_begin(cxlrd, cxled);
>>> + if (IS_ERR(cxlr))
>>> + return cxlr;
>>> +
>>> set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
>>> res = kmalloc(sizeof(*res), GFP_KERNEL);
>>> @@ -3445,6 +3485,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>> __func__, dev_name(&cxlr->dev));
>>> }
>>> + p = &cxlr->params;
>>> p->res = res;
>>> p->interleave_ways = cxled->cxld.interleave_ways;
>>> p->interleave_granularity = cxled->cxld.interleave_granularity;
>>> @@ -3462,15 +3503,91 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>>> /* ...to match put_device() in cxl_add_to_region() */
>>> get_device(&cxlr->dev);
>>> up_write(&cxl_region_rwsem);
>>> -
>>> + construct_region_end();
>>> return cxlr;
>>> err:
>>> - up_write(&cxl_region_rwsem);
>>> - devm_release_action(port->uport_dev, unregister_region, cxlr);
>>> + construct_region_end();
>>> + drop_region(cxlr);
>>> + return ERR_PTR(rc);
>>> +}
>>> +
>>> +static struct cxl_region *
>>> +__construct_new_region(struct cxl_root_decoder *cxlrd,
>>> + struct cxl_endpoint_decoder *cxled)
>>> +{
>>> + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
>>> + struct cxl_region_params *p;
>>> + struct cxl_region *cxlr;
>>> + int rc;
>>> +
>>> + cxlr = construct_region_begin(cxlrd, cxled);
>>> + if (IS_ERR(cxlr))
>>> + return cxlr;
>>> +
>>> + rc = set_interleave_ways(cxlr, 1);
>>> + if (rc)
>>> + goto err;
>>> +
>>> + rc = set_interleave_granularity(cxlr, cxld->interleave_granularity);
>>> + if (rc)
>>> + goto err;
>>> +
>>> + rc = alloc_hpa(cxlr, resource_size(cxled->dpa_res));
>>> + if (rc)
>>> + goto err;
>>> +
>>> + down_read(&cxl_dpa_rwsem);
>>> + rc = cxl_region_attach(cxlr, cxled, 0);
>>> + up_read(&cxl_dpa_rwsem);
>>> +
>>> + if (rc)
>>> + goto err;
>>> +
>>> + rc = cxl_region_decode_commit(cxlr);
>>> + if (rc)
>>> + goto err;
>>> +
>>> + p = &cxlr->params;
>>> + p->state = CXL_CONFIG_COMMIT;
>>> +
>>> + construct_region_end();
>>> + return cxlr;
>>> +err:
>>> + construct_region_end();
>>> + drop_region(cxlr);
>>> return ERR_PTR(rc);
>>> }
>>> +/**
>>> + * cxl_create_region - Establish a region given an endpoint decoder
>>> + * @cxlrd: root decoder to allocate HPA
>>> + * @cxled: endpoint decoder with reserved DPA capacity
>>> + *
>>> + * Returns a fully formed region in the commit state and attached to the
>>> + * cxl_region driver.
>>> + */
>>> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
>>> + struct cxl_endpoint_decoder *cxled)
>>> +{
>>> + struct cxl_region *cxlr;
>>> +
>>> + mutex_lock(&cxlrd->range_lock);
>>> + cxlr = __construct_new_region(cxlrd, cxled);
>>> + mutex_unlock(&cxlrd->range_lock);
>>> +
>>> + if (IS_ERR(cxlr))
>>> + return cxlr;
>>> +
>>> + if (device_attach(&cxlr->dev) <= 0) {
>>> + dev_err(&cxlr->dev, "failed to create region\n");
>>> + drop_region(cxlr);
>>> + return ERR_PTR(-ENODEV);
>>> + }
>>> + return cxlr;
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_create_region, CXL);
>>> +
>>> int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
>>> {
>>> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>>> index 68d28eab3696..0f5c71909fd1 100644
>>> --- a/drivers/cxl/cxlmem.h
>>> +++ b/drivers/cxl/cxlmem.h
>>> @@ -875,4 +875,6 @@ struct cxl_hdm {
>>> struct seq_file;
>>> struct dentry *cxl_debugfs_create_dir(const char *dir);
>>> void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
>>> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
>>> + struct cxl_endpoint_decoder *cxled);
>>> #endif /* __CXL_MEM_H__ */
>>> diff --git a/include/linux/cxl/cxl.h b/include/linux/cxl/cxl.h
>>> index 45b6badb8048..c544339c2baf 100644
>>> --- a/include/linux/cxl/cxl.h
>>> +++ b/include/linux/cxl/cxl.h
>>> @@ -72,4 +72,8 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>>> resource_size_t min,
>>> resource_size_t max);
>>> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
>>> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
>>> + struct cxl_endpoint_decoder *cxled);
>>> +
>>> +int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled);
>>> #endif
next prev parent reply other threads:[~2024-10-18 16:40 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-17 16:51 [PATCH v4 00/26] cxl: add Type2 device support alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 01/26] cxl: add type2 device basic support alejandro.lucero-palau
2024-10-25 13:50 ` Jonathan Cameron
2024-10-28 9:37 ` Alejandro Lucero Palau
2024-10-28 18:05 ` Dave Jiang
2024-10-30 16:26 ` Alejandro Lucero Palau
2024-10-17 16:52 ` [PATCH v4 02/26] sfc: add cxl support using new CXL API alejandro.lucero-palau
2024-10-17 21:48 ` Ben Cheatham
2024-10-18 13:38 ` Alejandro Lucero Palau
2024-10-25 14:03 ` Jonathan Cameron
2024-10-28 11:59 ` Alejandro Lucero Palau
2024-10-29 15:14 ` Jonathan Cameron
2024-10-30 16:31 ` Alejandro Lucero Palau
2024-10-17 16:52 ` [PATCH v4 03/26] cxl: add capabilities field to cxl_dev_state and cxl_port alejandro.lucero-palau
2024-10-25 14:14 ` Jonathan Cameron
2024-10-28 12:00 ` Alejandro Lucero Palau
2024-10-28 18:19 ` Dave Jiang
2024-10-30 16:28 ` Alejandro Lucero Palau
2024-10-17 16:52 ` [PATCH v4 04/26] cxl/pci: add check for validating capabilities alejandro.lucero-palau
2024-10-25 10:16 ` Alejandro Lucero Palau
2024-10-25 14:16 ` Jonathan Cameron
2024-10-17 16:52 ` [PATCH v4 05/26] cxl: move pci generic code alejandro.lucero-palau
2024-10-17 21:49 ` Ben Cheatham
2024-10-18 9:35 ` Alejandro Lucero Palau
2024-10-17 16:52 ` [PATCH v4 06/26] cxl: add function for type2 cxl regs setup alejandro.lucero-palau
2024-10-17 21:49 ` Ben Cheatham
2024-10-17 16:52 ` [PATCH v4 07/26] sfc: use cxl api for regs setup and checking alejandro.lucero-palau
2024-10-17 21:49 ` Ben Cheatham
2024-10-18 15:07 ` Alejandro Lucero Palau
2024-10-17 16:52 ` [PATCH v4 08/26] cxl: add functions for resource request/release by a driver alejandro.lucero-palau
2024-10-17 21:49 ` Ben Cheatham
2024-10-18 14:58 ` Alejandro Lucero Palau
2024-10-18 16:40 ` Ben Cheatham
2024-10-17 16:52 ` [PATCH v4 09/26] sfc: request cxl ram resource alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 10/26] cxl: harden resource_contains checks to handle zero size resources alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 11/26] cxl: add function for setting media ready by a driver alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 12/26] sfc: set cxl media ready alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 13/26] cxl: prepare memdev creation for type2 alejandro.lucero-palau
2024-10-17 21:49 ` Ben Cheatham
2024-10-18 10:49 ` Alejandro Lucero Palau
2024-10-18 16:40 ` Ben Cheatham
2024-10-17 16:52 ` [PATCH v4 14/26] sfc: create type2 cxl memdev alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 15/26] cxl: define a driver interface for HPA free space enumeration alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 16/26] sfc: obtain root decoder with enough HPA free space alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 17/26] cxl: define a driver interface for DPA allocation alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 18/26] sfc: get endpoint decoder alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 19/26] cxl: make region type based on endpoint type alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 20/26] cxl/region: factor out interleave ways setup alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 21/26] cxl/region: factor out interleave granularity setup alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 22/26] cxl: allow region creation by type2 drivers alejandro.lucero-palau
2024-10-17 21:49 ` Ben Cheatham
2024-10-18 8:51 ` Alejandro Lucero Palau
2024-10-18 16:40 ` Ben Cheatham [this message]
2024-10-21 9:54 ` Alejandro Lucero Palau
2024-10-17 16:52 ` [PATCH v4 23/26] sfc: create cxl region alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 24/26] cxl: preclude device memory to be used for dax alejandro.lucero-palau
2024-10-17 21:50 ` Ben Cheatham
2024-10-18 8:10 ` Alejandro Lucero Palau
2024-10-17 16:52 ` [PATCH v4 25/26] cxl: add function for obtaining params from a region alejandro.lucero-palau
2024-10-17 16:52 ` [PATCH v4 26/26] sfc: support pio mapping based on cxl alejandro.lucero-palau
2024-10-23 8:46 ` [PATCH v4 00/26] cxl: add Type2 device support Paolo Abeni
2024-10-23 9:38 ` Alejandro Lucero Palau
2024-11-20 16:50 ` Should the CXL Type2 support patchset be split up? Alejandro Lucero Palau
2024-11-20 17:13 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54dd9faf-0078-4f3f-b31e-a500bcff64ba@amd.com \
--to=benjamin.cheatham@amd.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=alucerop@amd.com \
--cc=dan.j.williams@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=edward.cree@amd.com \
--cc=kuba@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=martin.habets@xilinx.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox