From: Alejandro Lucero Palau <alucerop@amd.com>
To: Dan Williams <djbw@kernel.org>,
Dan Williams <dan.j.williams@intel.com>,
dave.jiang@intel.com
Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
alejandro.lucero-palau@amd.com
Subject: Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
Date: Tue, 14 Apr 2026 16:41:46 +0100 [thread overview]
Message-ID: <b592580e-da0f-41fe-9ac5-d8dcec8fc6d8@amd.com> (raw)
In-Reply-To: <69dac05fd8b20_fdcb410011@djbw-dev.notmuch>
On 4/11/26 22:42, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
>> On 4/3/26 22:00, Dan Williams wrote:
>>> To date, platform firmware maps accelerator memory and accelerator drivers
>>> simply want an address range that they can map themselves. This typically
>>> results in a single region being auto-assembled upon registration of a
>>> memory device. Use the @attach mechanism of devm_cxl_add_memdev()
>>> parameter to retrieve that region while also adhering to CXL subsystem
>>> locking and lifetime rules.
>>
>> I doubt this approach is safer than the type2 v25 case.
> I believe it is safer, see evidence below...
It is safer for the case of user space unbinding a memdev device from
cxl_mem racing with the type2 initialization regarding the linked
autodiscovered region to the memdev. But, if I am not misunderstanding
something here, the problem remains ... (below explained).
BTW, I know about unbinding for the virtualization case, with a pci
device being bound to UIO/VFIO instead of to the host vendor-specific
pci driver, but I can not see the point of user space doing this for a
cxl memdev. I know the unbinding is standard functionality but I wonder
if this should be avoided under certain scenarios like this one.
Otherwise, if there exists a reason for allowing this, I would like to know.
>
>> Apparently you are reducing the potential concurrency issue if someone
>> removes cxl_acpi, but I think it is as strong as the v25 approach.
>> With the rwsem lock, the region will exist or it will not. If the
>> removal happens after the driver gets the region, the ioremap can
>> race, with your approach and with the v25 one.
> Yes, ioremap can race and in the attach case when the ioremap is
> invalidated the sfc driver gets unloaded. In v25 it leads to
> use-after-free issues.
I do not follow this. I can not see in the code a link between the
autoremove functionality and that triggering the sfc driver being unload.
And the new function added, cxl_memdev_attach_region(), supposedly being
called by a type2 driver after getting the memdev, does only ensure the
region will be there for updating the attach struct owned by the type2
driver, but as soon as the endpoint device lock is release, that could
not be the case, I mean the region not there anymore but the attach
struct having still the region data.
Also, if the cxl_mem unbinding happens after the type2 driver has
invoked ioremap, the driver will be using something it should not. Won't
it? I do not think that is correct and the type2 driver should somehow
get a notification for stopping using such a range, some sort of detach
function the cxl core can invoke. That is something I unsuccessfully
tried to add in the past:
https://lore.kernel.org/linux-cxl/20250624141355.269056-19-alejandro.lucero-palau@amd.com/
>
>> Also, you are keen to use your attach function for doing the same thing
>> I have, which needs to be justified.
> No, you need to justify usage and export of cxl_core objects without
> taking care of the current lifetime rules. Did you see these reports
> from sashiko about v25? These issues are the motivation to pull
> responsibility away from client drivers.
>
> https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=6
> https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=7
>
> Now, I do not think the "attach" appraoch is the final end state. This
> needs to get simpler still, but it can not get simpler with an
> increasing attack surface.
What I mean is the protection you are adding is doable inside the type2
series without the attach option for cxl memdev.
There is a justification for the protection avoiding racing with
unbinding from user space, but there is no justification for the attach
itself, at least for the sfc case.
>
>> I can not see a reason for that attach functionality with the sfc
>> case. You did not reply to that comment:
>>
>>
>> https://lore.kernel.org/linux-cxl/20260330143827.1278677-1-alejandro.lucero-palau@amd.com/T/#m21968fd0ddf02df3641194d1450cb2bd392def26
> Yes, did not have bandwidth to reply to that beyond sending out the
> attach approach to better contain CXL core complexity.
>
>> And although I have not tested it, I think this is buggy. Removal of
>> cxl_acpi will trigger unregister_region and a subsequent driver exit
>> will invoke, inside the autoremove call linked to the endpoint device,
>> devm_release_action on something not existing anymore.
> If there is a bug here I would like to see that test case because as far
> as I can see it would be shared with the generic expander case.
You are introducing the cxl_endpoint_region_autoremove() linked to the
endpoint device. If cxl_acpi is removed, that will lead to the region
unregistered first, then the delete_endpoint() trying to unregister the
region with devm_release_action() again. That will make a kernel warn at
least.
>> It is the same problem with v25 type2 and potential cxl_acpi removal.
>> Maybe we should avoid linking a type2-related region to the cxl root
>> port ...
>>
>>
>> And IMO, hiding what we are doing to the user/driver is not convenient.
>> Although current type2 is going to rely on a committed decoder, that
>> will not be the case if we look beyond impending needs.
> Yes, start with the simple committed decoder case. Later, follow-on with
> auto-create support. That would have no need to go back and teach
> accelerator drivers how to create regions in the simple case the same
> helper would "just work".
>
>> A function with a name describing this situation can add clarity to
>> the different possibilities to support as a follow up of this initial
>> work. About naming, attaching a region to a memdev is confusing ... as
>> it is already "attached" or linked, and the reason you give for having
>> that explicit attachment not being clear from a safety point of view.
> I would be happy if the only detail left to discuss was naming.
>
> In this case "attach" means "attach to CXL the topology". A PCI driver
> can create a cxl_memdev, but if that cxl_memdev finds no active CXL
> topology then that "attachment" fails.
next prev parent reply other threads:[~2026-04-14 15:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 21:00 [RFC PATCH 0/4] cxl: Region attach for accelerators Dan Williams
2026-04-03 21:00 ` [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach Dan Williams
2026-04-09 22:02 ` Cheatham, Benjamin
2026-04-11 22:33 ` Dan Williams
2026-04-03 21:00 ` [RFC PATCH 2/4] cxl/region: Move region lock error code to -EBUSY Dan Williams
2026-04-03 21:00 ` [RFC PATCH 3/4] cxl/region: Block region delete for locked regions Dan Williams
2026-04-03 21:00 ` [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region Dan Williams
2026-04-07 10:25 ` Alejandro Lucero Palau
2026-04-11 21:42 ` Dan Williams
2026-04-14 15:41 ` Alejandro Lucero Palau [this message]
2026-04-09 22:02 ` Cheatham, Benjamin
2026-04-11 23:02 ` Dan Williams
2026-04-12 8:57 ` Lukas Wunner
2026-04-13 14:25 ` Cheatham, Benjamin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b592580e-da0f-41fe-9ac5-d8dcec8fc6d8@amd.com \
--to=alucerop@amd.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=djbw@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox