public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: Dan Williams <djbw@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	dave.jiang@intel.com
Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	alejandro.lucero-palau@amd.com
Subject: Re: [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region
Date: Tue, 14 Apr 2026 16:41:46 +0100	[thread overview]
Message-ID: <b592580e-da0f-41fe-9ac5-d8dcec8fc6d8@amd.com> (raw)
In-Reply-To: <69dac05fd8b20_fdcb410011@djbw-dev.notmuch>


On 4/11/26 22:42, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
>> On 4/3/26 22:00, Dan Williams wrote:
>>> To date, platform firmware maps accelerator memory and accelerator drivers
>>> simply want an address range that they can map themselves. This typically
>>> results in a single region being auto-assembled upon registration of a
>>> memory device.  Use the @attach mechanism of devm_cxl_add_memdev()
>>> parameter to retrieve that region while also adhering to CXL subsystem
>>> locking and lifetime rules.
>>
>> I doubt this approach is safer than the type2 v25 case.
> I believe it is safer, see evidence below...


It is safer for the case of user space unbinding a memdev device from 
cxl_mem racing with the type2 initialization regarding the linked 
autodiscovered region to the memdev. But, if I am not misunderstanding 
something here, the problem remains ... (below explained).


BTW, I know about unbinding for the virtualization case, with a pci 
device being bound to UIO/VFIO instead of to the host vendor-specific 
pci driver, but I can not see the point of user space doing this for a 
cxl memdev. I know the unbinding is standard functionality but I wonder 
if this should be avoided under certain scenarios like this one. 
Otherwise, if there exists a reason for allowing this, I would like to know.


>
>> Apparently you are reducing the potential concurrency issue if someone
>> removes cxl_acpi, but I think it is as strong as the v25 approach.
>> With the rwsem lock, the region will exist or it will not. If the
>> removal happens after the driver gets the region, the ioremap can
>> race, with your approach and with the v25 one.
> Yes, ioremap can race and in the attach case when the ioremap is
> invalidated the sfc driver gets unloaded. In v25 it leads to
> use-after-free issues.


I do not follow this. I can not see in the code a link between the 
autoremove functionality and that triggering the sfc driver being unload.


And the new function added, cxl_memdev_attach_region(), supposedly being 
called by a type2 driver after getting the memdev, does only ensure the 
region will be there for updating the attach struct owned by the type2 
driver, but as soon as the endpoint device lock is release, that could 
not be the case, I mean the region not there anymore but the attach 
struct having still the region data.


Also, if the cxl_mem unbinding happens after the type2 driver has 
invoked ioremap, the driver will be using something it should not. Won't 
it? I do not think that is correct and the type2 driver should somehow 
get a notification for stopping using such a range, some sort of detach 
function the cxl core can invoke. That is something I unsuccessfully 
tried to add in the past:


https://lore.kernel.org/linux-cxl/20250624141355.269056-19-alejandro.lucero-palau@amd.com/



>
>> Also, you are keen to use your attach function for doing the same thing
>> I have, which needs to be justified.
> No, you need to justify usage and export of cxl_core objects without
> taking care of the current lifetime rules. Did you see these reports
> from sashiko about v25? These issues are the motivation to pull
> responsibility away from client drivers.
>
> https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=6
> https://sashiko.dev/#/patchset/20260330143827.1278677-1-alejandro.lucero-palau%40amd.com?part=7
>
> Now, I do not think the "attach" appraoch is the final end state. This
> needs to get simpler still, but it can not get simpler with an
> increasing attack surface.


What I mean is the protection you are adding is doable inside the type2 
series without the attach option for cxl memdev.

There is a justification for the protection avoiding racing with 
unbinding from user space, but there is no justification for the attach 
itself, at least for the sfc case.


>
>> I can not see a reason for that attach functionality with the sfc
>> case. You did not reply to that comment:
>>
>>
>> https://lore.kernel.org/linux-cxl/20260330143827.1278677-1-alejandro.lucero-palau@amd.com/T/#m21968fd0ddf02df3641194d1450cb2bd392def26
> Yes, did not have bandwidth to reply to that beyond sending out the
> attach approach to better contain CXL core complexity.
>
>> And although I have not tested it, I think this is buggy. Removal of
>> cxl_acpi will trigger unregister_region and a subsequent driver exit
>> will invoke, inside the autoremove call linked to the endpoint device,
>> devm_release_action on something not existing anymore.
> If there is a bug here I would like to see that test case because as far
> as I can see it would be shared with the generic expander case.


You are introducing the cxl_endpoint_region_autoremove() linked to the 
endpoint device. If cxl_acpi is removed, that will lead to the region 
unregistered first, then the delete_endpoint() trying to unregister the 
region with devm_release_action() again. That will make a kernel warn at 
least.


>> It is the same problem with v25 type2 and potential cxl_acpi removal.
>> Maybe we should avoid linking a type2-related region to the cxl root
>> port ...
>>
>>
>> And IMO, hiding what we are doing to the user/driver is not convenient.
>> Although current type2 is going to rely on a committed decoder, that
>> will not be the case if we look beyond impending needs.
> Yes, start with the simple committed decoder case. Later, follow-on with
> auto-create support. That would have no need to go back and teach
> accelerator drivers how to create regions in the simple case the same
> helper would "just work".
>
>> A function with a name describing this situation can add clarity to
>> the different possibilities to support as a follow up of this initial
>> work. About naming, attaching a region to a memdev is confusing ... as
>> it is already "attached" or linked, and the reason you give for having
>> that explicit attachment not being clear from a safety point of view.
> I would be happy if the only detail left to discuss was naming.
>
> In this case "attach" means "attach to CXL the topology". A PCI driver
> can create a cxl_memdev, but if that cxl_memdev finds no active CXL
> topology then that "attachment" fails.

  reply	other threads:[~2026-04-14 15:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03 21:00 [RFC PATCH 0/4] cxl: Region attach for accelerators Dan Williams
2026-04-03 21:00 ` [RFC PATCH 1/4] cxl/mem: Add support to cleanly continue after attach Dan Williams
2026-04-09 22:02   ` Cheatham, Benjamin
2026-04-11 22:33     ` Dan Williams
2026-04-03 21:00 ` [RFC PATCH 2/4] cxl/region: Move region lock error code to -EBUSY Dan Williams
2026-04-03 21:00 ` [RFC PATCH 3/4] cxl/region: Block region delete for locked regions Dan Williams
2026-04-03 21:00 ` [RFC PATCH 4/4] cxl/region: Introduce cxl_memdev_attach_region Dan Williams
2026-04-07 10:25   ` Alejandro Lucero Palau
2026-04-11 21:42     ` Dan Williams
2026-04-14 15:41       ` Alejandro Lucero Palau [this message]
2026-04-09 22:02   ` Cheatham, Benjamin
2026-04-11 23:02     ` Dan Williams
2026-04-12  8:57       ` Lukas Wunner
2026-04-13 14:25       ` Cheatham, Benjamin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b592580e-da0f-41fe-9ac5-d8dcec8fc6d8@amd.com \
    --to=alucerop@amd.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=djbw@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox