From: Dave Jiang <dave.jiang@intel.com>
To: Gregory Price <gourry@gourry.net>,
Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
Smita.KoralahalliChannabasappa@amd.com,
alison.schofield@intel.com, terry.bowman@amd.com,
alejandro.lucero-palau@amd.com, linux-pci@vger.kernel.org,
Jonathan.Cameron@huawei.com,
Ben Cheatham <benjamin.cheatham@amd.com>,
Alejandro Lucero <alucerop@amd.com>
Subject: Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
Date: Tue, 10 Feb 2026 08:05:21 -0700 [thread overview]
Message-ID: <a6b2ba96-ce22-426b-8277-8121ed77fa55@intel.com> (raw)
In-Reply-To: <aYq_zpNAgNgJY_P8@gourry-fedora-PF4VCD3F>
On 2/9/26 10:19 PM, Gregory Price wrote:
> On Mon, Dec 15, 2025 at 04:56:16PM -0800, Dan Williams wrote:
>> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
>> {
>> int rc;
>>
>> + /*
>> + * If @attach is provided fail if the driver is not attached upon
>> + * return. Note that failure here could be the result of a race to
>> + * teardown the CXL port topology. I.e. cxl_mem_probe() could have
>> + * succeeded and then cxl_mem unbound before the lock is acquired.
>> + */
>> + guard(device)(&cxlmd->dev);
>> + if (cxlmd->attach && !cxlmd->dev.driver) {
>> + cxl_memdev_unregister(cxlmd);
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
>> cxlmd);
>> if (rc)
>
>
> This caused a deadlock when trying to unbind cxl_pci and bind a custom
> driver.
>
> The followwing analysis was produced by Chris Mason's kreview scripts,
> but it did resolve my deadlock, and the analysis seems pretty straight
> forward.
The suggested resolution looks reasonable. Can you post a fix patch? Will need to apply it against 7.0-rc. It's too late to get it into the 7.0 merge PR.
DJ
>
> Although it was likely caused by a qemu quirk I was dealing with at the
> same time (see the note about topology failing to enumerate).
>
> ~Gregory
>
> ---
>
> cxl_memdev_autoremove() takes device_lock(&cxlmd->dev) via guard(device)
> and then calls cxl_memdev_unregister() when the attach callback was
> provided but cxl_mem_probe() failed to bind.
>
> cxl_memdev_unregister() calls cdev_device_del() → device_del() →
> bus_remove_device() → device_release_driver() which also takes
> device_lock(), deadlocking the calling thread.
>
> This path is reached when a driver uses the @attach parameter to
> devm_cxl_add_memdev() and the CXL topology fails to enumerate (e.g.
> DVSEC range registers decode outside platform-defined CXL ranges,
> causing the endpoint port probe to fail).
>
> Fix by using scoped_guard() and breaking out of the guard scope before
> calling cxl_memdev_unregister(), so device_lock() is released first.
>
> It suggested:
> ---
>
> static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> {
> int rc;
>
> /*
> * If @attach is provided fail if the driver is not attached upon
> * return. Note that failure here could be the result of a race to
> * teardown the CXL port topology. I.e. cxl_mem_probe() could have
> * succeeded and then cxl_mem unbound before the lock is acquired.
> *
> * Check under device_lock but unregister outside of it:
> * cxl_memdev_unregister() calls cdev_device_del() → device_del()
> * → device_release_driver() which also takes device_lock().
> */
> scoped_guard(device, &cxlmd->dev) {
> if (cxlmd->attach && !cxlmd->dev.driver) {
> /* Drop lock before unregister to avoid deadlock */
> break;
> }
>
> rc = devm_add_action_or_reset(cxlmd->cxlds->dev,
> cxl_memdev_unregister, cxlmd);
> if (rc)
> return ERR_PTR(rc);
>
> return cxlmd;
> }
>
> /* Reached only when attach failed — lock is released */
> cxl_memdev_unregister(cxlmd);
> return ERR_PTR(-ENXIO);
> }
>
prev parent reply other threads:[~2026-02-10 15:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-16 0:56 [PATCH v2 0/6] cxl: Initialization reworks to support Soft Reserve Recovery and Accelerator Memory Dan Williams
2025-12-16 0:56 ` [PATCH v2 1/6] cxl/mem: Fix devm_cxl_memdev_edac_release() confusion Dan Williams
2025-12-16 0:56 ` [PATCH v2 2/6] cxl/mem: Arrange for always-synchronous memdev attach Dan Williams
2025-12-16 0:56 ` [PATCH v2 3/6] cxl/port: Arrange for always synchronous endpoint attach Dan Williams
2025-12-16 0:56 ` [PATCH v2 4/6] cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup Dan Williams
2025-12-17 15:48 ` Jonathan Cameron
2025-12-16 0:56 ` [PATCH v2 5/6] cxl/mem: Drop @host argument to devm_cxl_add_memdev() Dan Williams
2025-12-16 0:56 ` [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation Dan Williams
2025-12-17 15:57 ` Jonathan Cameron
2025-12-17 16:27 ` dan.j.williams
2025-12-18 14:46 ` Jonathan Cameron
2025-12-18 19:50 ` dan.j.williams
2025-12-19 10:26 ` Jonathan Cameron
2025-12-20 7:31 ` Alejandro Lucero Palau
2026-02-10 5:19 ` Gregory Price
2026-02-10 15:05 ` Dave Jiang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a6b2ba96-ce22-426b-8277-8121ed77fa55@intel.com \
--to=dave.jiang@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=benjamin.cheatham@amd.com \
--cc=dan.j.williams@intel.com \
--cc=gourry@gourry.net \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=terry.bowman@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox