From: Gregory Price <gourry@gourry.net>
To: Robert Richter <rrichter@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
ira.weiny@intel.com, Dave Jiang <dave.jiang@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
stable@vger.kernel.org,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alison Schofield <alison.schofield@intel.com>,
Zijun Hu <quic_zijuhu@quicinc.com>,
Vishal Verma <vishal.l.verma@intel.com>,
linux-cxl@vger.kernel.org
Subject: Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes
Date: Wed, 23 Oct 2024 12:00:23 -0400 [thread overview]
Message-ID: <Zxkdl7H37gg5iXbO@PC2K9PVX.TheFacebook.com> (raw)
In-Reply-To: <Zxj2J6h8v788Vhxh@rric.localdomain>
On Wed, Oct 23, 2024 at 03:12:07PM +0200, Robert Richter wrote:
> On 22.10.24 18:43:15, Dan Williams wrote:
> > Changes since v1 [1]:
> > - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> > - Add comments explaining the order of objects in drivers/cxl/Makefile
> > (Jonathan)
> > - Rename attach_device => cxl_rescan_attach (Jonathan)
> > - Fixup Zijun's email (Zijun)
> >
> > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
> >
> > ---
> >
> > Original cover:
> >
> > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> > delayed arrival of the CXL "root" infrastructure [1] prompted questions
> > of how the existing mechanism for retrying cxl_mem_probe() could be
> > failing.
>
> I found a similar issue with the region creation.
>
> A region is created with the first endpoint found and immediately
> added as device which triggers cxl_region_probe(). Now, in
> interleaving setups the region state comes into commit state only
> after the last endpoint was probed. So the probe must be repeated
> until all endpoints were enumerated. I ended up with this change:
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a07b62254596..c78704e435e5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
> }
>
> if (p->state < CXL_CONFIG_COMMIT) {
> - dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> - rc = -ENXIO;
> + rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> + "region config state: %d\n", p->state);
> goto out;
> }
>
I have not experienced any out of order operations since applying Dan's
v1 of this patch set, do you still see this after applying the existing
set?
Probably this is indicative of needing another SOFTDEP / ordering issue.
~Gregory
> --
> 2.39.5
>
> I don't see an init order issue here as the mem module is always up
> before the regions are probed.
>
> -Robert
>
> >
> > The critical missing piece in the debug was that Gregory's setup had
> > almost all CXL modules built-in to the kernel.
> >
> > On the way to that discovery several other bugs and init-order corner
> > cases were discovered.
> >
> > The main fix is to make sure the drivers/cxl/Makefile object order
> > supports root CXL ports being fully initialized upon cxl_acpi_probe()
> > exit. The modular case has some similar potential holes that are fixed
> > with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
> > cxl_test to reproduce the original report resulted in the discovery of a
> > separate long standing use after free bug in cxl_region_detach().
> >
> > [2]: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> >
> > ---
> >
> > Dan Williams (6):
> > cxl/port: Fix CXL port initialization order when the subsystem is built-in
> > cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
> > cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
> > cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
> > cxl/port: Prevent out-of-order decoder allocation
> > cxl/test: Improve init-order fidelity relative to real-world systems
> >
> >
> > drivers/base/core.c | 35 +++++++
> > drivers/cxl/Kconfig | 1
> > drivers/cxl/Makefile | 20 +++-
> > drivers/cxl/acpi.c | 7 +
> > drivers/cxl/core/hdm.c | 50 +++++++++--
> > drivers/cxl/core/port.c | 13 ++-
> > drivers/cxl/core/region.c | 91 ++++++++++---------
> > drivers/cxl/cxl.h | 3 -
> > include/linux/device.h | 3 +
> > tools/testing/cxl/test/cxl.c | 200 +++++++++++++++++++++++-------------------
> > tools/testing/cxl/test/mem.c | 1
> > 11 files changed, 269 insertions(+), 155 deletions(-)
> >
> > base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
next prev parent reply other threads:[~2024-10-23 16:00 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 1:43 [PATCH v2 0/6] cxl: Initialization and shutdown fixes Dan Williams
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-24 9:42 ` Jonathan Cameron
2024-10-24 16:19 ` Dan Williams
2024-10-24 16:39 ` Jonathan Cameron
2024-10-24 10:36 ` Alejandro Lucero Palau
2024-10-24 16:32 ` Dan Williams
2024-10-25 8:43 ` Alejandro Lucero Palau
2024-10-25 15:19 ` Dan Williams
2024-10-24 14:14 ` Ira Weiny
2024-10-25 19:32 ` [PATCH v3 " Dan Williams
2024-10-23 1:43 ` [PATCH v2 2/6] cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() Dan Williams
2024-10-23 15:57 ` Gregory Price
2024-10-24 9:43 ` Jonathan Cameron
2024-10-24 14:29 ` Ira Weiny
2024-10-23 1:43 ` [PATCH v2 3/6] cxl/acpi: Ensure ports ready at cxl_acpi_probe() return Dan Williams
2024-10-23 15:58 ` Gregory Price
2024-10-24 9:44 ` Jonathan Cameron
2024-10-24 14:34 ` Ira Weiny
2024-10-23 1:43 ` [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
2024-10-24 15:55 ` Ira Weiny
2024-10-23 1:43 ` [PATCH v2 5/6] cxl/port: Prevent out-of-order decoder allocation Dan Williams
2024-10-24 12:10 ` Jonathan Cameron
2024-10-24 16:20 ` Ira Weiny
2024-10-23 1:44 ` [PATCH v2 6/6] cxl/test: Improve init-order fidelity relative to real-world systems Dan Williams
2024-10-24 12:17 ` Jonathan Cameron
2024-10-24 16:32 ` Ira Weiny
2024-10-23 13:12 ` [PATCH v2 0/6] cxl: Initialization and shutdown fixes Robert Richter
2024-10-23 16:00 ` Gregory Price [this message]
2024-10-23 20:34 ` Dan Williams
2024-10-24 11:56 ` Robert Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zxkdl7H37gg5iXbO@PC2K9PVX.TheFacebook.com \
--to=gourry@gourry.net \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=gregkh@linuxfoundation.org \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=quic_zijuhu@quicinc.com \
--cc=rrichter@amd.com \
--cc=stable@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox