From: Dan Williams <dan.j.williams@intel.com>
To: Robert Richter <rrichter@amd.com>,
Dan Williams <dan.j.williams@intel.com>
Cc: <ira.weiny@intel.com>, Dave Jiang <dave.jiang@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
<stable@vger.kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alison Schofield <alison.schofield@intel.com>,
Gregory Price <gourry@gourry.net>,
Zijun Hu <quic_zijuhu@quicinc.com>,
Vishal Verma <vishal.l.verma@intel.com>,
<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes
Date: Wed, 23 Oct 2024 13:34:36 -0700 [thread overview]
Message-ID: <67195ddc7888d_4bc22941c@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Zxj2J6h8v788Vhxh@rric.localdomain>
Robert Richter wrote:
> On 22.10.24 18:43:15, Dan Williams wrote:
> > Changes since v1 [1]:
> > - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> > - Add comments explaining the order of objects in drivers/cxl/Makefile
> > (Jonathan)
> > - Rename attach_device => cxl_rescan_attach (Jonathan)
> > - Fixup Zijun's email (Zijun)
> >
> > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
> >
> > ---
> >
> > Original cover:
> >
> > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> > delayed arrival of the CXL "root" infrastructure [1] prompted questions
> > of how the existing mechanism for retrying cxl_mem_probe() could be
> > failing.
>
> I found a similar issue with the region creation.
>
> A region is created with the first endpoint found and immediately
> added as device which triggers cxl_region_probe(). Now, in
> interleaving setups the region state comes into commit state only
> after the last endpoint was probed. So the probe must be repeated
> until all endpoints were enumerated. I ended up with this change:
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a07b62254596..c78704e435e5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
> }
>
> if (p->state < CXL_CONFIG_COMMIT) {
> - dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> - rc = -ENXIO;
> + rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> + "region config state: %d\n", p->state);
I would argue EPROBE_DEFER is not appropriate because there is no
guarantee that the other members of the region show up, and if they do
they will re-trigger probe. So "probe must be repeated until all
endpoints were enumerated" is the case either way. I.e. either more
endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra
redundant probing *and* still results in a probe attempts as endpoints
arrive.
So a dev_dbg() plus -ENXIO return on uncommited region state is
expected.
> goto out;
> }
>
> --
> 2.39.5
>
> I don't see an init order issue here as the mem module is always up
> before the regions are probed.
Right, cxl_endpoint_port_probe() triggers region discovery and
cxl_endpoint_port_probe() currently only triggers after cxl_mem has
registered an endpoint port.
The failure this set is address is unwanted cxl_mem_probe() failures.
next prev parent reply other threads:[~2024-10-23 20:34 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 1:43 [PATCH v2 0/6] cxl: Initialization and shutdown fixes Dan Williams
2024-10-23 1:43 ` [PATCH v2 1/6] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-24 9:42 ` Jonathan Cameron
2024-10-24 16:19 ` Dan Williams
2024-10-24 16:39 ` Jonathan Cameron
2024-10-24 10:36 ` Alejandro Lucero Palau
2024-10-24 16:32 ` Dan Williams
2024-10-25 8:43 ` Alejandro Lucero Palau
2024-10-25 15:19 ` Dan Williams
2024-10-24 14:14 ` Ira Weiny
2024-10-25 19:32 ` [PATCH v3 " Dan Williams
2024-10-23 1:43 ` [PATCH v2 2/6] cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() Dan Williams
2024-10-23 15:57 ` Gregory Price
2024-10-24 9:43 ` Jonathan Cameron
2024-10-24 14:29 ` Ira Weiny
2024-10-23 1:43 ` [PATCH v2 3/6] cxl/acpi: Ensure ports ready at cxl_acpi_probe() return Dan Williams
2024-10-23 15:58 ` Gregory Price
2024-10-24 9:44 ` Jonathan Cameron
2024-10-24 14:34 ` Ira Weiny
2024-10-23 1:43 ` [PATCH v2 4/6] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
2024-10-24 15:55 ` Ira Weiny
2024-10-23 1:43 ` [PATCH v2 5/6] cxl/port: Prevent out-of-order decoder allocation Dan Williams
2024-10-24 12:10 ` Jonathan Cameron
2024-10-24 16:20 ` Ira Weiny
2024-10-23 1:44 ` [PATCH v2 6/6] cxl/test: Improve init-order fidelity relative to real-world systems Dan Williams
2024-10-24 12:17 ` Jonathan Cameron
2024-10-24 16:32 ` Ira Weiny
2024-10-23 13:12 ` [PATCH v2 0/6] cxl: Initialization and shutdown fixes Robert Richter
2024-10-23 16:00 ` Gregory Price
2024-10-23 20:34 ` Dan Williams [this message]
2024-10-24 11:56 ` Robert Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67195ddc7888d_4bc22941c@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=alison.schofield@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=gourry@gourry.net \
--cc=gregkh@linuxfoundation.org \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=quic_zijuhu@quicinc.com \
--cc=rrichter@amd.com \
--cc=stable@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox