From: Gregory Price <gourry@gourry.net>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, lukas@wunner.de,
dan.j.williams@intel.com, bhelgaas@google.com, dave@stgolabs.net,
dave.jiang@intel.com, vishal.l.verma@intel.com,
Jonathan.Cameron@huawei.com
Subject: Re: [PATCH] PCI/DOE: Poll DOE Busy bit for up to 1 second in pci_doe_send_req
Date: Fri, 11 Oct 2024 09:59:12 -0400 [thread overview]
Message-ID: <ZwkvMIqC2DjLZJrg@PC2K9PVX.TheFacebook.com> (raw)
In-Reply-To: <20241010221628.GA580128@bhelgaas>
On Thu, Oct 10, 2024 at 05:16:28PM -0500, Bjorn Helgaas wrote:
> On Fri, Oct 04, 2024 at 12:28:28PM -0400, Gregory Price wrote:
> > During initial device probe, the PCI DOE busy bit for some CXL
> > devices may be left set for a longer period than expected by the
> > current driver logic. Despite local comments stating DOE Busy is
> > unlikely to be detected, it appears commonly specifically during
> > boot when CXL devices are being probed.
> >
> > This was observed on a single socket AMD platform with 2 CXL memory
> > expanders attached to the single socket. It was not the case that
> > concurrent accesses were being made, as validated by monitoring
> > mailbox commands on the device side.
> >
> > This behavior has been observed with multiple CXL memory expanders
> > from different vendors - so it appears unrelated to the model.
> >
> > In all observed tests, only a small period of the retry window is
> > actually used - typically only a handful of loop iterations.
> >
> > Polling on the PCI DOE Busy Bit for (at max) one PCI DOE timeout
> > interval (1 second), resolves this issues cleanly.
> >
> > Per PCIe r6.2 sec 6.30.3, the DOE Busy Bit being cleared does not
> > raise an interrupt, so polling is the best option in this scenario.
> >
> > Subsqeuent code in doe_statemachine_work and abort paths also wait
> > for up to 1 PCI DOE timeout interval, so this order of (potential)
> > additional delay is presumed acceptable.
>
> I provisionally applied this to pci/doe for v6.13 with Lukas and
> Jonathan's reviewed-by.
>
> Can we include a sample of any dmesg logging or other errors users
> would see because of this problem? I'll update the commit log with
> any of this information to help users connect an issue with this fix.
>
The only indication in dmesg you will see is a line like
[ 24.542625] endpoint6: DOE failed -EBUSY
produced by cxl_cdat_get_length or cxl_cdat_read_table
Do you want an updated patch with the nits fixed?
> > Suggested-by: Lukas Wunner <lukas@wunner.de>
> > Signed-off-by: Gregory Price <gourry@gourry.net>
> > ---
> > drivers/pci/doe.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> > index 652d63df9d22..27ba5d281384 100644
> > --- a/drivers/pci/doe.c
> > +++ b/drivers/pci/doe.c
> > @@ -149,14 +149,26 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
> > size_t length, remainder;
> > u32 val;
> > int i;
> > + unsigned long timeout_jiffies;
> >
> > /*
> > * Check the DOE busy bit is not set. If it is set, this could indicate
> > * someone other than Linux (e.g. firmware) is using the mailbox. Note
> > * it is expected that firmware and OS will negotiate access rights via
> > * an, as yet to be defined, method.
> > + *
> > + * Wait up to one PCI_DOE_TIMEOUT period to allow the prior command to
> > + * finish. Otherwise, simply error out as unable to field the request.
> > + *
> > + * PCIe r6.2 sec 6.30.3 states no interrupt is raised when the DOE Busy
> > + * bit is cleared, so polling here is our best option for the moment.
> > */
> > - pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
> > + timeout_jiffies = jiffies + PCI_DOE_TIMEOUT;
> > + do {
> > + pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
> > + } while (FIELD_GET(PCI_DOE_STATUS_BUSY, val) &&
> > + !time_after(jiffies, timeout_jiffies));
> > +
> > if (FIELD_GET(PCI_DOE_STATUS_BUSY, val))
> > return -EBUSY;
> >
> > --
> > 2.43.0
> >
next prev parent reply other threads:[~2024-10-11 13:59 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 16:28 [PATCH] PCI/DOE: Poll DOE Busy bit for up to 1 second in pci_doe_send_req Gregory Price
2024-10-10 10:38 ` Lukas Wunner
2024-10-10 16:23 ` Jonathan Cameron
2024-10-10 22:16 ` Bjorn Helgaas
2024-10-11 13:59 ` Gregory Price [this message]
2024-10-13 11:08 ` Lukas Wunner
2024-10-13 15:58 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwkvMIqC2DjLZJrg@PC2K9PVX.TheFacebook.com \
--to=gourry@gourry.net \
--cc=Jonathan.Cameron@huawei.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=helgaas@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox