linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Gregory Price <gourry@gourry.net>
Cc: linux-pci@vger.kernel.org, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, lukas@wunner.de,
	dan.j.williams@intel.com, bhelgaas@google.com, dave@stgolabs.net,
	dave.jiang@intel.com, vishal.l.verma@intel.com,
	Jonathan.Cameron@huawei.com
Subject: Re: [PATCH] PCI/DOE: Poll DOE Busy bit for up to 1 second in pci_doe_send_req
Date: Sun, 13 Oct 2024 10:58:34 -0500	[thread overview]
Message-ID: <20241013155834.GA607803@bhelgaas> (raw)
In-Reply-To: <ZwkvMIqC2DjLZJrg@PC2K9PVX.TheFacebook.com>

On Fri, Oct 11, 2024 at 09:59:12AM -0400, Gregory Price wrote:
> On Thu, Oct 10, 2024 at 05:16:28PM -0500, Bjorn Helgaas wrote:
> > On Fri, Oct 04, 2024 at 12:28:28PM -0400, Gregory Price wrote:
> > > During initial device probe, the PCI DOE busy bit for some CXL
> > > devices may be left set for a longer period than expected by the
> > > current driver logic. Despite local comments stating DOE Busy is
> > > unlikely to be detected, it appears commonly specifically during
> > > boot when CXL devices are being probed.
> > > 
> > > This was observed on a single socket AMD platform with 2 CXL memory
> > > expanders attached to the single socket. It was not the case that
> > > concurrent accesses were being made, as validated by monitoring
> > > mailbox commands on the device side.
> > > 
> > > This behavior has been observed with multiple CXL memory expanders
> > > from different vendors - so it appears unrelated to the model.
> > > 
> > > In all observed tests, only a small period of the retry window is
> > > actually used - typically only a handful of loop iterations.
> > > 
> > > Polling on the PCI DOE Busy Bit for (at max) one PCI DOE timeout
> > > interval (1 second), resolves this issues cleanly.
> > > 
> > > Per PCIe r6.2 sec 6.30.3, the DOE Busy Bit being cleared does not
> > > raise an interrupt, so polling is the best option in this scenario.
> > > 
> > > Subsqeuent code in doe_statemachine_work and abort paths also wait
> > > for up to 1 PCI DOE timeout interval, so this order of (potential)
> > > additional delay is presumed acceptable.
> > 
> > I provisionally applied this to pci/doe for v6.13 with Lukas and
> > Jonathan's reviewed-by.  
> > 
> > Can we include a sample of any dmesg logging or other errors users
> > would see because of this problem?  I'll update the commit log with
> > any of this information to help users connect an issue with this fix.
> >
> 
> The only indication in dmesg you will see is a line like
> 
> [   24.542625] endpoint6: DOE failed -EBUSY
> 
> produced by cxl_cdat_get_length or cxl_cdat_read_table
> 
> 
> Do you want an updated patch with the nits fixed?

No need, I fixed the nits and added the dmesg line to the commit log.
Thank you!

Bjorn

      parent reply	other threads:[~2024-10-13 15:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04 16:28 [PATCH] PCI/DOE: Poll DOE Busy bit for up to 1 second in pci_doe_send_req Gregory Price
2024-10-10 10:38 ` Lukas Wunner
2024-10-10 16:23   ` Jonathan Cameron
2024-10-10 22:16 ` Bjorn Helgaas
2024-10-11 13:59   ` Gregory Price
2024-10-13 11:08     ` Lukas Wunner
2024-10-13 15:58     ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241013155834.GA607803@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gourry@gourry.net \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).