public inbox for linux-bluetooth@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: "Devegowda, Chandrashekar" <chandrashekar.devegowda@intel.com>,
	Luiz Augusto von Dentz <luiz.dentz@gmail.com>,
	"linux-bluetooth@vger.kernel.org"
	<linux-bluetooth@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"Srivatsa, Ravishankar" <ravishankar.srivatsa@intel.com>,
	"Tumkur Narayan, Chethan" <chethan.tumkur.narayan@intel.com>,
	"K, Kiran" <kiran.k@intel.com>,
	"Ben Ami, Golan" <golan.ben.ami@intel.com>
Subject: Re: [PATCH v1] Bluetooth: btintel_pcie: Support function level reset
Date: Wed, 29 Oct 2025 11:26:34 -0500	[thread overview]
Message-ID: <20251029162634.GA1565820@bhelgaas> (raw)
In-Reply-To: <44428a7c1e1d4565556335d6fb28919053da5d4d.camel@sipsolutions.net>

On Wed, Oct 29, 2025 at 11:38:31AM +0100, Johannes Berg wrote:
> On Tue, 2025-10-28 at 16:06 -0500, Bjorn Helgaas wrote:
> > > Obviously, we know the state of the device, but ... it _does_ require
> > > PCI removal *and* rescan, because the device completely falls off the
> > > bus and needs to be rediscovered. The drivers also fundamentally have to
> > > be unbound from it, since all state of the device (including BAR setup)
> > > is lost. I'm fairly certain that if you were to query even the device
> > > IDs after the reset, you'd see 0xFFFFFFFF, but in truth I don't fully
> > > understand how this works at the PCIe bus level.
> > 
> > It might be different for other buses, but the PCI core really doesn't
> > do anything to the device during removal or rescan.  It does turn off
> > power management interrupts from the device and the like, but I'm
> > pretty sure it doesn't reset the device or do anything to make it
> > start responding to PCI transactions again.
> 
> OK, fair. I have hit various weird issues with cached config space in
> the past (such as [1], which we never resolved), so I guess I'm possibly
> being ultra careful here in what I'm assuming or not.
> 
> [1] https://lore.kernel.org/linux-pci/20230605203519.bc4232207449.Idbaa55b93f780838af44ebccb84c36f60716df04@changeid/
> 
> > Obviously remove and rescan reinitializes the *driver* because the PCI
> > core calls the driver .remove() method, reads the Vendor and Device
> > IDs, reads and if necessary programs the BARs, and calls the driver
> > .probe() method.
> 
> Right. So I guess in effect you're saying that device_reprobe() ought to
> be sufficient.
> 
> For WiFi, this code goes back to another issue we had (somewhat
> intentionally) where under certain circumstances during initialisation
> the firmware can do a product reset without the driver being aware, and
> then the driver just has to detect it by PCIe transactions failing with
> 0xFFFF'FFFF being read all the time. It's going to be hard to test this
> case now, but we can still test the product reset.

I think there are also some cases (maybe mostly embedded arm and
arm64?) where PCIe failures cause synchronous aborts or SError 
interrupts instead of just returning 0xFFFF'FFFF data.  Maybe nothing
you can do about that other than adding delay after initialization or
something.

> For BT detecting the error and initiation product reset, it does seem
> that doing (only) device_reprobe() for both functions is actually
> sufficient. I believe the FLR code in BT is broken though, so we're
> going to (re-)check all of this.
> 
> > I think it's really the PLDR that's making the difference here, not
> > the remove and rescan.  I guess you could experimentally read some
> > config registers after the PLDR and before the remove/rescan.
> 
> Yeah, just need to find real hardware with all the BIOS integration to
> run it ;-) (Most of our testing uses VMs.)
> 
> > Since you know the other device is dead already, I don't have a
> > problem with resetting the shared parts of the device, so you do need
> > some way to poke the other driver to reinit.  But I think using the
> > PCI core remove/rescan to do that makes it more complicated than
> > necessary and distracts from what's really happening.
> 
> Fair. I think the easiest way to achieve this is still device_reprobe()
> on the other device - eventually even if we tell the other driver then
> it still will simply call device_reprobe() on itself, so there's no big
> difference.

I hadn't heard of device_reprobe() until you pointed it out, but this
usage doesn't really seem to fit the intended purpose since the
probing critera haven't changed:

 * This function detaches the attached driver (if any) for the given
 * device and restarts the driver probing process.  It is intended
 * to use if probing criteria changed during a devices lifetime and
 * driver attachment should change accordingly.

But I definitely like it better than remove/rescan.  I suppose either
way results in udev remove/add events that are really spurious.  And
maybe PM artifacts like what you tripped over at [1] (although there
may still be a harmful PCI core race there).

Bjorn

      reply	other threads:[~2025-10-29 16:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-14 10:16 [PATCH v1] Bluetooth: btintel_pcie: Support function level reset Chandrashekar Devegowda
2025-03-14  7:32 ` [v1] " bluez.test.bot
2025-03-14 19:40 ` [PATCH v1] " Bjorn Helgaas
2025-03-18 14:55   ` Luiz Augusto von Dentz
2025-03-18 15:47     ` Bjorn Helgaas
2025-05-25 10:30       ` K, Kiran
2025-10-23  9:42       ` Devegowda, Chandrashekar
2025-10-23 20:36         ` Bjorn Helgaas
2025-10-27 10:08           ` Johannes Berg
2025-10-28 21:06             ` Bjorn Helgaas
2025-10-29 10:38               ` Johannes Berg
2025-10-29 16:26                 ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251029162634.GA1565820@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=chandrashekar.devegowda@intel.com \
    --cc=chethan.tumkur.narayan@intel.com \
    --cc=golan.ben.ami@intel.com \
    --cc=johannes@sipsolutions.net \
    --cc=kiran.k@intel.com \
    --cc=linux-bluetooth@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=luiz.dentz@gmail.com \
    --cc=ravishankar.srivatsa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox