linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Yihang Li <liyihang9@huawei.com>,
	cassel@kernel.org, James.Bottomley@hansenpartnership.com,
	martin.petersen@oracle.com, john.g.garry@oracle.com,
	yanaijie@huawei.com, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, linuxarm@huawei.com,
	chenxiang66@hisilicon.com, prime.zeng@huawei.com,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [bug report] scsi: SATA devices missing after FLR is triggered during HBA suspended
Date: Mon, 1 Jul 2024 15:39:03 -0500	[thread overview]
Message-ID: <20240701203903.GA16142@bhelgaas> (raw)
In-Reply-To: <a6e86954-4ab7-4bb0-b78d-56f44556318e@kernel.org>

[+cc Alex]

On Thu, Jun 27, 2024 at 09:56:02AM +0900, Damien Le Moal wrote:
> On 6/27/24 00:15, Bjorn Helgaas wrote:
> >>> Yes, I am talking about the PCI "Function Level Reset"
> >>>
> >>>> FLR and disk/controller suspend execution timing are unrelated.
> >>>> FLR can be triggered at any time through sysfs. So please give
> >>>> details here. Why is FLR done when the system is being
> >>>> suspended ?
> >>>
> >>> Yes, it is because FLR can be triggered at any time that we are
> >>> testing the reliability of executing FLR commands after
> >>> disk/controller suspended.
> >>
> >> "can be triggered" ? FLR is not a random asynchronous event. It
> >> is an action that is *issued* by a user with sys admin rights.
> >> And such users can do a lot of things that can break a machine...
> >>
> >> I fail to see the point of doing a function reset while the
> >> device is suspended. But granted, I guess the device should
> >> comeback up in such case, though I would like to hear what the
> >> PCI guys have to say about this.
> >>
> >> Bjorn,
> >>
> >> Is reseting a suspended PCI device something that should be/is
> >> supported ?
> > 
> > I doubt it.  The PCI core should be preserving all the generic PCI
> > state across suspend/resume.  The driver should only need to
> > save/restore device-specific things the PCI core doesn't know about.
> > 
> > A reset will clear out most state, and the driver doesn't know the
> > reset happened, so it will expect most device state to have been
> > preserved.
> 
> That is what I suspected. However, checking the code, reset_store() in
> pci-sysfs.c does:
> 
> 	pm_runtime_get_sync(dev);
> 	result = pci_reset_function(pdev);
> 	pm_runtime_put(dev);
> 
> and pm_runtime_get_sync() calls __pm_runtime_resume() which will
> resume a suspended device.
> 
> So while I still think it is not a good idea to reset a suspended
> device, things should still work as execpected and not cause any
> problem with the device state, right ?

The reset will clear almost all state, including both the generic PCI
part that pci_reset_function() saves/restores *and* any
device-specific state the PCI core doesn't know about.

That device-specific state isn't saved and restored anywhere in the
sysfs reset path, and the driver doesn't know this reset happened, so
I think all bets are off and we shouldn't expect the driver to work
afterwards.

A user-space reset might make sense if there's no driver bound to the
device, but I don't think it does if there is a driver (except maybe a
trivial stub driver that doesn't actually operate the device).

Bjorn

  parent reply	other threads:[~2024-07-01 20:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240618132900.2731301-1-liyihang9@huawei.com>
     [not found] ` <0c5e14eb-5560-48cb-9086-6ad9c3970427@kernel.org>
     [not found]   ` <f27d6fa7-3088-0e60-043e-e71232066b12@huawei.com>
2024-06-24  0:10     ` [bug report] scsi: SATA devices missing after FLR is triggered during HBA suspended Damien Le Moal
2024-06-24 12:10       ` Yihang Li
2024-07-01  3:03         ` Damien Le Moal
2024-07-02 11:20           ` Yihang Li
2024-06-26 15:15       ` Bjorn Helgaas
2024-06-27  0:56         ` Damien Le Moal
2024-06-27  8:19           ` Yihang Li
2024-07-01 20:39           ` Bjorn Helgaas [this message]
2024-07-02  2:38             ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240701203903.GA16142@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cassel@kernel.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=dlemoal@kernel.org \
    --cc=john.g.garry@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liyihang9@huawei.com \
    --cc=martin.petersen@oracle.com \
    --cc=prime.zeng@huawei.com \
    --cc=yanaijie@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).