From: Bjorn Helgaas <helgaas@kernel.org>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Yihang Li <liyihang9@huawei.com>,
cassel@kernel.org, James.Bottomley@hansenpartnership.com,
martin.petersen@oracle.com, john.g.garry@oracle.com,
yanaijie@huawei.com, linux-kernel@vger.kernel.org,
linux-scsi@vger.kernel.org, linuxarm@huawei.com,
chenxiang66@hisilicon.com, prime.zeng@huawei.com,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [bug report] scsi: SATA devices missing after FLR is triggered during HBA suspended
Date: Mon, 1 Jul 2024 15:39:03 -0500 [thread overview]
Message-ID: <20240701203903.GA16142@bhelgaas> (raw)
In-Reply-To: <a6e86954-4ab7-4bb0-b78d-56f44556318e@kernel.org>
[+cc Alex]
On Thu, Jun 27, 2024 at 09:56:02AM +0900, Damien Le Moal wrote:
> On 6/27/24 00:15, Bjorn Helgaas wrote:
> >>> Yes, I am talking about the PCI "Function Level Reset"
> >>>
> >>>> FLR and disk/controller suspend execution timing are unrelated.
> >>>> FLR can be triggered at any time through sysfs. So please give
> >>>> details here. Why is FLR done when the system is being
> >>>> suspended ?
> >>>
> >>> Yes, it is because FLR can be triggered at any time that we are
> >>> testing the reliability of executing FLR commands after
> >>> disk/controller suspended.
> >>
> >> "can be triggered" ? FLR is not a random asynchronous event. It
> >> is an action that is *issued* by a user with sys admin rights.
> >> And such users can do a lot of things that can break a machine...
> >>
> >> I fail to see the point of doing a function reset while the
> >> device is suspended. But granted, I guess the device should
> >> comeback up in such case, though I would like to hear what the
> >> PCI guys have to say about this.
> >>
> >> Bjorn,
> >>
> >> Is reseting a suspended PCI device something that should be/is
> >> supported ?
> >
> > I doubt it. The PCI core should be preserving all the generic PCI
> > state across suspend/resume. The driver should only need to
> > save/restore device-specific things the PCI core doesn't know about.
> >
> > A reset will clear out most state, and the driver doesn't know the
> > reset happened, so it will expect most device state to have been
> > preserved.
>
> That is what I suspected. However, checking the code, reset_store() in
> pci-sysfs.c does:
>
> pm_runtime_get_sync(dev);
> result = pci_reset_function(pdev);
> pm_runtime_put(dev);
>
> and pm_runtime_get_sync() calls __pm_runtime_resume() which will
> resume a suspended device.
>
> So while I still think it is not a good idea to reset a suspended
> device, things should still work as execpected and not cause any
> problem with the device state, right ?
The reset will clear almost all state, including both the generic PCI
part that pci_reset_function() saves/restores *and* any
device-specific state the PCI core doesn't know about.
That device-specific state isn't saved and restored anywhere in the
sysfs reset path, and the driver doesn't know this reset happened, so
I think all bets are off and we shouldn't expect the driver to work
afterwards.
A user-space reset might make sense if there's no driver bound to the
device, but I don't think it does if there is a driver (except maybe a
trivial stub driver that doesn't actually operate the device).
Bjorn
next prev parent reply other threads:[~2024-07-01 20:39 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-18 13:29 [bug report] scsi: SATA devices missing after FLR is triggered during HBA suspended Yihang Li
2024-06-18 23:11 ` Damien Le Moal
2024-06-22 3:31 ` Yihang Li
2024-06-22 11:25 ` Niklas Cassel
2024-06-24 0:10 ` Damien Le Moal
2024-06-24 12:10 ` Yihang Li
2024-07-01 3:03 ` Damien Le Moal
2024-07-02 11:20 ` Yihang Li
2024-06-26 15:15 ` Bjorn Helgaas
2024-06-27 0:56 ` Damien Le Moal
2024-06-27 8:19 ` Yihang Li
2024-07-01 20:39 ` Bjorn Helgaas [this message]
2024-07-02 2:38 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240701203903.GA16142@bhelgaas \
--to=helgaas@kernel.org \
--cc=James.Bottomley@hansenpartnership.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=cassel@kernel.org \
--cc=chenxiang66@hisilicon.com \
--cc=dlemoal@kernel.org \
--cc=john.g.garry@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=liyihang9@huawei.com \
--cc=martin.petersen@oracle.com \
--cc=prime.zeng@huawei.com \
--cc=yanaijie@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.