From: Niklas Schnelle <schnelle@linux.ibm.com>
To: "Oliver O'Halloran" <oohall@gmail.com>
Cc: linux-s390@vger.kernel.org, Pierre Morel <pmorel@linux.ibm.com>,
Matthew Rosato <mjrosato@linux.ibm.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Linas Vepstas <linasvepstas@gmail.com>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 0/5] s390/pci: automatic error recovery
Date: Wed, 08 Sep 2021 10:09:21 +0200 [thread overview]
Message-ID: <4a250dadbf8124980b4912389745c9546e2ec431.camel@linux.ibm.com> (raw)
In-Reply-To: <CAOSf1CH2T-R44qx1mGpJQ8WgD0upxG8sQNud_5L3SHYZJm9LRA@mail.gmail.com>
On Wed, 2021-09-08 at 11:37 +1000, Oliver O'Halloran wrote:
> On Tue, Sep 7, 2021 at 10:21 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> > On Tue, 2021-09-07 at 10:45 +0200, Niklas Schnelle wrote:
> > > On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote:
> > > > On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> > > > > Patch 3 I already sent separately resulting in the discussion below but without
> > > > > a final conclusion.
> > > > >
> > > > > https://lore.kernel.org/lkml/20210720150145.640727-1-schnelle@linux.ibm.com/
> > > > >
> > > > > I believe even though there were some doubts about the use of
> > > > > pci_dev_is_added() by arch code the existing uses as well as the use in the
> > > > > final patch of this series warrant this export.
> > > >
> > > > The use of pci_dev_is_added() in arch/powerpc was because in the past
> > > > pci_bus_add_device() could be called before pci_device_add(). That was
> > > > fixed a while ago so It should be safe to remove those calls now.
> > >
> > > Hmm, ok that confirms Bjorns suspicion and explains how it came to be.
> > > I can certainly sent a patch for that. This would then leave only the
> > > existing use in s390 which I added because of a dead lock prevention
> > > and explained here:
> > > https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.camel@linux.ibm.com/
> > >
> > > Plus the need to use it in the recovery code of this series. I think in
> > > the EEH code the need for a similar check is alleviated by the checks
> > > in the beginning of
> > > arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially
> > > eeh_slot_presence_check() which checks presence via the hotplug slot.
> > > I guess we could use our own state tracking in a similar way but felt
> > > like pci_dev_is_added() is the more logical choice.
>
> The slot check is mainly there to prevent attempts to "recover"
> devices that have been surprise removed (i.e NVMe hot-unplug). The
> actual recovery process operates off the eeh_pe tree which is frozen
> in place when an error is detected. If a pci_dev is added or removed
> it's not really a problem since those are only ever looked at when
> notifying drivers which is done with the rescan_remove lock held.
Thanks for the explanation.
> That
> said, I wouldn't really encourage anyone to follow the EEH model since
> it's pretty byzantine.
>
> > Looking into this again, I think we actually can't easily track this
> > state ourselves outside struct pci_dev. The reason for this is that
> > when e.g. arch/s390/pci/pci_sysfs.c:recover_store() removes the struct
> > pci_dev and scans it again the new struct pci_dev re-uses the same
> > struct zpci_dev because from a platform point of view the PCI device
> > was never removed but only disabled and re-enabled. Thus we can only
> > distinguish the stale struct pci_dev by looking at things stored in
> > struct pci_dev itself.
>
> IMO the real problem is removing and re-adding the pci_dev. I think
> it's something that's done largely because the PCI core doesn't really
> provide any better mechanism for getting a device back into a
> known-good state so it's abused to implement error recovery. This is
> something that's always annoyed me since it conflates recovery with
> hotplug. After a hot-(un)plug we might have a different device or no
> device. In the recovery case we expect to start and end with the same
> device. Why not apply the same logic to the pci_dev?
For us there are two cases. First The existing
/sys/bus/pci/devices/<dev>/recover attribute. This does the pci_dev
remove and re-add that you mention and thus we end up with a ne pci_dev
afterwards and I agree that is kind of a dumb way to recover which
(too?) closely resembles unplug/re-plug.
Secondly the automatic error recovery added in this series. Here we
only attempt recovery if we have a driver bound that supports the error
callbacks thus always keeping the same pci_dev. If there is no driver
we give up automatic recovery and are back at the situation without
this series.
>
> Something I was tinkering with before I left IBM was re-working the
> way EEH handles recovering devices that don't have a driver with error
> handling callbacks to something like:
>
> 1. unbind the driver
> 2. pci_save_state()
> 3. do the reset
> 4. pci_restore_state()
> 5. re-bind the driver
>
> That would allow keeping the pci_dev around and let me delete a pile
> of confusing code which handles binding the eeh_dev to the new
> pci_dev.
This sounds like an interesting future approach for us too. Thankfully
our binding of the zpci_dev to the new pci_dev is pretty simple by now.
The main trouble with removing and re-adding a pci_dev is then that
upper layers like block devices are also re-created which really only
happens if we have a driver bound.
> The obvious problem with that approach is the assumption the
> device is functional enough to allow saving the config space, but I
> don't think that's a deal breaker. We could stash a copy of the device
> state before we allow drivers to attach and use that to restore the
> device after the reset. The end result would be the same known-good
> state that we'd get after a re-scan.
>
> > That said, I think for the recovery case we might be able to drop the
> > pci_dev_is_added() and rely on pdev->driver != NULL which we check
> > anyway and that should catch any PCI device that was already removed.
>
> Would that work if there was an error on a device without a driver
> bound?
For the automatic recovery flow introduced by this series we only
recover if such a driver is bound anyway so that is already a
requirement. Luckily all physical PCI devices we support on our
platform have drivers with that support.
> If you're just trying to stop races between recovery and device
> removal then pci_dev_is_added() is probably the right tool for the
> job. Trying to substitute it with a proxy seems like a bad idea.
Yes I believe at least for the existing recover attribute that does not
require a bound driver we still need pci_dev_is_added().
For the automatic recovery flow I think it would be okay to rely on the
fact that removed devices don't have a driver bound since the recovery
requires a bound driver anyway but yes an explicit pci_dev_is_added()
check as in this patch does feel more clean.
next prev parent reply other threads:[~2021-09-08 8:10 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-06 9:49 [PATCH 0/5] s390/pci: automatic error recovery Niklas Schnelle
2021-09-06 9:49 ` [PATCH 1/5] s390/pci: refresh function handle in iomap Niklas Schnelle
2021-09-06 9:49 ` [PATCH 2/5] s390/pci: implement reset_slot for hotplug slot Niklas Schnelle
2021-09-06 9:49 ` [PATCH 3/5] PCI: Move pci_dev_is/assign_added() to pci.h Niklas Schnelle
2021-09-07 0:22 ` kernel test robot
2021-09-07 0:25 ` kernel test robot
2021-09-07 7:51 ` Andy Shevchenko
2021-09-07 8:14 ` Niklas Schnelle
2021-09-06 9:49 ` [PATCH 4/5] PCI: Export pci_dev_lock() Niklas Schnelle
2021-09-06 9:49 ` [PATCH 5/5] s390/pci: implement minimal PCI error recovery Niklas Schnelle
2021-09-07 2:04 ` [PATCH 0/5] s390/pci: automatic " Oliver O'Halloran
2021-09-07 8:45 ` Niklas Schnelle
2021-09-07 12:21 ` Niklas Schnelle
2021-09-08 1:37 ` Oliver O'Halloran
2021-09-08 8:09 ` Niklas Schnelle [this message]
2021-09-07 2:05 ` Linas Vepstas
2021-09-07 7:49 ` Niklas Schnelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a250dadbf8124980b4912389745c9546e2ec431.camel@linux.ibm.com \
--to=schnelle@linux.ibm.com \
--cc=bhelgaas@google.com \
--cc=linasvepstas@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mjrosato@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=pmorel@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).