From: Alex Williamson <alex.williamson@redhat.com>
To: Linas Vepstas <linasvepstas@gmail.com>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>,
Jonathan Corbet <corbet@lwn.net>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
linux-doc@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [PATCH] pci-error-recover: doc cleanup
Date: Fri, 9 Dec 2016 09:11:23 -0700 [thread overview]
Message-ID: <20161209091123.110c351a@t450s.home> (raw)
In-Reply-To: <CAHrUA36r3o3ziEdMz-8=w5XTymsMQZYRXXrCt=H+1F3M4+6RnQ@mail.gmail.com>
On Fri, 9 Dec 2016 14:44:25 +0800
Linas Vepstas <linasvepstas@gmail.com> wrote:
> On Fri, Dec 9, 2016 at 2:37 PM, Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> >
> >
> > On 12/09/2016 02:24 PM, Linas Vepstas wrote:
> >> I suppose I'm confused, but I recall that link resets are non-fatal.
> >> Fatal errors typically require that the the pci adapter be completely
> >> reset, any adapter firmware to be reloaded from scratch, the device
> >> driver has to kill all device state and start from scratch. Its huge.
> >> If the fatal error is on pci device that is under a block device
> >> holding a file system, then (usually) there is no way to recover,
> >> because the block layer (and file system) cannot deal with a block
> >> device that disappeared and then reappeared some few seconds later.
> >> (maybe some future zfs or lvm or btrfs might be able to deal with
> >> this, but not today)
> >>
> >> By contrast, link resets are far more gentle: the device driver might
> >> have to discard some half-full FIFO's, or cancel some in-flight
> >> commands, but can otherwise gracefully recover without telling the
> >> higher layers that there were any problems.
> >>
> >> --linas
> >>
> >
> > I am little confused too, even not sure if we are talking the same
> > *fatal error*, I am talking the fatal error defined in PCI Express spec,
> > chapter 6.2.2.2.1:
> >
> > Fatal errors are uncorrectable error conditions which render the
> > particular Link and related hardware unreliable. For Fatal errors, a
> > reset of the components on the Link may be required to return to
> > reliable operation. Platform handling of Fatal errors, and any efforts
> > to limit the effects of these errors, is platform implementation specific.
> >
> > Link reset means set *secondary bus reset* bit in pci bridge config
> > space, can reset the link and device simultaneously, is the strongest
> > kind of reset as I know.
>
> OK, well, its been far too many years, and I don't have the PCI spec
> at my fingertips.
> Isn't there a link reset that can be performed, without forcing a device reset?
>
> The intent was that some PCI link errors are due to vibration,
> ground-bounce, humidity, etc. and that these errors can be detected
> and do not corrupt the device state or the device driver state. Since
> they are not associated with data corruption (or rather, the
> corruption is local to the link), these can be recovered by reseting
> just the link, without resetting the whole adapter. They may require
> reseting some device-driver state, but not all of it.
>
> However, this was all decided before the PCI-E spec was written, so
> maybe the newer PCI-E specs now say something different.
Perhaps you're thinking of link retraining? That sort of error would
be considered correctable, not fatal. Fatal errors are uncorrected
errors and a bigger hammer is needed to deal with them, such as a link
reset. Thanks,
Alex
next prev parent reply other threads:[~2016-12-09 16:11 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-08 8:16 [PATCH] pci-error-recover: doc cleanup Cao jin
2016-12-08 14:05 ` Jonathan Corbet
2016-12-08 14:13 ` Cao jin
2016-12-09 6:24 ` Linas Vepstas
2016-12-09 6:37 ` Cao jin
2016-12-09 6:44 ` Linas Vepstas
2016-12-09 7:59 ` Cao jin
2016-12-09 16:11 ` Alex Williamson [this message]
2016-12-09 14:37 ` Jonathan Corbet
2016-12-19 3:25 ` Cao jin
2016-12-09 6:50 ` Andrew Donnellan
2016-12-14 2:39 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161209091123.110c351a@t450s.home \
--to=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=caoj.fnst@cn.fujitsu.com \
--cc=corbet@lwn.net \
--cc=linasvepstas@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox