From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-pci@atrey.karlin.mff.cuni.cz, linux-ia64@vger.kernel.org
Cc: Linus Torvalds <torvalds@osdl.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Linas Vepstas <linas@austin.ibm.com>,
"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Fri, 04 Mar 2005 21:40:58 +0900 [thread overview]
Message-ID: <4228575A.8070708@jp.fujitsu.com> (raw)
In-Reply-To: <422428EC.3090905@jp.fujitsu.com>
Thanks for all comments!
OK, I'd like to sort our situation:
################
$ Here are 2 features:
- iochk_clear/read() interface for error "detection"
by Seto ... me :-)
- callback, thread, and event notification for error "recovery"
by Linas ... expert in PPC64
$ What will "detection" interface provides?
- allow drivers to get error information
- device/bus was isolated/going-reset/re-enabled/etc.
- error status which hardware and PCI subsystem provides
- allow drivers to do "simple retry" easily
- major soft errors(*1) would be recovered by a simple retry
- in cases that device/bus was re-enabled but a retry is required
$ What will "recovery" infrastructure provides?
- allow drivers to help OS's recovery
- usually OS cannot re-enable affected devices by itself
- allow drivers to respond asynchronous error event
- allow drivers to implement "device specific recovery"
$ Difference of stance
- "detection"
- Assume that the number of soft error is far more than that of
hard error. (PCI-Express has ECC, but traditional PCI does not.)
- Assume that it isn't too late that attempt of device isolation
and/or recovery comes after a simple retry(*2), and that a retry
would be required even if the recovery had go well.
- It isn't matter whether device isolation is actually possible or
not for the arch. The fundamental intention of this interface is
prevent user applications from data pollution.
- Currently DMA and asynchronous I/O is not target.
- "recovery"
- (I'd appreciate it if Linas could fill here by his suitable words.)
- (Maybe,) it is based on assuming that erroneous device should be
isolated immediately irrespective of type of the error.
- (I guess that) once a device was isolated, it become harder to
re-enable it. It seems like a kind of hotplug feature.
- Currently there are few platform which can isolate devices and
attempt to recover from the I/O error.
$ How to use
- "detection" ... easy.
- clip I/Os by iochk_clear() and iochk_read()
- if iochk_read() returns non-0, retry once and/or notify the error
to user application.
- "recovery" ... rather hard.
- (I'd appreciate it if Linas could fill here by his suitable words.)
- write callback function for each event(*3)
-----
*1:
Traditionally, there are 2 types of error:
- soft error:
data was broken (ex. due to low voltage, natural radiation etc.)
temporary error
- hard error:
device or bus was physically broken (i.e. uncorrectable)
permanent error
*2:
it's difficult to distinguish hard errors from soft errors, without
any retry.
*3:
Linas, how many stages/events would you prefer to be there? is 3 enough?
ex. IMHO:
IOERR_DETECTED
- An error was detected, so error logging or device isolation would be
major request. On PPC64, isolation would be already done by hardware.
IOERR_PREPARE_RECOVERY
- Require preparation before attempting error recovery by OS.
IOERR_DO_RECOVERY
- Require device specific recovery and result of the recovery.
OS will gather all results and will decide recovered or not.
IOERR_RECOVERED
- OS recovery was succeeded.
IOERR_DEAD
- OS recovery was failed.
And as Ben said and as you already proposed, I also think only one callback
is enough and better, like:
int pci_emergency_callback(pci_dev *dev, err_event event, void *extra)
It allows us to add new event if desired.
################
Thanks,
H.Seto
prev parent reply other threads:[~2005-03-04 12:51 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-01 8:33 [PATCH/RFC] I/O-check interface for driver's error handling Hidetoshi Seto
2005-03-01 14:42 ` Matthew Wilcox
2005-03-01 19:27 ` Linas Vepstas
2005-03-01 19:37 ` Linus Torvalds
2005-03-02 6:13 ` Hidetoshi Seto
2005-03-02 19:20 ` Linas Vepstas
2005-03-04 2:03 ` Hidetoshi Seto
2005-03-04 16:46 ` Linas Vepstas
2005-03-01 16:37 ` Jeff Garzik
2005-03-01 16:49 ` Linus Torvalds
2005-03-01 16:59 ` Matthew Wilcox
2005-03-01 17:10 ` Jesse Barnes
2005-03-01 18:33 ` Linas Vepstas
2005-03-01 22:27 ` Benjamin Herrenschmidt
2005-03-02 20:02 ` Linas Vepstas
2005-03-02 22:46 ` Benjamin Herrenschmidt
2005-03-02 23:37 ` Linas Vepstas
2005-03-01 22:23 ` Benjamin Herrenschmidt
2005-03-02 3:13 ` Hidetoshi Seto
2005-03-04 13:54 ` Pavel Machek
2005-03-04 17:50 ` Jesse Barnes
2005-03-04 22:37 ` Benjamin Herrenschmidt
2005-03-04 22:57 ` Pavel Machek
2005-03-04 23:03 ` Benjamin Herrenschmidt
2005-03-04 23:18 ` Pavel Machek
2005-03-04 23:27 ` Benjamin Herrenschmidt
2005-03-02 2:28 ` Hidetoshi Seto
2005-03-02 17:44 ` Linas Vepstas
2005-03-02 18:03 ` linux-os
2005-03-02 22:40 ` Benjamin Herrenschmidt
2005-03-04 2:21 ` Hidetoshi Seto
2005-03-01 22:20 ` Benjamin Herrenschmidt
2005-03-02 18:22 ` Linas Vepstas
2005-03-02 18:41 ` Jesse Barnes
2005-03-02 19:46 ` Linas Vepstas
2005-03-02 22:43 ` Benjamin Herrenschmidt
2005-03-02 22:41 ` Benjamin Herrenschmidt
2005-03-02 23:30 ` Linas Vepstas
2005-03-02 23:40 ` Jesse Barnes
2005-03-01 19:17 ` Linas Vepstas
2005-03-01 22:15 ` Benjamin Herrenschmidt
2005-03-01 17:19 ` Andi Kleen
2005-03-01 18:08 ` Linus Torvalds
2005-03-01 18:45 ` Andi Kleen
2005-03-01 18:59 ` Linas Vepstas
2005-03-01 22:26 ` Benjamin Herrenschmidt
2005-03-01 22:24 ` Benjamin Herrenschmidt
2005-03-04 12:40 ` Hidetoshi Seto [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4228575A.8070708@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=benh@kernel.crashing.org \
--cc=linas@austin.ibm.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@atrey.karlin.mff.cuni.cz \
--cc=tony.luck@intel.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox