public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Garzik <jgarzik@pobox.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Linux Kernel list <linux-kernel@vger.kernel.org>,
	linux-pci@atrey.karlin.mff.cuni.cz, linux-ia64@vger.kernel.org,
	Linus Torvalds <torvalds@osdl.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Linas Vepstas <linas@austin.ibm.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Tue, 01 Mar 2005 11:37:24 -0500	[thread overview]
Message-ID: <42249A44.4020507@pobox.com> (raw)
In-Reply-To: <422428EC.3090905@jp.fujitsu.com>

Hidetoshi Seto wrote:
> Hi, long time no see :-)
> 
> Currently, I/O error is not a leading cause of system failure.
> However, since Linux nowadays is making great progress on its
> scalability, and ever larger number of PCI devices are being
> connected to a single high-performance server, the risk of the
> I/O error is increasing day by day.
> 
> For example, PCI parity error is one of the most common errors
> in the hardware world. However, the major cause of parity error
> is not hardware's error but software's - low voltage, humidity,
> natural radiation... etc. Even though, some platforms are nervous
> to parity error enough to shutdown the system immediately on such
> error. So if device drivers can retry its transaction once results
> as an error, we can reduce the risk of I/O errors.
> 
> So I'd like to suggest new interfaces that enable drivers to
> check - detect error and retry their I/O transaction easily.

I have been thinking about PCI system and parity errors, and how to 
handle them.  I do not think this is the correct approach.

A simple retry is... too simple.  If you are having a massive problem on 
your PCI bus, more action should be taken than a retry.

In my opinion each driver needs to be aware of PCI sys/parity errs, and 
handle them.  For network drivers, this is rather simple -- check the 
hardware, then restart the DMA engine.  Possibly turning off 
TSO/checksum to guarantee that bad packets are not accepted.  For SATA 
and SCSI drivers, this is more complex, as one must retry a number of 
queued disk commands, after resetting the hardware.

A new API handles none of this.

	Jeff




  parent reply	other threads:[~2005-03-01 16:38 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-01  8:33 [PATCH/RFC] I/O-check interface for driver's error handling Hidetoshi Seto
2005-03-01 14:42 ` Matthew Wilcox
2005-03-01 19:27   ` Linas Vepstas
2005-03-01 19:37     ` Linus Torvalds
2005-03-02  6:13     ` Hidetoshi Seto
2005-03-02 19:20       ` Linas Vepstas
2005-03-04  2:03         ` Hidetoshi Seto
2005-03-04 16:46           ` Linas Vepstas
2005-03-01 16:37 ` Jeff Garzik [this message]
2005-03-01 16:49   ` Linus Torvalds
2005-03-01 16:59     ` Matthew Wilcox
2005-03-01 17:10       ` Jesse Barnes
2005-03-01 18:33         ` Linas Vepstas
2005-03-01 22:27           ` Benjamin Herrenschmidt
2005-03-02 20:02             ` Linas Vepstas
2005-03-02 22:46               ` Benjamin Herrenschmidt
2005-03-02 23:37                 ` Linas Vepstas
2005-03-01 22:23         ` Benjamin Herrenschmidt
2005-03-02  3:13         ` Hidetoshi Seto
2005-03-04 13:54         ` Pavel Machek
2005-03-04 17:50           ` Jesse Barnes
2005-03-04 22:37           ` Benjamin Herrenschmidt
2005-03-04 22:57             ` Pavel Machek
2005-03-04 23:03               ` Benjamin Herrenschmidt
2005-03-04 23:18                 ` Pavel Machek
2005-03-04 23:27                   ` Benjamin Herrenschmidt
2005-03-02  2:28       ` Hidetoshi Seto
2005-03-02 17:44         ` Linas Vepstas
2005-03-02 18:03           ` linux-os
2005-03-02 22:40             ` Benjamin Herrenschmidt
2005-03-04  2:21           ` Hidetoshi Seto
2005-03-01 22:20     ` Benjamin Herrenschmidt
2005-03-02 18:22     ` Linas Vepstas
2005-03-02 18:41       ` Jesse Barnes
2005-03-02 19:46         ` Linas Vepstas
2005-03-02 22:43         ` Benjamin Herrenschmidt
2005-03-02 22:41       ` Benjamin Herrenschmidt
2005-03-02 23:30         ` Linas Vepstas
2005-03-02 23:40           ` Jesse Barnes
2005-03-01 19:17   ` Linas Vepstas
2005-03-01 22:15   ` Benjamin Herrenschmidt
2005-03-01 17:19 ` Andi Kleen
2005-03-01 18:08   ` Linus Torvalds
2005-03-01 18:45     ` Andi Kleen
2005-03-01 18:59     ` Linas Vepstas
2005-03-01 22:26     ` Benjamin Herrenschmidt
2005-03-01 22:24   ` Benjamin Herrenschmidt
2005-03-04 12:40 ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42249A44.4020507@pobox.com \
    --to=jgarzik@pobox.com \
    --cc=benh@kernel.crashing.org \
    --cc=linas@austin.ibm.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@atrey.karlin.mff.cuni.cz \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox