All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Garzik <jgarzik@pobox.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Linux Kernel list <linux-kernel@vger.kernel.org>,
	linux-pci@atrey.karlin.mff.cuni.cz, linux-ia64@vger.kernel.org,
	Linus Torvalds <torvalds@osdl.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Linas Vepstas <linas@austin.ibm.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Tue, 01 Mar 2005 16:37:24 +0000	[thread overview]
Message-ID: <42249A44.4020507@pobox.com> (raw)
In-Reply-To: <422428EC.3090905@jp.fujitsu.com>

Hidetoshi Seto wrote:
> Hi, long time no see :-)
> 
> Currently, I/O error is not a leading cause of system failure.
> However, since Linux nowadays is making great progress on its
> scalability, and ever larger number of PCI devices are being
> connected to a single high-performance server, the risk of the
> I/O error is increasing day by day.
> 
> For example, PCI parity error is one of the most common errors
> in the hardware world. However, the major cause of parity error
> is not hardware's error but software's - low voltage, humidity,
> natural radiation... etc. Even though, some platforms are nervous
> to parity error enough to shutdown the system immediately on such
> error. So if device drivers can retry its transaction once results
> as an error, we can reduce the risk of I/O errors.
> 
> So I'd like to suggest new interfaces that enable drivers to
> check - detect error and retry their I/O transaction easily.

I have been thinking about PCI system and parity errors, and how to 
handle them.  I do not think this is the correct approach.

A simple retry is... too simple.  If you are having a massive problem on 
your PCI bus, more action should be taken than a retry.

In my opinion each driver needs to be aware of PCI sys/parity errs, and 
handle them.  For network drivers, this is rather simple -- check the 
hardware, then restart the DMA engine.  Possibly turning off 
TSO/checksum to guarantee that bad packets are not accepted.  For SATA 
and SCSI drivers, this is more complex, as one must retry a number of 
queued disk commands, after resetting the hardware.

A new API handles none of this.

	Jeff




WARNING: multiple messages have this Message-ID (diff)
From: Jeff Garzik <jgarzik@pobox.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Linux Kernel list <linux-kernel@vger.kernel.org>,
	linux-pci@atrey.karlin.mff.cuni.cz, linux-ia64@vger.kernel.org,
	Linus Torvalds <torvalds@osdl.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Linas Vepstas <linas@austin.ibm.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Tue, 01 Mar 2005 11:37:24 -0500	[thread overview]
Message-ID: <42249A44.4020507@pobox.com> (raw)
In-Reply-To: <422428EC.3090905@jp.fujitsu.com>

Hidetoshi Seto wrote:
> Hi, long time no see :-)
> 
> Currently, I/O error is not a leading cause of system failure.
> However, since Linux nowadays is making great progress on its
> scalability, and ever larger number of PCI devices are being
> connected to a single high-performance server, the risk of the
> I/O error is increasing day by day.
> 
> For example, PCI parity error is one of the most common errors
> in the hardware world. However, the major cause of parity error
> is not hardware's error but software's - low voltage, humidity,
> natural radiation... etc. Even though, some platforms are nervous
> to parity error enough to shutdown the system immediately on such
> error. So if device drivers can retry its transaction once results
> as an error, we can reduce the risk of I/O errors.
> 
> So I'd like to suggest new interfaces that enable drivers to
> check - detect error and retry their I/O transaction easily.

I have been thinking about PCI system and parity errors, and how to 
handle them.  I do not think this is the correct approach.

A simple retry is... too simple.  If you are having a massive problem on 
your PCI bus, more action should be taken than a retry.

In my opinion each driver needs to be aware of PCI sys/parity errs, and 
handle them.  For network drivers, this is rather simple -- check the 
hardware, then restart the DMA engine.  Possibly turning off 
TSO/checksum to guarantee that bad packets are not accepted.  For SATA 
and SCSI drivers, this is more complex, as one must retry a number of 
queued disk commands, after resetting the hardware.

A new API handles none of this.

	Jeff




  parent reply	other threads:[~2005-03-01 16:37 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-01  8:33 [PATCH/RFC] I/O-check interface for driver's error handling Hidetoshi Seto
2005-03-01  8:33 ` Hidetoshi Seto
2005-03-01 14:42 ` Matthew Wilcox
2005-03-01 14:42   ` Matthew Wilcox
2005-03-01 19:27   ` Linas Vepstas
2005-03-01 19:27     ` Linas Vepstas
2005-03-01 19:37     ` Linus Torvalds
2005-03-01 19:37       ` Linus Torvalds
2005-03-02  6:13     ` Hidetoshi Seto
2005-03-02  6:13       ` Hidetoshi Seto
2005-03-02 19:20       ` Linas Vepstas
2005-03-02 19:20         ` Linas Vepstas
2005-03-04  2:03         ` Hidetoshi Seto
2005-03-04  2:03           ` Hidetoshi Seto
2005-03-04 16:46           ` Linas Vepstas
2005-03-04 16:46             ` Linas Vepstas
2005-03-01 16:37 ` Jeff Garzik [this message]
2005-03-01 16:37   ` Jeff Garzik
2005-03-01 16:49   ` Linus Torvalds
2005-03-01 16:49     ` Linus Torvalds
2005-03-01 16:59     ` Matthew Wilcox
2005-03-01 16:59       ` Matthew Wilcox
2005-03-01 17:10       ` Jesse Barnes
2005-03-01 17:10         ` Jesse Barnes
2005-03-01 18:33         ` Linas Vepstas
2005-03-01 18:33           ` Linas Vepstas
2005-03-01 22:27           ` Benjamin Herrenschmidt
2005-03-01 22:27             ` Benjamin Herrenschmidt
2005-03-02 20:02             ` Linas Vepstas
2005-03-02 20:02               ` Linas Vepstas
2005-03-02 22:46               ` Benjamin Herrenschmidt
2005-03-02 22:46                 ` Benjamin Herrenschmidt
2005-03-02 23:37                 ` Linas Vepstas
2005-03-02 23:37                   ` Linas Vepstas
2005-03-01 22:23         ` Benjamin Herrenschmidt
2005-03-01 22:23           ` Benjamin Herrenschmidt
2005-03-02  3:13         ` Hidetoshi Seto
2005-03-02  3:13           ` Hidetoshi Seto
2005-03-04 13:54         ` Pavel Machek
2005-03-04 13:54           ` Pavel Machek
2005-03-04 17:50           ` Jesse Barnes
2005-03-04 17:50             ` Jesse Barnes
2005-03-04 22:37           ` Benjamin Herrenschmidt
2005-03-04 22:37             ` Benjamin Herrenschmidt
2005-03-04 22:57             ` Pavel Machek
2005-03-04 22:57               ` Pavel Machek
2005-03-04 23:03               ` Benjamin Herrenschmidt
2005-03-04 23:03                 ` Benjamin Herrenschmidt
2005-03-04 23:18                 ` Pavel Machek
2005-03-04 23:18                   ` Pavel Machek
2005-03-04 23:27                   ` Benjamin Herrenschmidt
2005-03-04 23:27                     ` Benjamin Herrenschmidt
2005-03-02  2:28       ` Hidetoshi Seto
2005-03-02  2:28         ` Hidetoshi Seto
2005-03-02 17:44         ` Linas Vepstas
2005-03-02 17:44           ` Linas Vepstas
2005-03-02 18:03           ` linux-os
2005-03-02 18:03             ` linux-os
2005-03-02 22:40             ` Benjamin Herrenschmidt
2005-03-02 22:40               ` Benjamin Herrenschmidt
2005-03-04  2:21           ` Hidetoshi Seto
2005-03-04  2:21             ` Hidetoshi Seto
2005-03-01 22:20     ` Benjamin Herrenschmidt
2005-03-01 22:20       ` Benjamin Herrenschmidt
2005-03-02 18:22     ` Linas Vepstas
2005-03-02 18:22       ` Linas Vepstas
2005-03-02 18:41       ` Jesse Barnes
2005-03-02 18:41         ` Jesse Barnes
2005-03-02 19:46         ` Linas Vepstas
2005-03-02 19:46           ` Linas Vepstas
2005-03-02 22:43         ` Benjamin Herrenschmidt
2005-03-02 22:43           ` Benjamin Herrenschmidt
2005-03-02 22:41       ` Benjamin Herrenschmidt
2005-03-02 22:41         ` Benjamin Herrenschmidt
2005-03-02 23:30         ` Linas Vepstas
2005-03-02 23:30           ` Linas Vepstas
2005-03-02 23:40           ` Jesse Barnes
2005-03-02 23:40             ` Jesse Barnes
2005-03-01 19:17   ` Linas Vepstas
2005-03-01 19:17     ` Linas Vepstas
2005-03-01 22:15   ` Benjamin Herrenschmidt
2005-03-01 22:15     ` Benjamin Herrenschmidt
2005-03-01 17:19 ` Andi Kleen
2005-03-01 18:08   ` Linus Torvalds
2005-03-01 18:45     ` Andi Kleen
2005-03-01 18:59     ` Linas Vepstas
2005-03-01 22:26     ` Benjamin Herrenschmidt
2005-03-01 22:24   ` Benjamin Herrenschmidt
2005-03-04 12:40 ` Hidetoshi Seto
2005-03-04 12:40   ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42249A44.4020507@pobox.com \
    --to=jgarzik@pobox.com \
    --cc=benh@kernel.crashing.org \
    --cc=linas@austin.ibm.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@atrey.karlin.mff.cuni.cz \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.