public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Linas Vepstas <linas@austin.ibm.com>
Cc: Greg KH <greg@kroah.com>,
	Linux Kernel list <linux-kernel@vger.kernel.org>,
	linux-ia64@vger.kernel.org, linux-pci@atrey.karlin.mff.cuni.cz
Subject: Re: [PATCH 1/6] PCIERR : interfaces for synchronous I/O error detection on driver
Date: Mon, 27 Mar 2006 11:37:36 +0900	[thread overview]
Message-ID: <44274FF0.406@jp.fujitsu.com> (raw)
In-Reply-To: <20060324234306.GC21895@austin.ibm.com>

Linas Vepstas wrote:
> On Fri, Mar 24, 2006 at 04:47:25PM +0900, Hidetoshi Seto wrote:
>> However, some difficulty still remains to cover all possible error
>> situations even if we use callbacks. It will not help keeping data
>> integrity, passing no broken data to drivers and user lands, preventing
>> applications from going crazy or sudden death.
> 
> This is not true.  Although there are some subtle issues, (which
> I invite you to describe), the goal of the current design is to 
> insure data integrity, and make sure that neither the driver nor 
> the userland gets corrupted data. There shouldn't be any "crazy
> or sudden death" if the device drivers are any good.

OK, we are sharing the same goal even now.

I failed to mention that as you know this synchronous error detection
would be required if the async-callback needs to touch the hardware
due to recover it or to pick up diagnostic data during kernel-initiated
recovery. (I found a word "pci_check_whatever() API" at your comment
in document, Documentation/pci-error-recovery.txt)

> Of course, this depends on the hardware implementation. If
> your PCI bus sends corrupt data up to the driver ... all bets 
> are off. The design is predicated on the assumption that the
> hardware sends either good data or no data, ad that the latter
> is associated with a bus state indicating an error has ocurred.
> 
>>  - It will be useful if arch chooses panic on bus errors not to pass
>>    any broken data to un-reliable drivers.
> 
> I assume you meant "if arch chooses NOT to panic on bus errors ..."

Hmm, what I meant is that:
   There is an arch that chooses reboot on bus error.
   The reason why it do so is that the design is based on the assumption
   that no driver is able to handle bus error and that almost all drivers
   will go without checking hardware status. So the arch chooses rebooting
   rather than polluting user data.
   The design allows OS to determine whether the system goes reboot or not,
   but OS has no idea to know which driver actually check hardware state.
Therefore this interface will help OS to know which driver is reliable.

Of course there are some arch that chooses not to panic/reboot on bus error.
I think they are believing that all drivers working on the arch can handle
any type of errors, or they have their special feature against errors...,
or just being idiot about hardware errors.

Anyway, all that is certain is that:
  - To check the data from hardware, driver need to ask anywhere synchronously.
  - "Anywhere" would be a register, and/or something in kernel/hardware.
  - State check would be architecture dependent routine work.

Thanks,
H.Seto


  reply	other threads:[~2006-03-27  2:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-22  8:38 [PATCH] PCIERR : interfaces for synchronous I/O error detection on driver Hidetoshi Seto
2006-03-22 21:01 ` Greg KH
2006-03-24  7:47   ` [PATCH 1/6] " Hidetoshi Seto
2006-03-24 23:43     ` Linas Vepstas
2006-03-27  2:37       ` Hidetoshi Seto [this message]
2006-03-31 22:01         ` Linas Vepstas
2006-04-03  4:54           ` Hidetoshi Seto
2006-03-24  7:48   ` [PATCH 2/6] PCIERR : interfaces for synchronous I/O error detection on driver (config) Hidetoshi Seto
2006-03-24  7:49   ` [PATCH 3/6] PCIERR : interfaces for synchronous I/O error detection on driver (base) Hidetoshi Seto
2006-03-24  7:50   ` [PATCH 4/6] PCIERR : interfaces for synchronous I/O error detection on driver (mcadrv) Hidetoshi Seto
2006-03-24  7:51   ` [PATCH 5/6] PCIERR : interfaces for synchronous I/O error detection on driver (poison) Hidetoshi Seto
2006-03-24  7:52   ` [PATCH 6/6] PCIERR : interfaces for synchronous I/O error detection on driver (sample: Fusion MPT) Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44274FF0.406@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=greg@kroah.com \
    --cc=linas@austin.ibm.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@atrey.karlin.mff.cuni.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox