public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: linas@austin.ibm.com (Linas Vepstas)
To: James Smart <James.Smart@Emulex.Com>
Cc: linux-scsi@vger.kernel.org, rlary@us.ibm.com
Subject: Re: PCI error recovery for the Emulex LPFC
Date: Tue, 31 Oct 2006 11:19:02 -0600	[thread overview]
Message-ID: <20061031171902.GP6360@austin.ibm.com> (raw)
In-Reply-To: <454754CC.8040308@emulex.com>

On Tue, Oct 31, 2006 at 08:51:08AM -0500, James Smart wrote:
> Linas,
> 
> I don't know of anything in this area.
> I also need a deeper understand of what the error was, and how,
> that was injected. This play into it.

When the PCI slot is frozen, the PCI bridge will block all writes
to the device, and will return all 0xffffffff for reads. All DMA
will be prevented from going through. 

> Also, PCI error recovery is not a simple task. 

I've implemented it for the ipr and symbios SCSI controllers, 
and for the e100, e1000, ixgb and s2io ethernet cards.  If you 
revew the actual code, you will see its fairly tiny. Mostly
I've discovered that if the device driver has clean, clear-cut 
device-up/device-down routines, then recovery is straightforward.

FWIW, I've run some of the kernels & devices through 48-hour runs 
with thousands of errors injected and successfully recovered from.

> There are many
> aspects to the adapter messaging interface and the affects of the
> PCI error recovery scheme that has to be closely looked at. DMA
> errors can be very fatal, even if the PCI bus survives. In many
> cases, the only safe recovery is a hard adapter reset (with little
> to no interaction with the adapter to clean up). 

Currently, all of the device drivers I mention above perform the 
recovery with a hard reset. The generic API does not require this,
but this seems to be the simplest, most robust/reliable route.
I experimeted with non-hard-reset on the s2io, which I got "almost
working". I don't know that its worth the trouble.

Just to be clear, I'm refering to the infrastructure documented 
in Documentation/pci-error-recovery.txt

--linas


      reply	other threads:[~2006-10-31 17:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-30 22:20 PCI error recovery for the Emulex LPFC Linas Vepstas
2006-10-31 13:51 ` James Smart
2006-10-31 17:19   ` Linas Vepstas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061031171902.GP6360@austin.ibm.com \
    --to=linas@austin.ibm.com \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=rlary@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox