From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Bernd Schubert <bs@q-leap.de>
Cc: Matthew Wilcox <matthew@wil.cx>, linux-scsi@vger.kernel.org
Subject: Re: [PATCH] scsi device recovery
Date: Wed, 12 Dec 2007 10:59:36 -0500 [thread overview]
Message-ID: <1197475177.4203.29.camel@localhost.localdomain> (raw)
In-Reply-To: <200712121536.10665.bs@q-leap.de>
On Wed, 2007-12-12 at 15:36 +0100, Bernd Schubert wrote:
> On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> > On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > > below is a patch introducing device recovery, trying to prevent i/o
> > > errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
> >
> > Why doesn't the regular scsi_eh do what you need?
>
> First of all, it is presently simply not called when the two errors above do
> happen. This could be changed, of course.
Erm, I think you'll find the error handler does activate on
DID_SOFT_ERROR. It causes a retry via the eh. DID_NO_CONNECT is an
immediate error with no eh intervention because it means that the target
went away. Handling this as a retryable error isn't an option because
it will interfere with hotplug.
> Secondly, I think scsi_eh is in most cases doing too much. We are fighting
> with flaky Infortrend boxes here, and scsi_eh sometimes manages to crash
> their scsi channels. In most cases it is sufficient to stall any io to the
> device and then to resume.
But that's basically the default behaviour of the error handler (stall
then resume).
> For most scsi devices one probably doesn't need a suspend time or it can be
> very small, this still needs to become configurable via sysfs.
You mean a wait time beyond what the error handler currently does
(basically it waits for the quiesce, begins error handling and then
sends a test unit ready when it finishes before restarting).
> Thirdly, scsi_eh doesn't give up, in most cases, when the scsi channel of a
> Infortrend box crashed, it tried forever to recover.
> To improve this is still on my todo list.
Could you send traces for this. I thought the error handler had been
fixed over the last few years always to terminate. If there's a case
where it doesn't, this needs fixing.
James
next prev parent reply other threads:[~2007-12-12 15:59 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-12 12:54 [PATCH] scsi device recovery Bernd Schubert
2007-12-12 13:39 ` Matthew Wilcox
2007-12-12 14:36 ` Bernd Schubert
2007-12-12 15:59 ` James Bottomley [this message]
2007-12-12 17:54 ` Bernd Schubert
2007-12-13 14:18 ` James Bottomley
2007-12-14 11:26 ` fusion problem (was Re: [PATCH] scsi device recovery) Bernd Schubert
2007-12-14 12:04 ` [PATCH] scsi device recovery Bernd Schubert
2007-12-14 12:22 ` Matthew Wilcox
2007-12-14 12:28 ` Bernd Schubert
2007-12-14 14:35 ` James Bottomley
2007-12-14 15:26 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1197475177.4203.29.camel@localhost.localdomain \
--to=james.bottomley@hansenpartnership.com \
--cc=bs@q-leap.de \
--cc=linux-scsi@vger.kernel.org \
--cc=matthew@wil.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.