From: Bernd Schubert <bs@q-leap.de>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH 2/7] Allow requeuement on DID_SOFT_ERROR
Date: Wed, 3 Dec 2008 18:06:01 +0100 [thread overview]
Message-ID: <200812031806.02179.bs@q-leap.de> (raw)
In-Reply-To: <1228321762.5551.23.camel@localhost.localdomain>
On Wednesday 03 December 2008 17:29:22 James Bottomley wrote:
> On Wed, 2008-12-03 at 17:00 +0100, Bernd Schubert wrote:
> > On Wednesday 03 December 2008 16:16:31 James Bottomley wrote:
> > > On Wed, 2008-12-03 at 13:17 +0100, Bernd Schubert wrote:
> > > > On Wednesday 26 November 2008 19:47:47 James Bottomley wrote:
> > > > > On Wed, 2008-11-26 at 18:46 +0100, Bernd Schubert wrote:
> > > > > > Activate the error handler if DID_SOFT_ERROR failed to often, but
> > > > > > only for commands which have a scmd->allowed > 1.
> > > > > > Also make a function out of a goto-block.
> > > > >
> > > > > What is the rationale for this? It really doesn't look right since
> > > > > DID_SOFT_ERROR is supposed to be for temporary out of resource
> > > > > conditions in the HBA driver ... activating the error handler isn't
> > > > > really going to fix this because the eh is taking us through a
> > > > > state model for device conditions, which DID_SOFT_ERROR shouldn't
> > > > > be.
> > > >
> > > > What do you suggest instead of? Just returning an I/O error without
> > > > even to try to recover the device isn't nice.
> > >
> > > it doesn't do that ... it retries up to the retry limit before failing
> > > the command. There is an argument that we should treat this as other
> > > temporary resource conditions like BUSY and QUEUE_FULL, so return
> > > ADD_TO_MLQUEUE. On the other hand, DID_REQUEUE already does that, so
> > > this would lose the only unconditional DID_ code going generically
> > > through the retry path.
> > >
> > > > > If you just need a DID_FAIL to activate the eh, it can be added
> > > > > without changing the meaning of DID_SOFT_ERROR.
> > > > >
> > > > > Also, you changed the return to make it device blocking (which also
> > > > > doesn't look right) but didn't document that in the change log.
> > > >
> > > > Last year you suggested to switch from NEEDS_RETRY to ADD_TO_MLQUEUE
> > > >
> > > > http://www.mail-archive.com/linux-scsi%40vger.kernel.org/msg12475.htm
> > > >l
> > > >
> > > > When I wrote the patch documentation, I already forgot about it,
> > > > sorry. Unfortunately, it didn't help much for our devices. So I made
> > > > it to activate the eh only, if it fails too often. With activated eh,
> > > > devices sometimes can be recovered. But I'm certainly grateful for
> > > > any hints to further improve recovery and to prevent i/o errors.
> > >
> > > Well, what exactly is the problem? changing to ADD_TO_MLQUEUE will
> > > retry intermittently up to the command timeout. If activating the
> > > error handler actually fixes the problem, then the driver was probably
> > > wrongly returning DID_SOFT_ERROR.
> >
> > Well, I have certainly no experience with hardware/driver programming but
> > all drivers I have looked at, seem to use DID_SOFT_ERROR as something
> > like DID_UNKNOWN_ERROR.
>
> DID_SOFT_ERROR means specifically that the driver ran into a resource or
> other soft (as in retryable) error.
>
> > I certainly do not insist on using ADD_TO_MLQUEUE instead of NEEDS_RETRY
> > and I will happlily modify the patch, if you think NEEDS_RETRY is better.
> > But I would really prefer to try to recover the device when
> > DID_SOFT_ERROR came up. I mean without the eh we get an I/O error anyway.
> > So as last attempt to try to do some device resets won't hurt, will it?
>
> Yes, it will on a SAN. For an error condition internal to the driver
> what is the point of causing external disruptions to the devices?
Hrm, right. For SAN it is really a problem :(
>
> > And I have really seen some successful mpt fusion device recoveries.
>
> Well, my guess there would be the internal sequencer is hosed and the
> host reset corrected it, is that right? In which case, there might be a
> place where mpt fusion is returning DID_SOFT_ERROR incorrectly.
Well, these DID_SOFT_ERRORs are not easily reproducibly. If they would, I
already would have debugged it more in detail.
Any idea what we can do?
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
next prev parent reply other threads:[~2008-12-03 17:06 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-26 17:40 [PATCH 0/7] scsi error handler improvements Bernd Schubert
2008-11-26 17:44 ` [PATCH 1/7] print eh activation Bernd Schubert
2008-11-26 18:47 ` James Bottomley
2008-12-03 11:19 ` Bernd Schubert
2008-12-03 15:16 ` James Bottomley
2008-12-03 15:52 ` Bernd Schubert
2008-11-26 17:46 ` [PATCH 2/7] Allow requeuement on DID_SOFT_ERROR Bernd Schubert
2008-11-26 18:47 ` James Bottomley
2008-12-03 12:17 ` Bernd Schubert
2008-12-03 15:16 ` James Bottomley
2008-12-03 16:00 ` Bernd Schubert
2008-12-03 16:29 ` James Bottomley
2008-12-03 17:06 ` Bernd Schubert [this message]
2008-11-26 18:25 ` [PATCH 03/07] Don't online offlined devices in scsi_target_quiesce() Bernd Schubert
2008-11-26 18:26 ` [PATCH 4/7] allow activation of eh on DID_NO_CONNECT Bernd Schubert
2008-11-26 18:29 ` [PATCH 5/7] time needs to be adjusted when eh was running Bernd Schubert
2009-01-07 18:09 ` Bernd Schubert
2008-11-26 18:31 ` [PATCH 6/7] SYNCHRONIZE_CACHE command used fixed value Bernd Schubert
2008-11-26 18:32 ` [PATCH 0/7] trivial: move a variable from function to if-scope Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200812031806.02179.bs@q-leap.de \
--to=bs@q-leap.de \
--cc=James.Bottomley@hansenpartnership.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.