All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Smart <James.Smart@emulex.com>
To: Roland Dreier <roland@purestorage.com>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
	Hannes Reinecke <hare@suse.de>,
	Jej B <James.Bottomley@hansenpartnership.com>
Subject: Re: SCSI error handling -- one error blocks the whole SCSI host
Date: Sat, 25 May 2013 14:07:32 -0400	[thread overview]
Message-ID: <51A0FDE4.7050506@emulex.com> (raw)
In-Reply-To: <CAL1RGDWuriBQQcTngjn5M3fUg0VC9XUKC_0iMQbf2p2DKfxJsQ@mail.gmail.com>

Roland,

I agree, and am already working around that limitation.

-- james s


On 5/23/2013 2:14 PM, Roland Dreier wrote:
> At LSF this year, we had a discussion about error handling and in
> particular the problem that SCSI midlayer error handling waits for the
> entire SCSI host (HBA) to quiesce before it starts to abort commands
> etc.
>
> James made the suggestion that FC should handle things the way SAS
> does, because SAS has a strategy handler that does things the right
> way.  However, now that I finally sit down and look at the code, I
> don't see how this is the case.  It seems inherent in the way that
> scsi_eh_scmd_add() and the thread in scsi_error_handler() work (in
> particular the strategy handler can't even be called until host_failed
> == host_busy; we don't bump host_failed without SHOST_RECOVERY set,
> which stops queueing commands to any devices attached to the whole
> HBA).
>
> James, am I understanding your suggestion properly?  If so can you
> explain what you meant about the libsas code -- I see that it has its
> own strategy handler but as I said before we've already stopped every
> device attached to the HBA before we ever get there.
>
> To recapitulate the problem here, we might have a whole fabric
> attached to an HBA via SAS or FC, and be doing 500K IOPS happily to 50
> devices.  Then a single LUN goes wonky and all the IO stops while we
> try to recover that single device, which might take minutes.
>
> I know this has been discussed before, but can we find a way forward
> here?  Is there some way we can start with per-device error recovery
> and avoid disrupting IO that we can see is working fine?
>
> Thanks,
>    Roland
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


  reply	other threads:[~2013-05-25 18:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-23 18:14 SCSI error handling -- one error blocks the whole SCSI host Roland Dreier
2013-05-25 18:07 ` James Smart [this message]
2013-05-26 22:44 ` James Bottomley
2013-05-27 14:39   ` Hannes Reinecke
2013-05-27 20:41     ` James Bottomley
2013-05-28  1:32       ` Baruch Even
2013-05-28 14:38         ` Jeremy Linton
2013-05-28 16:22           ` Baruch Even

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51A0FDE4.7050506@emulex.com \
    --to=james.smart@emulex.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=hare@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=roland@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.