linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: michaelc@cs.wisc.edu
To: dm-devel@redhat.com, linux-scsi@vger.kernel.org
Subject: block and scsi fail fast fixes
Date: Wed,  4 Jun 2008 20:41:39 -0500	[thread overview]
Message-ID: <1212630106-13413-1-git-send-email-michaelc@cs.wisc.edu> (raw)

The following patches fix two problems I have been seeing in Red Hat
bugzillas. The patches are made over scsi-misc, but except for
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
they could also apply over scsi-rc-fixes or linus's tree.
0006-block-and-drivers-separate-failfast-into-multiple-b.patch has a patch
to convert the scsi dh modules so that is why it does not apply to
the other kernels.

The first problem is that when a transport problem is detected and
the classes/drivers block the scsi_devices, there is IO in the driver
and IO in the scsi_device queues. For fibre we have the fast IO fail
tmo infrastructure to allow us to get IO in the driver up to multipath,
but IO in the queues remains until the dev_loss_tmo fires. The
difference between the timers can be minutes, so it looks like hang to
the application. iSCSI has something similar to FC's fast io fail
tmo, but it is called the replacment timeout. With this we will fail
all IO that is in the driver or queued or any incoming IO.

The first 5 patches try to provide common behavior:
0001-scsi-add-transport-host-byte-errors-v2.patch
0002-iscsi-class-libiscsi-and-qla4xxx-convert-to-new-tr.patch
0003-fc-class-Add-support-for-new-transport-errors.patch
0004-qla2xxx-use-new-host-byte-transport-errors.patch
0005-lpfc-start-to-use-new-trasnport-errors.patch

Basically, when we block a device we fail IO with DID_TRANSPORT_DISRUPTED.
When the fast io transport timer fires we fail IO with DID_TRANSPORT_FAILFAST.

I converted qla2xxx and tried to convert lpfc (I was not sure about
some of the errors). zfcp and mpt need to be converted, but it looked
like they would be ok with the patches below. I could only test qla2xxx
and lpfc though.


The second problem is that multipath is not really good at handling a lot
of errors. It just retries all errors on a different path, so for transport
errors it makes a lot of sense to send them up to us pretty quickly. But
device errors or driver errors or weird ones inbetween the scsi layer is
better at handling them because the multipath layer does not know anything
about scsi details.

The patches:
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
0007-scsi-Support-fail-fast-bits.patch

are really simple and just break up the FAILFAST bits into device, driver
and transport bits, so the upper layer can ask the lower layers to only
fail fast certain types of errors. For multipath we only set the transport
fail fast bit, and I thought in the future maybe something like RAID
would set the device failfast error and not want transport errors failed
fast to it.




             reply	other threads:[~2008-06-05  1:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-05  1:41 michaelc [this message]
2008-06-05  1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc
2008-06-05  1:41   ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc
2008-06-05  1:41     ` [PATCH 3/7] fc class: Add support for new transport errors michaelc
2008-06-05  1:41       ` [PATCH 4/7] qla2xxx: use new host byte " michaelc
2008-06-05  1:41         ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc
2008-06-05  1:41           ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc
2008-06-05  1:41             ` [PATCH 7/7] scsi: Support fail fast bits michaelc
2008-08-19 15:35       ` [PATCH 3/7] fc class: Add support for new transport errors James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1212630106-13413-1-git-send-email-michaelc@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=dm-devel@redhat.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).