linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* block and scsi fail fast fixes
@ 2008-06-05  1:41 michaelc
  2008-06-05  1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc
  0 siblings, 1 reply; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi

The following patches fix two problems I have been seeing in Red Hat
bugzillas. The patches are made over scsi-misc, but except for
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
they could also apply over scsi-rc-fixes or linus's tree.
0006-block-and-drivers-separate-failfast-into-multiple-b.patch has a patch
to convert the scsi dh modules so that is why it does not apply to
the other kernels.

The first problem is that when a transport problem is detected and
the classes/drivers block the scsi_devices, there is IO in the driver
and IO in the scsi_device queues. For fibre we have the fast IO fail
tmo infrastructure to allow us to get IO in the driver up to multipath,
but IO in the queues remains until the dev_loss_tmo fires. The
difference between the timers can be minutes, so it looks like hang to
the application. iSCSI has something similar to FC's fast io fail
tmo, but it is called the replacment timeout. With this we will fail
all IO that is in the driver or queued or any incoming IO.

The first 5 patches try to provide common behavior:
0001-scsi-add-transport-host-byte-errors-v2.patch
0002-iscsi-class-libiscsi-and-qla4xxx-convert-to-new-tr.patch
0003-fc-class-Add-support-for-new-transport-errors.patch
0004-qla2xxx-use-new-host-byte-transport-errors.patch
0005-lpfc-start-to-use-new-trasnport-errors.patch

Basically, when we block a device we fail IO with DID_TRANSPORT_DISRUPTED.
When the fast io transport timer fires we fail IO with DID_TRANSPORT_FAILFAST.

I converted qla2xxx and tried to convert lpfc (I was not sure about
some of the errors). zfcp and mpt need to be converted, but it looked
like they would be ok with the patches below. I could only test qla2xxx
and lpfc though.


The second problem is that multipath is not really good at handling a lot
of errors. It just retries all errors on a different path, so for transport
errors it makes a lot of sense to send them up to us pretty quickly. But
device errors or driver errors or weird ones inbetween the scsi layer is
better at handling them because the multipath layer does not know anything
about scsi details.

The patches:
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
0007-scsi-Support-fail-fast-bits.patch

are really simple and just break up the FAILFAST bits into device, driver
and transport bits, so the upper layer can ask the lower layers to only
fail fast certain types of errors. For multipath we only set the transport
fail fast bit, and I thought in the future maybe something like RAID
would set the device failfast error and not want transport errors failed
fast to it.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-08-19 15:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-05  1:41 block and scsi fail fast fixes michaelc
2008-06-05  1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc
2008-06-05  1:41   ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc
2008-06-05  1:41     ` [PATCH 3/7] fc class: Add support for new transport errors michaelc
2008-06-05  1:41       ` [PATCH 4/7] qla2xxx: use new host byte " michaelc
2008-06-05  1:41         ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc
2008-06-05  1:41           ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc
2008-06-05  1:41             ` [PATCH 7/7] scsi: Support fail fast bits michaelc
2008-08-19 15:35       ` [PATCH 3/7] fc class: Add support for new transport errors James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).