From: James Smart <James.Smart@Emulex.Com>
To: Michael Reed <mdr@sgi.com>
Cc: linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@SteelEye.com>,
Christoph Hellwig <hch@lst.de>, Jeremy Higdon <jeremy@sgi.com>
Subject: Re: [PATCH] Make scsi error recovery play nice with devices blocked by transport
Date: Wed, 28 Dec 2005 11:04:11 -0500 [thread overview]
Message-ID: <43B2B77B.20602@emulex.com> (raw)
In-Reply-To: <43A35498.6030903@sgi.com>
This patch looks ok. It's certainly an issue that we (Emulex) have been
fighting, with the answer being : always ensure that the device i/o timeout
is 2x (+ some fudge?) the devloss_tmo. As we recommend setting the i/o
timeout to 60 seconds for enterprise arrays, and as devloss_tmo defaults
to 30, this generally worked.
One other point though that should be addressed. This doesn't affect the
successive "bigger hammer" recovery steps in scsi_eh_ready_devs(). We
should have the function recognize that the scsi_eh_stu(),
scsi_eh_bus_device_reset(), scsi_eh_bus_reset() functions can fail due to
a cable pull (e.g. devices being blocked), which is ok, and does not
require the next level of reset. Causing a host reset because of a
temporary cable pull, or a bus reset (all targets on a bus to be reset)
because an unrelated one temporarily went away - are not good things.
-- james s
Michael Reed wrote:
> As no one has commented, I'm reposting this rfc as a patch. I've
> been using it for all my testing during development of the LSI MPT Fusion
> fc transport attribute patch and it has shown no ill effect.
>
> --
>
> Error recovery doesn't interact very well with fc targets which
> have been blocked by the fc transport. Error recovery continues
> to attempt to recover the target and ends up marking the fc target
> offline. Once offline, if the target returns before the remote port is
> removed, commands which could have been successfully reissued instead
> are completed with an error status due to the offline status
> of the target.
>
> This patch makes a couple of hopefully minor tweaks to the error
> recovery logic to work better with targets which have been blocked
> by the transport.
>
> First, if the target is blocked and error recovery gives up, don't
> put the device offline. Either the transport will delete the target
> thus disposing of any queued requests or it will unblock the target and
> requests will be reissued.
>
> Second, if a device is blocked, queue up commands being flushed from
> the done queue for retry instead of completing them with an error.
>
> Thanks,
> Mike Reed
>
>
>
> ------------------------------------------------------------------------
>
> diff -ru linux-2.6.15-rc5-git6-patched/drivers/scsi/scsi_error.c linux-2.6.15-rc5-git6-patched-mdr/drivers/scsi/scsi_error.c
> --- linux-2.6.15-rc5-git6-patched/drivers/scsi/scsi_error.c 2005-12-16 16:48:19.000000000 -0600
> +++ linux-2.6.15-rc5-git6-patched-mdr/drivers/scsi/scsi_error.c 2005-12-16 17:42:07.000000000 -0600
> @@ -1130,10 +1130,14 @@
> struct scsi_cmnd *scmd, *next;
>
> list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> - sdev_printk(KERN_INFO, scmd->device,
> - "scsi: Device offlined - not"
> - " ready after error recovery\n");
> - scsi_device_set_state(scmd->device, SDEV_OFFLINE);
> + /* if blocked, transport will provide final device disposition */
> + if (!scsi_device_blocked(scmd->device)) {
> + sdev_printk(KERN_INFO, scmd->device,
> + "scsi: Device offlined - not"
> + " ready after error recovery\n");
> + scsi_device_set_state(scmd->device, SDEV_OFFLINE);
> + }
> +
> if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
> /*
> * FIXME: Handle lost cmds.
> @@ -1460,9 +1464,10 @@
>
> list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
> list_del_init(&scmd->eh_entry);
> - if (scsi_device_online(scmd->device) &&
> + if (scsi_device_blocked(scmd->device) ||
> + (scsi_device_online(scmd->device) &&
> !blk_noretry_request(scmd->request) &&
> - (++scmd->retries < scmd->allowed)) {
> + (++scmd->retries < scmd->allowed))) {
> SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush"
> " retry cmd: %p\n",
> current->comm,
> diff -ru linux-2.6.15-rc5-git6-patched/include/scsi/scsi_device.h linux-2.6.15-rc5-git6-patched-mdr/include/scsi/scsi_device.h
> --- linux-2.6.15-rc5-git6-patched/include/scsi/scsi_device.h 2005-12-05 12:39:40.000000000 -0600
> +++ linux-2.6.15-rc5-git6-patched-mdr/include/scsi/scsi_device.h 2005-12-16 17:42:07.000000000 -0600
> @@ -275,6 +275,11 @@
> int data_direction, void *buffer, unsigned bufflen,
> struct scsi_sense_hdr *, int timeout, int retries);
>
> +static inline int scsi_device_blocked(struct scsi_device *sdev)
> +{
> + return sdev->sdev_state == SDEV_BLOCK;
> +}
> +
> static inline unsigned int sdev_channel(struct scsi_device *sdev)
> {
> return sdev->channel;
next prev parent reply other threads:[~2005-12-28 16:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-16 23:58 [PATCH] Make scsi error recovery play nice with devices blocked by transport Michael Reed
2005-12-28 16:04 ` James Smart [this message]
2006-01-06 21:33 ` Michael Reed
2006-01-09 15:01 ` James Smart
2006-01-09 15:39 ` James Bottomley
2006-01-13 19:29 ` Michael Reed
2006-01-13 19:38 ` Christoph Hellwig
2006-01-13 19:50 ` Michael Reed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43B2B77B.20602@emulex.com \
--to=james.smart@emulex.com \
--cc=James.Bottomley@SteelEye.com \
--cc=hch@lst.de \
--cc=jeremy@sgi.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mdr@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).