From: Michael Reed <mdr@sgi.com>
To: James.Smart@Emulex.Com
Cc: linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@SteelEye.com>,
Christoph Hellwig <hch@lst.de>, Jeremy Higdon <jeremy@sgi.com>
Subject: Re: [PATCH] Make scsi error recovery play nice with devices blocked by transport
Date: Fri, 06 Jan 2006 15:33:50 -0600 [thread overview]
Message-ID: <43BEE23E.8040707@sgi.com> (raw)
In-Reply-To: <43B2B77B.20602@emulex.com>
James Smart wrote:
> This patch looks ok. It's certainly an issue that we (Emulex) have been
> fighting, with the answer being : always ensure that the device i/o timeout
> is 2x (+ some fudge?) the devloss_tmo. As we recommend setting the i/o
> timeout to 60 seconds for enterprise arrays, and as devloss_tmo defaults
> to 30, this generally worked.
>
> One other point though that should be addressed. This doesn't affect the
> successive "bigger hammer" recovery steps in scsi_eh_ready_devs(). We
> should have the function recognize that the scsi_eh_stu(),
> scsi_eh_bus_device_reset(), scsi_eh_bus_reset() functions can fail due to
> a cable pull (e.g. devices being blocked), which is ok, and does not
> require the next level of reset. Causing a host reset because of a
> temporary cable pull, or a bus reset (all targets on a bus to be reset)
> because an unrelated one temporarily went away - are not good things.
Well, a device can go away for reasons other than a cable pull.
Sometimes (in my experience with SGI's driver for qlogic fc cards)
harder resets can bring it back whereas waiting won't. Think firmware
problems....
I think letting the harder resets happen is a good thing (or at least
not a bad thing) as long as recovery waits for the driver to report that
the drive is gone (offline).
Mike
>
> -- james s
>
>
> Michael Reed wrote:
>>As no one has commented, I'm reposting this rfc as a patch. I've
>>been using it for all my testing during development of the LSI MPT Fusion
>>fc transport attribute patch and it has shown no ill effect.
>>
>>--
>>
>>Error recovery doesn't interact very well with fc targets which
>>have been blocked by the fc transport. Error recovery continues
>>to attempt to recover the target and ends up marking the fc target
>>offline. Once offline, if the target returns before the remote port is
>>removed, commands which could have been successfully reissued instead
>>are completed with an error status due to the offline status
>>of the target.
>>
>>This patch makes a couple of hopefully minor tweaks to the error
>>recovery logic to work better with targets which have been blocked
>>by the transport.
>>
>>First, if the target is blocked and error recovery gives up, don't
>>put the device offline. Either the transport will delete the target
>>thus disposing of any queued requests or it will unblock the target and
>>requests will be reissued.
>>
>>Second, if a device is blocked, queue up commands being flushed from
>>the done queue for retry instead of completing them with an error.
>>
>>Thanks,
>> Mike Reed
>>
>>
>>
>>------------------------------------------------------------------------
>>
>>diff -ru linux-2.6.15-rc5-git6-patched/drivers/scsi/scsi_error.c linux-2.6.15-rc5-git6-patched-mdr/drivers/scsi/scsi_error.c
>>--- linux-2.6.15-rc5-git6-patched/drivers/scsi/scsi_error.c 2005-12-16 16:48:19.000000000 -0600
>>+++ linux-2.6.15-rc5-git6-patched-mdr/drivers/scsi/scsi_error.c 2005-12-16 17:42:07.000000000 -0600
>>@@ -1130,10 +1130,14 @@
>> struct scsi_cmnd *scmd, *next;
>>
>> list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>>- sdev_printk(KERN_INFO, scmd->device,
>>- "scsi: Device offlined - not"
>>- " ready after error recovery\n");
>>- scsi_device_set_state(scmd->device, SDEV_OFFLINE);
>>+ /* if blocked, transport will provide final device disposition */
>>+ if (!scsi_device_blocked(scmd->device)) {
>>+ sdev_printk(KERN_INFO, scmd->device,
>>+ "scsi: Device offlined - not"
>>+ " ready after error recovery\n");
>>+ scsi_device_set_state(scmd->device, SDEV_OFFLINE);
>>+ }
>>+
>> if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
>> /*
>> * FIXME: Handle lost cmds.
>>@@ -1460,9 +1464,10 @@
>>
>> list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
>> list_del_init(&scmd->eh_entry);
>>- if (scsi_device_online(scmd->device) &&
>>+ if (scsi_device_blocked(scmd->device) ||
>>+ (scsi_device_online(scmd->device) &&
>> !blk_noretry_request(scmd->request) &&
>>- (++scmd->retries < scmd->allowed)) {
>>+ (++scmd->retries < scmd->allowed))) {
>> SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush"
>> " retry cmd: %p\n",
>> current->comm,
>>diff -ru linux-2.6.15-rc5-git6-patched/include/scsi/scsi_device.h linux-2.6.15-rc5-git6-patched-mdr/include/scsi/scsi_device.h
>>--- linux-2.6.15-rc5-git6-patched/include/scsi/scsi_device.h 2005-12-05 12:39:40.000000000 -0600
>>+++ linux-2.6.15-rc5-git6-patched-mdr/include/scsi/scsi_device.h 2005-12-16 17:42:07.000000000 -0600
>>@@ -275,6 +275,11 @@
>> int data_direction, void *buffer, unsigned bufflen,
>> struct scsi_sense_hdr *, int timeout, int retries);
>>
>>+static inline int scsi_device_blocked(struct scsi_device *sdev)
>>+{
>>+ return sdev->sdev_state == SDEV_BLOCK;
>>+}
>>+
>> static inline unsigned int sdev_channel(struct scsi_device *sdev)
>> {
>> return sdev->channel;
>
next prev parent reply other threads:[~2006-01-06 21:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-16 23:58 [PATCH] Make scsi error recovery play nice with devices blocked by transport Michael Reed
2005-12-28 16:04 ` James Smart
2006-01-06 21:33 ` Michael Reed [this message]
2006-01-09 15:01 ` James Smart
2006-01-09 15:39 ` James Bottomley
2006-01-13 19:29 ` Michael Reed
2006-01-13 19:38 ` Christoph Hellwig
2006-01-13 19:50 ` Michael Reed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43BEE23E.8040707@sgi.com \
--to=mdr@sgi.com \
--cc=James.Bottomley@SteelEye.com \
--cc=James.Smart@Emulex.Com \
--cc=hch@lst.de \
--cc=jeremy@sgi.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.