From: Mike Christie <michaelc@cs.wisc.edu>
To: device-mapper development <dm-devel@redhat.com>
Cc: Levy_Jerome@emc.com, linux-scsi@vger.kernel.org
Subject: Re: do Symmetrix multipath-tools defaults need update ? or scsi-to-blk errors management ?
Date: Wed, 10 Jun 2009 18:34:52 -0500 [thread overview]
Message-ID: <4A30431C.3030809@cs.wisc.edu> (raw)
In-Reply-To: <1766094670.1725191244670599365.JavaMail.root@zimbra16-e3.priv.proxad.net>
[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]
On 06/10/2009 04:49 PM, christophe.varoqui@free.fr wrote:
> Hi Jerome,
>
> EMC recently asked my/one-of-your client to active "queue_if_no_path" on Symmetrix logical units, which is not the current default setting in the upstream multipath-tools package.
>
> I'd like to know if you intent on submitting a patch to change the default setting accordingly, or if you'd rather let the no-queueing default unchanged and work on fixing the root cause of this issue.
>
> ::: Background information, root cause :::
>
> The Symmetrix array proved to return scsi errors io to submitters in certains circumstances (I was told of errors on R1+R2 network link). The linux kernel lacking finesse in the SCSI->DM error reporting ends-up invalidating in turn each path of the multipath before the multipathd daemon gets a chance to revalidate. "queue_if_no_path" being disabled, the io errors ends up in the FS layer and in the userspace submitter.
>
> ::: error log on a 2.6.9 (rhel 4.7) kernel :::
>
For RH 4.9 I did the attached patch. So this error is not fastfailed
(upstream does not fastfail this type of error when using dm-multipath
now). So now the scsi layer will retry its normal 5 times, then fail.
> SCSI error :<h b t l> return code 0x8000002
> current sday: sense key Aborted Command
> Additional sense: Internal target failure
> end_request: I/O error, dev sday, sector XXXXX
> device-mapper: dm-multipath: Failing path 67:32.
>
> ::: unfortunate side effect of queue_if_no_path :::
>
> Activating "queue_if_no_path" is certainly an effecient work-around for this kind of short-lived retriable errors, but this feature compromises data-protection on clusters relying on persistent reservation to fence ios from passive nodes. Ironically, the reason is quite similar : SCSI return codes for reservation conflicts also end up invalidating each path of a multipath, and worse, the io causing the conflict gets queued ! and retried ! until the poor active drops its reservation, unleashing data-corrupting ios from passive node queues on the logical unit.
>
> ::: error log on a 2.6.29.x kernel for a reservation conflict :::
>
> sd h:b:t:l: reservation conflict
> sd h:b:t:l: [sdu] Unhandled error code
> sd h:b:t:l: [sdu] Result: hostbyte=DID_OK driver_byte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdu, sector XXXXX
> device-mapper: dm-multipath: Failing path 65:64.
>
> ::: persistent reservation + queue_if_no_path, possible solution ? :::
>
> Seems to me scsi_lib.c::scsi_io_completion() should be able to cancel a reservation conflicting io and signal blk_end_request() with no error reported.
>
I was just about to post new blkerr patches. For this we just wan
multipath to fail this IO right away right? So have scsi return some
fatal error then dm-multipath will see it and not retry that IO?
[-- Attachment #2: dont-failfast-dev-errs.patch --]
[-- Type: text/plain, Size: 528 bytes --]
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7309f12..d5a3390 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1390,7 +1390,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
case CHECK_CONDITION:
rtn = scsi_check_sense(scmd);
if (rtn == NEEDS_RETRY)
- goto maybe_retry;
+ goto check_retry_count;
/* if rtn == FAILED, we have no sense information;
* returning FAILED will wake the error handler thread
* to collect the sense and redo the decide
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
next prev parent reply other threads:[~2009-06-10 23:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <2113818871.1724831244670374246.JavaMail.root@zimbra16-e3.priv.proxad.net>
2009-06-10 21:49 ` do Symmetrix multipath-tools defaults need update ? or scsi-to-blk errors management ? christophe.varoqui
2009-06-10 23:34 ` Mike Christie [this message]
2009-06-11 4:58 ` [dm-devel] " christophe.varoqui
2009-06-11 5:11 ` Mike Christie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A30431C.3030809@cs.wisc.edu \
--to=michaelc@cs.wisc.edu \
--cc=Levy_Jerome@emc.com \
--cc=dm-devel@redhat.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.