From: Chandra Seetharaman <sekharan@us.ibm.com>
To: device-mapper development <dm-devel@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [dm-devel] i/o error due to all path failure with rdac
Date: Thu, 30 Oct 2008 11:56:35 -0700 [thread overview]
Message-ID: <1225392995.14830.985.camel@chandra-ubuntu> (raw)
In-Reply-To: <E463DF2B2E584B4A82673F53D62C2EF45BD3B893@cosmail01.lsi.com>
[-- Attachment #1: Type: text/plain, Size: 6510 bytes --]
Hi Babu,
As Mike asked, can you provide the kernel version and the multipath
version.
Can you also provide the var/log/messages file from the start (before
you fail the first path) to the finish of your test.
Also, what kind of I/Os are you running.
BTW, if it is mainline code, can you apply the attached patch and see if
you see any better behavior.
regards,
chandra
On Fri, 2008-10-24 at 17:11 -0600, Moger, Babu wrote:
> Hi,
>
> I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
>
> Here is output multipath -ll. I have only one lun.
>
> [root@localhost ~]# multipath -ll
> mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=2][enabled]
> \_ 3:0:0:0 sde 8:64 [active][undef]
> \_ round-robin 0 [prio=1][enabled]
> \_ 3:0:1:0 sdf 8:80 [active][undef]
>
>
> Here is the detailed log.
>
> Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:06 localhost multipathd: ACTION=change
> Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:06 localhost multipathd: MAJOR=253
> Oct 24 16:51:06 localhost multipathd: MINOR=2
> Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> Oct 24 16:51:08 localhost multipathd: mpathie: discover
> Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:08 localhost multipathd: ACTION=change
> Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:08 localhost multipathd: MAJOR=253
> Oct 24 16:51:08 localhost multipathd: MINOR=2
> Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> Oct 24 16:51:36 localhost multipathd: MAJOR=21
> Oct 24 16:51:36 localhost multipathd: MINOR=5
> Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
>
> Thanks
> Babu Moger
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
[-- Attachment #2: retry_mode_select --]
[-- Type: text/plain, Size: 1317 bytes --]
Retry mode select.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Index: linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
===================================================================
--- linux-2.6.27.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -24,6 +24,7 @@
#include <scsi/scsi_dh.h>
#define RDAC_NAME "rdac"
+#define RDAC_RETRY_COUNT 5
/*
* LSI mode page stuff
@@ -476,21 +477,27 @@ static int send_mode_select(struct scsi_
{
struct request *rq;
struct request_queue *q = sdev->request_queue;
- int err = SCSI_DH_RES_TEMP_UNAVAIL;
+ int err, retry_cnt = RDAC_RETRY_COUNT;
+retry:
+ err = SCSI_DH_RES_TEMP_UNAVAIL;
rq = rdac_failover_get(sdev, h);
if (!rq)
goto done;
- sdev_printk(KERN_INFO, sdev, "queueing MODE_SELECT command.\n");
+ sdev_printk(KERN_INFO, sdev, "%s MODE_SELECT command.\n",
+ (retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying");
err = blk_execute_rq(q, NULL, rq, 1);
- if (err != SCSI_DH_OK)
+ blk_put_request(rq);
+ if (err != SCSI_DH_OK) {
err = mode_select_handle_sense(sdev, h->sense);
+ if (err == SCSI_DH_RETRY && retry_cnt--)
+ goto retry;
+ }
if (err == SCSI_DH_OK)
h->state = RDAC_STATE_ACTIVE;
- blk_put_request(rq);
done:
return err;
}
next prev parent reply other threads:[~2008-10-30 18:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-24 19:22 Buffer I/O error Kit Westneat
2008-10-24 23:11 ` i/o error due to all path failure with rdac Moger, Babu
2008-10-30 17:34 ` [dm-devel] " Mike Anderson
2008-10-30 19:17 ` Moger, Babu
2008-10-30 20:03 ` Chandra Seetharaman
2008-10-30 20:30 ` Moger, Babu
2008-10-30 22:23 ` [dm-devel] " Chandra Seetharaman
2008-10-30 23:21 ` Moger, Babu
2008-10-30 23:35 ` Chandra Seetharaman
2008-10-31 16:05 ` Moger, Babu
2008-10-31 20:21 ` [dm-devel] " Chandra Seetharaman
2008-10-30 18:56 ` Chandra Seetharaman [this message]
2008-10-30 17:29 ` Buffer I/O error Mike Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1225392995.14830.985.camel@chandra-ubuntu \
--to=sekharan@us.ibm.com \
--cc=dm-devel@redhat.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.