From: Chandra Seetharaman <sekharan@us.ibm.com>
To: device-mapper development <dm-devel@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [dm-devel] i/o error due to all path failure with rdac
Date: Thu, 30 Oct 2008 11:56:35 -0700 [thread overview]
Message-ID: <1225392995.14830.985.camel@chandra-ubuntu> (raw)
In-Reply-To: <E463DF2B2E584B4A82673F53D62C2EF45BD3B893@cosmail01.lsi.com>
[-- Attachment #1: Type: text/plain, Size: 6510 bytes --]
Hi Babu,
As Mike asked, can you provide the kernel version and the multipath
version.
Can you also provide the var/log/messages file from the start (before
you fail the first path) to the finish of your test.
Also, what kind of I/Os are you running.
BTW, if it is mainline code, can you apply the attached patch and see if
you see any better behavior.
regards,
chandra
On Fri, 2008-10-24 at 17:11 -0600, Moger, Babu wrote:
> Hi,
>
> I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
>
> Here is output multipath -ll. I have only one lun.
>
> [root@localhost ~]# multipath -ll
> mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=2][enabled]
> \_ 3:0:0:0 sde 8:64 [active][undef]
> \_ round-robin 0 [prio=1][enabled]
> \_ 3:0:1:0 sdf 8:80 [active][undef]
>
>
> Here is the detailed log.
>
> Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:06 localhost multipathd: ACTION=change
> Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:06 localhost multipathd: MAJOR=253
> Oct 24 16:51:06 localhost multipathd: MINOR=2
> Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> Oct 24 16:51:08 localhost multipathd: mpathie: discover
> Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:08 localhost multipathd: ACTION=change
> Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:08 localhost multipathd: MAJOR=253
> Oct 24 16:51:08 localhost multipathd: MINOR=2
> Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> Oct 24 16:51:36 localhost multipathd: MAJOR=21
> Oct 24 16:51:36 localhost multipathd: MINOR=5
> Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
>
> Thanks
> Babu Moger
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
[-- Attachment #2: retry_mode_select --]
[-- Type: text/plain, Size: 1317 bytes --]
Retry mode select.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Index: linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
===================================================================
--- linux-2.6.27.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -24,6 +24,7 @@
#include <scsi/scsi_dh.h>
#define RDAC_NAME "rdac"
+#define RDAC_RETRY_COUNT 5
/*
* LSI mode page stuff
@@ -476,21 +477,27 @@ static int send_mode_select(struct scsi_
{
struct request *rq;
struct request_queue *q = sdev->request_queue;
- int err = SCSI_DH_RES_TEMP_UNAVAIL;
+ int err, retry_cnt = RDAC_RETRY_COUNT;
+retry:
+ err = SCSI_DH_RES_TEMP_UNAVAIL;
rq = rdac_failover_get(sdev, h);
if (!rq)
goto done;
- sdev_printk(KERN_INFO, sdev, "queueing MODE_SELECT command.\n");
+ sdev_printk(KERN_INFO, sdev, "%s MODE_SELECT command.\n",
+ (retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying");
err = blk_execute_rq(q, NULL, rq, 1);
- if (err != SCSI_DH_OK)
+ blk_put_request(rq);
+ if (err != SCSI_DH_OK) {
err = mode_select_handle_sense(sdev, h->sense);
+ if (err == SCSI_DH_RETRY && retry_cnt--)
+ goto retry;
+ }
if (err == SCSI_DH_OK)
h->state = RDAC_STATE_ACTIVE;
- blk_put_request(rq);
done:
return err;
}
prev parent reply other threads:[~2008-10-30 18:57 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <49022078.6010700@datadirectnet.com>
2008-10-24 23:11 ` i/o error due to all path failure with rdac Moger, Babu
2008-10-30 17:34 ` [dm-devel] " Mike Anderson
2008-10-30 19:17 ` Moger, Babu
2008-10-30 20:03 ` Chandra Seetharaman
2008-10-30 20:30 ` Moger, Babu
2008-10-30 22:23 ` [dm-devel] " Chandra Seetharaman
2008-10-30 23:21 ` Moger, Babu
2008-10-30 23:35 ` Chandra Seetharaman
2008-10-31 16:05 ` Moger, Babu
2008-10-31 20:21 ` [dm-devel] " Chandra Seetharaman
2008-10-30 18:56 ` Chandra Seetharaman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1225392995.14830.985.camel@chandra-ubuntu \
--to=sekharan@us.ibm.com \
--cc=dm-devel@redhat.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox