From: Christophe Varoqui <christophe.varoqui@free.fr>
To: device-mapper development <dm-devel@redhat.com>
Subject: Re: Problems with multipathing
Date: Tue, 18 Apr 2006 06:47:58 +0200 [thread overview]
Message-ID: <44446F7E.2060502@free.fr> (raw)
In-Reply-To: <4443A390.5010005@ludd.luth.se>
Roger Håkansson a écrit :
> Christophe Varoqui wrote:
>
>> Do failover device nodes get reassigned during the rescan ?
>> Like, for example, a configured path sda gets removed and a new path sdb
>> appears ?
>>
>
> No, since I don't do a rescan on the bus but just on the target itself.
> When I had the controller in non-hubbed mode and did (when a controller
> has failed) "echo 1> /sys/class/scsi_host/host[1-2]/scan" I got two new
> devices, sde and sdf (I normally havesda,sdb,sdc and sdd)
> But if I instead did "echo 1 >
> /sys/class/scsi_device/[1-2]:0:0:0/device/rescan", I didn't get any new
> devices but the old ones start working again.
> Now, when I have the box in hubbed-mode, I can't seem to get new devices
> even when I do a scsi-host-scan, but just as before, a
> scsi-target-rescan will get my devices back to order again.
>
>
ok.
have you tried sending a START_STOP scsi command (wit sg_start from
sg3_utils) to the affect'ed LUN instead of target-rescaning ?
> Also, I've noticed that it's not only when a controller fails that this
> happens, when a failed controller is "revived" the same thing might happen.
>
> As far as I've been able to tell, the more I/O-transactions at the time
> of the failure, the more likely that the (SCSI) device will be marked as
> "dead".
> If I do "while /bin/true ;do dd if=/dev/zero of=/mnt/test count=20000;
> sleep 1" and fail (or revive) a controller it seems to work in 50% of
> the cases, with 2 sec sleep there is rarely any problem but with no
> sleep at all it fails nearly 100% of the times
> And in all types of tests, if I do a SCSI-target(path) rescan before
> multipath decides both paths are dead, both paths will work again and
> the multipath-device will never fail.
>
>
I see your features still don't include "queue_if_no_path". You seem to
really need it.
>> If so, the FC transport class is in charge of the timeout triggering the
>> dead devices removal.
>> A hardware handler wouldn't help here.
>>
>> Can you paste a before/after scsi rescan "multipath -l" output ?
>>
>>
>
> They are identical
>
> [root@asl005 ~]# multipath -l
> mpath1 (3600d0230000000000b01910b4d313400)
> [size=97 GB][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][active]
> \_ 1:0:0:0 sdb 8:16 [active][undef]
> \_ 2:0:0:0 sdc 8:32 [active][undef]
> [root@asl005 ~]# dmesg |tail -20
> SCSI error : <1 0 0 0> return code = 0x20008
> end_request: I/O error, dev sdb, sector 21247352
> end_request: I/O error, dev sdb, sector 21247360
> SCSI error : <1 0 0 0> return code = 0x20008
> end_request: I/O error, dev sdb, sector 21036576
> end_request: I/O error, dev sdb, sector 21036584
> Aborting journal on device dm-5.
> ext3_abort called.
> EXT3-fs error (device dm-5): ext3_journal_start_sb: Detected aborted journal
> Remounting filesystem read-only
> EXT3-fs error (device dm-5) in start_transaction: Journal has aborted
> __journal_remove_journal_head: freeing b_committed_data
> printk: 254766 messages suppressed.
> Buffer I/O error on device dm-5, logical block 2092209
> lost page write due to I/O error on dm-5
> Buffer I/O error on device dm-5, logical block 2093234
> lost page write due to I/O error on dm-5
> printk: 485 messages suppressed.
> Buffer I/O error on device dm-5, logical block 1
> lost page write due to I/O error on dm-5
> [root@asl005 ~]# multipath -l
> mpath1 (3600d0230000000000b01910b4d313400)
> [size=97 GB][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][enabled]
> \_ 1:0:0:0 sdb 8:16 [failed][undef]
> \_ 2:0:0:0 sdc 8:32 [failed][undef]
> [root@asl005 ~]# echo 1 >
> /sys/class/fc_transport/target1\:0\:0/device/1\:0\:0\:0/rescan
> [root@asl005 ~]# multipath -l
> mpath1 (3600d0230000000000b01910b4d313400)
> [size=97 GB][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][enabled]
> \_ 1:0:0:0 sdb 8:16 [failed][undef]
> \_ 2:0:0:0 sdc 8:32 [failed][undef]
> [root@asl005 ~]# multipath
> [root@asl005 ~]# multipath -ll
> mpath1 (3600d0230000000000b01910b4d313400)
> [size=97 GB][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][active]
> \_ 1:0:0:0 sdb 8:16 [active][undef]
> \_ 2:0:0:0 sdc 8:32 [active][undef]
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
next prev parent reply other threads:[~2006-04-18 4:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-11 18:10 Problems with multipathing Roger Håkansson
2006-04-11 18:42 ` Christophe Varoqui
2006-04-11 21:04 ` Roger Håkansson
2006-04-13 17:14 ` Roger Håkansson
2006-04-13 20:48 ` Christophe Varoqui
2006-04-16 22:44 ` Roger Håkansson
2006-04-17 0:35 ` Roger Håkansson
2006-04-17 9:43 ` Christophe Varoqui
2006-04-17 14:17 ` Roger Håkansson
2006-04-18 4:47 ` Christophe Varoqui [this message]
2006-04-18 5:04 ` Roger Håkansson
2006-04-18 19:38 ` James Smart
2006-04-18 19:24 ` James Smart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44446F7E.2060502@free.fr \
--to=christophe.varoqui@free.fr \
--cc=dm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.