* Failover between two paths to one LU doesn't work on linux-iscsi @ 2006-10-10 17:17 Michae Lyulko 2006-10-10 17:24 ` Failover between two paths to one LU doesn't work onlinux-iscsi Eli Stair 0 siblings, 1 reply; 8+ messages in thread From: Michae Lyulko @ 2006-10-10 17:17 UTC (permalink / raw) To: dm-devel Hello, I use linux-iscsi initiator and device-mapper to access a LU that has two paths to it. When one of the paths fails no fail-over occurs (waited for 15 minutes). The device appears in /dev/mapper/ I know that on open-iscsi initiator the fail-over does happen. 1. Are there any known problems that can cause fail-over not to happen in linux-iscsi? 2. Is there any way to get debug information in order to understand more in-depth what's going on? 3. How can I know what the timeouts are and how to change them? 4. If you need more specific information, please write what can help. Michael ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi 2006-10-10 17:17 Failover between two paths to one LU doesn't work on linux-iscsi Michae Lyulko @ 2006-10-10 17:24 ` Eli Stair 2006-10-10 19:55 ` Michae Lyulko 0 siblings, 1 reply; 8+ messages in thread From: Eli Stair @ 2006-10-10 17:24 UTC (permalink / raw) To: device-mapper development What versions of the tools and kernel are you running? What are your settings for multipath.conf? Is multipathd running? If you run multipath -v2 manually, does it fail the path? Etc. > 1. Are there any known problems that can cause fail-over not to happen > in linux-iscsi? > 2. Is there any way to get debug information in order to understand more > in-depth what's going on? man multipath && man multipathd multipathd -d multipath -v4 > 3. How can I know what the timeouts are and how to change them? Timeouts occur at several levels. Your HBA or iSCSI driver will have (at least) one. Check the documentation for the aspects of software/hardware you're using (kernel module options), dm-multipath docs, etc. > 4. If you need more specific information, please write what can help. http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=Home I've never done multipath-iscsi, so don't have direct experience. Try posting some more detailed info on your setup and what errors or info messages are occuring. /eli ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi 2006-10-10 17:24 ` Failover between two paths to one LU doesn't work onlinux-iscsi Eli Stair @ 2006-10-10 19:55 ` Michae Lyulko 2006-10-11 14:16 ` Dave Wysochanski 0 siblings, 1 reply; 8+ messages in thread From: Michae Lyulko @ 2006-10-10 19:55 UTC (permalink / raw) To: device-mapper development Eli Stair wrote: > > What versions of the tools and kernel are you running? What are your > settings for multipath.conf? Is multipathd running? If you run > multipath -v2 manually, does it fail the path? Etc. > >> 1. Are there any known problems that can cause fail-over not to happen >> in linux-iscsi? > >> 2. Is there any way to get debug information in order to understand more >> in-depth what's going on? > > man multipath && man multipathd > multipathd -d > multipath -v4 > >> 3. How can I know what the timeouts are and how to change them? > > Timeouts occur at several levels. Your HBA or iSCSI driver will have > (at least) one. Check the documentation for the aspects of > software/hardware you're using (kernel module options), dm-multipath > docs, etc. I meant Multipath timeouts. iSCSI driver definitely has some. > >> 4. If you need more specific information, please write what can help. > > http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=Home > > I've never done multipath-iscsi, so don't have direct experience. Try > posting some more detailed info on your setup and what errors or info > messages are occuring. > > > /eli > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel Here is some additional info (includes versions, debug info from multipath -v4, etc.) multipathd -d doesn't seem to add any info (strange), multipath -v2 doesn't fail the path. multipathd is running. I can work with the device via /dev/mapper/3600d02300063ccc60000006215c0b501 or via /dev/dm-0 -- mkfs was successful; I ran iozone benchmark and manually failed one of two paths. There was no failover to second path. The iSCSI initiator retried to reconnect to the first path - unsuccessfully. When I restored the first path - the initiator reconnected to the target via the first path. Thanks! ------------- INFO ------------- Versions: device-mapper-1.01.01-1.6 multipath-tools-0.4.5-0.11 Machine: 2.6.5-7.244-smp; x86_64; SLES9 sp3 virgo4:~ # lsmod | grep dm_multipath dm_multipath 38544 2 dm_round_robin dm_mod 77536 2 dm_multipath virgo4:~ # ps -ef | grep multipathd root 17078 1 0 22:04 pts/0 00:00:00 /sbin/multipathd root 17228 10503 0 22:13 pts/0 00:00:00 grep multipathd virgo4:~ # multipath -p failover dm names N dm info 3600d02300063ccc60000006215c0b501 N dm create 3600d02300063ccc60000006215c0b501 3600d02300063ccc60000006215c0b501 O dm reload 3600d02300063ccc60000006215c0b501 O dm resume 3600d02300063ccc60000006215c0b501 N dm message 3600d02300063ccc60000006215c0b501 N switch_group 1 create: 3600d02300063ccc60000006215c0b501 [size=87 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1] \_ 1:0:6:0 sda 8:0 [ready] \_ round-robin 0 [prio=1] \_ 1:1:6:0 sdb 8:16 [ready] virgo4:~ # virgo4:~ # multipath -l dm names N dm table 3600d02300063ccc60000006215c0b501 N dm table 3600d02300063ccc60000006215c0b501 N dm status 3600d02300063ccc60000006215c0b501 N dm info 3600d02300063ccc60000006215c0b501 O 3600d02300063ccc60000006215c0b501 [size=87 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 1:0:6:0 sda 8:0 [active][ready] \_ round-robin 0 [enabled] \_ 1:1:6:0 sdb 8:16 [active][ready] virgo4:~ # virgo4:~ # virgo4:~ # multipath -v2 dm names N dm table 3600d02300063ccc60000006215c0b501 N dm table 3600d02300063ccc60000006215c0b501 N dm status 3600d02300063ccc60000006215c0b501 N dm info 3600d02300063ccc60000006215c0b501 O virgo4:~ # virgo4:~ # virgo4:~ # virgo4:~ # multipath -v4 load path identifiers cache 3600d02300063ccc60000006215c0b501 1:0:6:0 sda 8:0 [ready] IFT /ES A16F-G 3600d02300063ccc60000006215c0b501 1:1:6:0 sdb 8:16 [ready] IFT /ES A16F- dm-0 blacklisted hda blacklisted hdd blacklisted loop0 blacklisted loop1 blacklisted loop2 blacklisted loop3 blacklisted loop4 blacklisted loop5 blacklisted loop6 blacklisted loop7 blacklisted md0 blacklisted ram0 blacklisted ram10 blacklisted ram11 blacklisted ram12 blacklisted ram13 blacklisted ram14 blacklisted ram15 blacklisted ram1 blacklisted ram2 blacklisted ram3 blacklisted ram4 blacklisted ram5 blacklisted ram6 blacklisted ram7 blacklisted ram8 blacklisted ram9 blacklisted ===== path info sda (mask 0x3f) ===== device sda is on bus scsi bus = 1 dev_t = 8:0 size = 184320000 vendor = IFT product = ES A16F-G rev = 331J h:b:t:l = 1:0:6:0 tgt_node_name = open error (Device or resource busy) claimed = 1 serial = 0000006215C0B501 path checker = readsector0 (internal default) state = 2 getprio = (null) (internal default) prio = 1 uid = 3600d02300063ccc60000006215c0b501 (cache) ===== path info sdb (mask 0x3f) ===== device sdb is on bus scsi bus = 1 dev_t = 8:16 size = 184320000 vendor = IFT product = ES A16F-G rev = 331J h:b:t:l = 1:1:6:0 tgt_node_name = open error (Device or resource busy) claimed = 1 serial = 0000006215C0B501 path checker = readsector0 (internal default) state = 2 getprio = (null) (internal default) prio = 1 uid = 3600d02300063ccc60000006215c0b501 (cache) 3600d02300063ccc60000006215c0b501 1:0:6:0 sda 8:0 1 [ready][claimed] IFT 3600d02300063ccc60000006215c0b501 1:1:6:0 sdb 8:16 1 [ready][claimed] IFT dm names N dm table 3600d02300063ccc60000006215c0b501 N dm table 3600d02300063ccc60000006215c0b501 N dm status 3600d02300063ccc60000006215c0b501 N dm info 3600d02300063ccc60000006215c0b501 O get_dm_mpvec for (null) params = 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000 status = 1 0 0 2 1 A 0 1 0 8:0 A 0 E 0 1 0 8:16 A 0 *word = 0, len = 1 *word = 0, len = 1 *word = 2, len = 1 *word = 1, len = 1 *word = round-robin, len = 11 *word = 0, len = 1 *word = 1, len = 1 *word = 1, len = 1 *word = 8:0, len = 3 *word = 1, len = 1 *word = 1, len = 1 *word = 8:16, len = 4 *word = 1, len = 1 *word = 0, len = 1 *word = 2, len = 1 *word = A, len = 1 *word = 1, len = 1 *word = 0, len = 1 *word = A, len = 1 *word = 0, len = 1 *word = E, len = 1 *word = 1, len = 1 *word = 0, len = 1 *word = A, len = 1 *word = 0, len = 1 pgpolicy = failover (internal default) selector = round-robin 0 (internal default) features = 0 (internal default) hwhandler = 0 (internal default) rr_weight = 1 (internal default) 0 184320000 multipath 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000 set ACT_NOTHING: map unchanged virgo4:~ # ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi 2006-10-10 19:55 ` Michae Lyulko @ 2006-10-11 14:16 ` Dave Wysochanski 2006-10-12 18:49 ` Michael Lyulko 0 siblings, 1 reply; 8+ messages in thread From: Dave Wysochanski @ 2006-10-11 14:16 UTC (permalink / raw) To: device-mapper development > I ran iozone benchmark and manually failed one of two paths. There was > no failover to second path. > The iSCSI initiator retried to reconnect to the first path - > unsuccessfully. When I restored the first > path - the initiator reconnected to the target via the first path. > > Thanks! > > ------------- INFO ------------- > Versions: > device-mapper-1.01.01-1.6 > multipath-tools-0.4.5-0.11 > > Machine: > 2.6.5-7.244-smp; x86_64; SLES9 sp3 > Did you set ConnFailTimeout to a non-zero value? This sounds like it's still at default (reconnect indefinately). You also need to set the Multipath variable, but it sounds like you may have done that already. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi 2006-10-11 14:16 ` Dave Wysochanski @ 2006-10-12 18:49 ` Michael Lyulko 2006-10-12 20:28 ` David Wysochanski 0 siblings, 1 reply; 8+ messages in thread From: Michael Lyulko @ 2006-10-12 18:49 UTC (permalink / raw) To: device-mapper development Dave Wysochanski wrote: >> I ran iozone benchmark and manually failed one of two paths. There was >> no failover to second path. >> The iSCSI initiator retried to reconnect to the first path - >> unsuccessfully. When I restored the first >> path - the initiator reconnected to the target via the first path. >> >> Thanks! >> >> ------------- INFO ------------- >> Versions: >> device-mapper-1.01.01-1.6 >> multipath-tools-0.4.5-0.11 >> >> Machine: >> 2.6.5-7.244-smp; x86_64; SLES9 sp3 >> >> > > Did you set ConnFailTimeout to a non-zero value? This sounds like it's > still at default (reconnect indefinately). You also need to set the > Multipath variable, but it sounds like you may have done that already. > > > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > Thanks! Now I set ConnFailTimeout=30 and when one of the paths failed a fail-over to second path occurred (I had Multipath=portal). But after the first path is back (iSCSI session is established and I can read from the device) multipath -l shows: <snip> \_ round-robin 0 [enabled] \_ 0:0:6:0 sda 8:0 [failed][ready] // still "failed" - Michael \_ round-robin 0 [active] \_ 0:1:6:0 sdb 8:16 [active][ready] 1. How can I configure the DM to automatically detect that a path is active again? 2. I don't have a multipath.conf in /etc. Is /usr/share/doc/packages/multipath-tools/ the right place to take it from? 3. What params in multipath.conf are a must in this case of fail-over setup regarding question no.1? Thanks again! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi 2006-10-12 18:49 ` Michael Lyulko @ 2006-10-12 20:28 ` David Wysochanski 2006-10-18 18:55 ` Michael Lyulko 0 siblings, 1 reply; 8+ messages in thread From: David Wysochanski @ 2006-10-12 20:28 UTC (permalink / raw) To: device-mapper development On Thu, 2006-10-12 at 20:49 +0200, Michael Lyulko wrote: > Dave Wysochanski wrote: > >> I ran iozone benchmark and manually failed one of two paths. There was > >> no failover to second path. > >> The iSCSI initiator retried to reconnect to the first path - > >> unsuccessfully. When I restored the first > >> path - the initiator reconnected to the target via the first path. > >> > >> Thanks! > >> > >> ------------- INFO ------------- > >> Versions: > >> device-mapper-1.01.01-1.6 > >> multipath-tools-0.4.5-0.11 > >> > >> Machine: > >> 2.6.5-7.244-smp; x86_64; SLES9 sp3 > >> > >> > > > > Did you set ConnFailTimeout to a non-zero value? This sounds like it's > > still at default (reconnect indefinately). You also need to set the > > Multipath variable, but it sounds like you may have done that already. > > > > > > > > -- > > dm-devel mailing list > > dm-devel@redhat.com > > https://www.redhat.com/mailman/listinfo/dm-devel > > > Thanks! Now I set ConnFailTimeout=30 and when one of the paths failed a > fail-over to > second path occurred (I had Multipath=portal). But after the first path > is back (iSCSI session > is established and I can read from the device) multipath -l shows: > <snip> > \_ round-robin 0 [enabled] > \_ 0:0:6:0 sda 8:0 [failed][ready] // still "failed" - > Michael > \_ round-robin 0 [active] > \_ 0:1:6:0 sdb 8:16 [active][ready] > > 1. How can I configure the DM to automatically detect that a path is > active again? you probably want "failback immediate" there's also been some bugs with failback - not sure about your version - your mileage may vary > 2. I don't have a multipath.conf in /etc. Is > /usr/share/doc/packages/multipath-tools/ > the right place to take it from? Yes, you can start with this. I think there may be a problem with some of the examples - make sure the "blacklist" keyword is correct - at some point it changed from "devnode_blacklist" to "blacklist" (not sure which one is in your version but example may be wrong). > 3. What params in multipath.conf are a must in this case of fail-over > setup regarding question no.1? > You might also want to look at no_path_retry and/or queue_if_no_path options. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi 2006-10-12 20:28 ` David Wysochanski @ 2006-10-18 18:55 ` Michael Lyulko 2006-11-01 6:20 ` Dave Wysochanski 0 siblings, 1 reply; 8+ messages in thread From: Michael Lyulko @ 2006-10-18 18:55 UTC (permalink / raw) To: device-mapper development David Wysochanski wrote: > On Thu, 2006-10-12 at 20:49 +0200, Michael Lyulko wrote: > >> Dave Wysochanski wrote: >> >>>> I ran iozone benchmark and manually failed one of two paths. There was >>>> no failover to second path. >>>> The iSCSI initiator retried to reconnect to the first path - >>>> unsuccessfully. When I restored the first >>>> path - the initiator reconnected to the target via the first path. >>>> >>>> Thanks! >>>> >>>> ------------- INFO ------------- >>>> Versions: >>>> device-mapper-1.01.01-1.6 >>>> multipath-tools-0.4.5-0.11 >>>> >>>> Machine: >>>> 2.6.5-7.244-smp; x86_64; SLES9 sp3 >>>> >>>> >>>> >>> Did you set ConnFailTimeout to a non-zero value? This sounds like it's >>> still at default (reconnect indefinately). You also need to set the >>> Multipath variable, but it sounds like you may have done that already. >>> >>> >>> >>> -- >>> dm-devel mailing list >>> dm-devel@redhat.com >>> https://www.redhat.com/mailman/listinfo/dm-devel >>> >>> >> Thanks! Now I set ConnFailTimeout=30 and when one of the paths failed a >> fail-over to >> second path occurred (I had Multipath=portal). But after the first path >> is back (iSCSI session >> is established and I can read from the device) multipath -l shows: >> <snip> >> \_ round-robin 0 [enabled] >> \_ 0:0:6:0 sda 8:0 [failed][ready] // still "failed" - >> Michael >> \_ round-robin 0 [active] >> \_ 0:1:6:0 sdb 8:16 [active][ready] >> >> 1. How can I configure the DM to automatically detect that a path is >> active again? >> > > you probably want "failback immediate" > there's also been some bugs with failback - not sure about your version > - your mileage may vary > > "failback immediate" didn't help on both SLES9 sp3 and SLES10. The device mapper doesn't rescan automatically the paths, so when the failed path is back to life, it is still "failed" in multipath -l output. Issuing "multipath -p failover" helps to return the path back to "active". I need an automatic "rescan". >> 2. I don't have a multipath.conf in /etc. Is >> /usr/share/doc/packages/multipath-tools/ >> the right place to take it from? >> > > Yes, you can start with this. I think there may be a problem with some > of the examples - make sure the "blacklist" keyword is correct - at some > point it changed from "devnode_blacklist" to "blacklist" (not sure which > one is in your version but example may be wrong). > > >> 3. What params in multipath.conf are a must in this case of fail-over >> setup regarding question no.1? >> >> > > You might also want to look at no_path_retry and/or queue_if_no_path > options. > SLES9 sp3: both parameters are not in default multipath.conf, so I suppose they are not supported yet in this distribution. SLES10: no_path_retry doesn't help. queue_if_no_path is not in the default multipath.conf. Here is my configuration: SLES9 sp3: multipath.conf: defaults { failback immediate } SLES10: multipath.conf: defaults { polling_interval 10 failback immediate no_path_retry fail } Here is "multipath -l" output during the test: SLES9 sp3: before the path failed: # multipath -l dm names N dm table 3600d02300063ccc60000006215c0b501 N dm table 3600d02300063ccc60000006215c0b501 N dm status 3600d02300063ccc60000006215c0b501 N dm info 3600d02300063ccc60000006215c0b501 O 3600d02300063ccc60000006215c0b501 [size=87 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:6:0 sda 8:0 [active][ready] \_ round-robin 0 [enabled] \_ 0:1:6:0 sdb 8:16 [active][ready] after the path is back again: # multipath -l dm names N dm table 3600d02300063ccc60000006215c0b501 N dm table 3600d02300063ccc60000006215c0b501 N dm status 3600d02300063ccc60000006215c0b501 N dm info 3600d02300063ccc60000006215c0b501 O 3600d02300063ccc60000006215c0b501 [size=87 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ 0:0:6:0 sda 8:0 [failed][ready] \_ round-robin 0 [active] \_ 0:1:6:0 sdb 8:16 [active][ready] SLES10: before the path failed: # multipath -l 320000004cffb0995SEAGATE,ST336607FC [size=34G][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 0:0:0:0 sda 8:0 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:0 sdb 8:16 [active][undef] # after the path is back again: # multipath -l 320000004cffb0995SEAGATE,ST336607FC [size=34G][features=0][hwhandler=0] \_ round-robin 0 [prio=0][enabled] \_ 0:0:0:0 sda 8:0 [failed][undef] \_ round-robin 0 [prio=0][active] \_ 1:0:0:0 sdb 8:16 [active][undef] # ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi 2006-10-18 18:55 ` Michael Lyulko @ 2006-11-01 6:20 ` Dave Wysochanski 0 siblings, 0 replies; 8+ messages in thread From: Dave Wysochanski @ 2006-11-01 6:20 UTC (permalink / raw) To: device-mapper development On Wed, 2006-10-18 at 20:55 +0200, Michael Lyulko wrote: > >> > >> 1. How can I configure the DM to automatically detect that a path is > >> active again? > >> > > > > you probably want "failback immediate" > > there's also been some bugs with failback - not sure about your version > > - your mileage may vary > > > > > "failback immediate" didn't help on both SLES9 sp3 and SLES10. The > device mapper doesn't > rescan automatically the paths, so when the failed path is back to life, > it is still "failed" > in multipath -l output. > Issuing "multipath -p failover" helps to return the path back to > "active". I need an automatic "rescan". This is supposed to be multipathd's job - it's supposed to have a path checker thread that repeatedly scans and notifies the kernel to reinstate paths. If the first state remains "[failed]", it indicates either multipathd isn't running or it's not notifying the kernel correctly of the reinstated paths. I seem to recall there was a sequencing problem - at least on SLES9 SP? with boot.multipath, multipathd and iscsi - something like multipathd starting too early, which caused it not to monitor paths properly. Did you try restarting multipathd after everything is up (on SLES9)? ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-11-01 6:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-10 17:17 Failover between two paths to one LU doesn't work on linux-iscsi Michae Lyulko 2006-10-10 17:24 ` Failover between two paths to one LU doesn't work onlinux-iscsi Eli Stair 2006-10-10 19:55 ` Michae Lyulko 2006-10-11 14:16 ` Dave Wysochanski 2006-10-12 18:49 ` Michael Lyulko 2006-10-12 20:28 ` David Wysochanski 2006-10-18 18:55 ` Michael Lyulko 2006-11-01 6:20 ` Dave Wysochanski
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.