* Failover between two paths to one LU doesn't work on linux-iscsi
@ 2006-10-10 17:17 Michae Lyulko
2006-10-10 17:24 ` Failover between two paths to one LU doesn't work onlinux-iscsi Eli Stair
0 siblings, 1 reply; 8+ messages in thread
From: Michae Lyulko @ 2006-10-10 17:17 UTC (permalink / raw)
To: dm-devel
Hello,
I use linux-iscsi initiator and device-mapper to access a LU that has
two paths to it.
When one of the paths fails no fail-over occurs (waited for 15 minutes).
The device appears in /dev/mapper/
I know that on open-iscsi initiator the fail-over does happen.
1. Are there any known problems that can cause fail-over not to happen
in linux-iscsi?
2. Is there any way to get debug information in order to understand more
in-depth what's going on?
3. How can I know what the timeouts are and how to change them?
4. If you need more specific information, please write what can help.
Michael
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi
2006-10-10 17:17 Failover between two paths to one LU doesn't work on linux-iscsi Michae Lyulko
@ 2006-10-10 17:24 ` Eli Stair
2006-10-10 19:55 ` Michae Lyulko
0 siblings, 1 reply; 8+ messages in thread
From: Eli Stair @ 2006-10-10 17:24 UTC (permalink / raw)
To: device-mapper development
What versions of the tools and kernel are you running? What are your
settings for multipath.conf? Is multipathd running? If you run
multipath -v2 manually, does it fail the path? Etc.
> 1. Are there any known problems that can cause fail-over not to happen
> in linux-iscsi?
> 2. Is there any way to get debug information in order to understand more
> in-depth what's going on?
man multipath && man multipathd
multipathd -d
multipath -v4
> 3. How can I know what the timeouts are and how to change them?
Timeouts occur at several levels. Your HBA or iSCSI driver will have
(at least) one. Check the documentation for the aspects of
software/hardware you're using (kernel module options), dm-multipath
docs, etc.
> 4. If you need more specific information, please write what can help.
http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=Home
I've never done multipath-iscsi, so don't have direct experience. Try
posting some more detailed info on your setup and what errors or info
messages are occuring.
/eli
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi
2006-10-10 17:24 ` Failover between two paths to one LU doesn't work onlinux-iscsi Eli Stair
@ 2006-10-10 19:55 ` Michae Lyulko
2006-10-11 14:16 ` Dave Wysochanski
0 siblings, 1 reply; 8+ messages in thread
From: Michae Lyulko @ 2006-10-10 19:55 UTC (permalink / raw)
To: device-mapper development
Eli Stair wrote:
>
> What versions of the tools and kernel are you running? What are your
> settings for multipath.conf? Is multipathd running? If you run
> multipath -v2 manually, does it fail the path? Etc.
>
>> 1. Are there any known problems that can cause fail-over not to happen
>> in linux-iscsi?
>
>> 2. Is there any way to get debug information in order to understand more
>> in-depth what's going on?
>
> man multipath && man multipathd
> multipathd -d
> multipath -v4
>
>> 3. How can I know what the timeouts are and how to change them?
>
> Timeouts occur at several levels. Your HBA or iSCSI driver will have
> (at least) one. Check the documentation for the aspects of
> software/hardware you're using (kernel module options), dm-multipath
> docs, etc.
I meant Multipath timeouts. iSCSI driver definitely has some.
>
>> 4. If you need more specific information, please write what can help.
>
> http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=Home
>
> I've never done multipath-iscsi, so don't have direct experience. Try
> posting some more detailed info on your setup and what errors or info
> messages are occuring.
>
>
> /eli
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
Here is some additional info (includes versions, debug info from
multipath -v4, etc.)
multipathd -d doesn't seem to add any info (strange),
multipath -v2 doesn't fail the path.
multipathd is running.
I can work with the device via /dev/mapper/3600d02300063ccc60000006215c0b501
or via /dev/dm-0 -- mkfs was successful;
I ran iozone benchmark and manually failed one of two paths. There was
no failover to second path.
The iSCSI initiator retried to reconnect to the first path -
unsuccessfully. When I restored the first
path - the initiator reconnected to the target via the first path.
Thanks!
------------- INFO -------------
Versions:
device-mapper-1.01.01-1.6
multipath-tools-0.4.5-0.11
Machine:
2.6.5-7.244-smp; x86_64; SLES9 sp3
virgo4:~ # lsmod | grep dm_multipath
dm_multipath 38544 2 dm_round_robin
dm_mod 77536 2 dm_multipath
virgo4:~ # ps -ef | grep multipathd
root 17078 1 0 22:04 pts/0 00:00:00 /sbin/multipathd
root 17228 10503 0 22:13 pts/0 00:00:00 grep multipathd
virgo4:~ # multipath -p failover
dm names N
dm info 3600d02300063ccc60000006215c0b501 N
dm create 3600d02300063ccc60000006215c0b501
3600d02300063ccc60000006215c0b501 O
dm reload 3600d02300063ccc60000006215c0b501 O
dm resume 3600d02300063ccc60000006215c0b501 N
dm message 3600d02300063ccc60000006215c0b501 N switch_group 1
create: 3600d02300063ccc60000006215c0b501
[size=87 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [prio=1]
\_ 1:0:6:0 sda 8:0 [ready]
\_ round-robin 0 [prio=1]
\_ 1:1:6:0 sdb 8:16 [ready]
virgo4:~ #
virgo4:~ # multipath -l
dm names N
dm table 3600d02300063ccc60000006215c0b501 N
dm table 3600d02300063ccc60000006215c0b501 N
dm status 3600d02300063ccc60000006215c0b501 N
dm info 3600d02300063ccc60000006215c0b501 O
3600d02300063ccc60000006215c0b501
[size=87 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 1:0:6:0 sda 8:0 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:1:6:0 sdb 8:16 [active][ready]
virgo4:~ #
virgo4:~ #
virgo4:~ # multipath -v2
dm names N
dm table 3600d02300063ccc60000006215c0b501 N
dm table 3600d02300063ccc60000006215c0b501 N
dm status 3600d02300063ccc60000006215c0b501 N
dm info 3600d02300063ccc60000006215c0b501 O
virgo4:~ #
virgo4:~ #
virgo4:~ #
virgo4:~ # multipath -v4
load path identifiers cache
3600d02300063ccc60000006215c0b501 1:0:6:0 sda 8:0 [ready] IFT /ES
A16F-G
3600d02300063ccc60000006215c0b501 1:1:6:0 sdb 8:16 [ready] IFT /ES
A16F-
dm-0 blacklisted
hda blacklisted
hdd blacklisted
loop0 blacklisted
loop1 blacklisted
loop2 blacklisted
loop3 blacklisted
loop4 blacklisted
loop5 blacklisted
loop6 blacklisted
loop7 blacklisted
md0 blacklisted
ram0 blacklisted
ram10 blacklisted
ram11 blacklisted
ram12 blacklisted
ram13 blacklisted
ram14 blacklisted
ram15 blacklisted
ram1 blacklisted
ram2 blacklisted
ram3 blacklisted
ram4 blacklisted
ram5 blacklisted
ram6 blacklisted
ram7 blacklisted
ram8 blacklisted
ram9 blacklisted
===== path info sda (mask 0x3f) =====
device sda is on bus scsi
bus = 1
dev_t = 8:0
size = 184320000
vendor = IFT
product = ES A16F-G
rev = 331J
h:b:t:l = 1:0:6:0
tgt_node_name =
open error (Device or resource busy)
claimed = 1
serial = 0000006215C0B501
path checker = readsector0 (internal default)
state = 2
getprio = (null) (internal default)
prio = 1
uid = 3600d02300063ccc60000006215c0b501 (cache)
===== path info sdb (mask 0x3f) =====
device sdb is on bus scsi
bus = 1
dev_t = 8:16
size = 184320000
vendor = IFT
product = ES A16F-G
rev = 331J
h:b:t:l = 1:1:6:0
tgt_node_name =
open error (Device or resource busy)
claimed = 1
serial = 0000006215C0B501
path checker = readsector0 (internal default)
state = 2
getprio = (null) (internal default)
prio = 1
uid = 3600d02300063ccc60000006215c0b501 (cache)
3600d02300063ccc60000006215c0b501 1:0:6:0 sda 8:0 1 [ready][claimed] IFT
3600d02300063ccc60000006215c0b501 1:1:6:0 sdb 8:16 1 [ready][claimed] IFT
dm names N
dm table 3600d02300063ccc60000006215c0b501 N
dm table 3600d02300063ccc60000006215c0b501 N
dm status 3600d02300063ccc60000006215c0b501 N
dm info 3600d02300063ccc60000006215c0b501 O
get_dm_mpvec for (null)
params = 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000
status = 1 0 0 2 1 A 0 1 0 8:0 A 0 E 0 1 0 8:16 A 0
*word = 0, len = 1
*word = 0, len = 1
*word = 2, len = 1
*word = 1, len = 1
*word = round-robin, len = 11
*word = 0, len = 1
*word = 1, len = 1
*word = 1, len = 1
*word = 8:0, len = 3
*word = 1, len = 1
*word = 1, len = 1
*word = 8:16, len = 4
*word = 1, len = 1
*word = 0, len = 1
*word = 2, len = 1
*word = A, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = A, len = 1
*word = 0, len = 1
*word = E, len = 1
*word = 1, len = 1
*word = 0, len = 1
*word = A, len = 1
*word = 0, len = 1
pgpolicy = failover (internal default)
selector = round-robin 0 (internal default)
features = 0 (internal default)
hwhandler = 0 (internal default)
rr_weight = 1 (internal default)
0 184320000 multipath 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1
1 8:16 1000
set ACT_NOTHING: map unchanged
virgo4:~ #
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi
2006-10-10 19:55 ` Michae Lyulko
@ 2006-10-11 14:16 ` Dave Wysochanski
2006-10-12 18:49 ` Michael Lyulko
0 siblings, 1 reply; 8+ messages in thread
From: Dave Wysochanski @ 2006-10-11 14:16 UTC (permalink / raw)
To: device-mapper development
> I ran iozone benchmark and manually failed one of two paths. There was
> no failover to second path.
> The iSCSI initiator retried to reconnect to the first path -
> unsuccessfully. When I restored the first
> path - the initiator reconnected to the target via the first path.
>
> Thanks!
>
> ------------- INFO -------------
> Versions:
> device-mapper-1.01.01-1.6
> multipath-tools-0.4.5-0.11
>
> Machine:
> 2.6.5-7.244-smp; x86_64; SLES9 sp3
>
Did you set ConnFailTimeout to a non-zero value? This sounds like it's
still at default (reconnect indefinately). You also need to set the
Multipath variable, but it sounds like you may have done that already.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi
2006-10-11 14:16 ` Dave Wysochanski
@ 2006-10-12 18:49 ` Michael Lyulko
2006-10-12 20:28 ` David Wysochanski
0 siblings, 1 reply; 8+ messages in thread
From: Michael Lyulko @ 2006-10-12 18:49 UTC (permalink / raw)
To: device-mapper development
Dave Wysochanski wrote:
>> I ran iozone benchmark and manually failed one of two paths. There was
>> no failover to second path.
>> The iSCSI initiator retried to reconnect to the first path -
>> unsuccessfully. When I restored the first
>> path - the initiator reconnected to the target via the first path.
>>
>> Thanks!
>>
>> ------------- INFO -------------
>> Versions:
>> device-mapper-1.01.01-1.6
>> multipath-tools-0.4.5-0.11
>>
>> Machine:
>> 2.6.5-7.244-smp; x86_64; SLES9 sp3
>>
>>
>
> Did you set ConnFailTimeout to a non-zero value? This sounds like it's
> still at default (reconnect indefinately). You also need to set the
> Multipath variable, but it sounds like you may have done that already.
>
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
Thanks! Now I set ConnFailTimeout=30 and when one of the paths failed a
fail-over to
second path occurred (I had Multipath=portal). But after the first path
is back (iSCSI session
is established and I can read from the device) multipath -l shows:
<snip>
\_ round-robin 0 [enabled]
\_ 0:0:6:0 sda 8:0 [failed][ready] // still "failed" -
Michael
\_ round-robin 0 [active]
\_ 0:1:6:0 sdb 8:16 [active][ready]
1. How can I configure the DM to automatically detect that a path is
active again?
2. I don't have a multipath.conf in /etc. Is
/usr/share/doc/packages/multipath-tools/
the right place to take it from?
3. What params in multipath.conf are a must in this case of fail-over
setup regarding question no.1?
Thanks again!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi
2006-10-12 18:49 ` Michael Lyulko
@ 2006-10-12 20:28 ` David Wysochanski
2006-10-18 18:55 ` Michael Lyulko
0 siblings, 1 reply; 8+ messages in thread
From: David Wysochanski @ 2006-10-12 20:28 UTC (permalink / raw)
To: device-mapper development
On Thu, 2006-10-12 at 20:49 +0200, Michael Lyulko wrote:
> Dave Wysochanski wrote:
> >> I ran iozone benchmark and manually failed one of two paths. There was
> >> no failover to second path.
> >> The iSCSI initiator retried to reconnect to the first path -
> >> unsuccessfully. When I restored the first
> >> path - the initiator reconnected to the target via the first path.
> >>
> >> Thanks!
> >>
> >> ------------- INFO -------------
> >> Versions:
> >> device-mapper-1.01.01-1.6
> >> multipath-tools-0.4.5-0.11
> >>
> >> Machine:
> >> 2.6.5-7.244-smp; x86_64; SLES9 sp3
> >>
> >>
> >
> > Did you set ConnFailTimeout to a non-zero value? This sounds like it's
> > still at default (reconnect indefinately). You also need to set the
> > Multipath variable, but it sounds like you may have done that already.
> >
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >
> Thanks! Now I set ConnFailTimeout=30 and when one of the paths failed a
> fail-over to
> second path occurred (I had Multipath=portal). But after the first path
> is back (iSCSI session
> is established and I can read from the device) multipath -l shows:
> <snip>
> \_ round-robin 0 [enabled]
> \_ 0:0:6:0 sda 8:0 [failed][ready] // still "failed" -
> Michael
> \_ round-robin 0 [active]
> \_ 0:1:6:0 sdb 8:16 [active][ready]
>
> 1. How can I configure the DM to automatically detect that a path is
> active again?
you probably want "failback immediate"
there's also been some bugs with failback - not sure about your version
- your mileage may vary
> 2. I don't have a multipath.conf in /etc. Is
> /usr/share/doc/packages/multipath-tools/
> the right place to take it from?
Yes, you can start with this. I think there may be a problem with some
of the examples - make sure the "blacklist" keyword is correct - at some
point it changed from "devnode_blacklist" to "blacklist" (not sure which
one is in your version but example may be wrong).
> 3. What params in multipath.conf are a must in this case of fail-over
> setup regarding question no.1?
>
You might also want to look at no_path_retry and/or queue_if_no_path
options.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi
2006-10-12 20:28 ` David Wysochanski
@ 2006-10-18 18:55 ` Michael Lyulko
2006-11-01 6:20 ` Dave Wysochanski
0 siblings, 1 reply; 8+ messages in thread
From: Michael Lyulko @ 2006-10-18 18:55 UTC (permalink / raw)
To: device-mapper development
David Wysochanski wrote:
> On Thu, 2006-10-12 at 20:49 +0200, Michael Lyulko wrote:
>
>> Dave Wysochanski wrote:
>>
>>>> I ran iozone benchmark and manually failed one of two paths. There was
>>>> no failover to second path.
>>>> The iSCSI initiator retried to reconnect to the first path -
>>>> unsuccessfully. When I restored the first
>>>> path - the initiator reconnected to the target via the first path.
>>>>
>>>> Thanks!
>>>>
>>>> ------------- INFO -------------
>>>> Versions:
>>>> device-mapper-1.01.01-1.6
>>>> multipath-tools-0.4.5-0.11
>>>>
>>>> Machine:
>>>> 2.6.5-7.244-smp; x86_64; SLES9 sp3
>>>>
>>>>
>>>>
>>> Did you set ConnFailTimeout to a non-zero value? This sounds like it's
>>> still at default (reconnect indefinately). You also need to set the
>>> Multipath variable, but it sounds like you may have done that already.
>>>
>>>
>>>
>>> --
>>> dm-devel mailing list
>>> dm-devel@redhat.com
>>> https://www.redhat.com/mailman/listinfo/dm-devel
>>>
>>>
>> Thanks! Now I set ConnFailTimeout=30 and when one of the paths failed a
>> fail-over to
>> second path occurred (I had Multipath=portal). But after the first path
>> is back (iSCSI session
>> is established and I can read from the device) multipath -l shows:
>> <snip>
>> \_ round-robin 0 [enabled]
>> \_ 0:0:6:0 sda 8:0 [failed][ready] // still "failed" -
>> Michael
>> \_ round-robin 0 [active]
>> \_ 0:1:6:0 sdb 8:16 [active][ready]
>>
>> 1. How can I configure the DM to automatically detect that a path is
>> active again?
>>
>
> you probably want "failback immediate"
> there's also been some bugs with failback - not sure about your version
> - your mileage may vary
>
>
"failback immediate" didn't help on both SLES9 sp3 and SLES10. The
device mapper doesn't
rescan automatically the paths, so when the failed path is back to life,
it is still "failed"
in multipath -l output.
Issuing "multipath -p failover" helps to return the path back to
"active". I need an automatic "rescan".
>> 2. I don't have a multipath.conf in /etc. Is
>> /usr/share/doc/packages/multipath-tools/
>> the right place to take it from?
>>
>
> Yes, you can start with this. I think there may be a problem with some
> of the examples - make sure the "blacklist" keyword is correct - at some
> point it changed from "devnode_blacklist" to "blacklist" (not sure which
> one is in your version but example may be wrong).
>
>
>> 3. What params in multipath.conf are a must in this case of fail-over
>> setup regarding question no.1?
>>
>>
>
> You might also want to look at no_path_retry and/or queue_if_no_path
> options.
>
SLES9 sp3: both parameters are not in default multipath.conf, so I
suppose they are not supported yet in this distribution.
SLES10: no_path_retry doesn't help. queue_if_no_path is not in the
default multipath.conf.
Here is my configuration:
SLES9 sp3: multipath.conf:
defaults {
failback immediate
}
SLES10: multipath.conf:
defaults {
polling_interval 10
failback immediate
no_path_retry fail
}
Here is "multipath -l" output during the test:
SLES9 sp3:
before the path failed:
# multipath -l
dm names N
dm table 3600d02300063ccc60000006215c0b501 N
dm table 3600d02300063ccc60000006215c0b501 N
dm status 3600d02300063ccc60000006215c0b501 N
dm info 3600d02300063ccc60000006215c0b501 O
3600d02300063ccc60000006215c0b501
[size=87 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:6:0 sda 8:0 [active][ready]
\_ round-robin 0 [enabled]
\_ 0:1:6:0 sdb 8:16 [active][ready]
after the path is back again:
# multipath -l
dm names N
dm table 3600d02300063ccc60000006215c0b501 N
dm table 3600d02300063ccc60000006215c0b501 N
dm status 3600d02300063ccc60000006215c0b501 N
dm info 3600d02300063ccc60000006215c0b501 O
3600d02300063ccc60000006215c0b501
[size=87 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
\_ 0:0:6:0 sda 8:0 [failed][ready]
\_ round-robin 0 [active]
\_ 0:1:6:0 sdb 8:16 [active][ready]
SLES10:
before the path failed:
# multipath -l
320000004cffb0995SEAGATE,ST336607FC
[size=34G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
\_ 0:0:0:0 sda 8:0 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:0 sdb 8:16 [active][undef]
#
after the path is back again:
# multipath -l
320000004cffb0995SEAGATE,ST336607FC
[size=34G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 0:0:0:0 sda 8:0 [failed][undef]
\_ round-robin 0 [prio=0][active]
\_ 1:0:0:0 sdb 8:16 [active][undef]
#
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failover between two paths to one LU doesn't work onlinux-iscsi
2006-10-18 18:55 ` Michael Lyulko
@ 2006-11-01 6:20 ` Dave Wysochanski
0 siblings, 0 replies; 8+ messages in thread
From: Dave Wysochanski @ 2006-11-01 6:20 UTC (permalink / raw)
To: device-mapper development
On Wed, 2006-10-18 at 20:55 +0200, Michael Lyulko wrote:
> >>
> >> 1. How can I configure the DM to automatically detect that a path is
> >> active again?
> >>
> >
> > you probably want "failback immediate"
> > there's also been some bugs with failback - not sure about your version
> > - your mileage may vary
> >
> >
> "failback immediate" didn't help on both SLES9 sp3 and SLES10. The
> device mapper doesn't
> rescan automatically the paths, so when the failed path is back to life,
> it is still "failed"
> in multipath -l output.
> Issuing "multipath -p failover" helps to return the path back to
> "active". I need an automatic "rescan".
This is supposed to be multipathd's job - it's supposed to have a path
checker thread that repeatedly scans and notifies the kernel to
reinstate paths. If the first state remains "[failed]", it indicates
either multipathd isn't running or it's not notifying the kernel
correctly of the reinstated paths.
I seem to recall there was a sequencing problem - at least on SLES9 SP?
with boot.multipath, multipathd and iscsi - something like multipathd
starting too early, which caused it not to monitor paths properly. Did
you try restarting multipathd after everything is up (on SLES9)?
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-11-01 6:20 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-10 17:17 Failover between two paths to one LU doesn't work on linux-iscsi Michae Lyulko
2006-10-10 17:24 ` Failover between two paths to one LU doesn't work onlinux-iscsi Eli Stair
2006-10-10 19:55 ` Michae Lyulko
2006-10-11 14:16 ` Dave Wysochanski
2006-10-12 18:49 ` Michael Lyulko
2006-10-12 20:28 ` David Wysochanski
2006-10-18 18:55 ` Michael Lyulko
2006-11-01 6:20 ` Dave Wysochanski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.