* Problems with multipathd
@ 2005-08-31 15:29 Simon
2005-08-31 19:56 ` christophe varoqui
0 siblings, 1 reply; 5+ messages in thread
From: Simon @ 2005-08-31 15:29 UTC (permalink / raw)
To: dm-devel
Hi,
I am trying to use multipath to provide a single block device for a
multipathed LUN for failover reasons. After some days of installation,
documentation reading and debugging I have solved a lot of problems but not
all and I need some help. I know it's a lot of text (sorry!!!), but I think
it's necessary to describe my problems.
I have marked my questions/comments with "===>". Please answer to this notes.
Thank you.
1.) *** System Description ***
Storage:
- Storage EVA-3000
- Controller-B connected to fabric-A and fabric-B
- one VDisk presented to host testhalde2 via controller-B to fabric-A and -B
Server (testhalde2):
- 1x HBA Qlogic 2340 connected to fabric-A
- 1x HBA Qlogic 2340 connected to fabric-B
- Kernel 2.6.12.5 (vanilla, gentoo)
- device-mapper-1.01.03, udev-058, multipath-tools-0.4.4
testhalde2 tmp # dmesg | fgrep device-mapper
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
device-mapper: dm-multipath version 1.0.4 loaded
device-mapper: dm-round-robin version 1.0.0 loaded
testhalde2 tmp # lsmod
Module Size Used by
qla2300 123904 0
qla2xxx 88208 4 qla2300
scsi_transport_fc 26880 1 qla2xxx
testhalde2 etc # cat multipath.conf
defaults {
multipath_tool "/sbin/multipath -v 0 -S"
udev_dir /dev
polling_interval 10
default_selector "round-robin 0"
default_path_grouping_policy failover
default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
default_prio_callout "/bin/false"
r_min_io 100
}
blacklist {
wwid 26353900f02796769
devnode "(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "hd[a-z][[0-9]*]"
devnode "cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
multipath {
wwid 3600508b40010079d0001900000460000
alias 150gb
path_grouping_policy failover
path_selector "round-robin 0"
}
}
devices {
device {
vendor "HP "
product "HSV100 "
path_grouping_policy multibus
path_checker tur
prio_callout "/sbin/pp_balance_units %d"
}
}
testhalde2 etc # cat /etc/udev/rules.d/20-multipath.rules
KERNEL="dm-[0-9]*", PROGRAM="/sbin/devmap_name %M %m", NAME="%k", SYMLINK="%c"
testhalde2 ~ # cat /etc/dev.d/block/multipath.dev
#!/bin/sh -e
print()
{
echo "`date +%H%M%S` - $1" >> /tmp/devd_multipath
}
print "ENV_ACTION: $ACTION" # debugging
if [ ! "${ACTION}" = add ] ; then
exit
fi
if [ "${DEVPATH:7:3}" = "dm-" ] ; then
dev=$(</sys${DEVPATH}/dev)
map=$(/sbin/devmap_name $dev)
print "KPARTX $map" # debugging
/sbin/kpartx -v -a /dev/$map >> /tmp/devd_multipath
else
print "ENV_DEVNAME: ${DEVNAME}" # debugging
/sbin/multipath ${DEVNAME}
fi
2.) *** Multipath in action ***
After rebooting testhalde2, I see the following:
testhalde2 tmp # ls /sys/block/
dm-0 loop0 loop3 loop6 ram1 ram12 ram15 ram4 ram7 sda
fd0 loop1 loop4 loop7 ram10 ram13 ram2 ram5 ram8 sdb
hda loop2 loop5 ram0 ram11 ram14 ram3 ram6 ram9
testhalde2 tmp # ls -lF /dev/mapper/
total 0
brw------- 1 root root 254, 0 Aug 31 12:20 150gb
crw-rw---- 1 root root 10, 63 Aug 31 2005 control
testhalde2 ~ # fdisk -l /dev/mapper/150gb
Disk /dev/mapper/150gb: 161.0 GB, 161061273600 bytes
255 heads, 63 sectors/track, 19581 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mapper/150gb doesn't contain a valid partition table
===> Is it possible to _use_ partitions on this device? I know that it is
possible to create them, but what is the device-name (/dev/...) from
partition 1?
testhalde2 ~ # mkreiserfs /dev/mapper/150gb
mkreiserfs 3.6.19 (2003 www.namesys.com)
...
ReiserFS is successfully created on /dev/mapper/150gb.
testhalde2 ~ #
testhalde2 ~ # mount /dev/mapper/150gb /mnt/test/
testhalde2 ~ # touch /mnt/test/file # ok
testhalde2 ~ # rm /mnt/test/file # ok
testhalde2 rules.d # udevtest /sys/block/dm-0 block
udevtest.c: looking at device '/block/dm-0' from subsystem 'block'
udevtest.c: opened class_dev->name='dm-0'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, added symlink '%c'
udev_rules.c: add symlink '150gb'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, 'dm-0' becomes '%k'
udev_rules.c: configured rule in '/etc/udev/rules.d/50-udev.rules[63]' applied, '
dm-0' is ignored
testhalde2 tmp # ls -lF /dev/1*
ls: /dev/1*: No such file or directory
===> udevtest shows that udev reads the 20-multipath.rules rule. Why doesn't
udev creates /dev/150gb?
testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 0:0:0:1 sda 8:0 [ready ][active]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
testhalde2 tmp # dmsetup table
150gb: 0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1
1 8:16 1000
testhalde2 tmp # cat devd_multipath # multipath.dev debugging output
...
142037 - ENV_DEVPATH: ram
142037 - ENV_DEVNAME: /dev/rd/9
142046 - ENV_ACTION: add
142046 - ENV_DEVPATH: sda
142046 - ENV_DEVNAME: /dev/sda
122045 - ENV_ACTION: add
122045 - ENV_DEVPATH: sdb
122045 - ENV_DEVNAME: /dev/sdb
testhalde2 tmp # fgrep dm devd_multipath
testhalde2 tmp #
===> My idea from the sysfs/device-mapper/udev/multipath cooperation is the
following: After loading the hba module qla2300 the kernel creates
/sys/block/sda and /sys/block/sdb und executes udevsend - udevd - udev. udev
invokes /etc/dev.d/block/multipath.dev (ADD, sda/sdb). multipath.dev executes
multipath that creates the device-mapper table and the device-mapper device
/sys/block/dm-0. Ok, now we have a new device (dm-0). And again: udevsend -
udevd - udev and multipath.dev (ADD, dm-0). multipath.dev should start kpartx,
but the debug file /tmp/devd_multipath shows nothing! So, I think kpartx will
never started. Is this behavior ok? It seems to work without kpartx, so I don't
understand why I need this tool.
testhalde2 ~ # multipath -v3
fd0 blacklisted
ram0 blacklisted
ram1 blacklisted
ram2 blacklisted
ram3 blacklisted
ram4 blacklisted
ram5 blacklisted
ram6 blacklisted
ram7 blacklisted
ram8 blacklisted
ram9 blacklisted
ram10 blacklisted
ram11 blacklisted
ram12 blacklisted
ram13 blacklisted
ram14 blacklisted
ram15 blacklisted
loop0 blacklisted
loop1 blacklisted
loop2 blacklisted
loop3 blacklisted
loop4 blacklisted
loop5 blacklisted
loop6 blacklisted
loop7 blacklisted
hda blacklisted
path sda not found in pathvec
===== path sda =====
vendor = HP
:
product = HSV100
rev = 3025
dev_t = 8:0
size = 314572800
h:b:t:l = 0:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
path sdb not found in pathvec
===== path sdb =====
vendor = HP
product = HSV100
rev = 3025
dev_t = 8:16
size = 314572800
h:b:t:l = 1:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
dm-0 blacklisted
#
# all paths :
#
3600508b40010079d0001900000460000 0:0:0:1 sda 8:0 [ready ][HSV100 ]
3600508b40010079d0001900000460000 1:0:0:1 sdb 8:16 [ready ][HSV100 ]
params = 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000
status = 1 0 0 2 1 A 0 1 0 8:0 A 0 E 0 1 0 8:16 A 0
pgpolicy = failover (LUN setting)
selector = round-robin 0 (LUN setting)
features = 0 (internal default)
hwhandler = 0 (internal default)
0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1
000
action preset to 0
action set to 1
cannot signal daemon, pidfile not found
testhalde2 ~ #
testhalde2 ~ # ps ax | fgrep multipathd
10870 pts/0 SL 0:00 multipathd
10871 pts/0 SL 0:00 multipathd
10872 pts/0 SL 0:00 multipathd
10875 pts/0 S+ 0:00 fgrep multipathd
testhalde2 ~ # ls /var/run/multipathd.pid
ls: /var/run/multipathd.pid: No such file or directory
===> Does the system really need _three_ multipathd daemons and why is
there no pid file?
testhalde2 ~ # echo 10870 > /var/run/multipathd.pid
testhalde2 ~ # strace -f -p 10870 >/tmp/strace_multipatd 2>&1 &
[1] 11192
Now, I disable HBA-fabric-B port on the san-switch...
testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
open class /sys/block/sdc failed: No such file or directory
error calling out /sbin/scsi_id -g -u -s /block/sdc
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 ~ # multipath -l # again
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:0 8:32 [undef ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0 SL 0:00 multipathd
10872 pts/0 SL 0:00 multipathd
10870 pts/0 SL 0:00 multipathd
11534 pts/0 S+ 0:00 fgrep multipathd
testhalde2 tmp # cat strace_multipatd
Process 10870 attached - interrupt to quit
testhalde2 tmp #
===> No output in the strace-debug file from multipathd. It seems that
multipathd don't recognize the changes.
Enabling HBA-fabric-B port on the san-switch...
testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
Disabling HBA-fabric-A port on the other san-switch...
testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
1:0:0:1: cannot open /tmp/scsi-maj8-min16-11665: No such device or address
error calling out /sbin/scsi_id -g -u -s /block/sdb
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
\_ 1:0:0:1 sdb 8:16 [faulty][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 tmp # multipath -l # again
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
\_ 0:0:0:0 8:16 [undef ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
===> Why do I get the "error calling out..." error only when I disable the
HBA-port from _fabric-A_?
Enabling HBA-fabric-A port...
testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sda 8:0 [ready ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0 SL 0:00 multipathd
10872 pts/0 SL 0:00 multipathd
10870 pts/0 SL 0:00 multipathd
11534 pts/0 S+ 0:00 fgrep multipathd
testhalde2 tmp # cat strace_multipatd
Process 10870 attached - interrupt to quit
testhalde2 tmp #
===> Again: No output in the strace-debug file from multipathd.
SUMMARY:
========
The failover mechanism seems to work, but it's very very slow (>= 35 sec).
I am sure that the host will die when I have a lot of I/O's in this moment.
The documentation says that multipathd "is in charge of checking the paths
in case they come up or down" and multipathd seems to do nothing... I think
that is the problem... What do you mean?
Thanks a lot for your help
Simon
--
Simon
gistolero@gmx.de
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with multipathd
2005-08-31 15:29 Problems with multipathd Simon
@ 2005-08-31 19:56 ` christophe varoqui
2005-09-06 16:46 ` gistolero
0 siblings, 1 reply; 5+ messages in thread
From: christophe varoqui @ 2005-08-31 19:56 UTC (permalink / raw)
To: gistolero, device-mapper development
On mer, 2005-08-31 at 17:29 +0200, Simon wrote:
>
> testhalde2 tmp # ls -lF /dev/mapper/
> total 0
> brw------- 1 root root 254, 0 Aug 31 12:20 150gb
> crw-rw---- 1 root root 10, 63 Aug 31 2005 control
>
No /dev/150gb node :) ?
/etc/udev/rules.d/20-multipath.rules should create it, see below.
>
> testhalde2 ~ # fdisk -l /dev/mapper/150gb
> Disk /dev/mapper/150gb: 161.0 GB, 161061273600 bytes
> 255 heads, 63 sectors/track, 19581 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk /dev/mapper/150gb doesn't contain a valid partition table
>
>
> ===> Is it possible to _use_ partitions on this device? I know that it is
> possible to create them, but what is the device-name (/dev/...) from
> partition 1?
>
A little bit harder, but I guess so :
- remove the multipath map
- partition a path (/dev/sda for example)
- re-create the multipath map through '/sbin/multipath /dev/sda'
> testhalde2 rules.d # udevtest /sys/block/dm-0 block
> udevtest.c: looking at device '/block/dm-0' from subsystem 'block'
> udevtest.c: opened class_dev->name='dm-0'
> udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
> d, added symlink '%c'
> udev_rules.c: add symlink '150gb'
> udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
> d, 'dm-0' becomes '%k'
> udev_rules.c: configured rule in '/etc/udev/rules.d/50-udev.rules[63]' applied, '
> dm-0' is ignored
>
Default udev.rules file has a directive to ignore dm-*
Something like :
KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device"
/etc/udev/rules.d/20-multipath.rules is useless unless you you comment
out this rule.
>
> testhalde2 tmp # ls -lF /dev/1*
> ls: /dev/1*: No such file or directory
>
>
> ===> udevtest shows that udev reads the 20-multipath.rules rule. Why doesn't
> udev creates /dev/150gb?
>
See above
>
> testhalde2 tmp # multipath -l
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
> \_ 0:0:0:1 sda 8:0 [ready ][active]
> \_ round-robin 0 [enabled]
> \_ 1:0:0:1 sdb 8:16 [ready ][active]
>
>
> testhalde2 tmp # dmsetup table
> 150gb: 0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1
> 1 8:16 1000
>
>
> testhalde2 tmp # cat devd_multipath # multipath.dev debugging output
> ...
> 142037 - ENV_DEVPATH: ram
> 142037 - ENV_DEVNAME: /dev/rd/9
> 142046 - ENV_ACTION: add
> 142046 - ENV_DEVPATH: sda
> 142046 - ENV_DEVNAME: /dev/sda
> 122045 - ENV_ACTION: add
> 122045 - ENV_DEVPATH: sdb
> 122045 - ENV_DEVNAME: /dev/sdb
>
>
> testhalde2 tmp # fgrep dm devd_multipath
> testhalde2 tmp #
>
>
> ===> My idea from the sysfs/device-mapper/udev/multipath cooperation is the
> following: After loading the hba module qla2300 the kernel creates
> /sys/block/sda and /sys/block/sdb und executes udevsend - udevd - udev. udev
> invokes /etc/dev.d/block/multipath.dev (ADD, sda/sdb). multipath.dev executes
> multipath that creates the device-mapper table and the device-mapper device
> /sys/block/dm-0. Ok, now we have a new device (dm-0). And again: udevsend -
> udevd - udev and multipath.dev (ADD, dm-0). multipath.dev should start kpartx,
> but the debug file /tmp/devd_multipath shows nothing! So, I think kpartx will
> never started. Is this behavior ok? It seems to work without kpartx, so I don't
> understand why I need this tool.
>
>
kpartx is triggered for dm-* "adds" only by multipath.dev hotplug
script. *and* the script expects the node to be in /dev/
(not /dev/mapper/).
This problem is linked to the previous one.
>
> testhalde2 ~ # ps ax | fgrep multipathd
> 10870 pts/0 SL 0:00 multipathd
> 10871 pts/0 SL 0:00 multipathd
> 10872 pts/0 SL 0:00 multipathd
> 10875 pts/0 S+ 0:00 fgrep multipathd
>
> testhalde2 ~ # ls /var/run/multipathd.pid
> ls: /var/run/multipathd.pid: No such file or directory
>
> ===> Does the system really need _three_ multipathd daemons and why is
> there no pid file?
>
I don't know default ps/nptl Gentoo choice, but it might well be the
different threads you see there. Consecutive PID numbers are a sign.
>
> testhalde2 ~ # echo 10870 > /var/run/multipathd.pid
> testhalde2 ~ # strace -f -p 10870 >/tmp/strace_multipatd 2>&1 &
> [1] 11192
>
Don't debug this way.
Use 'multipathd -v4' and see the log or 'strace -f multipathd'
>
>
>
> Now, I disable HBA-fabric-B port on the san-switch...
>
> testhalde2 ~ # multipath -l
> [ sleeping 35 seconds ]
> open class /sys/block/sdc failed: No such file or directory
> error calling out /sbin/scsi_id -g -u -s /block/sdc
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
> \_ 1:0:0:1 sdb 8:16 [ready ][active]
> \_ round-robin 0 [enabled]
> \_ 0:0:0:1 sdc 8:32 [ready ][active]
>
> testhalde2 ~ # multipath -l # again
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
> \_ 1:0:0:1 sdb 8:16 [ready ][active]
> \_ round-robin 0 [enabled]
> \_ 0:0:0:0 8:32 [undef ][active]
>
Lower the timeouts in your Qlogic driver.
> testhalde2 tmp # touch /mnt/test/test # ok
> testhalde2 tmp # rm /mnt/test/test # ok
>
> testhalde2 tmp # ps ax | fgrep multipathd
> 10871 pts/0 SL 0:00 multipathd
> 10872 pts/0 SL 0:00 multipathd
> 10870 pts/0 SL 0:00 multipathd
> 11534 pts/0 S+ 0:00 fgrep multipathd
>
> testhalde2 tmp # cat strace_multipatd
> Process 10870 attached - interrupt to quit
> testhalde2 tmp #
>
> ===> No output in the strace-debug file from multipathd. It seems that
> multipathd don't recognize the changes.
>
Do the log agree with that ?
>
> Enabling HBA-fabric-B port on the san-switch...
>
> testhalde2 tmp # multipath -l
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
> \_ 1:0:0:1 sdb 8:16 [ready ][active]
> \_ round-robin 0 [enabled]
> \_ 0:0:0:1 sdc 8:32 [ready ][active]
>
> testhalde2 tmp # touch /mnt/test/test # ok
> testhalde2 tmp # rm /mnt/test/test # ok
>
>
>
>
> Disabling HBA-fabric-A port on the other san-switch...
>
> testhalde2 ~ # multipath -l
> [ sleeping 35 seconds ]
> 1:0:0:1: cannot open /tmp/scsi-maj8-min16-11665: No such device or address
> error calling out /sbin/scsi_id -g -u -s /block/sdb
> error calling out /sbin/pp_balance_units 8:32
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active][first]
> \_ 1:0:0:1 sdb 8:16 [faulty][active]
> \_ round-robin 0 [enabled]
> \_ 0:0:0:1 sdc 8:32 [ready ][active]
>
> testhalde2 tmp # multipath -l # again
> error calling out /sbin/pp_balance_units 8:32
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active][first]
> \_ 0:0:0:0 8:16 [undef ][active]
> \_ round-robin 0 [enabled]
> \_ 0:0:0:1 sdc 8:32 [ready ][active]
>
> testhalde2 tmp # touch /mnt/test/test # ok
> testhalde2 tmp # rm /mnt/test/test # ok
>
>
> ===> Why do I get the "error calling out..." error only when I disable the
> HBA-port from _fabric-A_?
>
Your log shows this message when disabling B too.
These are scsi_id error messages.
>
> Enabling HBA-fabric-A port...
>
> testhalde2 tmp # multipath -l
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
> \_ 0:0:0:1 sdc 8:32 [ready ][active]
> \_ round-robin 0 [enabled]
> \_ 1:0:0:1 sda 8:0 [ready ][active]
>
> testhalde2 tmp # touch /mnt/test/test # ok
> testhalde2 tmp # rm /mnt/test/test # ok
>
>
> SUMMARY:
> ========
>
> The failover mechanism seems to work, but it's very very slow (>= 35 sec).
> I am sure that the host will die when I have a lot of I/O's in this moment.
> The documentation says that multipathd "is in charge of checking the paths
> in case they come up or down" and multipathd seems to do nothing... I think
> that is the problem... What do you mean?
>
Hope the previous comments clarifies a bit.
Also know the 0.4.5 snapshots are largely better suited to the task.
Consider upgrading.
And consider updating the wiki FAQ with the response you found to be
enlightening :/
Regards,
--
christophe varoqui <christophe.varoqui@free.fr>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: Problems with multipathd
2005-08-31 19:56 ` christophe varoqui
@ 2005-09-06 16:46 ` gistolero
2005-09-06 20:47 ` christophe varoqui
0 siblings, 1 reply; 5+ messages in thread
From: gistolero @ 2005-09-06 16:46 UTC (permalink / raw)
To: dm-devel
Hi,
> Also know the 0.4.5 snapshots are largely better suited to the task.
> Consider upgrading.
Now, I use multipath-tools-0.4.5, udev-068 and device-mapper-1.01.03.
> Lower the timeouts in your Qlogic driver.
===> I found some settings in /sys/module/qla2xxx/parameters/...,
but most of them are read-only values. I have changed ql2xretrycount
and ql2xsuspendcount but without success. Any suggestions for
this driver?
I.) --- udev and udevstart ---
> Default udev.rules file has a directive to ignore dm-*
> Something like :
> KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device"
>
> /etc/udev/rules.d/20-multipath.rules is useless unless you you comment
> out this rule.
I have commented this line, but udev still has difficulties to create this
links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
is attached at the end of this post) and added debug messages. The most
important modification is that kpartx uses the block-device-files in
/dev/mapper/... instead of /dev/...
===> Why isn't that the default? Are there any disadvantages?
testhalde2 ~ # multipath /dev/sda
create: 150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0
\_ 0:0:0:1 sda 8:0 [ready]
\_ 1:0:0:1 sdb 8:16 [ready]
testhalde2 ~ # multipath -ll
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][ready]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260
testhalde2 ~ # ls -lF /dev/mapper/
total 0
brw------- 1 root root 254, 0 Sep 6 15:04 150gb
brw------- 1 root root 254, 1 Sep 6 15:04 150gb1
brw------- 1 root root 254, 2 Sep 6 15:04 150gb2
crw-rw---- 1 root root 10, 63 Sep 6 2005 control
testhalde2 ~ # ls -lF /dev/1*
ls: /dev/1*: No such file or directory
testhalde2 ~ # udevstart
testhalde2 ~ # ls -lF /dev/1*
lrwxrwxrwx 1 root root 4 Sep 6 15:10 /dev/150gb -> dm-0
lrwxrwxrwx 1 root root 4 Sep 6 15:11 /dev/150gb1 -> dm-1
lrwxrwxrwx 1 root root 4 Sep 6 15:11 /dev/150gb2 -> dm-2
===> Without "udevstart" udev doesn't create the /dev/150gb*
links! Is this a udev bug?
II.) --- Using multipathd ---
testhalde2 ~ # multipathd -v4
testhalde2 ~ # ps ax | fgrep multipathd
11024 pts/1 SL 0:00 multipathd -v4
11025 pts/1 SL 0:00 multipathd -v4
11026 pts/1 SL 0:00 multipathd -v4
11029 pts/1 SL 0:00 multipathd -v4
11030 pts/1 SL 0:00 multipathd -v4
11031 pts/1 SL 0:00 multipathd -v4
11032 pts/1 SL 0:00 multipathd -v4
11071 pts/1 S+ 0:00 fgrep multipathd
testhalde2 ~ # cat /var/run/multipathd.pid
11024
Yeah! Version 0.4.5 creates a pid and a socket file :-)
It's important that I start "multipath /dev/sda" _before_
multipathd! If I change this order, multipathd does nothing.
/var/log/messages shows "tick", "map garbage collection"
etc. and nothing about /dev/sda or /dev/sdb. It seems that
multipathd doesn't read the device-mapper table at startup.
===> Is this behavior ok?
testhalde2 ~ # less /var/log/messages
...
/etc/dev.d/multipath.dev (10904): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-0, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (10904): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (10904): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (10904): Getting /block/dm-0 major and minor number
/etc/dev.d/multipath.dev (10904): /block/dm-0 major:minor = 254:0
/etc/dev.d/multipath.dev (10904): Getting /block/dm-0 alias
/etc/dev.d/multipath.dev (10904): /block/dm-0 alias = /dev/mapper/150gb
/etc/dev.d/multipath.dev (10904): /sbin/kpartx -v -a /dev/mapper/150gb
/etc/dev.d/multipath.dev (10935): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-1, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (10935): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (10935): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (10935): Getting /block/dm-1 major and minor number
/etc/dev.d/multipath.dev (10935): /block/dm-1 major:minor = 254:1
/etc/dev.d/multipath.dev (10935): Getting /block/dm-1 alias
/etc/dev.d/multipath.dev (10963): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-2, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (10963): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (10963): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (10963): Getting /block/dm-2 major and minor number
/etc/dev.d/multipath.dev (10963): /block/dm-2 major:minor = 254:2
/etc/dev.d/multipath.dev (10963): Getting /block/dm-2 alias
/etc/dev.d/multipath.dev (10935): /block/dm-1 alias = /dev/mapper/150gb1
/etc/dev.d/multipath.dev (10935): /sbin/kpartx -v -a /dev/mapper/150gb1
/etc/dev.d/multipath.dev (10963): /block/dm-2 alias = /dev/mapper/150gb2
/etc/dev.d/multipath.dev (10963): /sbin/kpartx -v -a /dev/mapper/150gb2
multipathd: --------start up--------
multipathd: read /etc/multipath.conf
multipathd: fd0 blacklisted
...
multipathd: hda blacklisted
multipathd: path sda not found in pathvec
multipathd: ===== path sda =====
multipathd: bus = 1
multipathd: dev_t = 8:0
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 0:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: getuid = /sbin/scsi_id -g -u -s /block/%n (controler setting)
multipathd: uid = 3600508b40010079d0001900000460000 (callout)
multipathd: path sdb not found in pathvec
multipathd: ===== path sdb =====
multipathd: bus = 1
multipathd: dev_t = 8:16
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 1:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: getuid = /sbin/scsi_id -g -u -s /block/%n (controler setting)
multipathd: uid = 3600508b40010079d0001900000460000 (callout)
multipathd: dm-0 blacklisted
multipathd: dm-1 blacklisted
multipathd: dm-2 blacklisted
multipathd: discovered map 150gb
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: 8:0 ownership set
multipathd: 8:16 ownership set
multipathd: pgfailback = -2 (LUN setting)
multipathd: 150gb: event checker started
multipathd: path checkers start up
multipathd: tick
multipathd: ===== path sda =====
multipathd: bus = 1
multipathd: dev_t = 8:0
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 0:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path checker = tur (controler setting)
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: reinstated
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: ===== path sda =====
multipathd: getprio = /bin/true (internal default)
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: getprio = /bin/true (internal default)
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: bus = 1
multipathd: dev_t = 8:16
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 1:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path checker = tur (controler setting)
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: reinstated
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 4 times
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 20s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 20s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: tick
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
multipathd: tick
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: tick
...
*** Disabling san-port from HBA-1... ***
testhalde2 ~ # multipath -ll
[ sleeping 35 seconds ]
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][faulty]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # multipath -ll
[ sleeping 10 seconds ]
failed to open /dev/sda
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [failed][faulty]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # multipath -ll
[ sleeping 10 seconds ]
failed to open /dev/sda
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][ready]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # ls /sys/block/
dm-0 fd0 loop1 loop4 loop7 ram10 ram13 ram2 ram5 ram8
dm-1 hda loop2 loop5 ram0 ram11 ram14 ram3 ram6 ram9
dm-2 loop0 loop3 loop6 ram1 ram12 ram15 ram4 ram7 sdb
testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260
testhalde2 ~ # less /var/log/messages
...
kernel: qla2300 0000:03:01.0: LOOP DOWN detected.
multipathd: tick
...
kernel: rport-0:0-3: blocked FC remote port time out: removing target
multipathd: 8:0: tur checker reports path is down
multipathd: checker failed path 8:0 in map 150gb
kernel: device-mapper: dm-multipath: Failing path 8:0.
multipathd: 150gb: devmap event #2
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: discovered map 150gb
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = F, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: 8:0 ownership set
multipathd: 8:16 ownership set
multipathd: pgfailback = -2 (LUN setting)
/etc/dev.d/multipath.dev (11579): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda1, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11579): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11579): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11579): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11571): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda2, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11571): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11571): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11571): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11604): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11604): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11604): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11604): Exiting: $ACTION != "add"
multipathd: tick
multipathd: map garbage collection
multipathd: tick
multipathd: tick
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 3 times
multipathd: map garbage collection
multipathd: tick
last message repeated 3 times
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: reinstated
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: 150gb: devmap event #3
multipathd: discovered map 150gb
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: 8:0 ownership set
multipathd: 8:16 ownership set
multipathd: pgfailback = -2 (LUN setting)
kernel: scsi0 (0:1): rejecting I/O to dead device
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
multipathd: tick
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 2 times
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 20s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
kernel: scsi0 (0:1): rejecting I/O to dead device
multipathd: tick
last message repeated 2 times
multipathd: map garbage collection
multipathd: tick
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 4 times
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 3 times
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: tick
multipathd: tick
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: map garbage collection
kernel: scsi0 (0:1): rejecting I/O to dead device
multipathd: tick
...
===> First multipathd says "8:0: tur checker reports
path is down" and multipath prints sda "failed" (ok).
After a few seconds sda is "ready" and multipathd says
"8:0: tur checker reports path is up"?! I have changed
nothing during this time.
*** Enabling san-switch port from HBA-1 ***
testhalde2 ~ # multipath -ll
testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb2: 0 314504505 linear 254:0 64260
testhalde2 ~ # less /var/log/messages
...
multipathd: tick
kernel: qla2300 0000:03:01.0: LIP reset occured (f7f7).
kernel: qla2300 0000:03:01.0: LOOP UP detected (2 Gbps).
...
kernel: SCSI device sdc: drive cache: write through
kernel: sdc: sdc1 sdc2
kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1
kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1, type 0
scsi.agent[11856]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1
/etc/dev.d/multipath.dev (11909): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11909): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11909): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11909): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11909): multipath -v0 /dev/sdc
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
multipathd: tick
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 2 times
multipathd: map garbage collection
kernel: device-mapper: dm-multipath: error getting device
kernel: device-mapper: error adding target to table
/etc/dev.d/multipath.dev (11944): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11944): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11944): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11959): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11959): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11959): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11959): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11959): multipath -v0 /dev/sdc1
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
logger: /etc/dev.d/multipath.dev (11944): Checking/Creating multipath device-mapper table with multipath-tool
logger: /etc/dev.d/multipath.dev (11944): multipath -v0 /dev/sdc2
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 2 times
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
multipathd: tick
kernel: device-mapper: dm-multipath: error getting device
kernel: device-mapper: error adding target to table
kernel: device-mapper: device doesn't appear to be in the dev hash table.
multipathd: tick
multipathd: map garbage collection
multipathd: 150gb: remove dead map
multipathd: 150gb: reap event checker
multipathd: 8:0 is orphaned
multipathd: 8:16 is orphaned
multipathd: SIGHUP received
multipathd: tick
multipathd: tick
/etc/dev.d/multipath.dev (12002): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-3, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (12002): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (12002): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (12002): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (12018): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-4, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (12018): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (12018): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (12018): Exiting: $ACTION != "add"
multipathd: tick
...
===> An error occurs while device-mapper tries to update
the dm-table and deletes the "150gb" entry.
III.) --- Using multipath-tools without multipath ---
Now, I try the same _without_ starting multipathd...
testhalde2 ~ # ps ax | fgrep multipathd
testhalde2 ~ #
testhalde2 ~ # multipath /dev/sda
create: 150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0
\_ 0:0:0:1 sda 8:0 [ready]
\_ 1:0:0:1 sdb 8:16 [ready]
testhalde2 ~ # multipath -ll
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][ready]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260
testhalde2 ~ # ls /dev/mapper/
150gb 150gb1 150gb2 control
*** Disabling san-switch port from HBA-1 ***
testhalde2 ~ # multipath -ll
[ sleeping 35 seconds ]
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sda 8:0 [active][faulty]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # multipath -ll
[ sleeping 10 seconds ]
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ #:#:#:# 8:0 [active]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # ls /sys/block/
dm-0 fd0 loop1 loop4 loop7 ram10 ram13 ram2 ram5 ram8
dm-1 hda loop2 loop5 ram0 ram11 ram14 ram3 ram6 ram9
dm-2 loop0 loop3 loop6 ram1 ram12 ram15 ram4 ram7 sdb
testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260
testhalde2 ~ # less /var/log/messages
...
kernel: qla2300 0000:03:01.0: LOOP DOWN detected.
kernel: rport-0:0-3: blocked FC remote port time out: removing target
/etc/dev.d/multipath.dev (11186): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda1, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11186): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11186): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11186): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11200): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda2, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11200): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11200): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11200): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11217): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11217): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11217): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11217): Exiting: $ACTION != "add"
...
*** Enabling san-switch port from HBA-1 ***
testhalde2 ~ # multipath -ll
[ sleeping 5 seconds ]
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ #:#:#:# 8:0 [active]
\_ 1:0:0:1 sdb 8:16 [active][ready]
testhalde2 ~ # multipath -ll
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdb 8:16 [active][ready]
\_ 0:0:0:1 sdc 8:32 [active][ready]
testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:16 1000 8:32 1000
150gb2: 0 314504505 linear 254:0 64260
testhalde2 ~ # ls /dev/mapper/
150gb 150gb1 150gb2 control
testhalde2 ~ # less /var/log/messages
...
kernel: sdc: sdc1 sdc2
kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1
kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1, type 0
scsi.agent[11271]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1
/etc/dev.d/multipath.dev (11325): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11325): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11325): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11325): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11325): multipath -v0 /dev/sdc
/etc/dev.d/multipath.dev (11356): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11356): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11356): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11356): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11356): multipath -v0 /dev/sdc2
/etc/dev.d/multipath.dev (11377): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11377): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11377): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11377): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11377): multipath -v0 /dev/sdc1
...
VI.) --- Summary ---
===> Multipathing seems to work without but not with multipathd.
It's very slow, but Christophe Varoqui wrote that I have to lower
the HBA timeouts (unfortunately, I don't know how to do this,
see above). Does I really need multipathd? I suppose so :-)
> And consider updating the wiki FAQ with the response you found to be
> enlightening :/
As soon as I have running multipath I will write a step-by-step
documentation.
Thanks again for your help,
Simon
#--- My /etc/dev.d/block/multipath.dev - Begin ---
#!/bin/sh
# log to local file? (0 or 1)
# (print log file with "sort $logFile [| less]")
logToFile=1
# log to syslog? (0 or 1)
logToSyslog=1
# path to log file
# (only used if $logToFile==1)
logFile="/root/multipath.dev.log"
# syslog facility.priority (man syslog.conf)
# (only used if $logToSyslog==1)
syslogFacPrio="daemon.info"
# timeout for getting ${DEVPATH} alias in seconds
timeout=10
# be verbose in log file and/or syslog?
# (only used if $logToFile==1 and/or $logToSyslog==1)
verbose=1
# Don't touch
pid=$$
logCount=1;
log()
{
if [ ${logToFile} -eq 1 ]
then
msg="PID ${pid} - $(date +%Y%m%d-%H%M%S) -"
msg="${msg} Log entry ${logCount}: ${1}"
echo "${msg}" >> $logFile
logCount=$(($logCount + 1))
fi
if [ ${logToSyslog} -eq 1 ]
then
logger -p ${syslogFacPrio} "/etc/dev.d/multipath.dev (${pid}): ${1}"
fi
}
die()
{
log "DIED WITH ERROR: ${1}"
exit 1
}
exe()
{
log "${1}"
if ! ${1}
then
die "\"${1}\" failed (device blacklisted?)"
fi
}
end()
{
if [ ! -z ${1} ]
then
log "${1}"
fi
exit 0
}
if [ ${verbose} -eq 1 ]
then
msg="Parameters: \$0=${0}, \$DEVPATH=${DEVPATH},"
msg="${msg} \$ACTION=${ACTION}, \$@=${@}"
log "${msg}"
if [ ${logToFile} -eq 1 ]
then
log "Logging to local file is enabled: log file is ${logFile}"
fi
if [ ${logToSyslog} -eq 1 ]
then
msg="Logging to syslog is enabled:"
msg="${msg} facility.priority is ${syslogFacPrio}"
log "${msg}"
fi
fi
if [ ! "${ACTION}" = add ]
then
if [ ${verbose} -eq 1 ]
then
log "Exiting: \$ACTION != \"add\""
fi
end
fi
if [ "${DEVPATH:7:3}" = "dm-" ]
then
log "Getting ${DEVPATH} major and minor number"
devMajorMinor=$(</sys${DEVPATH}/dev)
if [ -z ${devMajorMinor} ]
then
die "Getting ${DEVPATH} major and minor number failed"
else
log "${DEVPATH} major:minor = ${devMajorMinor}"
fi
log "Getting ${DEVPATH} alias"
count=0
devAlias="none"
while [ ! -b ${devAlias} ] && [ ${count} -le ${timeout} ]
do
devAlias="/dev/mapper/$(devmap_name ${devMajorMinor})"
if [ ${count} -ne 0 ]
then
sleep 1
fi
count=$(($count + 1))
done
if [ ${count} -gt ${timeout} ]
then
msg="Getting ${DEVPATH} alias failed (Found ${devAlias}, but"
msg="${msg} this isn't a block device)"
die "${msg}"
else
log "${DEVPATH} alias = ${devAlias}"
fi
exe "/sbin/kpartx -v -a ${devAlias}"
else
log "Checking/Creating multipath device-mapper table with multipath-tool"
exe "multipath -v0 ${DEVNAME}"
fi
#--- /etc/dev.d/block/multipath.dev - End ---
--
Simon
gistolero@gmx.de
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: Problems with multipathd
2005-09-06 16:46 ` gistolero
@ 2005-09-06 20:47 ` christophe varoqui
2005-09-12 15:52 ` gistolero
0 siblings, 1 reply; 5+ messages in thread
From: christophe varoqui @ 2005-09-06 20:47 UTC (permalink / raw)
To: device-mapper development
On mar, 2005-09-06 at 18:46 +0200, gistolero@gmx.de wrote:
> Hi,
> > Lower the timeouts in your Qlogic driver.
>
> ===> I found some settings in /sys/module/qla2xxx/parameters/...,
> but most of them are read-only values. I have changed ql2xretrycount
> and ql2xsuspendcount but without success. Any suggestions for
> this driver?
>
Here are the interesting one I guess.
[root@s64p17bibro ~]# find /sys/class/ -name "*tmo*"
/sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo
/sys/class/scsi_host/host1/lpfc_nodev_tmo
>
> I.) --- udev and udevstart ---
>
>
> > Default udev.rules file has a directive to ignore dm-*
> > Something like :
> > KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device"
> >
> > /etc/udev/rules.d/20-multipath.rules is useless unless you you comment
> > out this rule.
>
> I have commented this line, but udev still has difficulties to create this
> links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
> is attached at the end of this post) and added debug messages. The most
> important modification is that kpartx uses the block-device-files in
> /dev/mapper/... instead of /dev/...
> ===> Why isn't that the default? Are there any disadvantages?
>
Not really. All distributors seem to have their own ideas about naming
policies. You should ask about, and follow the Gentoo philosophy I
guess.
>
> ===> Without "udevstart" udev doesn't create the /dev/150gb*
> links! Is this a udev bug?
>
You can still identify the udev problems keeping the node creation
in /dev/. Maybe all path setupis done in the initrd/initramfs without
multipath being able to react.
> Yeah! Version 0.4.5 creates a pid and a socket file :-)
> It's important that I start "multipath /dev/sda" _before_
> multipathd! If I change this order, multipathd does nothing.
> /var/log/messages shows "tick", "map garbage collection"
> etc. and nothing about /dev/sda or /dev/sdb. It seems that
> multipathd doesn't read the device-mapper table at startup.
> ===> Is this behavior ok?
>
No, and I don't reproduce this behaviour here.
<no map loaded>
[root@s64p17bibro multipath-tools-0.4.5]# multipathd -d
path checkers start up
<run /sbin/multipath>
device-mapper ioctl cmd 12 failed: No such device or address
ema4_lun1: event checker started
8:16: hp_sw checker reports path is up
8:16: reinstated
8:48: hp_sw checker reports path is ghost
8:48: reinstated
> ...
>
>
> ===> First multipathd says "8:0: tur checker reports
> path is down" and multipath prints sda "failed" (ok).
> After a few seconds sda is "ready" and multipathd says
> "8:0: tur checker reports path is up"?! I have changed
> nothing during this time.
>
Maybe the checker is confused by the long timeouts.
Worth another try after the lowering.
> *** Enabling san-switch port from HBA-1 ***
>
>
> testhalde2 ~ # multipath -ll
> testhalde2 ~ # dmsetup table
> 150gb1: 0 64197 linear 254:0 63
> 150gb2: 0 314504505 linear 254:0 64260
>
>
> testhalde2 ~ # less /var/log/messages
> ...
> multipathd: tick
> kernel: qla2300 0000:03:01.0: LIP reset occured (f7f7).
> kernel: qla2300 0000:03:01.0: LOOP UP detected (2 Gbps).
> ...
> kernel: SCSI device sdc: drive cache: write through
> kernel: sdc: sdc1 sdc2
> kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1
> kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1, type 0
> scsi.agent[11856]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1
> /etc/dev.d/multipath.dev (11909): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block
> /etc/dev.d/multipath.dev (11909): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (11909): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (11909): Checking/Creating multipath device-mapper table with multipath-tool
> /etc/dev.d/multipath.dev (11909): multipath -v0 /dev/sdc
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> multipathd: tick
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> last message repeated 2 times
> multipathd: map garbage collection
> kernel: device-mapper: dm-multipath: error getting device
> kernel: device-mapper: error adding target to table
> /etc/dev.d/multipath.dev (11944): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block
> /etc/dev.d/multipath.dev (11944): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (11944): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (11959): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block
> /etc/dev.d/multipath.dev (11959): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (11959): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (11959): Checking/Creating multipath device-mapper table with multipath-tool
> /etc/dev.d/multipath.dev (11959): multipath -v0 /dev/sdc1
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> logger: /etc/dev.d/multipath.dev (11944): Checking/Creating multipath device-mapper table with multipath-tool
> logger: /etc/dev.d/multipath.dev (11944): multipath -v0 /dev/sdc2
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> last message repeated 2 times
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> multipathd: tick
> kernel: device-mapper: dm-multipath: error getting device
> kernel: device-mapper: error adding target to table
> kernel: device-mapper: device doesn't appear to be in the dev hash table.
> multipathd: tick
> multipathd: map garbage collection
> multipathd: 150gb: remove dead map
> multipathd: 150gb: reap event checker
> multipathd: 8:0 is orphaned
> multipathd: 8:16 is orphaned
> multipathd: SIGHUP received
> multipathd: tick
> multipathd: tick
> /etc/dev.d/multipath.dev (12002): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-3, $ACTION=remove, $@=block
> /etc/dev.d/multipath.dev (12002): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (12002): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (12002): Exiting: $ACTION != "add"
> /etc/dev.d/multipath.dev (12018): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-4, $ACTION=remove, $@=block
> /etc/dev.d/multipath.dev (12018): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (12018): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (12018): Exiting: $ACTION != "add"
> multipathd: tick
> ...
>
>
> ===> An error occurs while device-mapper tries to update
> the dm-table and deletes the "150gb" entry.
>
Seems to point to a kernel issue.
FYI, /bin/multipath alone tries to update the map.
>
>
> III.) --- Using multipath-tools without multipath ---
>
> Now, I try the same _without_ starting multipathd...
>
> ===> Multipathing seems to work without but not with multipathd.
> It's very slow, but Christophe Varoqui wrote that I have to lower
> the HBA timeouts (unfortunately, I don't know how to do this,
> see above). Does I really need multipathd? I suppose so :-)
>
multipathd is needed to reinstate paths.
In your case the rport disappears and reappears so the mecanism is all
hotplug-driven and thus may work without the daemon ... if memory
ressources permits hotplug and multipath(8) execution, that is.
>
Regards,
--
christophe varoqui <christophe.varoqui@free.fr>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with multipathd
2005-09-06 20:47 ` christophe varoqui
@ 2005-09-12 15:52 ` gistolero
0 siblings, 0 replies; 5+ messages in thread
From: gistolero @ 2005-09-12 15:52 UTC (permalink / raw)
To: device-mapper development
>>===> I found some settings in /sys/module/qla2xxx/parameters/...,
>>but most of them are read-only values. I have changed ql2xretrycount
>>and ql2xsuspendcount but without success. Any suggestions for
>>this driver?
>>
>
> Here are the interesting one I guess.
>
> [root@s64p17bibro ~]# find /sys/class/ -name "*tmo*"
> /sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo
> /sys/class/scsi_host/host1/lpfc_nodev_tmo
Ok, I have a 6 seconds timeout now :-)
>>I have commented this line, but udev still has difficulties to create this
>>links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
>>is attached at the end of this post) and added debug messages. The most
>>important modification is that kpartx uses the block-device-files in
>>/dev/mapper/... instead of /dev/...
>>===> Why isn't that the default? Are there any disadvantages?
>>
>
> Not really. All distributors seem to have their own ideas about naming
> policies. You should ask about, and follow the Gentoo philosophy I
> guess.
I'm sure of not beeing the only one who has problems with missing /dev/...
links. It's possible that multipath installs a device-mapper table without
errors, but kpartx fails because udev doesn't create links in /dev/... So, I
think multipath.dev should execute kpartx with /dev/mapper/... instead of
/dev/... by default.
>>===> Without "udevstart" udev doesn't create the /dev/150gb*
>>links! Is this a udev bug?
>>
> You can still identify the udev problems keeping the node creation
> in /dev/. Maybe all path setupis done in the initrd/initramfs without
> multipath being able to react.
multipath is able to react. I don't understand why I have to execute udevstart.
>>===> First multipathd says "8:0: tur checker reports
>>path is down" and multipath prints sda "failed" (ok).
>>After a few seconds sda is "ready" and multipathd says
>>"8:0: tur checker reports path is up"?! I have changed
>>nothing during this time.
>>
>
> Maybe the checker is confused by the long timeouts.
> Worth another try after the lowering.
After lowering the timeouts to 6 seconds multipathd shows the same behavior.
>>===> Multipathing seems to work without but not with multipathd.
>>It's very slow, but Christophe Varoqui wrote that I have to lower
>>the HBA timeouts (unfortunately, I don't know how to do this,
>>see above). Does I really need multipathd? I suppose so :-)
>>
>
> multipathd is needed to reinstate paths.
> In your case the rport disappears and reappears so the mecanism is all
> hotplug-driven and thus may work without the daemon ... if memory
> ressources permits hotplug and multipath(8) execution, that is.
What do you means with "In your case..."? Because 2.6 and udev are
multipath-tools dependencies all systems running multipath have the same
environment. They all use kernel 2.6 and udev, that is hotplug-driven. The
kernel starts this hotplug process and udev executes multipath. Sorry, but I
have to ask again: Does we really need multipathd?
After lowering dev_loss_tmo timeouts and stopping multipathd I have a working
multipath environment :-))) I tested this with a little perl script and a
mysql database:
My trafficmaker-host executed this script 27 times (parallel):
...
for(my $count=1;$count<=1000000;$count++)
{
...
my $sql="INSERT INTO $table VALUES($id,\"$value\")";
my $return=$dbh->do($sql);
...
}
...
{
my $sql="SELECT COUNT(*) FROM $table WHERE id=$id";
my $sth=$dbh->prepare($sql);
my $return=$sth->execute();
...
$selectCount=$sth->fetchrow_array();
...;
}
The database host had to insert this 30 byte strings and I have started some
copy-jobs (cp -a /usr/* /partition_mounted_with_multipath/ etc.) to increase
the I/O load. During this test I have disabled and enabled the different
HBA-Switch-Ports with the following result: It took 6 to 15 seconds before
"multipath -l" showed that a path is down (15 seconds because the host had a
30.0 CPU load and responded very slowly), but no INSERT got lost :-)))
But sometimes multipath seems to be a bit confused...
1.) one path disabled
In the majority of cases multipath prints...
testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ #:#:#:# 8:0 [active]
\_ 1:0:0:1 sdb 8:16 [active]
But sometimes I get...
testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 4:0:0:1 sdb 8:16 [active]
2.) all paths enabled (default)
In the majority of cases multipath prints...
testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdb 8:16 [active]
\_ 0:0:0:1 sdc 8:32 [active]
But sometimes I get...
testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sdb 8:16 [active]
\_ round-robin 0 [enabled]
\_ 4:0:0:1 sdc 8:32 [active]
Regards
Simon
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-09-12 15:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-31 15:29 Problems with multipathd Simon
2005-08-31 19:56 ` christophe varoqui
2005-09-06 16:46 ` gistolero
2005-09-06 20:47 ` christophe varoqui
2005-09-12 15:52 ` gistolero
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.