* Problems with multipathd
@ 2005-08-31 15:29 Simon
2005-08-31 19:56 ` christophe varoqui
0 siblings, 1 reply; 5+ messages in thread
From: Simon @ 2005-08-31 15:29 UTC (permalink / raw)
To: dm-devel
Hi,
I am trying to use multipath to provide a single block device for a
multipathed LUN for failover reasons. After some days of installation,
documentation reading and debugging I have solved a lot of problems but not
all and I need some help. I know it's a lot of text (sorry!!!), but I think
it's necessary to describe my problems.
I have marked my questions/comments with "===>". Please answer to this notes.
Thank you.
1.) *** System Description ***
Storage:
- Storage EVA-3000
- Controller-B connected to fabric-A and fabric-B
- one VDisk presented to host testhalde2 via controller-B to fabric-A and -B
Server (testhalde2):
- 1x HBA Qlogic 2340 connected to fabric-A
- 1x HBA Qlogic 2340 connected to fabric-B
- Kernel 2.6.12.5 (vanilla, gentoo)
- device-mapper-1.01.03, udev-058, multipath-tools-0.4.4
testhalde2 tmp # dmesg | fgrep device-mapper
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
device-mapper: dm-multipath version 1.0.4 loaded
device-mapper: dm-round-robin version 1.0.0 loaded
testhalde2 tmp # lsmod
Module Size Used by
qla2300 123904 0
qla2xxx 88208 4 qla2300
scsi_transport_fc 26880 1 qla2xxx
testhalde2 etc # cat multipath.conf
defaults {
multipath_tool "/sbin/multipath -v 0 -S"
udev_dir /dev
polling_interval 10
default_selector "round-robin 0"
default_path_grouping_policy failover
default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
default_prio_callout "/bin/false"
r_min_io 100
}
blacklist {
wwid 26353900f02796769
devnode "(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "hd[a-z][[0-9]*]"
devnode "cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
multipath {
wwid 3600508b40010079d0001900000460000
alias 150gb
path_grouping_policy failover
path_selector "round-robin 0"
}
}
devices {
device {
vendor "HP "
product "HSV100 "
path_grouping_policy multibus
path_checker tur
prio_callout "/sbin/pp_balance_units %d"
}
}
testhalde2 etc # cat /etc/udev/rules.d/20-multipath.rules
KERNEL="dm-[0-9]*", PROGRAM="/sbin/devmap_name %M %m", NAME="%k", SYMLINK="%c"
testhalde2 ~ # cat /etc/dev.d/block/multipath.dev
#!/bin/sh -e
print()
{
echo "`date +%H%M%S` - $1" >> /tmp/devd_multipath
}
print "ENV_ACTION: $ACTION" # debugging
if [ ! "${ACTION}" = add ] ; then
exit
fi
if [ "${DEVPATH:7:3}" = "dm-" ] ; then
dev=$(</sys${DEVPATH}/dev)
map=$(/sbin/devmap_name $dev)
print "KPARTX $map" # debugging
/sbin/kpartx -v -a /dev/$map >> /tmp/devd_multipath
else
print "ENV_DEVNAME: ${DEVNAME}" # debugging
/sbin/multipath ${DEVNAME}
fi
2.) *** Multipath in action ***
After rebooting testhalde2, I see the following:
testhalde2 tmp # ls /sys/block/
dm-0 loop0 loop3 loop6 ram1 ram12 ram15 ram4 ram7 sda
fd0 loop1 loop4 loop7 ram10 ram13 ram2 ram5 ram8 sdb
hda loop2 loop5 ram0 ram11 ram14 ram3 ram6 ram9
testhalde2 tmp # ls -lF /dev/mapper/
total 0
brw------- 1 root root 254, 0 Aug 31 12:20 150gb
crw-rw---- 1 root root 10, 63 Aug 31 2005 control
testhalde2 ~ # fdisk -l /dev/mapper/150gb
Disk /dev/mapper/150gb: 161.0 GB, 161061273600 bytes
255 heads, 63 sectors/track, 19581 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mapper/150gb doesn't contain a valid partition table
===> Is it possible to _use_ partitions on this device? I know that it is
possible to create them, but what is the device-name (/dev/...) from
partition 1?
testhalde2 ~ # mkreiserfs /dev/mapper/150gb
mkreiserfs 3.6.19 (2003 www.namesys.com)
...
ReiserFS is successfully created on /dev/mapper/150gb.
testhalde2 ~ #
testhalde2 ~ # mount /dev/mapper/150gb /mnt/test/
testhalde2 ~ # touch /mnt/test/file # ok
testhalde2 ~ # rm /mnt/test/file # ok
testhalde2 rules.d # udevtest /sys/block/dm-0 block
udevtest.c: looking at device '/block/dm-0' from subsystem 'block'
udevtest.c: opened class_dev->name='dm-0'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, added symlink '%c'
udev_rules.c: add symlink '150gb'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, 'dm-0' becomes '%k'
udev_rules.c: configured rule in '/etc/udev/rules.d/50-udev.rules[63]' applied, '
dm-0' is ignored
testhalde2 tmp # ls -lF /dev/1*
ls: /dev/1*: No such file or directory
===> udevtest shows that udev reads the 20-multipath.rules rule. Why doesn't
udev creates /dev/150gb?
testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 0:0:0:1 sda 8:0 [ready ][active]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
testhalde2 tmp # dmsetup table
150gb: 0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1
1 8:16 1000
testhalde2 tmp # cat devd_multipath # multipath.dev debugging output
...
142037 - ENV_DEVPATH: ram
142037 - ENV_DEVNAME: /dev/rd/9
142046 - ENV_ACTION: add
142046 - ENV_DEVPATH: sda
142046 - ENV_DEVNAME: /dev/sda
122045 - ENV_ACTION: add
122045 - ENV_DEVPATH: sdb
122045 - ENV_DEVNAME: /dev/sdb
testhalde2 tmp # fgrep dm devd_multipath
testhalde2 tmp #
===> My idea from the sysfs/device-mapper/udev/multipath cooperation is the
following: After loading the hba module qla2300 the kernel creates
/sys/block/sda and /sys/block/sdb und executes udevsend - udevd - udev. udev
invokes /etc/dev.d/block/multipath.dev (ADD, sda/sdb). multipath.dev executes
multipath that creates the device-mapper table and the device-mapper device
/sys/block/dm-0. Ok, now we have a new device (dm-0). And again: udevsend -
udevd - udev and multipath.dev (ADD, dm-0). multipath.dev should start kpartx,
but the debug file /tmp/devd_multipath shows nothing! So, I think kpartx will
never started. Is this behavior ok? It seems to work without kpartx, so I don't
understand why I need this tool.
testhalde2 ~ # multipath -v3
fd0 blacklisted
ram0 blacklisted
ram1 blacklisted
ram2 blacklisted
ram3 blacklisted
ram4 blacklisted
ram5 blacklisted
ram6 blacklisted
ram7 blacklisted
ram8 blacklisted
ram9 blacklisted
ram10 blacklisted
ram11 blacklisted
ram12 blacklisted
ram13 blacklisted
ram14 blacklisted
ram15 blacklisted
loop0 blacklisted
loop1 blacklisted
loop2 blacklisted
loop3 blacklisted
loop4 blacklisted
loop5 blacklisted
loop6 blacklisted
loop7 blacklisted
hda blacklisted
path sda not found in pathvec
===== path sda =====
vendor = HP
:
product = HSV100
rev = 3025
dev_t = 8:0
size = 314572800
h:b:t:l = 0:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
path sdb not found in pathvec
===== path sdb =====
vendor = HP
product = HSV100
rev = 3025
dev_t = 8:16
size = 314572800
h:b:t:l = 1:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
dm-0 blacklisted
#
# all paths :
#
3600508b40010079d0001900000460000 0:0:0:1 sda 8:0 [ready ][HSV100 ]
3600508b40010079d0001900000460000 1:0:0:1 sdb 8:16 [ready ][HSV100 ]
params = 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000
status = 1 0 0 2 1 A 0 1 0 8:0 A 0 E 0 1 0 8:16 A 0
pgpolicy = failover (LUN setting)
selector = round-robin 0 (LUN setting)
features = 0 (internal default)
hwhandler = 0 (internal default)
0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1
000
action preset to 0
action set to 1
cannot signal daemon, pidfile not found
testhalde2 ~ #
testhalde2 ~ # ps ax | fgrep multipathd
10870 pts/0 SL 0:00 multipathd
10871 pts/0 SL 0:00 multipathd
10872 pts/0 SL 0:00 multipathd
10875 pts/0 S+ 0:00 fgrep multipathd
testhalde2 ~ # ls /var/run/multipathd.pid
ls: /var/run/multipathd.pid: No such file or directory
===> Does the system really need _three_ multipathd daemons and why is
there no pid file?
testhalde2 ~ # echo 10870 > /var/run/multipathd.pid
testhalde2 ~ # strace -f -p 10870 >/tmp/strace_multipatd 2>&1 &
[1] 11192
Now, I disable HBA-fabric-B port on the san-switch...
testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
open class /sys/block/sdc failed: No such file or directory
error calling out /sbin/scsi_id -g -u -s /block/sdc
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 ~ # multipath -l # again
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:0 8:32 [undef ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0 SL 0:00 multipathd
10872 pts/0 SL 0:00 multipathd
10870 pts/0 SL 0:00 multipathd
11534 pts/0 S+ 0:00 fgrep multipathd
testhalde2 tmp # cat strace_multipatd
Process 10870 attached - interrupt to quit
testhalde2 tmp #
===> No output in the strace-debug file from multipathd. It seems that
multipathd don't recognize the changes.
Enabling HBA-fabric-B port on the san-switch...
testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 1:0:0:1 sdb 8:16 [ready ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
Disabling HBA-fabric-A port on the other san-switch...
testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
1:0:0:1: cannot open /tmp/scsi-maj8-min16-11665: No such device or address
error calling out /sbin/scsi_id -g -u -s /block/sdb
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
\_ 1:0:0:1 sdb 8:16 [faulty][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 tmp # multipath -l # again
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
\_ 0:0:0:0 8:16 [undef ][active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
===> Why do I get the "error calling out..." error only when I disable the
HBA-port from _fabric-A_?
Enabling HBA-fabric-A port...
testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
\_ 0:0:0:1 sdc 8:32 [ready ][active]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sda 8:0 [ready ][active]
testhalde2 tmp # touch /mnt/test/test # ok
testhalde2 tmp # rm /mnt/test/test # ok
testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0 SL 0:00 multipathd
10872 pts/0 SL 0:00 multipathd
10870 pts/0 SL 0:00 multipathd
11534 pts/0 S+ 0:00 fgrep multipathd
testhalde2 tmp # cat strace_multipatd
Process 10870 attached - interrupt to quit
testhalde2 tmp #
===> Again: No output in the strace-debug file from multipathd.
SUMMARY:
========
The failover mechanism seems to work, but it's very very slow (>= 35 sec).
I am sure that the host will die when I have a lot of I/O's in this moment.
The documentation says that multipathd "is in charge of checking the paths
in case they come up or down" and multipathd seems to do nothing... I think
that is the problem... What do you mean?
Thanks a lot for your help
Simon
--
Simon
gistolero@gmx.de
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Problems with multipathd 2005-08-31 15:29 Problems with multipathd Simon @ 2005-08-31 19:56 ` christophe varoqui 2005-09-06 16:46 ` gistolero 0 siblings, 1 reply; 5+ messages in thread From: christophe varoqui @ 2005-08-31 19:56 UTC (permalink / raw) To: gistolero, device-mapper development On mer, 2005-08-31 at 17:29 +0200, Simon wrote: > > testhalde2 tmp # ls -lF /dev/mapper/ > total 0 > brw------- 1 root root 254, 0 Aug 31 12:20 150gb > crw-rw---- 1 root root 10, 63 Aug 31 2005 control > No /dev/150gb node :) ? /etc/udev/rules.d/20-multipath.rules should create it, see below. > > testhalde2 ~ # fdisk -l /dev/mapper/150gb > Disk /dev/mapper/150gb: 161.0 GB, 161061273600 bytes > 255 heads, 63 sectors/track, 19581 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > Disk /dev/mapper/150gb doesn't contain a valid partition table > > > ===> Is it possible to _use_ partitions on this device? I know that it is > possible to create them, but what is the device-name (/dev/...) from > partition 1? > A little bit harder, but I guess so : - remove the multipath map - partition a path (/dev/sda for example) - re-create the multipath map through '/sbin/multipath /dev/sda' > testhalde2 rules.d # udevtest /sys/block/dm-0 block > udevtest.c: looking at device '/block/dm-0' from subsystem 'block' > udevtest.c: opened class_dev->name='dm-0' > udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie > d, added symlink '%c' > udev_rules.c: add symlink '150gb' > udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie > d, 'dm-0' becomes '%k' > udev_rules.c: configured rule in '/etc/udev/rules.d/50-udev.rules[63]' applied, ' > dm-0' is ignored > Default udev.rules file has a directive to ignore dm-* Something like : KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device" /etc/udev/rules.d/20-multipath.rules is useless unless you you comment out this rule. > > testhalde2 tmp # ls -lF /dev/1* > ls: /dev/1*: No such file or directory > > > ===> udevtest shows that udev reads the 20-multipath.rules rule. Why doesn't > udev creates /dev/150gb? > See above > > testhalde2 tmp # multipath -l > 150gb (3600508b40010079d0001900000460000) > [size=150 GB][features="0"][hwhandler="0"] > \_ round-robin 0 [enabled][first] > \_ 0:0:0:1 sda 8:0 [ready ][active] > \_ round-robin 0 [enabled] > \_ 1:0:0:1 sdb 8:16 [ready ][active] > > > testhalde2 tmp # dmsetup table > 150gb: 0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 > 1 8:16 1000 > > > testhalde2 tmp # cat devd_multipath # multipath.dev debugging output > ... > 142037 - ENV_DEVPATH: ram > 142037 - ENV_DEVNAME: /dev/rd/9 > 142046 - ENV_ACTION: add > 142046 - ENV_DEVPATH: sda > 142046 - ENV_DEVNAME: /dev/sda > 122045 - ENV_ACTION: add > 122045 - ENV_DEVPATH: sdb > 122045 - ENV_DEVNAME: /dev/sdb > > > testhalde2 tmp # fgrep dm devd_multipath > testhalde2 tmp # > > > ===> My idea from the sysfs/device-mapper/udev/multipath cooperation is the > following: After loading the hba module qla2300 the kernel creates > /sys/block/sda and /sys/block/sdb und executes udevsend - udevd - udev. udev > invokes /etc/dev.d/block/multipath.dev (ADD, sda/sdb). multipath.dev executes > multipath that creates the device-mapper table and the device-mapper device > /sys/block/dm-0. Ok, now we have a new device (dm-0). And again: udevsend - > udevd - udev and multipath.dev (ADD, dm-0). multipath.dev should start kpartx, > but the debug file /tmp/devd_multipath shows nothing! So, I think kpartx will > never started. Is this behavior ok? It seems to work without kpartx, so I don't > understand why I need this tool. > > kpartx is triggered for dm-* "adds" only by multipath.dev hotplug script. *and* the script expects the node to be in /dev/ (not /dev/mapper/). This problem is linked to the previous one. > > testhalde2 ~ # ps ax | fgrep multipathd > 10870 pts/0 SL 0:00 multipathd > 10871 pts/0 SL 0:00 multipathd > 10872 pts/0 SL 0:00 multipathd > 10875 pts/0 S+ 0:00 fgrep multipathd > > testhalde2 ~ # ls /var/run/multipathd.pid > ls: /var/run/multipathd.pid: No such file or directory > > ===> Does the system really need _three_ multipathd daemons and why is > there no pid file? > I don't know default ps/nptl Gentoo choice, but it might well be the different threads you see there. Consecutive PID numbers are a sign. > > testhalde2 ~ # echo 10870 > /var/run/multipathd.pid > testhalde2 ~ # strace -f -p 10870 >/tmp/strace_multipatd 2>&1 & > [1] 11192 > Don't debug this way. Use 'multipathd -v4' and see the log or 'strace -f multipathd' > > > > Now, I disable HBA-fabric-B port on the san-switch... > > testhalde2 ~ # multipath -l > [ sleeping 35 seconds ] > open class /sys/block/sdc failed: No such file or directory > error calling out /sbin/scsi_id -g -u -s /block/sdc > 150gb (3600508b40010079d0001900000460000) > [size=150 GB][features="0"][hwhandler="0"] > \_ round-robin 0 [enabled][first] > \_ 1:0:0:1 sdb 8:16 [ready ][active] > \_ round-robin 0 [enabled] > \_ 0:0:0:1 sdc 8:32 [ready ][active] > > testhalde2 ~ # multipath -l # again > 150gb (3600508b40010079d0001900000460000) > [size=150 GB][features="0"][hwhandler="0"] > \_ round-robin 0 [enabled][first] > \_ 1:0:0:1 sdb 8:16 [ready ][active] > \_ round-robin 0 [enabled] > \_ 0:0:0:0 8:32 [undef ][active] > Lower the timeouts in your Qlogic driver. > testhalde2 tmp # touch /mnt/test/test # ok > testhalde2 tmp # rm /mnt/test/test # ok > > testhalde2 tmp # ps ax | fgrep multipathd > 10871 pts/0 SL 0:00 multipathd > 10872 pts/0 SL 0:00 multipathd > 10870 pts/0 SL 0:00 multipathd > 11534 pts/0 S+ 0:00 fgrep multipathd > > testhalde2 tmp # cat strace_multipatd > Process 10870 attached - interrupt to quit > testhalde2 tmp # > > ===> No output in the strace-debug file from multipathd. It seems that > multipathd don't recognize the changes. > Do the log agree with that ? > > Enabling HBA-fabric-B port on the san-switch... > > testhalde2 tmp # multipath -l > 150gb (3600508b40010079d0001900000460000) > [size=150 GB][features="0"][hwhandler="0"] > \_ round-robin 0 [enabled][first] > \_ 1:0:0:1 sdb 8:16 [ready ][active] > \_ round-robin 0 [enabled] > \_ 0:0:0:1 sdc 8:32 [ready ][active] > > testhalde2 tmp # touch /mnt/test/test # ok > testhalde2 tmp # rm /mnt/test/test # ok > > > > > Disabling HBA-fabric-A port on the other san-switch... > > testhalde2 ~ # multipath -l > [ sleeping 35 seconds ] > 1:0:0:1: cannot open /tmp/scsi-maj8-min16-11665: No such device or address > error calling out /sbin/scsi_id -g -u -s /block/sdb > error calling out /sbin/pp_balance_units 8:32 > 150gb (3600508b40010079d0001900000460000) > [size=150 GB][features="0"][hwhandler="0"] > \_ round-robin 0 [active][first] > \_ 1:0:0:1 sdb 8:16 [faulty][active] > \_ round-robin 0 [enabled] > \_ 0:0:0:1 sdc 8:32 [ready ][active] > > testhalde2 tmp # multipath -l # again > error calling out /sbin/pp_balance_units 8:32 > 150gb (3600508b40010079d0001900000460000) > [size=150 GB][features="0"][hwhandler="0"] > \_ round-robin 0 [active][first] > \_ 0:0:0:0 8:16 [undef ][active] > \_ round-robin 0 [enabled] > \_ 0:0:0:1 sdc 8:32 [ready ][active] > > testhalde2 tmp # touch /mnt/test/test # ok > testhalde2 tmp # rm /mnt/test/test # ok > > > ===> Why do I get the "error calling out..." error only when I disable the > HBA-port from _fabric-A_? > Your log shows this message when disabling B too. These are scsi_id error messages. > > Enabling HBA-fabric-A port... > > testhalde2 tmp # multipath -l > 150gb (3600508b40010079d0001900000460000) > [size=150 GB][features="0"][hwhandler="0"] > \_ round-robin 0 [enabled][first] > \_ 0:0:0:1 sdc 8:32 [ready ][active] > \_ round-robin 0 [enabled] > \_ 1:0:0:1 sda 8:0 [ready ][active] > > testhalde2 tmp # touch /mnt/test/test # ok > testhalde2 tmp # rm /mnt/test/test # ok > > > SUMMARY: > ======== > > The failover mechanism seems to work, but it's very very slow (>= 35 sec). > I am sure that the host will die when I have a lot of I/O's in this moment. > The documentation says that multipathd "is in charge of checking the paths > in case they come up or down" and multipathd seems to do nothing... I think > that is the problem... What do you mean? > Hope the previous comments clarifies a bit. Also know the 0.4.5 snapshots are largely better suited to the task. Consider upgrading. And consider updating the wiki FAQ with the response you found to be enlightening :/ Regards, -- christophe varoqui <christophe.varoqui@free.fr> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: Problems with multipathd 2005-08-31 19:56 ` christophe varoqui @ 2005-09-06 16:46 ` gistolero 2005-09-06 20:47 ` christophe varoqui 0 siblings, 1 reply; 5+ messages in thread From: gistolero @ 2005-09-06 16:46 UTC (permalink / raw) To: dm-devel Hi, > Also know the 0.4.5 snapshots are largely better suited to the task. > Consider upgrading. Now, I use multipath-tools-0.4.5, udev-068 and device-mapper-1.01.03. > Lower the timeouts in your Qlogic driver. ===> I found some settings in /sys/module/qla2xxx/parameters/..., but most of them are read-only values. I have changed ql2xretrycount and ql2xsuspendcount but without success. Any suggestions for this driver? I.) --- udev and udevstart --- > Default udev.rules file has a directive to ignore dm-* > Something like : > KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device" > > /etc/udev/rules.d/20-multipath.rules is useless unless you you comment > out this rule. I have commented this line, but udev still has difficulties to create this links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script is attached at the end of this post) and added debug messages. The most important modification is that kpartx uses the block-device-files in /dev/mapper/... instead of /dev/... ===> Why isn't that the default? Are there any disadvantages? testhalde2 ~ # multipath /dev/sda create: 150gb (3600508b40010079d0001900000460000) [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 \_ 0:0:0:1 sda 8:0 [ready] \_ 1:0:0:1 sdb 8:16 [ready] testhalde2 ~ # multipath -ll 150gb (3600508b40010079d0001900000460000) [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sda 8:0 [active][ready] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # dmsetup table 150gb1: 0 64197 linear 254:0 63 150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000 150gb2: 0 314504505 linear 254:0 64260 testhalde2 ~ # ls -lF /dev/mapper/ total 0 brw------- 1 root root 254, 0 Sep 6 15:04 150gb brw------- 1 root root 254, 1 Sep 6 15:04 150gb1 brw------- 1 root root 254, 2 Sep 6 15:04 150gb2 crw-rw---- 1 root root 10, 63 Sep 6 2005 control testhalde2 ~ # ls -lF /dev/1* ls: /dev/1*: No such file or directory testhalde2 ~ # udevstart testhalde2 ~ # ls -lF /dev/1* lrwxrwxrwx 1 root root 4 Sep 6 15:10 /dev/150gb -> dm-0 lrwxrwxrwx 1 root root 4 Sep 6 15:11 /dev/150gb1 -> dm-1 lrwxrwxrwx 1 root root 4 Sep 6 15:11 /dev/150gb2 -> dm-2 ===> Without "udevstart" udev doesn't create the /dev/150gb* links! Is this a udev bug? II.) --- Using multipathd --- testhalde2 ~ # multipathd -v4 testhalde2 ~ # ps ax | fgrep multipathd 11024 pts/1 SL 0:00 multipathd -v4 11025 pts/1 SL 0:00 multipathd -v4 11026 pts/1 SL 0:00 multipathd -v4 11029 pts/1 SL 0:00 multipathd -v4 11030 pts/1 SL 0:00 multipathd -v4 11031 pts/1 SL 0:00 multipathd -v4 11032 pts/1 SL 0:00 multipathd -v4 11071 pts/1 S+ 0:00 fgrep multipathd testhalde2 ~ # cat /var/run/multipathd.pid 11024 Yeah! Version 0.4.5 creates a pid and a socket file :-) It's important that I start "multipath /dev/sda" _before_ multipathd! If I change this order, multipathd does nothing. /var/log/messages shows "tick", "map garbage collection" etc. and nothing about /dev/sda or /dev/sdb. It seems that multipathd doesn't read the device-mapper table at startup. ===> Is this behavior ok? testhalde2 ~ # less /var/log/messages ... /etc/dev.d/multipath.dev (10904): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-0, $ACTION=add, $@=block /etc/dev.d/multipath.dev (10904): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (10904): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (10904): Getting /block/dm-0 major and minor number /etc/dev.d/multipath.dev (10904): /block/dm-0 major:minor = 254:0 /etc/dev.d/multipath.dev (10904): Getting /block/dm-0 alias /etc/dev.d/multipath.dev (10904): /block/dm-0 alias = /dev/mapper/150gb /etc/dev.d/multipath.dev (10904): /sbin/kpartx -v -a /dev/mapper/150gb /etc/dev.d/multipath.dev (10935): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-1, $ACTION=add, $@=block /etc/dev.d/multipath.dev (10935): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (10935): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (10935): Getting /block/dm-1 major and minor number /etc/dev.d/multipath.dev (10935): /block/dm-1 major:minor = 254:1 /etc/dev.d/multipath.dev (10935): Getting /block/dm-1 alias /etc/dev.d/multipath.dev (10963): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-2, $ACTION=add, $@=block /etc/dev.d/multipath.dev (10963): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (10963): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (10963): Getting /block/dm-2 major and minor number /etc/dev.d/multipath.dev (10963): /block/dm-2 major:minor = 254:2 /etc/dev.d/multipath.dev (10963): Getting /block/dm-2 alias /etc/dev.d/multipath.dev (10935): /block/dm-1 alias = /dev/mapper/150gb1 /etc/dev.d/multipath.dev (10935): /sbin/kpartx -v -a /dev/mapper/150gb1 /etc/dev.d/multipath.dev (10963): /block/dm-2 alias = /dev/mapper/150gb2 /etc/dev.d/multipath.dev (10963): /sbin/kpartx -v -a /dev/mapper/150gb2 multipathd: --------start up-------- multipathd: read /etc/multipath.conf multipathd: fd0 blacklisted ... multipathd: hda blacklisted multipathd: path sda not found in pathvec multipathd: ===== path sda ===== multipathd: bus = 1 multipathd: dev_t = 8:0 multipathd: size = 314572800 multipathd: vendor = HP multipathd: product = HSV100 multipathd: rev = 3025 multipathd: h:b:t:l = 0:0:0:1 multipathd: tgt_node_name = 0x50001fe150051d20 multipathd: getuid = /sbin/scsi_id -g -u -s /block/%n (controler setting) multipathd: uid = 3600508b40010079d0001900000460000 (callout) multipathd: path sdb not found in pathvec multipathd: ===== path sdb ===== multipathd: bus = 1 multipathd: dev_t = 8:16 multipathd: size = 314572800 multipathd: vendor = HP multipathd: product = HSV100 multipathd: rev = 3025 multipathd: h:b:t:l = 1:0:0:1 multipathd: tgt_node_name = 0x50001fe150051d20 multipathd: getuid = /sbin/scsi_id -g -u -s /block/%n (controler setting) multipathd: uid = 3600508b40010079d0001900000460000 (callout) multipathd: dm-0 blacklisted multipathd: dm-1 blacklisted multipathd: dm-2 blacklisted multipathd: discovered map 150gb multipathd: *word = 0, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = round-robin, len = 11 multipathd: *word = 0, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 8:0, len = 3 multipathd: *word = 8:16, len = 4 multipathd: *word = 1, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 0, len = 1 multipathd: 8:0 ownership set multipathd: 8:16 ownership set multipathd: pgfailback = -2 (LUN setting) multipathd: 150gb: event checker started multipathd: path checkers start up multipathd: tick multipathd: ===== path sda ===== multipathd: bus = 1 multipathd: dev_t = 8:0 multipathd: size = 314572800 multipathd: vendor = HP multipathd: product = HSV100 multipathd: rev = 3025 multipathd: h:b:t:l = 0:0:0:1 multipathd: tgt_node_name = 0x50001fe150051d20 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: path checker = tur (controler setting) multipathd: 8:0: tur checker reports path is up multipathd: 8:0: reinstated multipathd: *word = 0, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = round-robin, len = 11 multipathd: *word = 0, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 8:0, len = 3 multipathd: *word = 8:16, len = 4 multipathd: *word = 1, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 0, len = 1 multipathd: ===== path sda ===== multipathd: getprio = /bin/true (internal default) multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: ===== path sdb ===== multipathd: getprio = /bin/true (internal default) multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: path prio refresh multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: ===== path sdb ===== multipathd: bus = 1 multipathd: dev_t = 8:16 multipathd: size = 314572800 multipathd: vendor = HP multipathd: product = HSV100 multipathd: rev = 3025 multipathd: h:b:t:l = 1:0:0:1 multipathd: tgt_node_name = 0x50001fe150051d20 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: path checker = tur (controler setting) multipathd: 8:16: tur checker reports path is up multipathd: 8:16: reinstated multipathd: *word = 0, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = round-robin, len = 11 multipathd: *word = 0, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 8:0, len = 3 multipathd: *word = 8:16, len = 4 multipathd: *word = 1, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 0, len = 1 multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: ===== path sdb ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: path prio refresh multipathd: ===== path sdb ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: map garbage collection multipathd: tick last message repeated 5 times multipathd: map garbage collection multipathd: tick last message repeated 4 times multipathd: 8:0: tur checker reports path is up multipathd: 8:0: delay next check 20s multipathd: path prio refresh multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: 8:16: tur checker reports path is up multipathd: 8:16: delay next check 20s multipathd: path prio refresh multipathd: ===== path sdb ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: tick multipathd: map garbage collection multipathd: tick last message repeated 5 times multipathd: map garbage collection multipathd: tick last message repeated 5 times multipathd: map garbage collection multipathd: tick last message repeated 5 times multipathd: map garbage collection multipathd: tick multipathd: tick multipathd: 8:0: tur checker reports path is up multipathd: 8:0: delay next check 40s multipathd: path prio refresh multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: 8:16: tur checker reports path is up multipathd: 8:16: delay next check 40s multipathd: path prio refresh multipathd: ===== path sdb ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: tick ... *** Disabling san-port from HBA-1... *** testhalde2 ~ # multipath -ll [ sleeping 35 seconds ] 150gb (3600508b40010079d0001900000460000) [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sda 8:0 [active][faulty] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # multipath -ll [ sleeping 10 seconds ] failed to open /dev/sda 150gb (3600508b40010079d0001900000460000) [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sda 8:0 [failed][faulty] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # multipath -ll [ sleeping 10 seconds ] failed to open /dev/sda 150gb (3600508b40010079d0001900000460000) [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sda 8:0 [active][ready] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # ls /sys/block/ dm-0 fd0 loop1 loop4 loop7 ram10 ram13 ram2 ram5 ram8 dm-1 hda loop2 loop5 ram0 ram11 ram14 ram3 ram6 ram9 dm-2 loop0 loop3 loop6 ram1 ram12 ram15 ram4 ram7 sdb testhalde2 ~ # dmsetup table 150gb1: 0 64197 linear 254:0 63 150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000 150gb2: 0 314504505 linear 254:0 64260 testhalde2 ~ # less /var/log/messages ... kernel: qla2300 0000:03:01.0: LOOP DOWN detected. multipathd: tick ... kernel: rport-0:0-3: blocked FC remote port time out: removing target multipathd: 8:0: tur checker reports path is down multipathd: checker failed path 8:0 in map 150gb kernel: device-mapper: dm-multipath: Failing path 8:0. multipathd: 150gb: devmap event #2 multipathd: 8:16: tur checker reports path is up multipathd: 8:16: delay next check 40s multipathd: path prio refresh multipathd: ===== path sdb ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: discovered map 150gb multipathd: *word = 0, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = round-robin, len = 11 multipathd: *word = 0, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 8:0, len = 3 multipathd: *word = 8:16, len = 4 multipathd: *word = 1, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = F, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 0, len = 1 multipathd: 8:0 ownership set multipathd: 8:16 ownership set multipathd: pgfailback = -2 (LUN setting) /etc/dev.d/multipath.dev (11579): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda1, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (11579): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11579): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11579): Exiting: $ACTION != "add" /etc/dev.d/multipath.dev (11571): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda2, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (11571): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11571): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11571): Exiting: $ACTION != "add" /etc/dev.d/multipath.dev (11604): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (11604): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11604): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11604): Exiting: $ACTION != "add" multipathd: tick multipathd: map garbage collection multipathd: tick multipathd: tick multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 multipathd: tick last message repeated 3 times multipathd: map garbage collection multipathd: tick last message repeated 3 times multipathd: 8:0: tur checker reports path is up multipathd: 8:0: reinstated multipathd: *word = 0, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = round-robin, len = 11 multipathd: *word = 0, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 8:0, len = 3 multipathd: *word = 8:16, len = 4 multipathd: *word = 1, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 0, len = 1 multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: ===== path sdb ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: path prio refresh multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: 150gb: devmap event #3 multipathd: discovered map 150gb multipathd: *word = 0, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = round-robin, len = 11 multipathd: *word = 0, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = 8:0, len = 3 multipathd: *word = 8:16, len = 4 multipathd: *word = 1, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 2, len = 1 multipathd: *word = 0, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 1, len = 1 multipathd: *word = A, len = 1 multipathd: *word = 0, len = 1 multipathd: 8:0 ownership set multipathd: 8:16 ownership set multipathd: pgfailback = -2 (LUN setting) kernel: scsi0 (0:1): rejecting I/O to dead device multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 multipathd: tick multipathd: tick multipathd: map garbage collection multipathd: tick last message repeated 5 times multipathd: map garbage collection multipathd: tick last message repeated 2 times multipathd: 8:0: tur checker reports path is up multipathd: 8:0: delay next check 20s multipathd: path prio refresh multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) kernel: scsi0 (0:1): rejecting I/O to dead device multipathd: tick last message repeated 2 times multipathd: map garbage collection multipathd: tick multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 multipathd: tick last message repeated 4 times multipathd: map garbage collection multipathd: tick last message repeated 5 times multipathd: map garbage collection multipathd: tick last message repeated 3 times multipathd: 8:16: tur checker reports path is up multipathd: 8:16: delay next check 40s multipathd: path prio refresh multipathd: ===== path sdb ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: tick multipathd: tick multipathd: 8:0: tur checker reports path is up multipathd: 8:0: delay next check 40s multipathd: path prio refresh multipathd: ===== path sda ===== multipathd: prio = 0 multipathd: uid = 3600508b40010079d0001900000460000 (cache) multipathd: map garbage collection kernel: scsi0 (0:1): rejecting I/O to dead device multipathd: tick ... ===> First multipathd says "8:0: tur checker reports path is down" and multipath prints sda "failed" (ok). After a few seconds sda is "ready" and multipathd says "8:0: tur checker reports path is up"?! I have changed nothing during this time. *** Enabling san-switch port from HBA-1 *** testhalde2 ~ # multipath -ll testhalde2 ~ # dmsetup table 150gb1: 0 64197 linear 254:0 63 150gb2: 0 314504505 linear 254:0 64260 testhalde2 ~ # less /var/log/messages ... multipathd: tick kernel: qla2300 0000:03:01.0: LIP reset occured (f7f7). kernel: qla2300 0000:03:01.0: LOOP UP detected (2 Gbps). ... kernel: SCSI device sdc: drive cache: write through kernel: sdc: sdc1 sdc2 kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1 kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1, type 0 scsi.agent[11856]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1 /etc/dev.d/multipath.dev (11909): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block /etc/dev.d/multipath.dev (11909): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11909): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11909): Checking/Creating multipath device-mapper table with multipath-tool /etc/dev.d/multipath.dev (11909): multipath -v0 /dev/sdc multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 multipathd: tick multipathd: tick multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 multipathd: tick last message repeated 2 times multipathd: map garbage collection kernel: device-mapper: dm-multipath: error getting device kernel: device-mapper: error adding target to table /etc/dev.d/multipath.dev (11944): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block /etc/dev.d/multipath.dev (11944): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11944): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11959): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block /etc/dev.d/multipath.dev (11959): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11959): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11959): Checking/Creating multipath device-mapper table with multipath-tool /etc/dev.d/multipath.dev (11959): multipath -v0 /dev/sdc1 multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 logger: /etc/dev.d/multipath.dev (11944): Checking/Creating multipath device-mapper table with multipath-tool logger: /etc/dev.d/multipath.dev (11944): multipath -v0 /dev/sdc2 multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 multipathd: tick last message repeated 2 times multipathd: Got request [dump pathvec] multipathd: *word = dump, len = 4 multipathd: *word = pathvec, len = 7 multipathd: tick multipathd: tick kernel: device-mapper: dm-multipath: error getting device kernel: device-mapper: error adding target to table kernel: device-mapper: device doesn't appear to be in the dev hash table. multipathd: tick multipathd: map garbage collection multipathd: 150gb: remove dead map multipathd: 150gb: reap event checker multipathd: 8:0 is orphaned multipathd: 8:16 is orphaned multipathd: SIGHUP received multipathd: tick multipathd: tick /etc/dev.d/multipath.dev (12002): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-3, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (12002): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (12002): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (12002): Exiting: $ACTION != "add" /etc/dev.d/multipath.dev (12018): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-4, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (12018): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (12018): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (12018): Exiting: $ACTION != "add" multipathd: tick ... ===> An error occurs while device-mapper tries to update the dm-table and deletes the "150gb" entry. III.) --- Using multipath-tools without multipath --- Now, I try the same _without_ starting multipathd... testhalde2 ~ # ps ax | fgrep multipathd testhalde2 ~ # testhalde2 ~ # multipath /dev/sda create: 150gb (3600508b40010079d0001900000460000) [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 \_ 0:0:0:1 sda 8:0 [ready] \_ 1:0:0:1 sdb 8:16 [ready] testhalde2 ~ # multipath -ll 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sda 8:0 [active][ready] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # dmsetup table 150gb1: 0 64197 linear 254:0 63 150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000 150gb2: 0 314504505 linear 254:0 64260 testhalde2 ~ # ls /dev/mapper/ 150gb 150gb1 150gb2 control *** Disabling san-switch port from HBA-1 *** testhalde2 ~ # multipath -ll [ sleeping 35 seconds ] 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sda 8:0 [active][faulty] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # multipath -ll [ sleeping 10 seconds ] 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ #:#:#:# 8:0 [active] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # ls /sys/block/ dm-0 fd0 loop1 loop4 loop7 ram10 ram13 ram2 ram5 ram8 dm-1 hda loop2 loop5 ram0 ram11 ram14 ram3 ram6 ram9 dm-2 loop0 loop3 loop6 ram1 ram12 ram15 ram4 ram7 sdb testhalde2 ~ # dmsetup table 150gb1: 0 64197 linear 254:0 63 150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000 150gb2: 0 314504505 linear 254:0 64260 testhalde2 ~ # less /var/log/messages ... kernel: qla2300 0000:03:01.0: LOOP DOWN detected. kernel: rport-0:0-3: blocked FC remote port time out: removing target /etc/dev.d/multipath.dev (11186): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda1, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (11186): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11186): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11186): Exiting: $ACTION != "add" /etc/dev.d/multipath.dev (11200): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda2, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (11200): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11200): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11200): Exiting: $ACTION != "add" /etc/dev.d/multipath.dev (11217): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda, $ACTION=remove, $@=block /etc/dev.d/multipath.dev (11217): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11217): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11217): Exiting: $ACTION != "add" ... *** Enabling san-switch port from HBA-1 *** testhalde2 ~ # multipath -ll [ sleeping 5 seconds ] 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ #:#:#:# 8:0 [active] \_ 1:0:0:1 sdb 8:16 [active][ready] testhalde2 ~ # multipath -ll 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ 1:0:0:1 sdb 8:16 [active][ready] \_ 0:0:0:1 sdc 8:32 [active][ready] testhalde2 ~ # dmsetup table 150gb1: 0 64197 linear 254:0 63 150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:16 1000 8:32 1000 150gb2: 0 314504505 linear 254:0 64260 testhalde2 ~ # ls /dev/mapper/ 150gb 150gb1 150gb2 control testhalde2 ~ # less /var/log/messages ... kernel: sdc: sdc1 sdc2 kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1 kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1, type 0 scsi.agent[11271]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1 /etc/dev.d/multipath.dev (11325): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block /etc/dev.d/multipath.dev (11325): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11325): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11325): Checking/Creating multipath device-mapper table with multipath-tool /etc/dev.d/multipath.dev (11325): multipath -v0 /dev/sdc /etc/dev.d/multipath.dev (11356): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block /etc/dev.d/multipath.dev (11356): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11356): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11356): Checking/Creating multipath device-mapper table with multipath-tool /etc/dev.d/multipath.dev (11356): multipath -v0 /dev/sdc2 /etc/dev.d/multipath.dev (11377): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block /etc/dev.d/multipath.dev (11377): Logging to local file is enabled: log file is /root/multipath.dev.log /etc/dev.d/multipath.dev (11377): Logging to syslog ist enabled: facility.priority is daemon.info /etc/dev.d/multipath.dev (11377): Checking/Creating multipath device-mapper table with multipath-tool /etc/dev.d/multipath.dev (11377): multipath -v0 /dev/sdc1 ... VI.) --- Summary --- ===> Multipathing seems to work without but not with multipathd. It's very slow, but Christophe Varoqui wrote that I have to lower the HBA timeouts (unfortunately, I don't know how to do this, see above). Does I really need multipathd? I suppose so :-) > And consider updating the wiki FAQ with the response you found to be > enlightening :/ As soon as I have running multipath I will write a step-by-step documentation. Thanks again for your help, Simon #--- My /etc/dev.d/block/multipath.dev - Begin --- #!/bin/sh # log to local file? (0 or 1) # (print log file with "sort $logFile [| less]") logToFile=1 # log to syslog? (0 or 1) logToSyslog=1 # path to log file # (only used if $logToFile==1) logFile="/root/multipath.dev.log" # syslog facility.priority (man syslog.conf) # (only used if $logToSyslog==1) syslogFacPrio="daemon.info" # timeout for getting ${DEVPATH} alias in seconds timeout=10 # be verbose in log file and/or syslog? # (only used if $logToFile==1 and/or $logToSyslog==1) verbose=1 # Don't touch pid=$$ logCount=1; log() { if [ ${logToFile} -eq 1 ] then msg="PID ${pid} - $(date +%Y%m%d-%H%M%S) -" msg="${msg} Log entry ${logCount}: ${1}" echo "${msg}" >> $logFile logCount=$(($logCount + 1)) fi if [ ${logToSyslog} -eq 1 ] then logger -p ${syslogFacPrio} "/etc/dev.d/multipath.dev (${pid}): ${1}" fi } die() { log "DIED WITH ERROR: ${1}" exit 1 } exe() { log "${1}" if ! ${1} then die "\"${1}\" failed (device blacklisted?)" fi } end() { if [ ! -z ${1} ] then log "${1}" fi exit 0 } if [ ${verbose} -eq 1 ] then msg="Parameters: \$0=${0}, \$DEVPATH=${DEVPATH}," msg="${msg} \$ACTION=${ACTION}, \$@=${@}" log "${msg}" if [ ${logToFile} -eq 1 ] then log "Logging to local file is enabled: log file is ${logFile}" fi if [ ${logToSyslog} -eq 1 ] then msg="Logging to syslog is enabled:" msg="${msg} facility.priority is ${syslogFacPrio}" log "${msg}" fi fi if [ ! "${ACTION}" = add ] then if [ ${verbose} -eq 1 ] then log "Exiting: \$ACTION != \"add\"" fi end fi if [ "${DEVPATH:7:3}" = "dm-" ] then log "Getting ${DEVPATH} major and minor number" devMajorMinor=$(</sys${DEVPATH}/dev) if [ -z ${devMajorMinor} ] then die "Getting ${DEVPATH} major and minor number failed" else log "${DEVPATH} major:minor = ${devMajorMinor}" fi log "Getting ${DEVPATH} alias" count=0 devAlias="none" while [ ! -b ${devAlias} ] && [ ${count} -le ${timeout} ] do devAlias="/dev/mapper/$(devmap_name ${devMajorMinor})" if [ ${count} -ne 0 ] then sleep 1 fi count=$(($count + 1)) done if [ ${count} -gt ${timeout} ] then msg="Getting ${DEVPATH} alias failed (Found ${devAlias}, but" msg="${msg} this isn't a block device)" die "${msg}" else log "${DEVPATH} alias = ${devAlias}" fi exe "/sbin/kpartx -v -a ${devAlias}" else log "Checking/Creating multipath device-mapper table with multipath-tool" exe "multipath -v0 ${DEVNAME}" fi #--- /etc/dev.d/block/multipath.dev - End --- -- Simon gistolero@gmx.de ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: Problems with multipathd 2005-09-06 16:46 ` gistolero @ 2005-09-06 20:47 ` christophe varoqui 2005-09-12 15:52 ` gistolero 0 siblings, 1 reply; 5+ messages in thread From: christophe varoqui @ 2005-09-06 20:47 UTC (permalink / raw) To: device-mapper development On mar, 2005-09-06 at 18:46 +0200, gistolero@gmx.de wrote: > Hi, > > Lower the timeouts in your Qlogic driver. > > ===> I found some settings in /sys/module/qla2xxx/parameters/..., > but most of them are read-only values. I have changed ql2xretrycount > and ql2xsuspendcount but without success. Any suggestions for > this driver? > Here are the interesting one I guess. [root@s64p17bibro ~]# find /sys/class/ -name "*tmo*" /sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo /sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo /sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo /sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo /sys/class/scsi_host/host1/lpfc_nodev_tmo > > I.) --- udev and udevstart --- > > > > Default udev.rules file has a directive to ignore dm-* > > Something like : > > KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device" > > > > /etc/udev/rules.d/20-multipath.rules is useless unless you you comment > > out this rule. > > I have commented this line, but udev still has difficulties to create this > links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script > is attached at the end of this post) and added debug messages. The most > important modification is that kpartx uses the block-device-files in > /dev/mapper/... instead of /dev/... > ===> Why isn't that the default? Are there any disadvantages? > Not really. All distributors seem to have their own ideas about naming policies. You should ask about, and follow the Gentoo philosophy I guess. > > ===> Without "udevstart" udev doesn't create the /dev/150gb* > links! Is this a udev bug? > You can still identify the udev problems keeping the node creation in /dev/. Maybe all path setupis done in the initrd/initramfs without multipath being able to react. > Yeah! Version 0.4.5 creates a pid and a socket file :-) > It's important that I start "multipath /dev/sda" _before_ > multipathd! If I change this order, multipathd does nothing. > /var/log/messages shows "tick", "map garbage collection" > etc. and nothing about /dev/sda or /dev/sdb. It seems that > multipathd doesn't read the device-mapper table at startup. > ===> Is this behavior ok? > No, and I don't reproduce this behaviour here. <no map loaded> [root@s64p17bibro multipath-tools-0.4.5]# multipathd -d path checkers start up <run /sbin/multipath> device-mapper ioctl cmd 12 failed: No such device or address ema4_lun1: event checker started 8:16: hp_sw checker reports path is up 8:16: reinstated 8:48: hp_sw checker reports path is ghost 8:48: reinstated > ... > > > ===> First multipathd says "8:0: tur checker reports > path is down" and multipath prints sda "failed" (ok). > After a few seconds sda is "ready" and multipathd says > "8:0: tur checker reports path is up"?! I have changed > nothing during this time. > Maybe the checker is confused by the long timeouts. Worth another try after the lowering. > *** Enabling san-switch port from HBA-1 *** > > > testhalde2 ~ # multipath -ll > testhalde2 ~ # dmsetup table > 150gb1: 0 64197 linear 254:0 63 > 150gb2: 0 314504505 linear 254:0 64260 > > > testhalde2 ~ # less /var/log/messages > ... > multipathd: tick > kernel: qla2300 0000:03:01.0: LIP reset occured (f7f7). > kernel: qla2300 0000:03:01.0: LOOP UP detected (2 Gbps). > ... > kernel: SCSI device sdc: drive cache: write through > kernel: sdc: sdc1 sdc2 > kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1 > kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1, type 0 > scsi.agent[11856]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1 > /etc/dev.d/multipath.dev (11909): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block > /etc/dev.d/multipath.dev (11909): Logging to local file is enabled: log file is /root/multipath.dev.log > /etc/dev.d/multipath.dev (11909): Logging to syslog ist enabled: facility.priority is daemon.info > /etc/dev.d/multipath.dev (11909): Checking/Creating multipath device-mapper table with multipath-tool > /etc/dev.d/multipath.dev (11909): multipath -v0 /dev/sdc > multipathd: Got request [dump pathvec] > multipathd: *word = dump, len = 4 > multipathd: *word = pathvec, len = 7 > multipathd: tick > multipathd: tick > multipathd: Got request [dump pathvec] > multipathd: *word = dump, len = 4 > multipathd: *word = pathvec, len = 7 > multipathd: tick > last message repeated 2 times > multipathd: map garbage collection > kernel: device-mapper: dm-multipath: error getting device > kernel: device-mapper: error adding target to table > /etc/dev.d/multipath.dev (11944): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block > /etc/dev.d/multipath.dev (11944): Logging to local file is enabled: log file is /root/multipath.dev.log > /etc/dev.d/multipath.dev (11944): Logging to syslog ist enabled: facility.priority is daemon.info > /etc/dev.d/multipath.dev (11959): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block > /etc/dev.d/multipath.dev (11959): Logging to local file is enabled: log file is /root/multipath.dev.log > /etc/dev.d/multipath.dev (11959): Logging to syslog ist enabled: facility.priority is daemon.info > /etc/dev.d/multipath.dev (11959): Checking/Creating multipath device-mapper table with multipath-tool > /etc/dev.d/multipath.dev (11959): multipath -v0 /dev/sdc1 > multipathd: Got request [dump pathvec] > multipathd: *word = dump, len = 4 > multipathd: *word = pathvec, len = 7 > logger: /etc/dev.d/multipath.dev (11944): Checking/Creating multipath device-mapper table with multipath-tool > logger: /etc/dev.d/multipath.dev (11944): multipath -v0 /dev/sdc2 > multipathd: Got request [dump pathvec] > multipathd: *word = dump, len = 4 > multipathd: *word = pathvec, len = 7 > multipathd: tick > last message repeated 2 times > multipathd: Got request [dump pathvec] > multipathd: *word = dump, len = 4 > multipathd: *word = pathvec, len = 7 > multipathd: tick > multipathd: tick > kernel: device-mapper: dm-multipath: error getting device > kernel: device-mapper: error adding target to table > kernel: device-mapper: device doesn't appear to be in the dev hash table. > multipathd: tick > multipathd: map garbage collection > multipathd: 150gb: remove dead map > multipathd: 150gb: reap event checker > multipathd: 8:0 is orphaned > multipathd: 8:16 is orphaned > multipathd: SIGHUP received > multipathd: tick > multipathd: tick > /etc/dev.d/multipath.dev (12002): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-3, $ACTION=remove, $@=block > /etc/dev.d/multipath.dev (12002): Logging to local file is enabled: log file is /root/multipath.dev.log > /etc/dev.d/multipath.dev (12002): Logging to syslog ist enabled: facility.priority is daemon.info > /etc/dev.d/multipath.dev (12002): Exiting: $ACTION != "add" > /etc/dev.d/multipath.dev (12018): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-4, $ACTION=remove, $@=block > /etc/dev.d/multipath.dev (12018): Logging to local file is enabled: log file is /root/multipath.dev.log > /etc/dev.d/multipath.dev (12018): Logging to syslog ist enabled: facility.priority is daemon.info > /etc/dev.d/multipath.dev (12018): Exiting: $ACTION != "add" > multipathd: tick > ... > > > ===> An error occurs while device-mapper tries to update > the dm-table and deletes the "150gb" entry. > Seems to point to a kernel issue. FYI, /bin/multipath alone tries to update the map. > > > III.) --- Using multipath-tools without multipath --- > > Now, I try the same _without_ starting multipathd... > > ===> Multipathing seems to work without but not with multipathd. > It's very slow, but Christophe Varoqui wrote that I have to lower > the HBA timeouts (unfortunately, I don't know how to do this, > see above). Does I really need multipathd? I suppose so :-) > multipathd is needed to reinstate paths. In your case the rport disappears and reappears so the mecanism is all hotplug-driven and thus may work without the daemon ... if memory ressources permits hotplug and multipath(8) execution, that is. > Regards, -- christophe varoqui <christophe.varoqui@free.fr> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with multipathd 2005-09-06 20:47 ` christophe varoqui @ 2005-09-12 15:52 ` gistolero 0 siblings, 0 replies; 5+ messages in thread From: gistolero @ 2005-09-12 15:52 UTC (permalink / raw) To: device-mapper development >>===> I found some settings in /sys/module/qla2xxx/parameters/..., >>but most of them are read-only values. I have changed ql2xretrycount >>and ql2xsuspendcount but without success. Any suggestions for >>this driver? >> > > Here are the interesting one I guess. > > [root@s64p17bibro ~]# find /sys/class/ -name "*tmo*" > /sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo > /sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo > /sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo > /sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo > /sys/class/scsi_host/host1/lpfc_nodev_tmo Ok, I have a 6 seconds timeout now :-) >>I have commented this line, but udev still has difficulties to create this >>links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script >>is attached at the end of this post) and added debug messages. The most >>important modification is that kpartx uses the block-device-files in >>/dev/mapper/... instead of /dev/... >>===> Why isn't that the default? Are there any disadvantages? >> > > Not really. All distributors seem to have their own ideas about naming > policies. You should ask about, and follow the Gentoo philosophy I > guess. I'm sure of not beeing the only one who has problems with missing /dev/... links. It's possible that multipath installs a device-mapper table without errors, but kpartx fails because udev doesn't create links in /dev/... So, I think multipath.dev should execute kpartx with /dev/mapper/... instead of /dev/... by default. >>===> Without "udevstart" udev doesn't create the /dev/150gb* >>links! Is this a udev bug? >> > You can still identify the udev problems keeping the node creation > in /dev/. Maybe all path setupis done in the initrd/initramfs without > multipath being able to react. multipath is able to react. I don't understand why I have to execute udevstart. >>===> First multipathd says "8:0: tur checker reports >>path is down" and multipath prints sda "failed" (ok). >>After a few seconds sda is "ready" and multipathd says >>"8:0: tur checker reports path is up"?! I have changed >>nothing during this time. >> > > Maybe the checker is confused by the long timeouts. > Worth another try after the lowering. After lowering the timeouts to 6 seconds multipathd shows the same behavior. >>===> Multipathing seems to work without but not with multipathd. >>It's very slow, but Christophe Varoqui wrote that I have to lower >>the HBA timeouts (unfortunately, I don't know how to do this, >>see above). Does I really need multipathd? I suppose so :-) >> > > multipathd is needed to reinstate paths. > In your case the rport disappears and reappears so the mecanism is all > hotplug-driven and thus may work without the daemon ... if memory > ressources permits hotplug and multipath(8) execution, that is. What do you means with "In your case..."? Because 2.6 and udev are multipath-tools dependencies all systems running multipath have the same environment. They all use kernel 2.6 and udev, that is hotplug-driven. The kernel starts this hotplug process and udev executes multipath. Sorry, but I have to ask again: Does we really need multipathd? After lowering dev_loss_tmo timeouts and stopping multipathd I have a working multipath environment :-))) I tested this with a little perl script and a mysql database: My trafficmaker-host executed this script 27 times (parallel): ... for(my $count=1;$count<=1000000;$count++) { ... my $sql="INSERT INTO $table VALUES($id,\"$value\")"; my $return=$dbh->do($sql); ... } ... { my $sql="SELECT COUNT(*) FROM $table WHERE id=$id"; my $sth=$dbh->prepare($sql); my $return=$sth->execute(); ... $selectCount=$sth->fetchrow_array(); ...; } The database host had to insert this 30 byte strings and I have started some copy-jobs (cp -a /usr/* /partition_mounted_with_multipath/ etc.) to increase the I/O load. During this test I have disabled and enabled the different HBA-Switch-Ports with the following result: It took 6 to 15 seconds before "multipath -l" showed that a path is down (15 seconds because the host had a 30.0 CPU load and responded very slowly), but no INSERT got lost :-))) But sometimes multipath seems to be a bit confused... 1.) one path disabled In the majority of cases multipath prints... testhalde2 sbin # multipath -l 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ #:#:#:# 8:0 [active] \_ 1:0:0:1 sdb 8:16 [active] But sometimes I get... testhalde2 usr # multipath -l 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 4:0:0:1 sdb 8:16 [active] 2.) all paths enabled (default) In the majority of cases multipath prints... testhalde2 sbin # multipath -l 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ 1:0:0:1 sdb 8:16 [active] \_ 0:0:0:1 sdc 8:32 [active] But sometimes I get... testhalde2 usr # multipath -l 150gb () [size=150 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sdb 8:16 [active] \_ round-robin 0 [enabled] \_ 4:0:0:1 sdc 8:32 [active] Regards Simon ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-09-12 15:52 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-08-31 15:29 Problems with multipathd Simon 2005-08-31 19:56 ` christophe varoqui 2005-09-06 16:46 ` gistolero 2005-09-06 20:47 ` christophe varoqui 2005-09-12 15:52 ` gistolero
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.