All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems with multipathd
@ 2005-08-31 15:29 Simon
  2005-08-31 19:56 ` christophe varoqui
  0 siblings, 1 reply; 5+ messages in thread
From: Simon @ 2005-08-31 15:29 UTC (permalink / raw)
  To: dm-devel


Hi,

I am trying to use multipath to provide a single block device for a
multipathed LUN for failover reasons. After some days of installation,
documentation reading and debugging I have solved a lot of problems but not
all and I need some help. I know it's a lot of text (sorry!!!), but I think
it's necessary to describe my problems.

I have marked my questions/comments with "===>". Please answer to this notes.
Thank you.



1.) *** System Description ***



Storage:

- Storage EVA-3000
- Controller-B connected to fabric-A and fabric-B
- one VDisk presented to host testhalde2 via controller-B to fabric-A and -B


Server (testhalde2):

- 1x HBA Qlogic 2340 connected to fabric-A
- 1x HBA Qlogic 2340 connected to fabric-B
- Kernel 2.6.12.5 (vanilla, gentoo)
- device-mapper-1.01.03, udev-058, multipath-tools-0.4.4


testhalde2 tmp # dmesg | fgrep device-mapper
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
device-mapper: dm-multipath version 1.0.4 loaded
device-mapper: dm-round-robin version 1.0.0 loaded



testhalde2 tmp # lsmod
Module                  Size  Used by
qla2300               123904  0 
qla2xxx                88208  4 qla2300
scsi_transport_fc      26880  1 qla2xxx


testhalde2 etc # cat multipath.conf
defaults {
        multipath_tool                  "/sbin/multipath -v 0 -S"
        udev_dir                        /dev
        polling_interval                10
        default_selector                "round-robin 0"
        default_path_grouping_policy    failover
        default_getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        default_prio_callout            "/bin/false"
        r_min_io                        100
}
blacklist {
        wwid 26353900f02796769
        devnode "(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "hd[a-z][[0-9]*]"
        devnode "cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
        multipath {
                wwid                    3600508b40010079d0001900000460000
                alias                   150gb
                path_grouping_policy    failover
                path_selector           "round-robin 0"
        }
}
devices {
        device {
                vendor                  "HP      "
                product                 "HSV100          "
                path_grouping_policy    multibus
                path_checker            tur
                prio_callout            "/sbin/pp_balance_units %d"
        }
}


testhalde2 etc # cat /etc/udev/rules.d/20-multipath.rules 
KERNEL="dm-[0-9]*", PROGRAM="/sbin/devmap_name %M %m", NAME="%k", SYMLINK="%c"


testhalde2 ~ # cat /etc/dev.d/block/multipath.dev
#!/bin/sh -e
print()
{
  echo "`date +%H%M%S` - $1" >> /tmp/devd_multipath
}
print "ENV_ACTION:  $ACTION" # debugging
if [ ! "${ACTION}" = add ] ; then
        exit
fi
if [ "${DEVPATH:7:3}" = "dm-" ] ; then
        dev=$(</sys${DEVPATH}/dev)
        map=$(/sbin/devmap_name $dev)
        print "KPARTX $map" # debugging
        /sbin/kpartx -v -a /dev/$map >> /tmp/devd_multipath
else
        print "ENV_DEVNAME: ${DEVNAME}" # debugging
        /sbin/multipath ${DEVNAME}
fi



2.) *** Multipath in action ***



After rebooting testhalde2, I see the following:


testhalde2 tmp # ls /sys/block/
dm-0  loop0  loop3  loop6  ram1   ram12  ram15  ram4  ram7  sda
fd0   loop1  loop4  loop7  ram10  ram13  ram2   ram5  ram8  sdb
hda   loop2  loop5  ram0   ram11  ram14  ram3   ram6  ram9


testhalde2 tmp # ls -lF /dev/mapper/
total 0
brw-------  1 root root 254,  0 Aug 31 12:20 150gb
crw-rw----  1 root root  10, 63 Aug 31  2005 control


testhalde2 ~ # fdisk -l /dev/mapper/150gb 
Disk /dev/mapper/150gb: 161.0 GB, 161061273600 bytes
255 heads, 63 sectors/track, 19581 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mapper/150gb doesn't contain a valid partition table


===> Is it possible to _use_ partitions on this device? I know that it is
     possible to create them, but what is the device-name (/dev/...) from
     partition 1?


testhalde2 ~ # mkreiserfs /dev/mapper/150gb      
mkreiserfs 3.6.19 (2003 www.namesys.com)
...
ReiserFS is successfully created on /dev/mapper/150gb.
testhalde2 ~ # 


testhalde2 ~ # mount /dev/mapper/150gb /mnt/test/
testhalde2 ~ # touch /mnt/test/file  # ok
testhalde2 ~ # rm /mnt/test/file     # ok


testhalde2 rules.d # udevtest /sys/block/dm-0 block
udevtest.c: looking at device '/block/dm-0' from subsystem 'block'
udevtest.c: opened class_dev->name='dm-0'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, added symlink '%c'
udev_rules.c: add symlink '150gb'
udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
d, 'dm-0' becomes '%k'
udev_rules.c: configured rule in '/etc/udev/rules.d/50-udev.rules[63]' applied, '
dm-0' is ignored


testhalde2 tmp # ls -lF /dev/1*
ls: /dev/1*: No such file or directory


===> udevtest shows that udev reads the 20-multipath.rules rule. Why doesn't
     udev creates /dev/150gb?


testhalde2 tmp # multipath -l   
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 0:0:0:1 sda  8:0     [ready ][active]
\_ round-robin 0 [enabled]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]


testhalde2 tmp # dmsetup table
150gb: 0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 
1 8:16 1000 


testhalde2 tmp # cat devd_multipath    # multipath.dev debugging output
...
142037 - ENV_DEVPATH: ram
142037 - ENV_DEVNAME: /dev/rd/9
142046 - ENV_ACTION:  add
142046 - ENV_DEVPATH: sda
142046 - ENV_DEVNAME: /dev/sda
122045 - ENV_ACTION:  add
122045 - ENV_DEVPATH: sdb
122045 - ENV_DEVNAME: /dev/sdb


testhalde2 tmp # fgrep dm devd_multipath 
testhalde2 tmp # 


===> My idea from the sysfs/device-mapper/udev/multipath cooperation is the
following: After loading the hba module qla2300 the kernel creates
/sys/block/sda and /sys/block/sdb und executes udevsend - udevd - udev. udev
invokes /etc/dev.d/block/multipath.dev (ADD, sda/sdb). multipath.dev executes
multipath that creates the device-mapper table and the device-mapper device
/sys/block/dm-0. Ok, now we have a new device (dm-0). And again: udevsend -
udevd - udev and multipath.dev (ADD, dm-0). multipath.dev should start kpartx,
but the debug file /tmp/devd_multipath shows nothing! So, I think kpartx will
never started. Is this behavior ok? It seems to work without kpartx, so I don't
understand why I need this tool.
     


testhalde2 ~ # multipath -v3
fd0 blacklisted
ram0 blacklisted
ram1 blacklisted
ram2 blacklisted
ram3 blacklisted
ram4 blacklisted
ram5 blacklisted
ram6 blacklisted
ram7 blacklisted
ram8 blacklisted
ram9 blacklisted
ram10 blacklisted
ram11 blacklisted
ram12 blacklisted
ram13 blacklisted
ram14 blacklisted
ram15 blacklisted
loop0 blacklisted
loop1 blacklisted
loop2 blacklisted
loop3 blacklisted
loop4 blacklisted
loop5 blacklisted
loop6 blacklisted
loop7 blacklisted
hda blacklisted
path sda not found in pathvec

===== path sda =====
vendor = HP      
:
product = HSV100          
rev = 3025
dev_t = 8:0
size = 314572800
h:b:t:l = 0:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
path sdb not found in pathvec

===== path sdb =====
vendor = HP      
product = HSV100          
rev = 3025
dev_t = 8:16
size = 314572800
h:b:t:l = 1:0:0:1
tgt_node_name = 0x50001fe150051d20
serial = P66C5E2AAQI010
path checker = tur (controler setting)
state = 2
getprio = /sbin/pp_balance_units %d (controler setting)
prio = 1
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 3600508b40010079d0001900000460000 (callout)
dm-0 blacklisted
#
# all paths :
#
3600508b40010079d0001900000460000 0:0:0:1 sda  8:0     [ready ][HSV100          ]
3600508b40010079d0001900000460000 1:0:0:1 sdb  8:16    [ready ][HSV100          ]
params = 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000 
status = 1 0 0 2 1 A 0 1 0 8:0 A 0 E 0 1 0 8:16 A 0 
pgpolicy = failover (LUN setting)
selector = round-robin 0 (LUN setting)
features = 0 (internal default)
hwhandler = 0 (internal default)
0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1
000
action preset to 0
action set to 1
cannot signal daemon, pidfile not found
testhalde2 ~ # 


testhalde2 ~ # ps ax | fgrep multipathd
10870 pts/0    SL     0:00 multipathd
10871 pts/0    SL     0:00 multipathd
10872 pts/0    SL     0:00 multipathd
10875 pts/0    S+     0:00 fgrep multipathd

testhalde2 ~ # ls /var/run/multipathd.pid
ls: /var/run/multipathd.pid: No such file or directory

===> Does the system really need _three_ multipathd daemons and why is
     there no pid file?


testhalde2 ~ # echo 10870 > /var/run/multipathd.pid
testhalde2 ~ # strace -f -p 10870 >/tmp/strace_multipatd 2>&1 &
[1] 11192




Now, I disable HBA-fabric-B port on the san-switch...

testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
open class /sys/block/sdc failed: No such file or directory
error calling out /sbin/scsi_id -g -u -s /block/sdc
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 ~ # multipath -l       # again
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:0      8:32    [undef ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok

testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0    SL     0:00 multipathd
10872 pts/0    SL     0:00 multipathd
10870 pts/0    SL     0:00 multipathd
11534 pts/0    S+     0:00 fgrep multipathd

testhalde2 tmp # cat strace_multipatd 
Process 10870 attached - interrupt to quit
testhalde2 tmp # 

===> No output in the strace-debug file from multipathd. It seems that
     multipathd don't recognize the changes.
 


Enabling HBA-fabric-B port on the san-switch...

testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 1:0:0:1 sdb  8:16    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok




Disabling HBA-fabric-A port on the other san-switch...

testhalde2 ~ # multipath -l
[ sleeping 35 seconds ]
1:0:0:1: cannot open /tmp/scsi-maj8-min16-11665: No such device or address
error calling out /sbin/scsi_id -g -u -s /block/sdb
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
  \_ 1:0:0:1 sdb  8:16    [faulty][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 tmp # multipath -l             # again
error calling out /sbin/pp_balance_units 8:32
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
  \_ 0:0:0:0      8:16    [undef ][active]
\_ round-robin 0 [enabled]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok


===> Why do I get the "error calling out..." error only when I disable the
     HBA-port from _fabric-A_?


Enabling HBA-fabric-A port...

testhalde2 tmp # multipath -l
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled][first]
  \_ 0:0:0:1 sdc  8:32    [ready ][active]
\_ round-robin 0 [enabled]
  \_ 1:0:0:1 sda  8:0     [ready ][active]

testhalde2 tmp # touch /mnt/test/test    # ok
testhalde2 tmp # rm /mnt/test/test       # ok


testhalde2 tmp # ps ax | fgrep multipathd
10871 pts/0    SL     0:00 multipathd
10872 pts/0    SL     0:00 multipathd
10870 pts/0    SL     0:00 multipathd
11534 pts/0    S+     0:00 fgrep multipathd

testhalde2 tmp # cat strace_multipatd 
Process 10870 attached - interrupt to quit
testhalde2 tmp # 

===> Again: No output in the strace-debug file from multipathd.



SUMMARY:
========

The failover mechanism seems to work, but it's very very slow (>= 35 sec).
I am sure that the host will die when I have a lot of I/O's in this moment.
The documentation says that multipathd "is in charge of checking the paths
in case they come up or down" and multipathd seems to do nothing... I think
that is the problem... What do you mean?


Thanks a lot for your help
Simon


-- 

Simon
gistolero@gmx.de

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problems with multipathd
  2005-08-31 15:29 Problems with multipathd Simon
@ 2005-08-31 19:56 ` christophe varoqui
  2005-09-06 16:46   ` gistolero
  0 siblings, 1 reply; 5+ messages in thread
From: christophe varoqui @ 2005-08-31 19:56 UTC (permalink / raw)
  To: gistolero, device-mapper development

On mer, 2005-08-31 at 17:29 +0200, Simon wrote:

> 
> testhalde2 tmp # ls -lF /dev/mapper/
> total 0
> brw-------  1 root root 254,  0 Aug 31 12:20 150gb
> crw-rw----  1 root root  10, 63 Aug 31  2005 control
> 
No /dev/150gb node :) ?
/etc/udev/rules.d/20-multipath.rules should create it, see below.

> 
> testhalde2 ~ # fdisk -l /dev/mapper/150gb 
> Disk /dev/mapper/150gb: 161.0 GB, 161061273600 bytes
> 255 heads, 63 sectors/track, 19581 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk /dev/mapper/150gb doesn't contain a valid partition table
> 
> 
> ===> Is it possible to _use_ partitions on this device? I know that it is
>      possible to create them, but what is the device-name (/dev/...) from
>      partition 1?
> 
A little bit harder, but I guess so :
- remove the multipath map
- partition a path (/dev/sda for example)
- re-create the multipath map through '/sbin/multipath /dev/sda'

> testhalde2 rules.d # udevtest /sys/block/dm-0 block
> udevtest.c: looking at device '/block/dm-0' from subsystem 'block'
> udevtest.c: opened class_dev->name='dm-0'
> udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
> d, added symlink '%c'
> udev_rules.c: add symlink '150gb'
> udev_rules.c: configured rule in '/etc/udev/rules.d/20-multipath.rules[3]' applie
> d, 'dm-0' becomes '%k'
> udev_rules.c: configured rule in '/etc/udev/rules.d/50-udev.rules[63]' applied, '
> dm-0' is ignored
> 
Default udev.rules file has a directive to ignore dm-*
Something like :
KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device"

/etc/udev/rules.d/20-multipath.rules is useless unless you you comment
out this rule.

> 
> testhalde2 tmp # ls -lF /dev/1*
> ls: /dev/1*: No such file or directory
> 
> 
> ===> udevtest shows that udev reads the 20-multipath.rules rule. Why doesn't
>      udev creates /dev/150gb?
> 
See above
> 
> testhalde2 tmp # multipath -l   
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
>   \_ 0:0:0:1 sda  8:0     [ready ][active]
> \_ round-robin 0 [enabled]
>   \_ 1:0:0:1 sdb  8:16    [ready ][active]
> 
> 
> testhalde2 tmp # dmsetup table
> 150gb: 0 314572800 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 
> 1 8:16 1000 
> 
> 
> testhalde2 tmp # cat devd_multipath    # multipath.dev debugging output
> ...
> 142037 - ENV_DEVPATH: ram
> 142037 - ENV_DEVNAME: /dev/rd/9
> 142046 - ENV_ACTION:  add
> 142046 - ENV_DEVPATH: sda
> 142046 - ENV_DEVNAME: /dev/sda
> 122045 - ENV_ACTION:  add
> 122045 - ENV_DEVPATH: sdb
> 122045 - ENV_DEVNAME: /dev/sdb
> 
> 
> testhalde2 tmp # fgrep dm devd_multipath 
> testhalde2 tmp # 
> 
> 
> ===> My idea from the sysfs/device-mapper/udev/multipath cooperation is the
> following: After loading the hba module qla2300 the kernel creates
> /sys/block/sda and /sys/block/sdb und executes udevsend - udevd - udev. udev
> invokes /etc/dev.d/block/multipath.dev (ADD, sda/sdb). multipath.dev executes
> multipath that creates the device-mapper table and the device-mapper device
> /sys/block/dm-0. Ok, now we have a new device (dm-0). And again: udevsend -
> udevd - udev and multipath.dev (ADD, dm-0). multipath.dev should start kpartx,
> but the debug file /tmp/devd_multipath shows nothing! So, I think kpartx will
> never started. Is this behavior ok? It seems to work without kpartx, so I don't
> understand why I need this tool.
>      
> 
kpartx is triggered for dm-* "adds" only by multipath.dev hotplug
script. *and* the script expects the node to be in /dev/
(not /dev/mapper/). 
This problem is linked to the previous one.


> 
> testhalde2 ~ # ps ax | fgrep multipathd
> 10870 pts/0    SL     0:00 multipathd
> 10871 pts/0    SL     0:00 multipathd
> 10872 pts/0    SL     0:00 multipathd
> 10875 pts/0    S+     0:00 fgrep multipathd
> 
> testhalde2 ~ # ls /var/run/multipathd.pid
> ls: /var/run/multipathd.pid: No such file or directory
> 
> ===> Does the system really need _three_ multipathd daemons and why is
>      there no pid file?
> 
I don't know default ps/nptl Gentoo choice, but it might well be the
different threads you see there. Consecutive PID numbers are a sign.

> 
> testhalde2 ~ # echo 10870 > /var/run/multipathd.pid
> testhalde2 ~ # strace -f -p 10870 >/tmp/strace_multipatd 2>&1 &
> [1] 11192
> 
Don't debug this way.
Use 'multipathd -v4' and see the log or 'strace -f multipathd'
> 
> 
> 
> Now, I disable HBA-fabric-B port on the san-switch...
> 
> testhalde2 ~ # multipath -l
> [ sleeping 35 seconds ]
> open class /sys/block/sdc failed: No such file or directory
> error calling out /sbin/scsi_id -g -u -s /block/sdc
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
>   \_ 1:0:0:1 sdb  8:16    [ready ][active]
> \_ round-robin 0 [enabled]
>   \_ 0:0:0:1 sdc  8:32    [ready ][active]
> 
> testhalde2 ~ # multipath -l       # again
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
>   \_ 1:0:0:1 sdb  8:16    [ready ][active]
> \_ round-robin 0 [enabled]
>   \_ 0:0:0:0      8:32    [undef ][active]
> 
Lower the timeouts in your Qlogic driver.

> testhalde2 tmp # touch /mnt/test/test    # ok
> testhalde2 tmp # rm /mnt/test/test       # ok
> 
> testhalde2 tmp # ps ax | fgrep multipathd
> 10871 pts/0    SL     0:00 multipathd
> 10872 pts/0    SL     0:00 multipathd
> 10870 pts/0    SL     0:00 multipathd
> 11534 pts/0    S+     0:00 fgrep multipathd
> 
> testhalde2 tmp # cat strace_multipatd 
> Process 10870 attached - interrupt to quit
> testhalde2 tmp # 
> 
> ===> No output in the strace-debug file from multipathd. It seems that
>      multipathd don't recognize the changes.
>  
Do the log agree with that ?

> 
> Enabling HBA-fabric-B port on the san-switch...
> 
> testhalde2 tmp # multipath -l
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
>   \_ 1:0:0:1 sdb  8:16    [ready ][active]
> \_ round-robin 0 [enabled]
>   \_ 0:0:0:1 sdc  8:32    [ready ][active]
> 
> testhalde2 tmp # touch /mnt/test/test    # ok
> testhalde2 tmp # rm /mnt/test/test       # ok
> 
> 
> 
> 
> Disabling HBA-fabric-A port on the other san-switch...
> 
> testhalde2 ~ # multipath -l
> [ sleeping 35 seconds ]
> 1:0:0:1: cannot open /tmp/scsi-maj8-min16-11665: No such device or address
> error calling out /sbin/scsi_id -g -u -s /block/sdb
> error calling out /sbin/pp_balance_units 8:32
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active][first]
>   \_ 1:0:0:1 sdb  8:16    [faulty][active]
> \_ round-robin 0 [enabled]
>   \_ 0:0:0:1 sdc  8:32    [ready ][active]
> 
> testhalde2 tmp # multipath -l             # again
> error calling out /sbin/pp_balance_units 8:32
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active][first]
>   \_ 0:0:0:0      8:16    [undef ][active]
> \_ round-robin 0 [enabled]
>   \_ 0:0:0:1 sdc  8:32    [ready ][active]
> 
> testhalde2 tmp # touch /mnt/test/test    # ok
> testhalde2 tmp # rm /mnt/test/test       # ok
> 
> 
> ===> Why do I get the "error calling out..." error only when I disable the
>      HBA-port from _fabric-A_?
> 
Your log shows this message when disabling B too.
These are scsi_id error messages.
> 
> Enabling HBA-fabric-A port...
> 
> testhalde2 tmp # multipath -l
> 150gb (3600508b40010079d0001900000460000)
> [size=150 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [enabled][first]
>   \_ 0:0:0:1 sdc  8:32    [ready ][active]
> \_ round-robin 0 [enabled]
>   \_ 1:0:0:1 sda  8:0     [ready ][active]
> 
> testhalde2 tmp # touch /mnt/test/test    # ok
> testhalde2 tmp # rm /mnt/test/test       # ok
> 
> 

> SUMMARY:
> ========
> 
> The failover mechanism seems to work, but it's very very slow (>= 35 sec).
> I am sure that the host will die when I have a lot of I/O's in this moment.
> The documentation says that multipathd "is in charge of checking the paths
> in case they come up or down" and multipathd seems to do nothing... I think
> that is the problem... What do you mean?
> 
Hope the previous comments clarifies a bit.
Also know the 0.4.5 snapshots are largely better suited to the task.
Consider upgrading.
And consider updating the wiki FAQ with the response you found to be
enlightening :/

Regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: Problems with multipathd
  2005-08-31 19:56 ` christophe varoqui
@ 2005-09-06 16:46   ` gistolero
  2005-09-06 20:47     ` christophe varoqui
  0 siblings, 1 reply; 5+ messages in thread
From: gistolero @ 2005-09-06 16:46 UTC (permalink / raw)
  To: dm-devel

Hi,

 > Also know the 0.4.5 snapshots are largely better suited to the task.
 > Consider upgrading.

Now, I use multipath-tools-0.4.5, udev-068 and device-mapper-1.01.03.


 > Lower the timeouts in your Qlogic driver.

===> I found some settings in /sys/module/qla2xxx/parameters/...,
but most of them are read-only values. I have changed ql2xretrycount
and ql2xsuspendcount but without success. Any suggestions for
this driver?


I.) --- udev and udevstart ---


 > Default udev.rules file has a directive to ignore dm-*
 > Something like :
 > KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device"
 >
 > /etc/udev/rules.d/20-multipath.rules is useless unless you you comment
 > out this rule.

I have commented this line, but udev still has difficulties to create this
links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
is attached at the end of this post) and added debug messages. The most
important modification is that kpartx uses the block-device-files in
/dev/mapper/... instead of /dev/...
===> Why isn't that the default? Are there any disadvantages?


testhalde2 ~ # multipath /dev/sda
create: 150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0
  \_ 0:0:0:1 sda 8:0  [ready]
  \_ 1:0:0:1 sdb 8:16 [ready]

testhalde2 ~ # multipath -ll
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 0:0:0:1 sda 8:0  [active][ready]
  \_ 1:0:0:1 sdb 8:16 [active][ready]

testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260

testhalde2 ~ # ls -lF /dev/mapper/
total 0
brw-------  1 root root 254,  0 Sep  6 15:04 150gb
brw-------  1 root root 254,  1 Sep  6 15:04 150gb1
brw-------  1 root root 254,  2 Sep  6 15:04 150gb2
crw-rw----  1 root root  10, 63 Sep  6  2005 control

testhalde2 ~ # ls -lF /dev/1*
ls: /dev/1*: No such file or directory

testhalde2 ~ # udevstart

testhalde2 ~ # ls -lF /dev/1*
lrwxrwxrwx  1 root root 4 Sep  6 15:10 /dev/150gb -> dm-0
lrwxrwxrwx  1 root root 4 Sep  6 15:11 /dev/150gb1 -> dm-1
lrwxrwxrwx  1 root root 4 Sep  6 15:11 /dev/150gb2 -> dm-2


===> Without "udevstart" udev doesn't create the /dev/150gb*
links! Is this a udev bug?



II.) --- Using multipathd ---



testhalde2 ~ # multipathd -v4
testhalde2 ~ # ps ax | fgrep multipathd
11024 pts/1    SL     0:00 multipathd -v4
11025 pts/1    SL     0:00 multipathd -v4
11026 pts/1    SL     0:00 multipathd -v4
11029 pts/1    SL     0:00 multipathd -v4
11030 pts/1    SL     0:00 multipathd -v4
11031 pts/1    SL     0:00 multipathd -v4
11032 pts/1    SL     0:00 multipathd -v4
11071 pts/1    S+     0:00 fgrep multipathd
testhalde2 ~ # cat /var/run/multipathd.pid
11024

Yeah! Version 0.4.5 creates a pid and a socket file :-)
It's important that I start "multipath /dev/sda" _before_
multipathd! If I change this order, multipathd does nothing.
/var/log/messages shows "tick", "map garbage collection"
etc. and nothing about /dev/sda or /dev/sdb. It seems that
multipathd doesn't read the device-mapper table at startup.
===> Is this behavior ok?


testhalde2 ~ # less /var/log/messages
...
/etc/dev.d/multipath.dev (10904): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-0, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (10904): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (10904): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (10904): Getting /block/dm-0 major and minor number
/etc/dev.d/multipath.dev (10904): /block/dm-0 major:minor = 254:0
/etc/dev.d/multipath.dev (10904): Getting /block/dm-0 alias
/etc/dev.d/multipath.dev (10904): /block/dm-0 alias = /dev/mapper/150gb
/etc/dev.d/multipath.dev (10904): /sbin/kpartx -v -a /dev/mapper/150gb
/etc/dev.d/multipath.dev (10935): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-1, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (10935): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (10935): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (10935): Getting /block/dm-1 major and minor number
/etc/dev.d/multipath.dev (10935): /block/dm-1 major:minor = 254:1
/etc/dev.d/multipath.dev (10935): Getting /block/dm-1 alias
/etc/dev.d/multipath.dev (10963): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-2, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (10963): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (10963): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (10963): Getting /block/dm-2 major and minor number
/etc/dev.d/multipath.dev (10963): /block/dm-2 major:minor = 254:2
/etc/dev.d/multipath.dev (10963): Getting /block/dm-2 alias
/etc/dev.d/multipath.dev (10935): /block/dm-1 alias = /dev/mapper/150gb1
/etc/dev.d/multipath.dev (10935): /sbin/kpartx -v -a /dev/mapper/150gb1
/etc/dev.d/multipath.dev (10963): /block/dm-2 alias = /dev/mapper/150gb2
/etc/dev.d/multipath.dev (10963): /sbin/kpartx -v -a /dev/mapper/150gb2
multipathd: --------start up--------
multipathd: read /etc/multipath.conf
multipathd: fd0 blacklisted
...
multipathd: hda blacklisted
multipathd: path sda not found in pathvec
multipathd: ===== path sda =====
multipathd: bus = 1
multipathd: dev_t = 8:0
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 0:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: getuid = /sbin/scsi_id -g -u -s /block/%n (controler setting)
multipathd: uid = 3600508b40010079d0001900000460000 (callout)
multipathd: path sdb not found in pathvec
multipathd: ===== path sdb =====
multipathd: bus = 1
multipathd: dev_t = 8:16
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 1:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: getuid = /sbin/scsi_id -g -u -s /block/%n (controler setting)
multipathd: uid = 3600508b40010079d0001900000460000 (callout)
multipathd: dm-0 blacklisted
multipathd: dm-1 blacklisted
multipathd: dm-2 blacklisted
multipathd: discovered map 150gb
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: 8:0 ownership set
multipathd: 8:16 ownership set
multipathd: pgfailback = -2 (LUN setting)
multipathd: 150gb: event checker started
multipathd: path checkers start up
multipathd: tick
multipathd: ===== path sda =====
multipathd: bus = 1
multipathd: dev_t = 8:0
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 0:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path checker = tur (controler setting)
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: reinstated
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: ===== path sda =====
multipathd: getprio = /bin/true (internal default)
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: getprio = /bin/true (internal default)
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: bus = 1
multipathd: dev_t = 8:16
multipathd: size = 314572800
multipathd: vendor = HP
multipathd: product = HSV100
multipathd: rev = 3025
multipathd: h:b:t:l = 1:0:0:1
multipathd: tgt_node_name = 0x50001fe150051d20
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path checker = tur (controler setting)
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: reinstated
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 4 times
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 20s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 20s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: tick
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
multipathd: tick
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: tick
...



*** Disabling san-port from HBA-1... ***


testhalde2 ~ # multipath -ll
[ sleeping 35 seconds ]
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 0:0:0:1 sda 8:0  [active][faulty]
  \_ 1:0:0:1 sdb 8:16 [active][ready]

testhalde2 ~ # multipath -ll
[ sleeping 10 seconds ]
failed to open /dev/sda
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 0:0:0:1 sda 8:0  [failed][faulty]
  \_ 1:0:0:1 sdb 8:16 [active][ready]

testhalde2 ~ # multipath -ll
[ sleeping 10 seconds ]
failed to open /dev/sda
150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 0:0:0:1 sda 8:0  [active][ready]
  \_ 1:0:0:1 sdb 8:16 [active][ready]


testhalde2 ~ # ls /sys/block/
dm-0  fd0    loop1  loop4  loop7  ram10  ram13  ram2  ram5  ram8
dm-1  hda    loop2  loop5  ram0   ram11  ram14  ram3  ram6  ram9
dm-2  loop0  loop3  loop6  ram1   ram12  ram15  ram4  ram7  sdb


testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260


testhalde2 ~ # less /var/log/messages
...
kernel: qla2300 0000:03:01.0: LOOP DOWN detected.
multipathd: tick
...
kernel:  rport-0:0-3: blocked FC remote port time out: removing target
multipathd: 8:0: tur checker reports path is down
multipathd: checker failed path 8:0 in map 150gb
kernel: device-mapper: dm-multipath: Failing path 8:0.
multipathd: 150gb: devmap event #2
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: discovered map 150gb
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = F, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: 8:0 ownership set
multipathd: 8:16 ownership set
multipathd: pgfailback = -2 (LUN setting)
/etc/dev.d/multipath.dev (11579): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda1, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11579): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11579): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11579): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11571): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda2, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11571): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11571): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11571): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11604): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11604): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11604): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11604): Exiting: $ACTION != "add"
multipathd: tick
multipathd: map garbage collection
multipathd: tick
multipathd: tick
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 3 times
multipathd: map garbage collection
multipathd: tick
last message repeated 3 times
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: reinstated
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: 150gb: devmap event #3
multipathd: discovered map 150gb
multipathd: *word = 0, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = round-robin, len = 11
multipathd: *word = 0, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = 8:0, len = 3
multipathd: *word = 8:16, len = 4
multipathd: *word = 1, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 2, len = 1
multipathd: *word = 0, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 1, len = 1
multipathd: *word = A, len = 1
multipathd: *word = 0, len = 1
multipathd: 8:0 ownership set
multipathd: 8:16 ownership set
multipathd: pgfailback = -2 (LUN setting)
kernel: scsi0 (0:1): rejecting I/O to dead device
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
multipathd: tick
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 2 times
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 20s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
kernel: scsi0 (0:1): rejecting I/O to dead device
multipathd: tick
last message repeated 2 times
multipathd: map garbage collection
multipathd: tick
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 4 times
multipathd: map garbage collection
multipathd: tick
last message repeated 5 times
multipathd: map garbage collection
multipathd: tick
last message repeated 3 times
multipathd: 8:16: tur checker reports path is up
multipathd: 8:16: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sdb =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: tick
multipathd: tick
multipathd: 8:0: tur checker reports path is up
multipathd: 8:0: delay next check 40s
multipathd: path prio refresh
multipathd: ===== path sda =====
multipathd: prio = 0
multipathd: uid = 3600508b40010079d0001900000460000 (cache)
multipathd: map garbage collection
kernel: scsi0 (0:1): rejecting I/O to dead device
multipathd: tick
...


===> First multipathd says "8:0: tur checker reports
path is down" and multipath prints sda "failed" (ok).
After a few seconds sda is "ready" and multipathd says
"8:0: tur checker reports path is up"?! I have changed
nothing during this time.




*** Enabling san-switch port from HBA-1 ***


testhalde2 ~ # multipath -ll
testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb2: 0 314504505 linear 254:0 64260


testhalde2 ~ # less /var/log/messages
...
multipathd: tick
kernel: qla2300 0000:03:01.0: LIP reset occured (f7f7).
kernel: qla2300 0000:03:01.0: LOOP UP detected (2 Gbps).
...
kernel: SCSI device sdc: drive cache: write through
kernel:  sdc: sdc1 sdc2
kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1
kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1,  type 0
scsi.agent[11856]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1
/etc/dev.d/multipath.dev (11909): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11909): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11909): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11909): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11909): multipath -v0 /dev/sdc
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
multipathd: tick
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 2 times
multipathd: map garbage collection
kernel: device-mapper: dm-multipath: error getting device
kernel: device-mapper: error adding target to table
/etc/dev.d/multipath.dev (11944): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11944): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11944): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11959): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11959): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11959): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11959): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11959): multipath -v0 /dev/sdc1
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
logger: /etc/dev.d/multipath.dev (11944): Checking/Creating multipath device-mapper table with multipath-tool
logger: /etc/dev.d/multipath.dev (11944): multipath -v0 /dev/sdc2
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
last message repeated 2 times
multipathd: Got request [dump pathvec]
multipathd: *word = dump, len = 4
multipathd: *word = pathvec, len = 7
multipathd: tick
multipathd: tick
kernel: device-mapper: dm-multipath: error getting device
kernel: device-mapper: error adding target to table
kernel: device-mapper: device doesn't appear to be in the dev hash table.
multipathd: tick
multipathd: map garbage collection
multipathd: 150gb: remove dead map
multipathd: 150gb: reap event checker
multipathd: 8:0 is orphaned
multipathd: 8:16 is orphaned
multipathd: SIGHUP received
multipathd: tick
multipathd: tick
/etc/dev.d/multipath.dev (12002): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-3, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (12002): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (12002): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (12002): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (12018): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-4, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (12018): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (12018): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (12018): Exiting: $ACTION != "add"
multipathd: tick
...


===> An error occurs while device-mapper tries to update
the dm-table and deletes the "150gb" entry.



III.) --- Using multipath-tools without multipath ---



Now, I try the same _without_ starting multipathd...


testhalde2 ~ # ps ax | fgrep multipathd
testhalde2 ~ #

testhalde2 ~ # multipath /dev/sda
create: 150gb (3600508b40010079d0001900000460000)
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0
  \_ 0:0:0:1 sda 8:0  [ready]
  \_ 1:0:0:1 sdb 8:16 [ready]

testhalde2 ~ # multipath -ll
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 0:0:0:1 sda 8:0  [active][ready]
  \_ 1:0:0:1 sdb 8:16 [active][ready]

testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260

testhalde2 ~ # ls /dev/mapper/
150gb  150gb1  150gb2  control


*** Disabling san-switch port from HBA-1 ***


testhalde2 ~ # multipath -ll
[ sleeping 35 seconds ]
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 0:0:0:1 sda 8:0  [active][faulty]
  \_ 1:0:0:1 sdb 8:16 [active][ready]

testhalde2 ~ # multipath -ll
[ sleeping 10 seconds ]
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ #:#:#:#      8:0 [active]
  \_ 1:0:0:1 sdb 8:16 [active][ready]

testhalde2 ~ # ls /sys/block/
dm-0  fd0    loop1  loop4  loop7  ram10  ram13  ram2  ram5  ram8
dm-1  hda    loop2  loop5  ram0   ram11  ram14  ram3  ram6  ram9
dm-2  loop0  loop3  loop6  ram1   ram12  ram15  ram4  ram7  sdb

testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:16 1000
150gb2: 0 314504505 linear 254:0 64260


testhalde2 ~ # less /var/log/messages
...
kernel: qla2300 0000:03:01.0: LOOP DOWN detected.
kernel:  rport-0:0-3: blocked FC remote port time out: removing target
/etc/dev.d/multipath.dev (11186): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda1, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11186): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11186): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11186): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11200): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda/sda2, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11200): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11200): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11200): Exiting: $ACTION != "add"
/etc/dev.d/multipath.dev (11217): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sda, $ACTION=remove, $@=block
/etc/dev.d/multipath.dev (11217): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11217): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11217): Exiting: $ACTION != "add"
...



*** Enabling san-switch port from HBA-1 ***


testhalde2 ~ # multipath -ll
[ sleeping 5 seconds ]
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ #:#:#:#      8:0  [active]
  \_ 1:0:0:1 sdb 8:16 [active][ready]

testhalde2 ~ # multipath -ll
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
  \_ 1:0:0:1 sdb 8:16 [active][ready]
  \_ 0:0:0:1 sdc 8:32 [active][ready]

testhalde2 ~ # dmsetup table
150gb1: 0 64197 linear 254:0 63
150gb: 0 314572800 multipath 0 0 1 1 round-robin 0 2 1 8:16 1000 8:32 1000
150gb2: 0 314504505 linear 254:0 64260

testhalde2 ~ # ls /dev/mapper/
150gb  150gb1  150gb2  control


testhalde2 ~ # less /var/log/messages
...
kernel:  sdc: sdc1 sdc2
kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1
kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1,  type 0
scsi.agent[11271]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1
/etc/dev.d/multipath.dev (11325): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11325): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11325): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11325): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11325): multipath -v0 /dev/sdc
/etc/dev.d/multipath.dev (11356): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11356): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11356): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11356): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11356): multipath -v0 /dev/sdc2
/etc/dev.d/multipath.dev (11377): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block
/etc/dev.d/multipath.dev (11377): Logging to local file is enabled: log file is /root/multipath.dev.log
/etc/dev.d/multipath.dev (11377): Logging to syslog ist enabled: facility.priority is daemon.info
/etc/dev.d/multipath.dev (11377): Checking/Creating multipath device-mapper table with multipath-tool
/etc/dev.d/multipath.dev (11377): multipath -v0 /dev/sdc1
...


VI.) --- Summary ---


===> Multipathing seems to work without but not with multipathd.
It's very slow, but Christophe Varoqui wrote that I have to lower
the HBA timeouts (unfortunately, I don't know how to do this,
see above). Does I really need multipathd? I suppose so :-)



 > And consider updating the wiki FAQ with the response you found to be
 > enlightening :/

As soon as I have running multipath I will write a step-by-step
documentation.


Thanks again for your help,
Simon



#--- My /etc/dev.d/block/multipath.dev - Begin ---

#!/bin/sh

# log to local file? (0 or 1)
# (print log file with "sort $logFile [| less]")
logToFile=1

# log to syslog? (0 or 1)
logToSyslog=1

# path to log file
# (only used if $logToFile==1)
logFile="/root/multipath.dev.log"

# syslog facility.priority (man syslog.conf)
# (only used if $logToSyslog==1)
syslogFacPrio="daemon.info"

# timeout for getting ${DEVPATH} alias in seconds
timeout=10

# be verbose in log file and/or syslog?
# (only used if $logToFile==1 and/or $logToSyslog==1)
verbose=1

# Don't touch
pid=$$
logCount=1;

log()
{
   if [ ${logToFile} -eq 1 ]
   then
     msg="PID ${pid} - $(date +%Y%m%d-%H%M%S) -"
     msg="${msg} Log entry ${logCount}: ${1}"
     echo "${msg}" >> $logFile
     logCount=$(($logCount + 1))
   fi

   if [ ${logToSyslog} -eq 1 ]
   then
     logger -p ${syslogFacPrio} "/etc/dev.d/multipath.dev (${pid}): ${1}"
   fi
}

  die()
{
   log "DIED WITH ERROR: ${1}"
   exit 1
}

exe()
{
   log "${1}"
   if ! ${1}
   then
     die "\"${1}\" failed (device blacklisted?)"
   fi
}

end()
{
   if [ ! -z ${1} ]
   then
     log "${1}"
   fi
   exit 0
}

if [ ${verbose} -eq 1 ]
then
   msg="Parameters: \$0=${0}, \$DEVPATH=${DEVPATH},"
   msg="${msg} \$ACTION=${ACTION}, \$@=${@}"
   log "${msg}"

   if [ ${logToFile} -eq 1 ]
   then
     log "Logging to local file is enabled: log file is ${logFile}"
   fi

   if [ ${logToSyslog} -eq 1 ]
   then
     msg="Logging to syslog is enabled:"
     msg="${msg} facility.priority is ${syslogFacPrio}"
     log "${msg}"
   fi
fi

if [ ! "${ACTION}" = add ]
then
   if [ ${verbose} -eq 1 ]
   then
     log "Exiting: \$ACTION != \"add\""
   fi
   end
fi

if [ "${DEVPATH:7:3}" = "dm-" ]
then
   log "Getting ${DEVPATH} major and minor number"
   devMajorMinor=$(</sys${DEVPATH}/dev)
   if [ -z ${devMajorMinor} ]
   then
     die "Getting ${DEVPATH} major and minor number failed"
   else
     log "${DEVPATH} major:minor = ${devMajorMinor}"
   fi

   log "Getting ${DEVPATH} alias"
   count=0
   devAlias="none"
   while [ ! -b ${devAlias} ] && [ ${count} -le ${timeout} ]
   do
     devAlias="/dev/mapper/$(devmap_name ${devMajorMinor})"
     if [ ${count} -ne 0 ]
     then
       sleep 1
     fi
     count=$(($count + 1))
   done

   if [ ${count} -gt ${timeout} ]
   then
     msg="Getting ${DEVPATH} alias failed (Found ${devAlias}, but"
     msg="${msg} this isn't a block device)"
     die "${msg}"
   else
     log "${DEVPATH} alias = ${devAlias}"
   fi

   exe "/sbin/kpartx -v -a ${devAlias}"

else
   log "Checking/Creating multipath device-mapper table with multipath-tool"
   exe "multipath -v0 ${DEVNAME}"
fi

#--- /etc/dev.d/block/multipath.dev - End ---

-- 

Simon
gistolero@gmx.de

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: Problems with multipathd
  2005-09-06 16:46   ` gistolero
@ 2005-09-06 20:47     ` christophe varoqui
  2005-09-12 15:52       ` gistolero
  0 siblings, 1 reply; 5+ messages in thread
From: christophe varoqui @ 2005-09-06 20:47 UTC (permalink / raw)
  To: device-mapper development

On mar, 2005-09-06 at 18:46 +0200, gistolero@gmx.de wrote:
> Hi,

>  > Lower the timeouts in your Qlogic driver.
> 
> ===> I found some settings in /sys/module/qla2xxx/parameters/...,
> but most of them are read-only values. I have changed ql2xretrycount
> and ql2xsuspendcount but without success. Any suggestions for
> this driver?
> 
Here are the interesting one I guess.

[root@s64p17bibro ~]# find /sys/class/ -name "*tmo*"
/sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo
/sys/class/scsi_host/host1/lpfc_nodev_tmo

> 
> I.) --- udev and udevstart ---
> 
> 
>  > Default udev.rules file has a directive to ignore dm-*
>  > Something like :
>  > KERNEL=="dm-[0-9]*", OPTIONS+="ignore_device"
>  >
>  > /etc/udev/rules.d/20-multipath.rules is useless unless you you comment
>  > out this rule.
> 
> I have commented this line, but udev still has difficulties to create this
> links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
> is attached at the end of this post) and added debug messages. The most
> important modification is that kpartx uses the block-device-files in
> /dev/mapper/... instead of /dev/...
> ===> Why isn't that the default? Are there any disadvantages?
> 
Not really. All distributors seem to have their own ideas about naming
policies. You should ask about, and follow the Gentoo philosophy I
guess.

> 
> ===> Without "udevstart" udev doesn't create the /dev/150gb*
> links! Is this a udev bug?
> 
You can still identify the udev problems keeping the node creation
in /dev/. Maybe all path setupis done in the initrd/initramfs without
multipath being able to react.


> Yeah! Version 0.4.5 creates a pid and a socket file :-)
> It's important that I start "multipath /dev/sda" _before_
> multipathd! If I change this order, multipathd does nothing.
> /var/log/messages shows "tick", "map garbage collection"
> etc. and nothing about /dev/sda or /dev/sdb. It seems that
> multipathd doesn't read the device-mapper table at startup.
> ===> Is this behavior ok?
> 
No, and I don't reproduce this behaviour here.

<no map loaded>

[root@s64p17bibro multipath-tools-0.4.5]# multipathd -d
path checkers start up

<run /sbin/multipath>

device-mapper ioctl cmd 12 failed: No such device or address
ema4_lun1: event checker started
8:16: hp_sw checker reports path is up
8:16: reinstated
8:48: hp_sw checker reports path is ghost
8:48: reinstated

> ...
> 
> 
> ===> First multipathd says "8:0: tur checker reports
> path is down" and multipath prints sda "failed" (ok).
> After a few seconds sda is "ready" and multipathd says
> "8:0: tur checker reports path is up"?! I have changed
> nothing during this time.
> 
Maybe the checker is confused by the long timeouts.
Worth another try after the lowering.


> *** Enabling san-switch port from HBA-1 ***
> 
> 
> testhalde2 ~ # multipath -ll
> testhalde2 ~ # dmsetup table
> 150gb1: 0 64197 linear 254:0 63
> 150gb2: 0 314504505 linear 254:0 64260
> 
> 
> testhalde2 ~ # less /var/log/messages
> ...
> multipathd: tick
> kernel: qla2300 0000:03:01.0: LIP reset occured (f7f7).
> kernel: qla2300 0000:03:01.0: LOOP UP detected (2 Gbps).
> ...
> kernel: SCSI device sdc: drive cache: write through
> kernel:  sdc: sdc1 sdc2
> kernel: Attached scsi disk sdc at scsi0, channel 0, id 0, lun 1
> kernel: Attached scsi generic sg1 at scsi0, channel 0, id 0, lun 1,  type 0
> scsi.agent[11856]: disk at /devices/pci0000:03/0000:03:01.0/host0/rport-0:0-3/target0:0:0/0:0:0:1
> /etc/dev.d/multipath.dev (11909): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc, $ACTION=add, $@=block
> /etc/dev.d/multipath.dev (11909): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (11909): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (11909): Checking/Creating multipath device-mapper table with multipath-tool
> /etc/dev.d/multipath.dev (11909): multipath -v0 /dev/sdc
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> multipathd: tick
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> last message repeated 2 times
> multipathd: map garbage collection
> kernel: device-mapper: dm-multipath: error getting device
> kernel: device-mapper: error adding target to table
> /etc/dev.d/multipath.dev (11944): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc2, $ACTION=add, $@=block
> /etc/dev.d/multipath.dev (11944): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (11944): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (11959): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/sdc/sdc1, $ACTION=add, $@=block
> /etc/dev.d/multipath.dev (11959): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (11959): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (11959): Checking/Creating multipath device-mapper table with multipath-tool
> /etc/dev.d/multipath.dev (11959): multipath -v0 /dev/sdc1
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> logger: /etc/dev.d/multipath.dev (11944): Checking/Creating multipath device-mapper table with multipath-tool
> logger: /etc/dev.d/multipath.dev (11944): multipath -v0 /dev/sdc2
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> last message repeated 2 times
> multipathd: Got request [dump pathvec]
> multipathd: *word = dump, len = 4
> multipathd: *word = pathvec, len = 7
> multipathd: tick
> multipathd: tick
> kernel: device-mapper: dm-multipath: error getting device
> kernel: device-mapper: error adding target to table
> kernel: device-mapper: device doesn't appear to be in the dev hash table.
> multipathd: tick
> multipathd: map garbage collection
> multipathd: 150gb: remove dead map
> multipathd: 150gb: reap event checker
> multipathd: 8:0 is orphaned
> multipathd: 8:16 is orphaned
> multipathd: SIGHUP received
> multipathd: tick
> multipathd: tick
> /etc/dev.d/multipath.dev (12002): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-3, $ACTION=remove, $@=block
> /etc/dev.d/multipath.dev (12002): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (12002): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (12002): Exiting: $ACTION != "add"
> /etc/dev.d/multipath.dev (12018): Parameters: $0=/etc/dev.d/block/multipath.dev, $DEVPATH=/block/dm-4, $ACTION=remove, $@=block
> /etc/dev.d/multipath.dev (12018): Logging to local file is enabled: log file is /root/multipath.dev.log
> /etc/dev.d/multipath.dev (12018): Logging to syslog ist enabled: facility.priority is daemon.info
> /etc/dev.d/multipath.dev (12018): Exiting: $ACTION != "add"
> multipathd: tick
> ...
> 
> 
> ===> An error occurs while device-mapper tries to update
> the dm-table and deletes the "150gb" entry.
> 
Seems to point to a kernel issue.
FYI, /bin/multipath alone tries to update the map.

> 
> 
> III.) --- Using multipath-tools without multipath ---
> 
> Now, I try the same _without_ starting multipathd...
> 
> ===> Multipathing seems to work without but not with multipathd.
> It's very slow, but Christophe Varoqui wrote that I have to lower
> the HBA timeouts (unfortunately, I don't know how to do this,
> see above). Does I really need multipathd? I suppose so :-)
> 
multipathd is needed to reinstate paths.
In your case the rport disappears and reappears so the mecanism is all
hotplug-driven and thus may work without the daemon ... if memory
ressources permits hotplug and multipath(8) execution, that is.
> 

Regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problems with multipathd
  2005-09-06 20:47     ` christophe varoqui
@ 2005-09-12 15:52       ` gistolero
  0 siblings, 0 replies; 5+ messages in thread
From: gistolero @ 2005-09-12 15:52 UTC (permalink / raw)
  To: device-mapper development

>>===> I found some settings in /sys/module/qla2xxx/parameters/...,
>>but most of them are read-only values. I have changed ql2xretrycount
>>and ql2xsuspendcount but without success. Any suggestions for
>>this driver?
>>
> 
> Here are the interesting one I guess.
> 
> [root@s64p17bibro ~]# find /sys/class/ -name "*tmo*"
> /sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo
> /sys/class/scsi_host/host1/lpfc_nodev_tmo

Ok, I have a 6 seconds timeout now :-)


>>I have commented this line, but udev still has difficulties to create this
>>links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
>>is attached at the end of this post) and added debug messages. The most
>>important modification is that kpartx uses the block-device-files in
>>/dev/mapper/... instead of /dev/...
>>===> Why isn't that the default? Are there any disadvantages?
>>
> 
> Not really. All distributors seem to have their own ideas about naming
> policies. You should ask about, and follow the Gentoo philosophy I
> guess.


I'm sure of not beeing the only one who has problems with missing /dev/... 
links. It's possible that multipath installs a device-mapper table without 
errors, but kpartx fails because udev doesn't create links in /dev/... So, I 
think multipath.dev should execute kpartx with /dev/mapper/... instead of 
/dev/... by default.


>>===> Without "udevstart" udev doesn't create the /dev/150gb*
>>links! Is this a udev bug?
>>
> You can still identify the udev problems keeping the node creation
> in /dev/. Maybe all path setupis done in the initrd/initramfs without
> multipath being able to react.

multipath is able to react. I don't understand why I have to execute udevstart.



>>===> First multipathd says "8:0: tur checker reports
>>path is down" and multipath prints sda "failed" (ok).
>>After a few seconds sda is "ready" and multipathd says
>>"8:0: tur checker reports path is up"?! I have changed
>>nothing during this time.
>>
> 
> Maybe the checker is confused by the long timeouts.
> Worth another try after the lowering.

After lowering the timeouts to 6 seconds multipathd shows the same behavior.



>>===> Multipathing seems to work without but not with multipathd.
>>It's very slow, but Christophe Varoqui wrote that I have to lower
>>the HBA timeouts (unfortunately, I don't know how to do this,
>>see above). Does I really need multipathd? I suppose so :-)
>>
> 
> multipathd is needed to reinstate paths.
> In your case the rport disappears and reappears so the mecanism is all
> hotplug-driven and thus may work without the daemon ... if memory
> ressources permits hotplug and multipath(8) execution, that is.

What do you means with "In your case..."? Because 2.6 and udev are 
multipath-tools dependencies all systems running multipath have the same 
environment. They all use kernel 2.6 and udev, that is hotplug-driven. The 
kernel starts this hotplug process and udev executes multipath. Sorry, but I 
have to ask again: Does we really need multipathd?


After lowering dev_loss_tmo timeouts and stopping multipathd I have a working 
multipath environment :-))) I tested this with a little perl script and a 
mysql database:



My trafficmaker-host executed this script 27 times (parallel):

...
for(my $count=1;$count<=1000000;$count++)
{
   ...
   my $sql="INSERT INTO $table VALUES($id,\"$value\")";
   my $return=$dbh->do($sql);
   ...
}
...
{
   my $sql="SELECT COUNT(*) FROM $table WHERE id=$id";
   my $sth=$dbh->prepare($sql);
   my $return=$sth->execute();
   ...
   $selectCount=$sth->fetchrow_array();
   ...;
}


The database host had to insert this 30 byte strings and I have started some 
copy-jobs (cp -a /usr/* /partition_mounted_with_multipath/ etc.) to increase 
the I/O load. During this test I have disabled and enabled the different 
HBA-Switch-Ports with the following result: It took 6 to 15 seconds before 
"multipath -l" showed that a path is down (15 seconds because the host had a 
30.0 CPU load and responded very slowly), but no INSERT got lost :-)))

But sometimes multipath seems to be a bit confused...



1.) one path disabled

In the majority of cases multipath prints...

testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ #:#:#:#     8:0  [active]
  \_ 1:0:0:1 sdb 8:16 [active]


But sometimes I get...

testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 4:0:0:1 sdb 8:16 [active]



2.) all paths enabled (default)

In the majority of cases multipath prints...

testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
  \_ 1:0:0:1 sdb 8:16 [active]
  \_ 0:0:0:1 sdc 8:32 [active]


But sometimes I get...

testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 0:0:0:1 sdb 8:16 [active]
\_ round-robin 0 [enabled]
  \_ 4:0:0:1 sdc 8:32 [active]


Regards
Simon

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-09-12 15:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-31 15:29 Problems with multipathd Simon
2005-08-31 19:56 ` christophe varoqui
2005-09-06 16:46   ` gistolero
2005-09-06 20:47     ` christophe varoqui
2005-09-12 15:52       ` gistolero

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.