* HSG80, DM, multipath issues
@ 2006-04-10 17:42 D. North
2006-04-10 18:28 ` Christophe Varoqui
0 siblings, 1 reply; 7+ messages in thread
From: D. North @ 2006-04-10 17:42 UTC (permalink / raw)
To: dm-devel
Hi.
Note: this starts as a multipath issue, but I traced it into DM also, and
ultimately, this is a behavior problem with the HSG80's, I think.
I am unable to get SuSE 10 and my HSG80's successfully working with multipath.
I believe this is because the HSG80's are not reporting geometry information
on the standby path to each lun, but seeing as I'm pretty stumped, I'm hoping
someone's got a better explanation or even a workaround.
I'm stone-cold new on the fibrechannel stuff, so it's easily possible I've set
up my configuration incorrectly.... And also, the multipath and dm stuff is new
to me as well, so I could've also have made some mistakes there too...
Starting with configuration info:
SuSE 10 intel 32 bit, kernel 2.6.13-15.8, Multipath-tools 0.4.4-4 (from
SuSE .... multipath itself claims to be version 0.4.5!!!!)
1 QLA2200F single-attach to a EMC DS-16B switch (firmware flashed
to bios 1.83)
DS-16B is attached to the storage array on ports 1 & 2 on EACH HSG80.
A single LUN, D4 is defined, and is online to controller 2 and is the
lun I'm working with .... currently booting off this lun as /dev/sdb
/dev/sda is "sort of" there, but gives errors to almost anything that
tries to touch it
Connection paths are of type 'SUN' on the HSG80
HSG80 version V87F-7 configured MULTIBUS_FAILOVER, SCSI-3
Observed behavior is that the multipath tools do not accept the standby
path from the HSG, claiming a size mismatch. ... scsi inquiries are evidently
OK on standby for ident & existence, but geometry requests fail. shown below
after the output of the multipath command:
# multipath -v3 -d
: (blacklists omitted ... sd{x} is not blacklisted)
path sda not found in pathvec
===== path sda =====
device sda is on bus scsi
bus = 1
dev_t = 8:0
size = 2097152 <-----------WRONG - WHERE does *THIS* come from?
vendor = DEC
product = HSG80
rev = V87F
h:b:t:l = 0:0:0:4
tgt_node_name = 0x50001fe1000b0ad0
serial = ZG03401489
path checker = tur (controler setting)
state = 1
getprio = /bin/true (internal default)
prio = 0
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 360001fe1000b0ad00009034011590003 (callout)
path sdb not found in pathvec
===== path sdb =====
device sdb is on bus scsi
bus = 1
dev_t = 8:16
size = 443027195 <--------------RIGHT
vendor = DEC
product = HSG80
rev = V87F
h:b:t:l = 0:0:2:4
tgt_node_name = 0x50001fe1000b0ad0
serial = ZG03401159
path checker = tur (controler setting)
state = 2
getprio = /bin/true (internal default)
prio = 0
getuid = /sbin/scsi_id -g -u -s /block/%n (internal default)
uid = 360001fe1000b0ad00009034011590003 (callout)
#
# all paths :
#
360001fe1000b0ad00009034011590003 0:0:0:4 sda 8:0 [faulty][HSG80 ]
360001fe1000b0ad00009034011590003 0:0:2:4 sdb 8:16 [ready ][HSG80 ]
path size mismatch : discard 360001fe1000b0ad00009034011590003
pgpolicy = failover (LUN setting)
selector = round-robin (LUN setting)
features = 0 (internal default)
hwhandler = 0 (internal default)
0 2097152 multipath 0 0 2 1 round-robin 1 1 8:0 1000 round-robin 1 1 8:16 1000
action preset to 1
action set to 1
# scsiinfo -g /dev/sdb
Data from Rigid Disk Drive Geometry Page
----------------------------------------
Number of cylinders 72391
Number of heads 24
Starting write precomp 72391
Starting reduced current 72391
Drive step rate 0
Landing Zone Cylinder 0
RPL 0
Rotational Offset 0
Rotational Rate 3600
# scsiinfo -g /dev/sda
Unable to read Rigid Disk Geometry Page 04h
#
Diagnostics I tried:
1) I patched the multipath command to allow a faulty path on the same wwid to
"fudge" a copt of the size from a good path to the same wwid in order to get past
the multipath tools so I could try & get the device mapper set up.
Results:
Instead of (excerpts from unpatched multipath -v3 -d):
0 2097152 multipath 0 0 2 1 round-robin 1 1 8:0 1000 round-robin 1 1 8:16 1000
path size mismatch : discard 360001fe1000b0ad00009034011590003
I can now get:
0 443027195 multipath 0 0 2 1 round-robin 1 1 8:0 1000 round-robin 1 1 8:16 1000
create: lun4 (360001fe1000b0ad00009034011590003)
[size=211 GB][features="0"][hwhandler="0"]
\_ round-robin [best]
\_ 0:0:0:4 sda 8:0 [faulty]
\_ round-robin
\_ 0:0:2:4 sdb 8:16 [ready ]
Reissuing without -d results in:
device-mapper ioctl cmd 9 failed: Invalid argument
2) I tried to manually create the device map using the parameters generated in step 1:
# dmsetup remove_all ##just in case
# echo 0 443027195 multipath 0 0 2 1 round-robin 1 1 8:0 1000 round-robin 1 1 8:16 1000 | dmsetup create lun4
device-mapper ioctl cmd 9 failed: Invalid argument
Command failed
#
This creates a /dev/mapper/lun4, marked active, but apparently non-working, since fdisk is unable to read
from /dev/mapper/lun4. I'd sure like to know what that ioctl cmd 9 error is.... /var/log/messages now contains:
Apr 10 11:23:46 orthus-san kernel: device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
Apr 10 11:23:51 orthus-san kernel: device-mapper: dm-multipath version 1.0.4 loaded
Apr 10 11:23:54 orthus-san kernel: device-mapper: dm-round-robin version 1.0.0 loaded
Apr 10 11:23:54 orthus-san kernel: device-mapper: Unknown error
Apr 10 11:23:54 orthus-san kernel: device-mapper: error adding target to table
3) Then I dug into the dm_multipath module to try & track down the ioctl cmd 9 error.
After adding debugging info into {kernel}/drivers/md/dm-mpath.c, I find that in parse_priority_group(),
after the line:
nr_params = 1 + nr_selector_args
I log THIS with my debugging code:
Apr 10 11:15:46 orthus-san kernel: nr_params is 1001, nr_selector_args = 1000, pg->nr_pgpaths is 8
Whoops! ... THAT's not what I expected .... seems the parameters I sent to dmsetup are not what the
dm module is expecting. Is this because MAYBE dmsetup treats its arguments differently than the direct
calls into libdevmapper that multipath uses?
In any case, THIS seems to pass parse-muster with dm_multipath in this kernel:
# echo 0 443027195 multipath 0 0 2 1 round-robin 1 1 1 0 8:0 round-robin 1 1 1 0 8:16 | dmsetup create lun4
BUT.... the result isn't any happier, just different:
Apr 10 11:32:31 orthus-san kernel: device-mapper: device 8:0 too small for target
Apr 10 11:32:31 orthus-san kernel: device-mapper: dm-multipath: error getting device
Apr 10 11:32:31 orthus-san kernel: device-mapper: error adding target to table
Note that 8:0 is the faulty path to the HSG's, an apparently has the same busted geometry information.... :(
Anyway ... I figure I'm either missing something big, or this is going to be a LOT harder to get working
than I care to mess with.
Questions I hope someone can help with are:
a. (the big one!) Is there something I'm doing wrong, or a workaround, or something that would
help me get this up & running
b. Is the dmsetup test I show a valid way to be investigating this issue??
c. Any ideas on what other things I could try?
Thanks!
--
David North, rold5@tditx.com
The nicest thing about smacking your head against the the wall is.......The feeling you get when you stop - anon
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: HSG80, DM, multipath issues
2006-04-10 17:42 HSG80, DM, multipath issues D. North
@ 2006-04-10 18:28 ` Christophe Varoqui
2006-04-10 20:48 ` D. North
2006-04-10 22:00 ` Bernd Zeimetz
0 siblings, 2 replies; 7+ messages in thread
From: Christophe Varoqui @ 2006-04-10 18:28 UTC (permalink / raw)
To: device-mapper development
>
> a. (the big one!) Is there something I'm doing wrong, or a workaround, or something that would
> help me get this up & running
>
You mostly got it right.
HP indeed is not interested in providing a firmware with LUN size
inquiry enabled for this array. So we are bound to ugly workarounds.
The one I used is "force a bounce per LUN just after driver loading",
which this snippet does :
# cat read_ghost_capa
dummy_capa=2097152
for i in $(grep -rl $dummy_capa /sys/block/sd*/size|awk -F/ '{print $4}')
do
sg_start -s 1 /dev/$i
sleep 1
echo 1>/sys/block/$i/device/rescan
done
The multipath tools are rightly refusing to deal with paths with unknown
size. This won't change.
That problem solved, you'll hit the next wall, which is the hardware
handler need for this array family.
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: HSG80, DM, multipath issues
2006-04-10 18:28 ` Christophe Varoqui
@ 2006-04-10 20:48 ` D. North
2006-04-10 21:16 ` Christophe Varoqui
2006-04-10 22:00 ` Bernd Zeimetz
1 sibling, 1 reply; 7+ messages in thread
From: D. North @ 2006-04-10 20:48 UTC (permalink / raw)
To: device-mapper development
Thus spake Christophe Varoqui (christophe.varoqui@free.fr):
> The one I used is "force a bounce per LUN just after driver loading",
> which this snippet does :
>
> sg_start -s 1 /dev/$i
Thanks! ... This got me going enough to get a /dev/mapper/lun4p1...p3 up & running!
This was done manually from a 'rescue mode' boot of SuSE 10.
After trying the required commands on a running system, it's clear now that the
device mapper setup has to be done in the initrd if I expect to boot from this
array. No problem there... that's just a comment.
What is troubling me more at the moment is the difference between the device
mapper table setup commands created by the multipath command and how they
are different from what dm-mpath.c is actually expecting.
I downloaded 0.4.7 multipath tools & verified that they produce the same
table setup issues that I'm having trouble with from the 0.4.5 tool that
is current with SuSE 10. For example, the SuSE version of multipath command
produces this:
0 443027195 multipath 0 0 2 1 round-robin 1 1 8:0 1000 round-robin 1 1 8:16 1000
Which produces the previously mentioned misparse in dm-mpath.c (my debugging mods)
Apr 10 15:30:35 orthus-san kernel: nr_params is 1001, nr_selector_args = 1000, pg->nr_pgpaths is 8
Multipath tools 0.4.7 also causes the same diagnostic message (from my
debugging modifications), and it is sending in the same number of arguments
(14) to dm-mpath.c. I don't, however, get the diagnostic message any more from
this version of multipath to determine what it is actually sending into
libdevmapper, so I am not, at the moment, absolutely certain that the argument
list is the same as 0.4.5.
To bypass the misparse, I can still use a command like this:
# echo 0 443027195 multipath 0 0 2 1 round-robin 1 1 1 0 8:0 round-robin 1 1 1 0 8:16 | dmsetup create lun4
I have successfully set up a device mapper mapping that I can read & write
to from a rescue system, so the above command does actually work.
Unfortunately, if I correctly understand what I'm seeing, this means that the
current multipath tools 0.4.7 is incompatible with the dm_multipath module in
my 2.6.13-15.8 kernel. Also, this would seem to mean that multipathd is also
incompatible with this kernel.
Is anyone aware of a workaround or even a better identification for this
incompatibility?
Thanks!
--
D. North
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: HSG80, DM, multipath issues
2006-04-10 20:48 ` D. North
@ 2006-04-10 21:16 ` Christophe Varoqui
2006-04-10 22:16 ` D. North
0 siblings, 1 reply; 7+ messages in thread
From: Christophe Varoqui @ 2006-04-10 21:16 UTC (permalink / raw)
To: device-mapper development
D. North a écrit :
> Thus spake Christophe Varoqui (christophe.varoqui@free.fr):
>
>
>> The one I used is "force a bounce per LUN just after driver loading",
>> which this snippet does :
>>
>> sg_start -s 1 /dev/$i
>>
>
>
> Thanks! ... This got me going enough to get a /dev/mapper/lun4p1...p3 up & running!
> This was done manually from a 'rescue mode' boot of SuSE 10.
>
> After trying the required commands on a running system, it's clear now that the
> device mapper setup has to be done in the initrd if I expect to boot from this
> array. No problem there... that's just a comment.
>
> What is troubling me more at the moment is the difference between the device
> mapper table setup commands created by the multipath command and how they
> are different from what dm-mpath.c is actually expecting.
>
> I downloaded 0.4.7 multipath tools & verified that they produce the same
> table setup issues that I'm having trouble with from the 0.4.5 tool that
> is current with SuSE 10. For example, the SuSE version of multipath command
> produces this:
> 0 443027195 multipath 0 0 2 1 round-robin 1 1 8:0 1000 round-robin 1 1 8:16 1000
> Which produces the previously mentioned misparse in dm-mpath.c (my debugging mods)
> Apr 10 15:30:35 orthus-san kernel: nr_params is 1001, nr_selector_args = 1000, pg->nr_pgpaths is 8
>
> Multipath tools 0.4.7 also causes the same diagnostic message (from my
> debugging modifications), and it is sending in the same number of arguments
> (14) to dm-mpath.c. I don't, however, get the diagnostic message any more from
> this version of multipath to determine what it is actually sending into
> libdevmapper, so I am not, at the moment, absolutely certain that the argument
> list is the same as 0.4.5.
>
> To bypass the misparse, I can still use a command like this:
> # echo 0 443027195 multipath 0 0 2 1 round-robin 1 1 1 0 8:0 round-robin 1 1 1 0 8:16 | dmsetup create lun4
>
> I have successfully set up a device mapper mapping that I can read & write
> to from a rescue system, so the above command does actually work.
>
> Unfortunately, if I correctly understand what I'm seeing, this means that the
> current multipath tools 0.4.7 is incompatible with the dm_multipath module in
> my 2.6.13-15.8 kernel. Also, this would seem to mean that multipathd is also
> incompatible with this kernel.
>
>
Taken from "The Source", the multipath target syntax is :
/*-----------------------------------------------------------------
* Constructor/argument parsing:
* <#multipath feature args> [<arg>]*
* <#hw_handler args> [hw_handler [<arg>]*]
* <#priority groups>
* <initial priority group>
* [<selector> <#selector args> [<arg>]*
* <#paths> <#per-path selector args>
* [<path> [<arg>]* ]+ ]+
*---------------------------------------------------------------*/
1000 is the default number of IO routed to a path before the driver
switches to the next path in the same pathgroup. This is the simple way
to achieve the load-leveling through the "round-robin" selector.
Maybe confusingly, the rr_min_io is the "per-path selector args".
Thus, for a path group in your setup :
<selector> = round-robin
<#selector args> = 0
[<arg>]* = NULL
<#paths> = 1 (in your case)
<#per-path selector args> = 1 (for rr_min_io)
<path> = 8:0 (for one)
[<arg>]* = 1000 (default rr_min_io, proper)
What strikes me in this map "0 443027195 multipath 0 0 2 1 round-robin 1
1 8:0 1000 round-robin 1 1 8:16 1000" is the lack of <#selector args>.
... but I don't reproduce here : "0 71122560 multipath 1
queue_if_no_path 0 2 1 round-robin 0 1 1 8:32 100 round-robin 0 1 1 8:80
100"
> Is anyone aware of a workaround or even a better identification for this
> incompatibility?
>
> Thanks!
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: HSG80, DM, multipath issues
2006-04-10 21:16 ` Christophe Varoqui
@ 2006-04-10 22:16 ` D. North
0 siblings, 0 replies; 7+ messages in thread
From: D. North @ 2006-04-10 22:16 UTC (permalink / raw)
To: device-mapper development
Thus spake Christophe Varoqui (christophe.varoqui@free.fr):
> What strikes me in this map "0 443027195 multipath 0 0 2 1 round-robin 1
> 1 8:0 1000 round-robin 1 1 8:16 1000" is the lack of <#selector args>.
>
> ... but I don't reproduce here : "0 71122560 multipath 1
> queue_if_no_path 0 2 1 round-robin 0 1 1 8:32 100 round-robin 0 1 1 8:80
> 100"
Ok -- Thanks! .... That put me on the right track.
During my earlier testing, I tried several variations on the contents
of multipath.conf. The last one I'd tried had been a recommendation I read
setting path_selector to "round-robin" .... I set it to "round-robin 0" which
I had read in another recommendation, and now things are acting quite
reasonable. I think that now it is mostly a matter of getting my initrd
right and it'll be working.
Thanks!
--
D. North
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: HSG80, DM, multipath issues
2006-04-10 18:28 ` Christophe Varoqui
2006-04-10 20:48 ` D. North
@ 2006-04-10 22:00 ` Bernd Zeimetz
2006-04-10 22:17 ` Christophe Varoqui
1 sibling, 1 reply; 7+ messages in thread
From: Bernd Zeimetz @ 2006-04-10 22:00 UTC (permalink / raw)
To: device-mapper development
Hi,
>
> That problem solved, you'll hit the next wall, which is the hardware
> handler need for this array family.
if I remember right somebody posted a HW handler for this type of array
long time ago...!? Do you knwo what's up with it!?
best regards,
Bernd
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: HSG80, DM, multipath issues
2006-04-10 22:00 ` Bernd Zeimetz
@ 2006-04-10 22:17 ` Christophe Varoqui
0 siblings, 0 replies; 7+ messages in thread
From: Christophe Varoqui @ 2006-04-10 22:17 UTC (permalink / raw)
To: device-mapper development
Bernd Zeimetz a écrit :
> Hi,
>
>
>> That problem solved, you'll hit the next wall, which is the hardware
>> handler need for this array family.
>>
> if I remember right somebody posted a HW handler for this type of array
> long time ago...!? Do you knwo what's up with it!?
>
>
>
Yes, I store the different trials here :
http://christophe.varoqui.free.fr/multipath-tools/
(the dm-* files)
Mike Christie and Dave Olien coded those, at different times.
The main problem there is that HP won't endorse such handlers, nor
maintain, nor even advertise them ... so they are close to unmaintained.
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-04-10 22:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-10 17:42 HSG80, DM, multipath issues D. North
2006-04-10 18:28 ` Christophe Varoqui
2006-04-10 20:48 ` D. North
2006-04-10 21:16 ` Christophe Varoqui
2006-04-10 22:16 ` D. North
2006-04-10 22:00 ` Bernd Zeimetz
2006-04-10 22:17 ` Christophe Varoqui
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.