From: Eli Stair <estair@ilm.com>
To: dm-devel@redhat.com
Subject: dm-multipath (multipathd) not removing/adding channels back on one device in a multipath array
Date: Mon, 09 Oct 2006 15:35:41 -0700 [thread overview]
Message-ID: <452ACEBD.60205@ilm.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 3569 bytes --]
All,
I'm experiencing repeatable issues with multipathd (but not the kernel
detecting, or multipath manually) failing to add and/or remove paths to
a single device on a dual-loop FC disk tray. If I stop multipathd from
running, the kernel sees the paths as unreachable and marks them as
'failed' in the multipath -l output. If I run 'multipath' manually, it
_always_ picks up or removes the appropriate channels for all devices.
The failure mode comes up when using multipathd to auto-correct for path
failures. There is only a /single/ device (the first FC drive in the
array) that (reliably) has issues.
When running multipathd, the drive that is enumerated as /dev/sdb &&
/dev/sdp (14-drive enclosure sdb-sdo, drive naming re-starts at
/dev/sdp) gets skipped upon removal or addition of the path at least 50%
of the time. No amount of time I've waited has resulted in multipathd
making another attempt at fixing the path, however, running 'multipath'
immediately results in IT cleaning up the straggler and it is made
proper with output to that effect. Of note, if I leave multipathd off
and do not manually run multipath before reconnecting the FC channel,
upon disconnecting it again the system OOPS'es like mad, and hard-crashes.
// Example of multipathd leaving the drive (mpath3) with only one path
"up", while the others have both paths present:
mpath10 (32000000c50e8df4b)
[size=136 GB][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=0][active]
\_ 2:0:8:0 sdj 8:144 [active][undef]
\_ 3:0:8:0 sdx 65:112 [active][undef]
mpath3 (320000011c6bdfbd5)
[size=136 GB][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=0][active]
\_ 2:0:0:0 sdb 8:16 [active][undef]
// Example output (snippet) of 'multipath -v4' after 'multipathd' fails
to fix it:
mpath3: set ACT_RELOAD (path group topology change)
reload: mpath3 (320000011c6bdfbd5)
[size=136 GB][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][undef]
\_ 2:0:0:0 sdb 8:16 [active][ready]
\_ 3:0:0:0 sdp 8:240 [undef][ready]
The ACT_RELOAD line is what differs at this point, as all the other
fully-populated multipath devices show for instance "mpath0: set
ACT_NOTHING (map unchanged)". It seems that whatever criteria
multipathd is using to test a device and adjust its settings are failing
on this first-enumerated disk when it starts looking at the drives
through the second FC loop.
I've attached typescript of both "multipathd -d" in one file, and the
multipath -l and multipath -v output in a second file. It indicates in
detail the sequence of events on both loop addition and removal from the
system. dmesg output also attached.
I'd love to be of as much assistance as can, as I have eight systems
currently with this problem, and can't do much with them as of yet. I
have a set of QLogic qla2300 controllers as well as different disk trays
I'll be testing to see if this is a controller or enclosure-specific issue.
Please let me know what more I can do/provide/try to make myself useful.
Cheers,
/eli
/////////////
Some info:
2x Opteron 248, 8GB RAM
Tyan S2882 and Arima HDAMA boards tested.
kernel 2.6.18 (64-bit, SMP, NUMA)
dm-multipath v0.4.7 (03/12, 2006)
2 LSI FC adapters single-port 2G
14-drive LSI FC JBOD tray (model 2600/0834) dual-controller 2G
#multipath.conf:
defaults {
polling_interval 5
path_grouping_policy multibus
rr_min_io 100
failback 15
no_path_retry 2
}
[-- Attachment #2: multipathd-debug.out.bz2 --]
[-- Type: application/octet-stream, Size: 1565 bytes --]
[-- Attachment #3: multipath-debug.out.bz2 --]
[-- Type: application/octet-stream, Size: 8134 bytes --]
[-- Attachment #4: dmesg.2006-10-09.bz2 --]
[-- Type: application/octet-stream, Size: 8973 bytes --]
[-- Attachment #5: Type: text/plain, Size: 0 bytes --]
reply other threads:[~2006-10-09 22:35 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=452ACEBD.60205@ilm.com \
--to=estair@ilm.com \
--cc=dm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.