linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* multipath md devices
@ 2004-09-22 20:41 Anu Matthew
  2004-09-22 21:35 ` Doug Ledford
  0 siblings, 1 reply; 7+ messages in thread
From: Anu Matthew @ 2004-09-22 20:41 UTC (permalink / raw)
  To: linux-raid

Hi,

We have multipath devices created on SAN Luns. Say md0 is created on 
/dev/sdj and /dev/sde, the latter being the alternate path for /dev/sdj.

I've noticed the following:

1) Without much IO to the md device, and  I pull out the cable to say 
/dev/sdj, the /proc/mdstat still shows both devices.  /proc/mdstat won't 
get updated unless I start some considerable IO to the md device. Even 
mdadm scan/query o/p shows both the paths, which is not true. As we 
start IO, /proc/mdstat reflects that one of the devices, /dev/sdj in 
this case, has failed. Thereafter mdadm outputs would be correct too.

The entries (link down) in syslog and dmesg are almost instantaneous 
when the cable is pulled out. This makes it very difficult to monitor 
multipath devices, as we cannot rely on /proc/mdstat to read.  

2) Another situation: Device md0 is active, with healthy multipaths 
/dev/sdj and /dev/sde, under reasonable IO activity. If the cable to 
/dev/sdj is yanked out, md0 remains still active, thanks to the 
alternate path, sde. However, it fails to go back and re-construct the 
spare path allocation even after the fibre link is restored. Here, if I 
pull the cable out for sde even after 30 minutes, the machine ends up 
failing to write to /dev/md0 as it does not care whether /dev/sdj is 
back online, unless I failed, removed and add /dev/sdj  manually from 
the mdadm command line. If something is hard mounted on /dev/md0, it may 
end up in a system crash.

To conclude, if one path goes off, and comes back after a while, and 
then the second path goes off, md0 cannot be read, unless someone 
manually did fail, remove and add the first device which came back 
online, before the second path goes off.

Any help towards this will be much appreciated.

Thanks,

--AM.

^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: multipath md devices
@ 2004-09-22 21:55 Doug Griswold
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Griswold @ 2004-09-22 21:55 UTC (permalink / raw)
  To: linux-raid

>>> Doug Ledford <dledford@redhat.com> 09/22/04 5:35 PM >>>
On Wed, 2004-09-22 at 16:41, Anu Matthew wrote:
> Hi,
> 
> We have multipath devices created on SAN Luns. Say md0 is created on 
> /dev/sdj and /dev/sde, the latter being the alternate path for
/dev/sdj.
> 
> I've noticed the following:
> 
> 1) Without much IO to the md device, and  I pull out the cable to say 
> /dev/sdj, the /proc/mdstat still shows both devices.  /proc/mdstat
won't 
> get updated unless I start some considerable IO to the md device. Even

> mdadm scan/query o/p shows both the paths, which is not true. As we 
> start IO, /proc/mdstat reflects that one of the devices, /dev/sdj in 
> this case, has failed. Thereafter mdadm outputs would be correct too.
> 
> The entries (link down) in syslog and dmesg are almost instantaneous 
> when the cable is pulled out. This makes it very difficult to monitor 
> multipath devices, as we cannot rely on /proc/mdstat to read.  

> /proc/mdstat will be correct once the first physical read/write on the
> yanked path fails.

Is this true even if the lightpath is not dead?  

> 2) Another situation: Device md0 is active, with healthy multipaths 
> /dev/sdj and /dev/sde, under reasonable IO activity. If the cable to 
> /dev/sdj is yanked out, md0 remains still active, thanks to the 
> alternate path, sde. However, it fails to go back and re-construct the

> spare path allocation even after the fibre link is restored. Here, if
I 
> pull the cable out for sde even after 30 minutes, the machine ends up 
> failing to write to /dev/md0 as it does not care whether /dev/sdj is 
> back online, unless I failed, removed and add /dev/sdj  manually from 
> the mdadm command line. If something is hard mounted on /dev/md0, it
may 
> end up in a system crash.
> 
> To conclude, if one path goes off, and comes back after a while, and 
> then the second path goes off, md0 cannot be read, unless someone 
> manually did fail, remove and add the first device which came back 
> online, before the second path goes off.

> Yeah, IBM wrote a little app to help with that.  We stuffed it into
the
> mdadm package we ship since that seemed the most appropriate place for
> it.  It's called mdmpd and that's it's job basically.  Very simple
app,
> but doesn't run on upstream kernels at the moment (it wants the md
event
> interface which hasn't yet been submitted upstream by Neil).

> Any help towards this will be much appreciated.
> 
> Thanks,
> 
> --AM.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc.
         1801 Varsity Dr.
         Raleigh, NC 27606


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-09-24 16:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-22 20:41 multipath md devices Anu Matthew
2004-09-22 21:35 ` Doug Ledford
2004-09-22 22:20   ` Anu Matthew
2004-09-23  5:46   ` Luca Berra
2004-09-24  1:10     ` Neil Brown
2004-09-24 16:41       ` Doug Ledford
  -- strict thread matches above, loose matches on Subject: below --
2004-09-22 21:55 Doug Griswold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).