cluster raid

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* cluster raid
@ 2003-04-28 14:06 Miguel Biscaia
  2003-04-28 14:16 ` Lars Marowsky-Bree
  2003-04-30 20:52 ` Steven Dake
  0 siblings, 2 replies; 10+ messages in thread
From: Miguel Biscaia @ 2003-04-28 14:06 UTC (permalink / raw)
  To: 'linux-raid@vger.kernel.org'

Hello,
At my company we are implementing a shared disk database arquitecture for
telecom call processing, based on Linux. I would like to use MD devices,
although I´m not aware if the MD devices can be configured and started in
cluster environments. Is it possible? In which kernel version?

Thanks in advance.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cluster raid
  2003-04-28 14:06 cluster raid Miguel Biscaia
@ 2003-04-28 14:16 ` Lars Marowsky-Bree
  2003-04-30 20:52 ` Steven Dake
  1 sibling, 0 replies; 10+ messages in thread
From: Lars Marowsky-Bree @ 2003-04-28 14:16 UTC (permalink / raw)
  To: Miguel Biscaia, 'linux-raid@vger.kernel.org'

On 2003-04-28T15:06:03,
   Miguel Biscaia <miguel-b-pias@ptinovacao.pt> said:

> Hello,
> At my company we are implementing a shared disk database arquitecture for
> telecom call processing, based on Linux. I would like to use MD devices,
> although I?m not aware if the MD devices can be configured and started in
> cluster environments. Is it possible? In which kernel version?

They cannot be used in a shared configuration; it will only work if only
one node has activated any given md device at any time.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
SuSE Labs - Research & Development, SuSE Linux AG
  
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cluster raid
  2003-04-28 14:06 cluster raid Miguel Biscaia
  2003-04-28 14:16 ` Lars Marowsky-Bree
@ 2003-04-30 20:52 ` Steven Dake
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Dake @ 2003-04-30 20:52 UTC (permalink / raw)
  To: Miguel Biscaia; +Cc: 'linux-raid@vger.kernel.org'

Miguel,

My company (MontaVista software) is funding me to work on a solution for 
this for some time.  It requires node-node 
communication/membership/locking/master failover and hence a cluster 
manager.  I am nearing completion of the cluster manager currently, and 
then will work on the cluster services required to support md.  I'll 
post to this list once I have something working that I can open source.

You cannot currently have two devices open on multiple nodes.  It is 
possible to use RAIDs in shared storage where each node only opens up 
"its" assigned RAID devices.  This can be enforced by RAID locking (see 
the list for a patch for this technique).  I have a patch which locks 
based upon geographical address/slot#, fibrechannel WWN, scsi host # but 
I have not had time to make this available and there wasn't a big 
response when I posted the original patch.

Thanks
-steve

Miguel Biscaia wrote:

>Hello,
>At my company we are implementing a shared disk database arquitecture for
>telecom call processing, based on Linux. I would like to use MD devices,
>although I´m not aware if the MD devices can be configured and started in
>cluster environments. Is it possible? In which kernel version?
>
>Thanks in advance.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>  
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* cluster RAID
@ 2003-05-25  7:18 Kai-Min Sung
  2003-05-25 10:57 ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Kai-Min Sung @ 2003-05-25  7:18 UTC (permalink / raw)
  To: linux-raid

Hi,
	I have a shared storage environment (2 disks accessible by 2
nodes through iSCSI) and am trying to assemble the same RAID-1 array on
both nodes. Whenever I try to assemble the RAID-1 array on the second
node, it always begins reconstructing the mirror. My guess for why it's
doing this is that after the first node assembles the array, it marks a
dirty flag in the RAID metadata blocks on disk. (It only resets the
dirty flag when it deactivates the array). When the second node tries to
assemble the same array, it reads the metadata blocks and sees that it
is dirty. Then it proceeds with reconstruction. My question is does
reconstruction happen, simply because the dirty flag is set? Why doesn't
it first check if the checksums on all the mirror disks match (i.e. the
array is consistent) and bypass reconstruction? Btw, I plan for both
nodes to be accessing different partitions in the array, so there
shouldn't be any synchronization problems. Also, I'm using mdadm-1.2.0
for my testing.

Regards,
Kai-Min Sung 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cluster RAID
  2003-05-25  7:18 cluster RAID Kai-Min Sung
@ 2003-05-25 10:57 ` Neil Brown
  2003-05-25 11:51   ` Kai-Min Sung
  2003-05-27 19:24   ` Steven Dake
  0 siblings, 2 replies; 10+ messages in thread
From: Neil Brown @ 2003-05-25 10:57 UTC (permalink / raw)
  To: Kai-Min Sung; +Cc: linux-raid

On Sunday May 25, k@kaisung.com wrote:
> Hi,
> 	I have a shared storage environment (2 disks accessible by 2
> nodes through iSCSI) and am trying to assemble the same RAID-1 array on
> both nodes. Whenever I try to assemble the RAID-1 array on the second
> node, it always begins reconstructing the mirror. My guess for why it's
> doing this is that after the first node assembles the array, it marks a
> dirty flag in the RAID metadata blocks on disk. (It only resets the
> dirty flag when it deactivates the array). When the second node tries to
> assemble the same array, it reads the metadata blocks and sees that it
> is dirty. Then it proceeds with reconstruction. My question is does
> reconstruction happen, simply because the dirty flag is set? Why doesn't
> it first check if the checksums on all the mirror disks match (i.e. the
> array is consistent) and bypass reconstruction? Btw, I plan for both
> nodes to be accessing different partitions in the array, so there
> shouldn't be any synchronization problems. Also, I'm using mdadm-1.2.0
> for my testing.

What you are doing doesn't really make sense (at least not to me).

Having two hosts both trying to control a raid1 array cannot work as
neither host can make an guarantees about consistancy.

If you plan for both nodes to be accessing different partitions on the
array, why not be up-front about that and have two different raid1
arrays.

e.g. If your two drives are sda and sdb, then partition them into
  sda1, sda2, sdb1, sdb2

and then make md0 on node X from sda1 and sdb1, and
   md3 (or whatever) on node Y from sda2 and sdb2.

To answer your question - yes, the second node reconstucts because the
super block is marked dirty.  I'm not sure what you mean by "check if
the checksums on all mirror disks match".  What checksums?
  checksums of all data blocks?  That is as much work as a complete resync
  checksums of super blocks?  That wouldn't tell us anything useful.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: cluster RAID
  2003-05-25 10:57 ` Neil Brown
@ 2003-05-25 11:51   ` Kai-Min Sung
  2003-05-26  6:10     ` Neil Brown
  2003-05-27 19:24   ` Steven Dake
  1 sibling, 1 reply; 10+ messages in thread
From: Kai-Min Sung @ 2003-05-25 11:51 UTC (permalink / raw)
  To: 'Neil Brown'; +Cc: linux-raid

Hi Neil,
	I should probably elaborate a little more on my envisioned
setup. I'd like to setup a RAID array using shared disks, then run LVM
on top of the RAID array to carve out logical volumes. The logical
volumes would be used by one node at a time (no data sharing). For both
nodes to access the same LVM metadata, they must both activate the RAID
array. It seemed like someone had done something similar here:

http://marc.theaimsgroup.com/?l=linux-raid&m=98834623209225&w=2

Concerning the checksums, I was referring to the checksum output when I
run mdadm --examine on a RAID disk. I thought that was a checksum of all
the data blocks of the disk, but I guess not? So, basically the only
time you can guarantee a RAID array is consistent is once it's been
deactivated and the dirty flag cleared in the superblock?

Thanks,
-Kai

-----Original Message-----
From: Neil Brown [mailto:neilb@cse.unsw.edu.au] 
Sent: Sunday, May 25, 2003 3:57 AM
To: Kai-Min Sung
Cc: linux-raid@vger.kernel.org
Subject: Re: cluster RAID

On Sunday May 25, k@kaisung.com wrote:
> Hi,
> 	I have a shared storage environment (2 disks accessible by 2
> nodes through iSCSI) and am trying to assemble the same RAID-1 array
on
> both nodes. Whenever I try to assemble the RAID-1 array on the second
> node, it always begins reconstructing the mirror. My guess for why
it's
> doing this is that after the first node assembles the array, it marks
a
> dirty flag in the RAID metadata blocks on disk. (It only resets the
> dirty flag when it deactivates the array). When the second node tries
to
> assemble the same array, it reads the metadata blocks and sees that it
> is dirty. Then it proceeds with reconstruction. My question is does
> reconstruction happen, simply because the dirty flag is set? Why
doesn't
> it first check if the checksums on all the mirror disks match (i.e.
the
> array is consistent) and bypass reconstruction? Btw, I plan for both
> nodes to be accessing different partitions in the array, so there
> shouldn't be any synchronization problems. Also, I'm using mdadm-1.2.0
> for my testing.

What you are doing doesn't really make sense (at least not to me).

Having two hosts both trying to control a raid1 array cannot work as
neither host can make an guarantees about consistancy.

If you plan for both nodes to be accessing different partitions on the
array, why not be up-front about that and have two different raid1
arrays.

e.g. If your two drives are sda and sdb, then partition them into
  sda1, sda2, sdb1, sdb2

and then make md0 on node X from sda1 and sdb1, and
   md3 (or whatever) on node Y from sda2 and sdb2.

To answer your question - yes, the second node reconstucts because the
super block is marked dirty.  I'm not sure what you mean by "check if
the checksums on all mirror disks match".  What checksums?
  checksums of all data blocks?  That is as much work as a complete
resync
  checksums of super blocks?  That wouldn't tell us anything useful.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: cluster RAID
  2003-05-25 11:51   ` Kai-Min Sung
@ 2003-05-26  6:10     ` Neil Brown
  2003-05-26 12:25       ` Sean Kormilo
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2003-05-26  6:10 UTC (permalink / raw)
  To: Kai-Min Sung; +Cc: linux-raid

On Sunday May 25, k@kaisung.com wrote:
> Hi Neil,
> 	I should probably elaborate a little more on my envisioned
> setup. I'd like to setup a RAID array using shared disks, then run LVM
> on top of the RAID array to carve out logical volumes. The logical
> volumes would be used by one node at a time (no data sharing). For both
> nodes to access the same LVM metadata, they must both activate the RAID
> array. It seemed like someone had done something similar here:
> 
> http://marc.theaimsgroup.com/?l=linux-raid&m=98834623209225&w=2
> 
> Concerning the checksums, I was referring to the checksum output when I
> run mdadm --examine on a RAID disk. I thought that was a checksum of all
> the data blocks of the disk, but I guess not? So, basically the only
> time you can guarantee a RAID array is consistent is once it's been
> deactivated and the dirty flag cleared in the superblock?

It isn't clear what the auther of the refered mail is really doing,
but I stand by my position that this cannot work.

A raid array is not just a bunch of discs.  It is also a controller.
With Linux Soft raid, that controller is a computer running Linux.  
In the case of hardware raid, it might be an embeded system on a board
somewhere.  But there is still a single controller.

It probably would not be impossible to arrange a soft RAID system
where separate controller (Linux systems) could co-operate and manage
separate parts of the array, but it would be a lot of work.

The checksum is the checksum of the superblock, not the whole array.
Yes.  Yoo can only guarantee consustency if the dirty flag is clear.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: cluster RAID
  2003-05-26  6:10     ` Neil Brown
@ 2003-05-26 12:25       ` Sean Kormilo
  0 siblings, 0 replies; 10+ messages in thread
From: Sean Kormilo @ 2003-05-26 12:25 UTC (permalink / raw)
  To: Neil Brown; +Cc: Kai-Min Sung, linux-raid

Kai,

This sounds like a job for either OpenGFS or DRBD.
http://opengfs.sourceforge.net/
http://www.complang.tuwien.ac.at/reisner/drbd/

Sean.

> On Sunday May 25, k@kaisung.com wrote:
> > Hi Neil,
> > 	I should probably elaborate a little more on my envisioned
> > setup. I'd like to setup a RAID array using shared disks, then run LVM
> > on top of the RAID array to carve out logical volumes. The logical
> > volumes would be used by one node at a time (no data sharing). For both
> > nodes to access the same LVM metadata, they must both activate the RAID
> > array. It seemed like someone had done something similar here:
> > 
> > http://marc.theaimsgroup.com/?l=linux-raid&m=98834623209225&w=2
> > 
> > Concerning the checksums, I was referring to the checksum output when I
> > run mdadm --examine on a RAID disk. I thought that was a checksum of all
> > the data blocks of the disk, but I guess not? So, basically the only
> > time you can guarantee a RAID array is consistent is once it's been
> > deactivated and the dirty flag cleared in the superblock?
> 
> It isn't clear what the auther of the refered mail is really doing,
> but I stand by my position that this cannot work.
> 
> A raid array is not just a bunch of discs.  It is also a controller.
> With Linux Soft raid, that controller is a computer running Linux.  
> In the case of hardware raid, it might be an embeded system on a board
> somewhere.  But there is still a single controller.
> 
> It probably would not be impossible to arrange a soft RAID system
> where separate controller (Linux systems) could co-operate and manage
> separate parts of the array, but it would be a lot of work.
> 
> 
> The checksum is the checksum of the superblock, not the whole array.
> Yes.  Yoo can only guarantee consustency if the dirty flag is clear.
> 
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 

Sean C. Kormilo, STORM Software Architect, Nortel Networks
              email: skormilo@nortelnetworks.com
  


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: cluster RAID
  2003-05-25 10:57 ` Neil Brown
  2003-05-25 11:51   ` Kai-Min Sung
@ 2003-05-27 19:24   ` Steven Dake
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Dake @ 2003-05-27 19:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: Kai-Min Sung, linux-raid

Cluster RAID (accessing one storage device from multiple nodes) is 
useful when using a clustered volume manager or clustered filesystem.  
Without clustered RAID underneath, it is difficult to provide redundancy 
unless the clustered volume manager provides this functionality (which 
it currently does not).

It is possible to deal with the consistency issue but requires node-node 
communication within the cluster, and hence, a cluster framework.

Thanks
-steve

Neil Brown wrote:

>On Sunday May 25, k@kaisung.com wrote:
>  
>
>>Hi,
>>	I have a shared storage environment (2 disks accessible by 2
>>nodes through iSCSI) and am trying to assemble the same RAID-1 array on
>>both nodes. Whenever I try to assemble the RAID-1 array on the second
>>node, it always begins reconstructing the mirror. My guess for why it's
>>doing this is that after the first node assembles the array, it marks a
>>dirty flag in the RAID metadata blocks on disk. (It only resets the
>>dirty flag when it deactivates the array). When the second node tries to
>>assemble the same array, it reads the metadata blocks and sees that it
>>is dirty. Then it proceeds with reconstruction. My question is does
>>reconstruction happen, simply because the dirty flag is set? Why doesn't
>>it first check if the checksums on all the mirror disks match (i.e. the
>>array is consistent) and bypass reconstruction? Btw, I plan for both
>>nodes to be accessing different partitions in the array, so there
>>shouldn't be any synchronization problems. Also, I'm using mdadm-1.2.0
>>for my testing.
>>    
>>
>
>What you are doing doesn't really make sense (at least not to me).
>
>Having two hosts both trying to control a raid1 array cannot work as
>neither host can make an guarantees about consistancy.
>
>If you plan for both nodes to be accessing different partitions on the
>array, why not be up-front about that and have two different raid1
>arrays.
>
>e.g. If your two drives are sda and sdb, then partition them into
>  sda1, sda2, sdb1, sdb2
>
>and then make md0 on node X from sda1 and sdb1, and
>   md3 (or whatever) on node Y from sda2 and sdb2.
>
>
>To answer your question - yes, the second node reconstucts because the
>super block is marked dirty.  I'm not sure what you mean by "check if
>the checksums on all mirror disks match".  What checksums?
>  checksums of all data blocks?  That is as much work as a complete resync
>  checksums of super blocks?  That wouldn't tell us anything useful.
>
>NeilBrown
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: cluster RAID
@ 2003-05-27 19:47 Brian Schwarz
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Schwarz @ 2003-05-27 19:47 UTC (permalink / raw)
  To: 'Steven Dake', Neil Brown; +Cc: Kai-Min Sung, linux-raid

I'll apologize in advance for the blatant marketing.

Steven is correct, a clusterized volume manager with built in RAID
functionality would be ideal in this situation. VERITAS is plans to release
a software suite that includes clusterized versions of our volume manager
and the VERITAS File System. 

The VERITAS File System and Volume Manager are available today on Linux.
This includes RAID, online config changes, multi-pathing, etc... The
clusterized versions will be available towards the end of this year. The
clusterized versions use some of the clustering infrastructure from our
VERITAS Cluster Server offering, which is available today on Linux as well. 

For more info:
www.veritas.com/linux
www.veritas.com/forlinux

or contact me directly.

Regards,

-Brian

-----Original Message-----
From: Steven Dake [mailto:sdake@mvista.com]
Sent: Tuesday, May 27, 2003 12:25 PM
To: Neil Brown
Cc: Kai-Min Sung; linux-raid@vger.kernel.org
Subject: Re: cluster RAID


Cluster RAID (accessing one storage device from multiple nodes) is 
useful when using a clustered volume manager or clustered filesystem.  
Without clustered RAID underneath, it is difficult to provide redundancy 
unless the clustered volume manager provides this functionality (which 
it currently does not).

It is possible to deal with the consistency issue but requires node-node 
communication within the cluster, and hence, a cluster framework.

Thanks
-steve

Neil Brown wrote:

>On Sunday May 25, k@kaisung.com wrote:
>  
>
>>Hi,
>>	I have a shared storage environment (2 disks accessible by 2
>>nodes through iSCSI) and am trying to assemble the same RAID-1 array on
>>both nodes. Whenever I try to assemble the RAID-1 array on the second
>>node, it always begins reconstructing the mirror. My guess for why it's
>>doing this is that after the first node assembles the array, it marks a
>>dirty flag in the RAID metadata blocks on disk. (It only resets the
>>dirty flag when it deactivates the array). When the second node tries to
>>assemble the same array, it reads the metadata blocks and sees that it
>>is dirty. Then it proceeds with reconstruction. My question is does
>>reconstruction happen, simply because the dirty flag is set? Why doesn't
>>it first check if the checksums on all the mirror disks match (i.e. the
>>array is consistent) and bypass reconstruction? Btw, I plan for both
>>nodes to be accessing different partitions in the array, so there
>>shouldn't be any synchronization problems. Also, I'm using mdadm-1.2.0
>>for my testing.
>>    
>>
>
>What you are doing doesn't really make sense (at least not to me).
>
>Having two hosts both trying to control a raid1 array cannot work as
>neither host can make an guarantees about consistancy.
>
>If you plan for both nodes to be accessing different partitions on the
>array, why not be up-front about that and have two different raid1
>arrays.
>
>e.g. If your two drives are sda and sdb, then partition them into
>  sda1, sda2, sdb1, sdb2
>
>and then make md0 on node X from sda1 and sdb1, and
>   md3 (or whatever) on node Y from sda2 and sdb2.
>
>
>To answer your question - yes, the second node reconstucts because the
>super block is marked dirty.  I'm not sure what you mean by "check if
>the checksums on all mirror disks match".  What checksums?
>  checksums of all data blocks?  That is as much work as a complete resync
>  checksums of super blocks?  That wouldn't tell us anything useful.
>
>NeilBrown
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>  
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-05-27 19:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-28 14:06 cluster raid Miguel Biscaia
2003-04-28 14:16 ` Lars Marowsky-Bree
2003-04-30 20:52 ` Steven Dake
  -- strict thread matches above, loose matches on Subject: below --
2003-05-25  7:18 cluster RAID Kai-Min Sung
2003-05-25 10:57 ` Neil Brown
2003-05-25 11:51   ` Kai-Min Sung
2003-05-26  6:10     ` Neil Brown
2003-05-26 12:25       ` Sean Kormilo
2003-05-27 19:24   ` Steven Dake
2003-05-27 19:47 Brian Schwarz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).