Using linux software raid (mdadm) in a shared-disk cluster.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Hughes <john@Calva.COM>
To: linux-raid@vger.kernel.org
Subject: Using linux software raid (mdadm) in a shared-disk cluster.
Date: Tue, 14 Apr 2009 10:58:44 +0200	[thread overview]
Message-ID: <49E45044.5000406@Calva.COM> (raw)

I've got a little shared disk cluster (parallel SCSI, external DELL 
PV210 disk cabinet).

I've used linux raid to make a nice RAID10 on the external disks.

I can access this from either machine in the cluster, only one at a time 
of course, it works very well and I'm happy.

Now I'm running XEN and I want to be able to migrate a XEN domU from one 
machine to the other while the domU is using the RAID10 device.  I can 
make this "work" using XEN's migration hooks - it calls a script when it 
has stopped the running domU and I can start the raid device on the 
destination node, ready for the arrival of the domU.

There is one small problem - I can't stop the RAID10 on the source node 
until the domU has finished, so it seems to me there is a window that 
could lead to data corruption:

Source node                             Destination node

mdadm --assemble /dev/md0 ....
Start migrate
domU suspended
call migration script
               \-------------------->   mdadm --assemble /dev/md0 ...
                                        domU starts running
...
domU destroyed
mdadm --stop /dev/md0

I seems to me that the source node could still be messing with the 
bitmap and resyncing between the moment the destination node
starts the RAID10 and the source node stops it[*].

Am I right?  Is there a window?

If there is a window it could be closed if there was some kind of mdadm 
--freeze command which would stop the sync activity, which could be run 
on the source node before doing the assemble on the destination node.

([*] - imagine some block is marked unsynced in the bitmap.  The 
destination node does the assemble, so now it's in-memory bitmap has the 
block marked.  The source node syncs the block, updates the on disk 
bitmap.   Now the destination node happens to write that block,  it 
thinks the block is marked unsynced on the disk so it doesn't bother 
updating the bitmnap.  If the destination node crashes at this point 
there is a block on the disk that is unsyced, but the bitmap claims it's 
in sync.)

next             reply	other threads:[~2009-04-14  8:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-14  8:58 John Hughes [this message]
2009-04-22  8:54 ` Using linux software raid (mdadm) in a shared-disk cluster Goswin von Brederlow
2009-04-23  9:22   ` John Hughes
2009-04-23 20:30     ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49E45044.5000406@Calva.COM \
    --to=john@calva.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).