linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ethan Wilson <ethan.wilson@shiftmail.org>
To: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: md with shared disks
Date: Mon, 10 Nov 2014 17:40:19 +0100	[thread overview]
Message-ID: <5460EA73.4080009@shiftmail.org> (raw)
In-Reply-To: <545F2630.8090307@true.co.za>

On 09/11/2014 09:30, Anton Ekermans wrote:
> Good day raiders,
> I have a question on md that I cannot find (up to date) answer to.
> We use SuperMicro server with 16 shared disks on a shared backplane 
> between two motherboards, running up to date CentOS7.
> If I create an array on one node, the other node can detect it. I put 
> GFS2 on top of the array so both system can share the filesystem, but 
> I want to know if md raid is safe to be used in this way with possibly 
> 2 active/active nodes changing the metadata at the same time. I've 
> disabled raid-check cron job on one node so they don't both resync the 
> drives weekly, but I suspect there's a lot more to it than that.
>
> If it's not possible, then alternatively some advice on strategy to 
> have a large active/active shared disk/filesystem would also be welcome.

Not possible, as far as I know: MD does not reload / exchange metadata 
information with other MD peers. MD thinks it is the only user of those 
disks.
If you attempt to share the arrays and then one head fails one disk and 
starts reconstruction onto another disk, while the other head thinks the 
array is all right, havoc will arise certainly.

Even without this worst-case scenario, data probably will be still lost 
because the two MDs are not cache coherent, so writes on one head will 
not invalidate the kernel cache for the same region on the other head, 
and this is bad because reads performed on the other head will not see 
the changes just written if such area was cached in the kernel.
GFS actually will attempt to invalidate such cache but I am not sure to 
what extent: if you use raid5/6 probably it is not enough because the 
stripe-cache will hold stale data in a way that GFS probably does not 
know about (does not go away even with echo 3 > /proc/sys/vm/drop_caches 
). Maybe raid0/1/10 can be safer... anybody knows if cache dropping 
works well there?
But the problem of consistent vision of disk failures and raid 
reconstruction seems harder to overcome.

You can do an active/passive configuration, shutting down MD on one head 
and starting it on the other head.
Another option is the crossed-active or whatever it is called: some 
arrays are active on one head node, other arrays on the other head node, 
so to share the computational and bandwidth burden.

If other people have better ideas I am all ears.

Regards
EW


  reply	other threads:[~2014-11-10 16:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-09  8:30 md with shared disks Anton Ekermans
2014-11-10 16:40 ` Ethan Wilson [this message]
2014-11-10 22:14 ` Stan Hoeppner
2014-11-13 13:14   ` Anton Ekermans
2014-11-13 20:56     ` Stan Hoeppner
2014-11-13 22:53       ` Ethan Wilson
2014-11-14  0:07         ` Stan Hoeppner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5460EA73.4080009@shiftmail.org \
    --to=ethan.wilson@shiftmail.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).