From: Ethan Wilson <ethan.wilson@shiftmail.org>
To: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: md with shared disks
Date: Mon, 10 Nov 2014 17:40:19 +0100 [thread overview]
Message-ID: <5460EA73.4080009@shiftmail.org> (raw)
In-Reply-To: <545F2630.8090307@true.co.za>
On 09/11/2014 09:30, Anton Ekermans wrote:
> Good day raiders,
> I have a question on md that I cannot find (up to date) answer to.
> We use SuperMicro server with 16 shared disks on a shared backplane
> between two motherboards, running up to date CentOS7.
> If I create an array on one node, the other node can detect it. I put
> GFS2 on top of the array so both system can share the filesystem, but
> I want to know if md raid is safe to be used in this way with possibly
> 2 active/active nodes changing the metadata at the same time. I've
> disabled raid-check cron job on one node so they don't both resync the
> drives weekly, but I suspect there's a lot more to it than that.
>
> If it's not possible, then alternatively some advice on strategy to
> have a large active/active shared disk/filesystem would also be welcome.
Not possible, as far as I know: MD does not reload / exchange metadata
information with other MD peers. MD thinks it is the only user of those
disks.
If you attempt to share the arrays and then one head fails one disk and
starts reconstruction onto another disk, while the other head thinks the
array is all right, havoc will arise certainly.
Even without this worst-case scenario, data probably will be still lost
because the two MDs are not cache coherent, so writes on one head will
not invalidate the kernel cache for the same region on the other head,
and this is bad because reads performed on the other head will not see
the changes just written if such area was cached in the kernel.
GFS actually will attempt to invalidate such cache but I am not sure to
what extent: if you use raid5/6 probably it is not enough because the
stripe-cache will hold stale data in a way that GFS probably does not
know about (does not go away even with echo 3 > /proc/sys/vm/drop_caches
). Maybe raid0/1/10 can be safer... anybody knows if cache dropping
works well there?
But the problem of consistent vision of disk failures and raid
reconstruction seems harder to overcome.
You can do an active/passive configuration, shutting down MD on one head
and starting it on the other head.
Another option is the crossed-active or whatever it is called: some
arrays are active on one head node, other arrays on the other head node,
so to share the computational and bandwidth burden.
If other people have better ideas I am all ears.
Regards
EW
next prev parent reply other threads:[~2014-11-10 16:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-09 8:30 md with shared disks Anton Ekermans
2014-11-10 16:40 ` Ethan Wilson [this message]
2014-11-10 22:14 ` Stan Hoeppner
2014-11-13 13:14 ` Anton Ekermans
2014-11-13 20:56 ` Stan Hoeppner
2014-11-13 22:53 ` Ethan Wilson
2014-11-14 0:07 ` Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5460EA73.4080009@shiftmail.org \
--to=ethan.wilson@shiftmail.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).