linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Pelletier <plr.vincent@gmail.com>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Split-Brain Protection for MD arrays
Date: Mon, 12 Dec 2011 21:18:28 +0100	[thread overview]
Message-ID: <201112122118.28351.plr.vincent@gmail.com> (raw)
In-Reply-To: <CAGRgLy6=-naSGJw_tgiD5=ab7gWxyeQ2ysu-yCKa064Jih+cfA@mail.gmail.com>

Le lundi 12 décembre 2011 19:51:23, vous avez écrit :
> split-brain

I'm participating on the NEO[1] project (object database server with 
redundancy - that last bit is the one relevant to this discussion), which 
faces the same kind of problem (storage nodes dying when cluster is functional 
or not, dead nodes comming back to life later, etc). So we had to design some 
counter measures to handle split-brain. 

I'm happy to recognise some equivalent of the decisions we took on NEO, and 
I'll be following this thread with attention (we didn't try to get a lot of 
reviewing on our design so far).

I would suggest one thing:
Use a fixed increment for "metadata version" number. Time representation is
not reliable IMHO, especially at times when you need to setup an array:
faulty BIOS battery, old RTC drifting either way, no NTP to correct this
(either none available or no client to access one).
If timestamp is affected by timezone (and especially DST) makes matters
worse.
Admitedly, fixed increment exposes user to problems if he decides to
independently run two halves of a split brain, start making their data
diverge, reach a point (controlable) where version number is at some
convenient value and then let the array assemble itself and burst in fire.
Though, user has to jump through hoops to reach this. Timestamp-based
requires non-monotonous RTC.

Side note: if anyone knows a time source available to userland which is not
affected by date/ntpd/ntpdate nor timezones nor DST (but can drift when 
computer is powered down - but if possible not when suspended), please tell 
me.

[1] http://pypi.python.org/pypi/neoppod

Regards,
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-12-12 20:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-12 18:51 Split-Brain Protection for MD arrays Alexander Lyakas
2011-12-12 20:18 ` Vincent Pelletier [this message]
2011-12-13  9:50   ` Alexander Lyakas
2011-12-15  3:02 ` NeilBrown
2011-12-15 14:29   ` Alexander Lyakas
2011-12-15 19:40     ` NeilBrown
2011-12-16 13:46       ` Roberto Spadim
2011-12-16 14:30       ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201112122118.28351.plr.vincent@gmail.com \
    --to=plr.vincent@gmail.com \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).