From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 25 Aug 2004 12:28:52 +0200 From: Lars Marowsky-Bree To: Philipp Reisner , drbd-dev@lists.linbit.com Subject: Re: [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Message-ID: <20040825102852.GS3125@marowsky-bree.de> References: <20040819110202.GO9601@marowsky-bree.de> <200408201452.52512.philipp.reisner@linbit.com> <200408251142.18807.philipp.reisner@linbit.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200408251142.18807.philipp.reisner@linbit.com> Cc: List-Id: Coordination of development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 2004-08-25T11:42:18, Philipp Reisner said: > So, the current policy is: > * The primary node refuses to connect to a peer with higher generation > counts. This keeps the data intact. This is very related to the other > after-split-brain-policy I want to make expclicit. Makes sense. > * Remeber the options so far: (for primary-after-split-brain) > > - The node that was primary before split brain (current behaviour) > - The node that became primary during split brain > - The node that modified more of it's data during the split-brain > situation [ Do not think about implementation yet, just about > the policy ] > - None, wait for operator's decission. [suggested by LMB] > - Node that is currently primary [see example above by LGE] Minor clarification: I think the question is not about "Who becomes primary", as the Sync* is decoupled from that status, but which side drbd deems to have the good data and thus the SyncMaster. Looking at it from this angle, we have two dimensions: - Node state after the split-brain heals. Each side can either be primary or secondary. - The data state on each side. Now, obviously, if the node state of both sides is "primary", drbd can't automatically do something, but _must_ wait for admin intervention. It can't resolve this internally, because it would destroy the layers above. -> _MUST_ wait for operator intervention. (Embedded environments with a dumb cluster manager... Hmm... Ok, maybe crashing one side (which inherently stops the higher layers and triggers recovery) and thus reducing the problem to one of the somewhat simpler ones below might work...) If only one side is primary, and the algorithms determine that this one has the good data, and the other side has not touched the data in between, this is also a simple case. If both sides are secondary, but only one side has modified the data since or been primary, again it's simple. If one side is primary, but the other side has been primary in between (but not at the time of the connect), drbd can either wait for a higher-level intervention, or sync the now-secondary. Only two options, nothing else makes sense. (Changing the data underneath the primary strikes me as an exceptionally bad idea.) If both sides are secondary, but both sides have modified the data since, then we have several choices like picking the most recent (timestamp?), most data modified, throwing a coin or again waiting for admin intervention. (Personally, I'd say operator intervention, after very careful consideration of the problem, is in fact the only choice; this scenario is only reached by a combination of several _severe_ faults.) A special case obviously exists if one secondary side has inconsistent data and the other has a consistent snapshot, which case it is a somewhat safer assumption to sync automatically from the consistent to the inconsistent side. This should be the default, but may be configurable... Sincerely, Lars Marowsky-Brée -- High Availability & Clustering \\\ /// SUSE Labs, Research and Development \honk/ SUSE LINUX AG - A Novell company \\//