From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Philipp Reisner To: Lars Ellenberg , Lars Marowsky-Bree , drbd-dev@lists.linbit.com Date: Fri, 20 Aug 2004 14:52:52 +0200 References: <20040819110202.GO9601@marowsky-bree.de> <20040819113205.GP9601@marowsky-bree.de> In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Message-Id: <200408201452.52512.philipp.reisner@linbit.com> Subject: [Drbd-dev] Re: drbd Frage zu secondary vs primary; drbddisk status problem Sender: drbd-dev-admin@lists.linbit.com Errors-To: drbd-dev-admin@lists.linbit.com List-Help: List-Post: List-Subscribe: , List-Id: Coordination of development List-Unsubscribe: , List-Archive: On Thursday 19 August 2004 14:14, Lars Ellenberg wrote: [...] > > Split-brain Szenarien die mit Primary/Primary (beide StandAlone) enden > > habe ich schon im neuen Design bedacht (ich schreibe gerade). Was sonst? > > gar nicht soo unwahrscheinlich: > > wenn der primary stirbt (oder getötet wird), aber vor dem sterben > irgendwie noch geschafft hat, seine drbd connection zu verlieren _und_ > daher den "ConnectedCount" hochgezählt hat... > > der "slave" wird jetzt Secondary->Primary, zählt aber, weil < Connected > den ArbitraryCount hoch... > > situation beim nächsten connect: > > Flags: consistent, ,been primary last time > > früherer Primary 1:X:Y:a+1:b :10 (nach reboot jetzt Secondary) > jetziger Primary 1:X:Y:a :b+1:10 > > doh. jetziger Primary soll SyncTarget werden... shitty. > --> jetziger Primary goes StandAlone. > > nächster verbindungsversuch (von operator eingeleitet) > ... -> "split brain detected" > --> both go StandAlone > > u.U. müssen wir einen zusätzlichen counter einführen, einen "CRM > count", und der CRM muss, wenn er den anderen node geschossen hat, > sicherheitshalber ein drbdsetup "--crm" (vgl. --human) primary > machen, dass würde zumindest das oben beschriebene scenario auflösen... > Hi, Right, old toppic: What should we do after a split-brain situation. I have looked up my papers from 2001 to unterstand, why it is done the way it is today: The situation: N1 N2 P --- S Everything ok. P - - S Link breaks. P - - P A (also split-brained) Cluster-mgr makes N2 primary too. X X Both nodes down. P --- S The current behaviour. What should be done after Split brain ? The current policy is, that the node that was Primary before the split-brain situation should be primary afterwards. This Policy is hard-coded into DRBD. It is an arbitrary decission, I thought it is a good idea. The question are: Should this policy be configurable ? (IMO: yes) Which policies do we want to offer ? * The node that was primary before split brain (current behaviour) * The node that becaume primary during split brain * The node that modified more of it's data during the split-brain situation [ Do not think about implementation yet, just about the policy ] * others ?... The second question to answer is: What should we do if the connecting network heals ? I.e. N1 N2 P --- S Everything ok. P - - S Link breaks. P - - P A (also split-brained) Cluster-mgr makes N2 primary too. ? --- ? What now ? Current policy: The two nodes will refuse to connect. The administrator has to resove this. Are there any other policies that would make sense ? -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :