From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from soda.linbit (office.linbit [86.59.100.100]) by mail.linbit.com (LINBIT Mail Daemon) with ESMTP id 4D3242E083C4 for ; Tue, 12 Aug 2008 10:49:46 +0200 (CEST) Date: Tue, 12 Aug 2008 10:49:46 +0200 From: Lars Ellenberg To: drbd-dev@lists.linbit.com Subject: Re: [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching Message-ID: <20080812084946.GC19857@soda.linbit> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: Coordination of development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, Aug 11, 2008 at 11:52:30PM -0400, Graham, Simon wrote: > I have noticed that with 8.2.6, if the role of a device is > Secondary/Secondary and you detach and then re-attach a device, the peer > disk state on the other node ends up as Consistent instead of UpToDate - > it seems that in this case the code does not check if a resync is > required and goes directly from DiskLess->Consistent on the side that is > not doing the detach/attach. > > Here is a sample extract from the messages file on the two systems: > > First, on the system where you do the detach followed by attach > (connection state is Connected when this starts, roles are > Secondary/Secondary, disk UpToDate/UpToDate: > > Aug 9 04:53:11 node0 kernel: drbd16: disk( UpToDate -> Diskless ) > > Aug 9 04:53:32 node0 kernel: drbd16: disk( Diskless -> Attaching ) > Aug 9 04:53:32 node0 kernel: drbd16: No usable activity log found. > Aug 9 04:53:32 node0 kernel: drbd16: max_segment_size ( = BIO size ) = > 32768 > Aug 9 04:53:32 node0 kernel: drbd16: reading of bitmap took 1 jiffies > Aug 9 04:53:32 node0 kernel: drbd16: recounting of set bits took > additional 0 jiffies > Aug 9 04:53:32 node0 kernel: drbd16: 0 KB (0 bits) marked out-of-sync > by on disk bit-map. > Aug 9 04:53:32 node0 kernel: drbd16: disk( Attaching -> Negotiating ) > Aug 9 04:53:32 node0 kernel: drbd16: Writing meta data super block now. > Aug 9 04:53:32 node0 kernel: drbd16: disk( Negotiating -> UpToDate ) > > On the other node (same starting state): > > Aug 9 04:53:11 node1 kernel: drbd16: pdsk( UpToDate -> Diskless ) > > Aug 9 04:53:32 node1 kernel: drbd16: real peer disk state = Consistent > Aug 9 04:53:32 node1 kernel: drbd16: pdsk( Diskless -> Consistent ) > > I can see why the second node does not go to the UpToDate state - there > is a check in _drbd_set_state such that it only overwrites Consistent > with UpToDate if the connection state is also changing which it does not > in this case. HOWEVER, I'm not sure this is the right place to fix it - > it seems to me that we should check for a resync even in this case since > one or both of the disks could have been Primary and modified the disk > at some point and then been downgraded to Secondary - so we really need > to call drbd_sync_handshake even in this case, but we don't seem to... > > I don't see any fixes post 8.2.6 that obviously address this but perhaps > I missed something? confirmed in current 8.2 git. > If not, any thoughts on the right way to fix this? I leave that question open for now. -- : Lars Ellenberg Tel +43-1-8178292-55 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :