* [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching
@ 2008-08-12 3:52 Graham, Simon
2008-08-12 8:49 ` Lars Ellenberg
2008-09-15 18:26 ` Petrakis, Peter
0 siblings, 2 replies; 4+ messages in thread
From: Graham, Simon @ 2008-08-12 3:52 UTC (permalink / raw)
To: drbd-dev
I have noticed that with 8.2.6, if the role of a device is
Secondary/Secondary and you detach and then re-attach a device, the peer
disk state on the other node ends up as Consistent instead of UpToDate -
it seems that in this case the code does not check if a resync is
required and goes directly from DiskLess->Consistent on the side that is
not doing the detach/attach.
Here is a sample extract from the messages file on the two systems:
First, on the system where you do the detach followed by attach
(connection state is Connected when this starts, roles are
Secondary/Secondary, disk UpToDate/UpToDate:
Aug 9 04:53:11 node0 kernel: drbd16: disk( UpToDate -> Diskless )
Aug 9 04:53:32 node0 kernel: drbd16: disk( Diskless -> Attaching )
Aug 9 04:53:32 node0 kernel: drbd16: No usable activity log found.
Aug 9 04:53:32 node0 kernel: drbd16: max_segment_size ( = BIO size ) =
32768
Aug 9 04:53:32 node0 kernel: drbd16: reading of bitmap took 1 jiffies
Aug 9 04:53:32 node0 kernel: drbd16: recounting of set bits took
additional 0 jiffies
Aug 9 04:53:32 node0 kernel: drbd16: 0 KB (0 bits) marked out-of-sync
by on disk bit-map.
Aug 9 04:53:32 node0 kernel: drbd16: disk( Attaching -> Negotiating )
Aug 9 04:53:32 node0 kernel: drbd16: Writing meta data super block now.
Aug 9 04:53:32 node0 kernel: drbd16: disk( Negotiating -> UpToDate )
On the other node (same starting state):
Aug 9 04:53:11 node1 kernel: drbd16: pdsk( UpToDate -> Diskless )
Aug 9 04:53:32 node1 kernel: drbd16: real peer disk state = Consistent
Aug 9 04:53:32 node1 kernel: drbd16: pdsk( Diskless -> Consistent )
I can see why the second node does not go to the UpToDate state - there
is a check in _drbd_set_state such that it only overwrites Consistent
with UpToDate if the connection state is also changing which it does not
in this case. HOWEVER, I'm not sure this is the right place to fix it -
it seems to me that we should check for a resync even in this case since
one or both of the disks could have been Primary and modified the disk
at some point and then been downgraded to Secondary - so we really need
to call drbd_sync_handshake even in this case, but we don't seem to...
I don't see any fixes post 8.2.6 that obviously address this but perhaps
I missed something? If not, any thoughts on the right way to fix this?
Thanks,
Simon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching
2008-08-12 3:52 [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching Graham, Simon
@ 2008-08-12 8:49 ` Lars Ellenberg
2008-09-15 18:26 ` Petrakis, Peter
1 sibling, 0 replies; 4+ messages in thread
From: Lars Ellenberg @ 2008-08-12 8:49 UTC (permalink / raw)
To: drbd-dev
On Mon, Aug 11, 2008 at 11:52:30PM -0400, Graham, Simon wrote:
> I have noticed that with 8.2.6, if the role of a device is
> Secondary/Secondary and you detach and then re-attach a device, the peer
> disk state on the other node ends up as Consistent instead of UpToDate -
> it seems that in this case the code does not check if a resync is
> required and goes directly from DiskLess->Consistent on the side that is
> not doing the detach/attach.
>
> Here is a sample extract from the messages file on the two systems:
>
> First, on the system where you do the detach followed by attach
> (connection state is Connected when this starts, roles are
> Secondary/Secondary, disk UpToDate/UpToDate:
>
> Aug 9 04:53:11 node0 kernel: drbd16: disk( UpToDate -> Diskless )
>
> Aug 9 04:53:32 node0 kernel: drbd16: disk( Diskless -> Attaching )
> Aug 9 04:53:32 node0 kernel: drbd16: No usable activity log found.
> Aug 9 04:53:32 node0 kernel: drbd16: max_segment_size ( = BIO size ) =
> 32768
> Aug 9 04:53:32 node0 kernel: drbd16: reading of bitmap took 1 jiffies
> Aug 9 04:53:32 node0 kernel: drbd16: recounting of set bits took
> additional 0 jiffies
> Aug 9 04:53:32 node0 kernel: drbd16: 0 KB (0 bits) marked out-of-sync
> by on disk bit-map.
> Aug 9 04:53:32 node0 kernel: drbd16: disk( Attaching -> Negotiating )
> Aug 9 04:53:32 node0 kernel: drbd16: Writing meta data super block now.
> Aug 9 04:53:32 node0 kernel: drbd16: disk( Negotiating -> UpToDate )
>
> On the other node (same starting state):
>
> Aug 9 04:53:11 node1 kernel: drbd16: pdsk( UpToDate -> Diskless )
>
> Aug 9 04:53:32 node1 kernel: drbd16: real peer disk state = Consistent
> Aug 9 04:53:32 node1 kernel: drbd16: pdsk( Diskless -> Consistent )
>
> I can see why the second node does not go to the UpToDate state - there
> is a check in _drbd_set_state such that it only overwrites Consistent
> with UpToDate if the connection state is also changing which it does not
> in this case. HOWEVER, I'm not sure this is the right place to fix it -
> it seems to me that we should check for a resync even in this case since
> one or both of the disks could have been Primary and modified the disk
> at some point and then been downgraded to Secondary - so we really need
> to call drbd_sync_handshake even in this case, but we don't seem to...
>
> I don't see any fixes post 8.2.6 that obviously address this but perhaps
> I missed something?
confirmed in current 8.2 git.
> If not, any thoughts on the right way to fix this?
I leave that question open for now.
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching
2008-08-12 3:52 [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching Graham, Simon
2008-08-12 8:49 ` Lars Ellenberg
@ 2008-09-15 18:26 ` Petrakis, Peter
2008-10-30 11:23 ` Philipp Reisner
1 sibling, 1 reply; 4+ messages in thread
From: Petrakis, Peter @ 2008-09-15 18:26 UTC (permalink / raw)
To: drbd-dev
Hi All,
We've evolved the original patch to the following, which seems to solve
our problem. Can you see anything wrong with it? Thanks.
diff -r f9aa469f7044 drbd_main.c
--- a/drbd_main.c Fri Sep 12 13:51:40 2008 -0400
+++ b/drbd_main.c Mon Sep 15 10:04:48 2008 -0400
@@ -765,7 +765,7 @@
ns.conn = Connected;
}
- if (ns.conn != os.conn && ns.conn >= Connected &&
+ if (ns.conn >= Connected &&
(ns.disk == Consistent || ns.disk == Outdated)) {
switch(ns.conn) {
case WFBitMapT:
@@ -787,7 +787,7 @@
WARN("Implicit set disk from Outdate to
UpToDate\n");
}
- if (ns.conn != os.conn && ns.conn >= Connected &&
+ if (ns.conn >= Connected &&
(ns.pdsk == Consistent || ns.pdsk == Outdated)) {
switch(ns.conn) {
case Connected:
Peter
> -----Original Message-----
> From: drbd-dev-bounces@linbit.com [mailto:drbd-dev-bounces@linbit.com]
On
> Behalf Of Lars Ellenberg
> Sent: Tuesday, August 12, 2008 4:50 AM
> To: drbd-dev@linbit.com
> Subject: Re: [Drbd-dev] 8.2.6 Peer disk state handling issue when
> attaching
>
> On Mon, Aug 11, 2008 at 11:52:30PM -0400, Graham, Simon wrote:
> > I have noticed that with 8.2.6, if the role of a device is
> > Secondary/Secondary and you detach and then re-attach a device, the
peer
> > disk state on the other node ends up as Consistent instead of
UpToDate -
> > it seems that in this case the code does not check if a resync is
> > required and goes directly from DiskLess->Consistent on the side
that is
> > not doing the detach/attach.
> >
> > Here is a sample extract from the messages file on the two systems:
> >
> > First, on the system where you do the detach followed by attach
> > (connection state is Connected when this starts, roles are
> > Secondary/Secondary, disk UpToDate/UpToDate:
> >
> > Aug 9 04:53:11 node0 kernel: drbd16: disk( UpToDate -> Diskless )
> >
> > Aug 9 04:53:32 node0 kernel: drbd16: disk( Diskless -> Attaching )
> > Aug 9 04:53:32 node0 kernel: drbd16: No usable activity log found.
> > Aug 9 04:53:32 node0 kernel: drbd16: max_segment_size ( = BIO size
) =
> > 32768
> > Aug 9 04:53:32 node0 kernel: drbd16: reading of bitmap took 1
jiffies
> > Aug 9 04:53:32 node0 kernel: drbd16: recounting of set bits took
> > additional 0 jiffies
> > Aug 9 04:53:32 node0 kernel: drbd16: 0 KB (0 bits) marked
out-of-sync
> > by on disk bit-map.
> > Aug 9 04:53:32 node0 kernel: drbd16: disk( Attaching -> Negotiating
)
> > Aug 9 04:53:32 node0 kernel: drbd16: Writing meta data super block
now.
> > Aug 9 04:53:32 node0 kernel: drbd16: disk( Negotiating -> UpToDate
)
> >
> > On the other node (same starting state):
> >
> > Aug 9 04:53:11 node1 kernel: drbd16: pdsk( UpToDate -> Diskless )
> >
> > Aug 9 04:53:32 node1 kernel: drbd16: real peer disk state =
Consistent
> > Aug 9 04:53:32 node1 kernel: drbd16: pdsk( Diskless -> Consistent )
> >
> > I can see why the second node does not go to the UpToDate state -
there
> > is a check in _drbd_set_state such that it only overwrites
Consistent
> > with UpToDate if the connection state is also changing which it does
not
> > in this case. HOWEVER, I'm not sure this is the right place to fix
it -
> > it seems to me that we should check for a resync even in this case
since
> > one or both of the disks could have been Primary and modified the
disk
> > at some point and then been downgraded to Secondary - so we really
need
> > to call drbd_sync_handshake even in this case, but we don't seem
to...
> >
> > I don't see any fixes post 8.2.6 that obviously address this but
perhaps
> > I missed something?
>
> confirmed in current 8.2 git.
>
> > If not, any thoughts on the right way to fix this?
>
> I leave that question open for now.
>
> --
> : Lars Ellenberg Tel +43-1-8178292-55 :
> : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
> : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching
2008-09-15 18:26 ` Petrakis, Peter
@ 2008-10-30 11:23 ` Philipp Reisner
0 siblings, 0 replies; 4+ messages in thread
From: Philipp Reisner @ 2008-10-30 11:23 UTC (permalink / raw)
To: drbd-dev
Am Montag 15 September 2008 20:26:15 schrieb Petrakis, Peter:
> Hi All,
>
> We've evolved the original patch to the following, which seems to solve
> our problem. Can you see anything wrong with it? Thanks.
>
> diff -r f9aa469f7044 drbd_main.c
> --- a/drbd_main.c Fri Sep 12 13:51:40 2008 -0400
> +++ b/drbd_main.c Mon Sep 15 10:04:48 2008 -0400
> @@ -765,7 +765,7 @@
> ns.conn = Connected;
> }
>
> - if (ns.conn != os.conn && ns.conn >= Connected &&
> + if (ns.conn >= Connected &&
> (ns.disk == Consistent || ns.disk == Outdated)) {
> switch(ns.conn) {
> case WFBitMapT:
> @@ -787,7 +787,7 @@
> WARN("Implicit set disk from Outdate to
> UpToDate\n");
> }
>
> - if (ns.conn != os.conn && ns.conn >= Connected &&
> + if (ns.conn >= Connected &&
> (ns.pdsk == Consistent || ns.pdsk == Outdated)) {
> switch(ns.conn) {
> case Connected:
>
>
Hi Peter,
There is nothing wrong with that patch. It is just my failure to follow
drbd-dev for the last month...
I have committed now nearly that code change upstream. You will
find it there soon as 00f2ce70e0daaa72775b3712863fb29ee99581f3
(in drbd-8.0) and from there it will be propagated to 8.2 of
course.
-phil
--
: Dipl-Ing Philipp Reisner
: LINBIT | Your Way to High Availability
: Tel: +43-1-8178292-50, Fax: +43-1-8178292-82
: http://www.linbit.com
DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-10-30 11:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-12 3:52 [Drbd-dev] 8.2.6 Peer disk state handling issue when attaching Graham, Simon
2008-08-12 8:49 ` Lars Ellenberg
2008-09-15 18:26 ` Petrakis, Peter
2008-10-30 11:23 ` Philipp Reisner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox