All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [Drbd-dev] DRBD8: failed to complete sync due to receiving bitmap in unexpected cstate
@ 2006-12-19 19:36 Graham, Simon
  2006-12-20 14:14 ` Philipp Reisner
  0 siblings, 1 reply; 2+ messages in thread
From: Graham, Simon @ 2006-12-19 19:36 UTC (permalink / raw)
  To: Graham, Simon, Lars Ellenberg, drbd-dev

> 
> My theory was that there is a timing window relative to moving from
the
> PauseSync{T|S} state such that one side can get there first and
restart
> syncing before the other side.
> 

Not sure if you've had any thoughts on this, but I have a theory about
this that was sparked by the problem I found today where we can still be
in the PausedSyncX state when sync finishes...

If you recall, the problem was what the sync source side would get into
WFBitMapS and never exit and the target side would output:

unexpected cstate (PausedSyncT) in receive_bitmap

Here's my theory in a time sequence...

          Source                 Target
             |                      |
         <PausedSyncS>          <PausedSyncT>
             |                      |
       resync completes             |
             |                      |
          <Connected>               |
             |                      |
       high priority group          |
           finishes sync            |
         <aftr_isp->0>              |
             |                      |
     drbd_send_state                |
             |       ReportState    |
             +--------------------->|
             |     UUIDs            | ***Note UUIDs haven't been updated
here yet, so still look
             +<---------------------|    out of date
             |     ReportState      |
             +<---------------------|
             |                      |
        drbd_sync_handshake         |
          hg>1                      |
      <WFBitMapS>                   |
             |    Bitmap            |
             +--------------------->| *** get unexpected cstate message
plus never return bitmap
             |                      |
             |               Now we notice resync complete
             |                 <Connected>

Obviously this requires a lot of things to happen all together and
somewhat out of sequence, but I think it's feasible. As I see it, there
are actually several problems here:

1. When aftr_isp went to 0 we still initiated the resumption of resync
even though we are in Connected state
2. We ended up deciding to restart the resync because we got stale UUID
info from the target
3. The target side did not reply to the Bitmap leaving the source stuck
in WFBitmapS

I somehow don't think that putting a test for PausedSyncX in the
receive_bitmap() is the correct solution here but I'm not sure what
would be better... Any ideas?

Simon

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Drbd-dev] DRBD8: failed to complete sync due to receiving bitmap in unexpected cstate
  2006-12-19 19:36 [Drbd-dev] DRBD8: failed to complete sync due to receiving bitmap in unexpected cstate Graham, Simon
@ 2006-12-20 14:14 ` Philipp Reisner
  0 siblings, 0 replies; 2+ messages in thread
From: Philipp Reisner @ 2006-12-20 14:14 UTC (permalink / raw)
  To: drbd-dev

Am Dienstag, 19. Dezember 2006 20:36 schrieb Graham, Simon:
> > My theory was that there is a timing window relative to moving from
>
> the
>
> > PauseSync{T|S} state such that one side can get there first and
>
> restart
>
> > syncing before the other side.
>
> Not sure if you've had any thoughts on this, but I have a theory about
> this that was sparked by the problem I found today where we can still be
> in the PausedSyncX state when sync finishes...
>
> If you recall, the problem was what the sync source side would get into
> WFBitMapS and never exit and the target side would output:
>

Hi Simon, 

[Back from vacation]

I just read your mail from the 12th of December. I went through
the lines of the kernel logs line by line.

There is a bit called SYNC_STARTED. This is needed to determin if
we should clear bits in the bitmap upon the completion of
normal application writes.

Since I needed to introduce this during drbd-0.7 while the protocol
was frozen, I needed to introduce this bit without introducing a
new packet into the protocol.

I decided to set it with the first WriteAck sent from the SyncTarget
node to the SyncSource node.

  Before (with out the SYNC_STARTED bit) it could happen that one
  node considered an app-write to happen during the resync 
  (and drbd_set_in_sync() should be called) but the other node
  considered it to happen before the resync (therefore it did
  not call drbd_set_in_sync()). 


 Just an other thing I wanted to mention: 
 SyncPause only gets into effect after the exchange of the bitmaps
 finished.

I can reproduce here an issue where I disconnect two devices, r1 is
to sync after r0. 

1) I modify many blocks on r0, a few on r1.
2) When connecting them r0 does its resync, r1 goes into sync pause.
3) Then I rewrite the same blocks on r1, and in the end the 
   syncSource of r1 does not recognise that resync is finished.

I am working on this issue right now...

-phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austria        http://www.linbit.com :

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-12-20 14:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-19 19:36 [Drbd-dev] DRBD8: failed to complete sync due to receiving bitmap in unexpected cstate Graham, Simon
2006-12-20 14:14 ` Philipp Reisner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.