* [Drbd-dev] DRBD-8: failure to complete resync when connection lost and resyncing to primary
@ 2006-09-24 12:35 Graham, Simon
2006-09-25 13:41 ` Philipp Reisner
0 siblings, 1 reply; 2+ messages in thread
From: Graham, Simon @ 2006-09-24 12:35 UTC (permalink / raw)
To: drbd-dev
In testing the panic removal code, I have come across a failure to
complete a resync process when the primary side is the target of the
resync; right now, if the disk state is not Negotiating then the code
will reject any attempt to perform a resync with the primary side disk
as the target even if the primary's disk is inconsistent. I came across
this with the following test case:
1. Set one side primary
2. do detach/attach on the primary side - this starts a full resync with
the secondary side as the source
3. Forcibly disconnect the network (you can actually do 'drbdadm
disconnect' on the secondary side!)
4. Reconnect the network - at this point, the resync is rejected.
Attached is some sample trace output showing the failure on the primary
side.
I'm thinking that this should be allowed IF the primary side disk is
inconsistent or diskless or otherwise bad; this means that the test in
drbd_sync_handshake:
if (hg < 0 &&
mdev->state.role == Primary && mdev->state.disk !=
Negotiating ) {
ERR("I shall become SyncTarget, but I am primary!\n");
drbd_force_state(mdev,NS(conn,StandAlone));
drbd_thread_stop_nowait(&mdev->receiver);
return conn_mask;
}
should instead be:
if (hg < 0 &&
mdev->state.role == Primary && mdev->state.disk >=
Consistent ) {
ERR("I shall become SyncTarget, but I am primary!\n");
drbd_force_state(mdev,NS(conn,StandAlone));
drbd_thread_stop_nowait(&mdev->receiver);
return conn_mask;
}
Does that make sense?
Simon
---extract of messages from Primary---
Sep 24 08:17:03 snoopy kernel: drbd0: Forcing state change from bad
state. Error would be: 'Refusing to be Primary without at least one
UpToDate disk'
Sep 24 08:17:03 snoopy kernel: drbd0: old = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--- }
Sep 24 08:17:03 snoopy kernel: drbd0: new = { cs:WFReportParams
st:Primary/Unknown ds:Inconsistent/DUnknown r--- }
Sep 24 08:17:03 snoopy kernel: drbd0: conn( WFConnection ->
WFReportParams )
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> HandShake (protocol 82)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< HandShake (protocol 82)
Sep 24 08:17:03 snoopy kernel: drbd0: Handshake successful: DRBD Network
Protocol version 82
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportProtocol (11)
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> SyncParam (10)
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportSizes (d 15007MiB,
u 0MiB, c 15007MiB, max bio 1000, q order 0)
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportUUIDs
Curr:ABFA2C5E8469C059, Bitmap:0000000000000001, HisSt:980DF0A708466D72,
HisEnd:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: data >>> ReportState (s c861 {
role( Primary ) peer( Unknown ) conn( WFReportParams ) disk(
Inconsistent ) pdsk( DUnknown )})
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< ReportProtocol (11)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< SyncParam (10)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< ReportSizes (d 15007MiB,
u 0MiB, c 15007MiB, max bio 1000, q order 0)
Sep 24 08:17:03 snoopy kernel: drbd0: data <<< ReportUUIDs
Curr:DC3A3BB9EB892584, Bitmap:ABFA2C5E8469C058, HisSt:980DF0A708466D72,
HisEnd:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: drbd_sync_handshake:
Sep 24 08:17:03 snoopy kernel: drbd0: self
ABFA2C5E8469C059:0000000000000001:980DF0A708466D72:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: peer
DC3A3BB9EB892584:ABFA2C5E8469C058:980DF0A708466D72:882E2BDEC183E244
Sep 24 08:17:03 snoopy kernel: drbd0: uuid_compare()=-1
Sep 24 08:17:03 snoopy kernel: drbd0: I shall become SyncTarget, but I
am primary!
Sep 24 08:17:03 snoopy kernel: drbd0: conn( WFReportParams -> StandAlone
)
Sep 24 08:17:03 snoopy kernel: drbd0: error receiving ReportState, l: 4!
Sep 24 08:17:03 snoopy kernel: drbd0: asender starting cleanup
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Drbd-dev] DRBD-8: failure to complete resync when connection lost and resyncing to primary
2006-09-24 12:35 [Drbd-dev] DRBD-8: failure to complete resync when connection lost and resyncing to primary Graham, Simon
@ 2006-09-25 13:41 ` Philipp Reisner
0 siblings, 0 replies; 2+ messages in thread
From: Philipp Reisner @ 2006-09-25 13:41 UTC (permalink / raw)
To: drbd-dev
Hi Simon,
Thanks, for this.
> 3. Forcibly disconnect the network (you can actually do 'drbdadm
> disconnect' on the secondary side!)
Ok, I fixed this.
http://lists.linbit.com/pipermail/drbd-cvs/2006-September/001262.html
> 4. Reconnect the network - at this point, the resync is rejected.
> I'm thinking that this should be allowed IF the primary side disk is
> inconsistent or diskless or otherwise bad; this means that the test in
> drbd_sync_handshake:
Yes, right. This is necessary for the work you have in your pipeline,
I see.
I will commit that.
-Phil
--
: Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-09-25 13:41 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-24 12:35 [Drbd-dev] DRBD-8: failure to complete resync when connection lost and resyncing to primary Graham, Simon
2006-09-25 13:41 ` Philipp Reisner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox