All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [Drbd-dev] Barrier assert failures with latest 8.0 sources
@ 2008-01-19 16:40 Graham, Simon
  2008-01-21 16:36 ` Lars Ellenberg
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Graham, Simon @ 2008-01-19 16:40 UTC (permalink / raw)
  To: Graham, Simon, drbd-dev

> I'm attempting to run with the latest 8.0 sources from Git (plus a
> couple of patches - basically the ones I have submitted that have not
> yet been applied) and am seeing a lot of assert failures in the
barrier
> code since the latest change to send barriers as early as possible. A
> representative trace for a device is attached - you will see that the
> device gets connected then pauses resync (not sure if this is really
> relevant) and then we start streaming the assert failures --
apparently
> we are off by one barrier from this point on...

Hmm.. maybe not as hard to diagnose as I thought -- when the drbd
connection is lost, we end up calling tl_clear which clears out the
transfer list _but_ leaves a single barrier in the list with number 4711
and req-cnt 0 (so oldest_barrier and newest_barrier both point to this
pseudo-barrier entry). 

When we reconnect and start processing requests again, when the first
barrier is needed, it will be number 4712 and will get added to the list
and the BarrierRq will be sent with this number. When the BarrierAck is
received, oldest_barrier is still 4711 though, leading to the assert
failure...

I'm not sure why tl_clear leaves this pseudo-barrier in the list...
shouldn't it simply leave the list completely empty just like tl_init
does?

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [Drbd-dev] Barrier assert failures with latest 8.0 sources
@ 2008-01-19 16:25 Graham, Simon
  0 siblings, 0 replies; 8+ messages in thread
From: Graham, Simon @ 2008-01-19 16:25 UTC (permalink / raw)
  To: drbd-dev

Hi,

I'm attempting to run with the latest 8.0 sources from Git (plus a
couple of patches - basically the ones I have submitted that have not
yet been applied) and am seeing a lot of assert failures in the barrier
code since the latest change to send barriers as early as possible. A
representative trace for a device is attached - you will see that the
device gets connected then pauses resync (not sure if this is really
relevant) and then we start streaming the assert failures -- apparently
we are off by one barrier from this point on...

Any suggestions?
Simon

   5270:Jan 19 00:15:43 node1 kernel: drbd5: conn( Unconnected ->
WFConnection ) 
   6214:Jan 19 00:34:45 node1 kernel: drbd5: Handshake successful: DRBD
Network Protocol version 86
   6216:Jan 19 00:34:46 node1 kernel: drbd5: conn( WFConnection ->
WFReportParams ) 
   6219:Jan 19 00:34:46 node1 kernel: drbd5: Starting asender thread
(from drbd5_receiver [15383])
   6221:Jan 19 00:34:47 node1 kernel: drbd5: drbd_sync_handshake:
   6222:Jan 19 00:34:48 node1 kernel: drbd5: self
0D17E6C6F3714F2F:E0FEF0615D2424EF:2953DE3A10384390:169A43D80FC7A631
   6224:Jan 19 00:34:48 node1 kernel: drbd5: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
   6226:Jan 19 00:34:49 node1 kernel: drbd5: uuid_compare()=2 by rule 3
   6228:Jan 19 00:34:50 node1 kernel: drbd5: Becoming sync source due to
disk states.
   6231:Jan 19 00:34:51 node1 kernel: drbd5: Writing meta data super
block now.
   6234:Jan 19 00:34:53 node1 kernel: drbd5: writing of bitmap took 0
jiffies
   6237:Jan 19 00:34:53 node1 kernel: drbd5: 1023 MB (262127 bits)
marked out-of-sync by on disk bit-map.
   6240:Jan 19 00:34:54 node1 kernel: drbd5: Writing meta data super
block now.
   6246:Jan 19 00:35:04 node1 kernel: drbd5: peer( Unknown -> Secondary
) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Inconsistent ) 
   6251:Jan 19 00:35:14 node1 kernel: drbd5: Writing meta data super
block now.
   6257:Jan 19 00:35:19 node1 kernel: drbd5:  uuid[History_start] now
E0FEF0615D2424EF
   6261:Jan 19 00:35:23 node1 kernel: drbd5:  uuid[Bitmap] now
BF161CBD2512E46A
   6264:Jan 19 00:35:28 node1 kernel: drbd5: conn( WFBitMapS ->
SyncSource ) 
   6279:Jan 19 00:35:53 node1 kernel: drbd5: Began resync as SyncSource
(will sync 1048508 KB [262127 bits set]).
   6281:Jan 19 00:35:56 node1 kernel: drbd5: Writing meta data super
block now.
   6290:Jan 19 00:36:07 node1 kernel: drbd5: conn( SyncSource ->
PausedSyncS ) aftr_isp( 0 -> 1 ) 
   6292:Jan 19 00:36:08 node1 kernel: drbd5: Resync suspended
   6303:Jan 19 00:36:23 node1 kernel: drbd5: peer_isp( 0 -> 1 ) 
   6308:Jan 19 00:36:25 node1 kernel: drbd5: ASSERT( b->br_number ==
barrier_nr ) in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:240
   6311:Jan 19 00:36:27 node1 kernel: drbd5: ASSERT( b->n_req ==
set_size ) in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:241
   6314:Jan 19 00:36:28 node1 kernel: drbd5: b->br_number = 4711 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:245
   6318:Jan 19 00:36:29 node1 kernel: drbd5: barrier_nr = 4712 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:246
   6322:Jan 19 00:36:30 node1 kernel: drbd5: b->n_req = 0 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:249
   6326:Jan 19 00:36:31 node1 kernel: drbd5: set_size = 1 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:250
   6334:Jan 19 00:36:37 node1 kernel: drbd5: ASSERT( b->br_number ==
barrier_nr ) in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:240
   6337:Jan 19 00:36:38 node1 kernel: drbd5: b->br_number = 4712 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:245
   6340:Jan 19 00:36:38 node1 kernel: drbd5: barrier_nr = 4713 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:246
   6343:Jan 19 00:36:39 node1 kernel: drbd5: ASSERT( b->br_number ==
barrier_nr ) in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:240
   6347:Jan 19 00:36:40 node1 kernel: drbd5: ASSERT( b->n_req ==
set_size ) in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:241
   6350:Jan 19 00:36:42 node1 kernel: drbd5: b->br_number = 4713 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:245
   6353:Jan 19 00:36:43 node1 kernel: drbd5: barrier_nr = 4714 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:246
   6356:Jan 19 00:36:43 node1 kernel: drbd5: b->n_req = 1 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUIn_req = 1 in
/sandbox/sgraham/sn/drbd-git/platform/drbd/src/dist/BUILD/drbd-8.0.8/drb
d/drbd_main.c:249

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-01-23 14:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-19 16:40 [Drbd-dev] Barrier assert failures with latest 8.0 sources Graham, Simon
2008-01-21 16:36 ` Lars Ellenberg
2008-01-22  2:29 ` Graham, Simon
2008-01-22 16:20   ` Lars Ellenberg
     [not found] ` <342BAC0A5467384983B586A6B0B3767107E89D34@EXNA.corp.s tratus.com>
2008-01-22 20:49   ` Graham, Simon
2008-01-23 13:53   ` Graham, Simon
2008-01-23 14:03     ` Graham, Simon
  -- strict thread matches above, loose matches on Subject: below --
2008-01-19 16:25 Graham, Simon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.