Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
* [Drbd-dev] Transaction log related assert messages running DRBD 8 trunk
@ 2006-07-25 18:56 Graham, Simon
  2006-07-26  8:11 ` Philipp Reisner
  0 siblings, 1 reply; 2+ messages in thread
From: Graham, Simon @ 2006-07-25 18:56 UTC (permalink / raw)
  To: drbd-dev

Running some failover stress testing with the latest DRBD 8, I have
started to notice assert failures like this:

Jul 24 17:36:22 peer kernel: drbd1: ASSERT( b->br_number == barrier_nr )
in drbd/drbd_main.c:280 
Jul 24 17:36:22 peer kernel: drbd1: ASSERT( b->n_req == set_size ) in
drbd/drbd_main.c:281 

I'm not quite sure what these mean, but I do note that the code releases
the spin lock before the assert and it occurs to me that perhaps the
D_ASSERTs should also be done with the lock held (see below)?

Simon

--- code from drbd_main.c ---

void tl_release(drbd_dev *mdev,unsigned int barrier_nr,
		       unsigned int set_size)
{
	struct drbd_barrier *b;

	spin_lock_irq(&mdev->tl_lock);

	b = mdev->oldest_barrier;
	mdev->oldest_barrier = b->next;

	list_del(&b->requests);
	/* There could be requests on the list waiting for completion
	   of the write to the local disk, to avoid corruptions of
	   slab's data structures we have to remove the lists head */

	spin_unlock_irq(&mdev->tl_lock);

	D_ASSERT(b->br_number == barrier_nr);
	D_ASSERT(b->n_req == set_size);
...


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Drbd-dev] Transaction log related assert messages running DRBD 8 trunk
  2006-07-25 18:56 [Drbd-dev] Transaction log related assert messages running DRBD 8 trunk Graham, Simon
@ 2006-07-26  8:11 ` Philipp Reisner
  0 siblings, 0 replies; 2+ messages in thread
From: Philipp Reisner @ 2006-07-26  8:11 UTC (permalink / raw)
  To: drbd-dev

Am Dienstag, 25. Juli 2006 20:56 schrieb Graham, Simon:
> Running some failover stress testing with the latest DRBD 8, I have
> started to notice assert failures like this:
>
> Jul 24 17:36:22 peer kernel: drbd1: ASSERT( b->br_number == barrier_nr )
> in drbd/drbd_main.c:280
> Jul 24 17:36:22 peer kernel: drbd1: ASSERT( b->n_req == set_size ) in
> drbd/drbd_main.c:281
>
> I'm not quite sure what these mean, but I do note that the code releases
> the spin lock before the assert and it occurs to me that perhaps the
> D_ASSERTs should also be done with the lock held (see below)?
>
> Simon
>
> --- code from drbd_main.c ---
>
> void tl_release(drbd_dev *mdev,unsigned int barrier_nr,
> 		       unsigned int set_size)
> {
> 	struct drbd_barrier *b;
>
> 	spin_lock_irq(&mdev->tl_lock);
>
> 	b = mdev->oldest_barrier;
> 	mdev->oldest_barrier = b->next;
>
> 	list_del(&b->requests);
> 	/* There could be requests on the list waiting for completion
> 	   of the write to the local disk, to avoid corruptions of
> 	   slab's data structures we have to remove the lists head */
>
> 	spin_unlock_irq(&mdev->tl_lock);
>
> 	D_ASSERT(b->br_number == barrier_nr);
> 	D_ASSERT(b->n_req == set_size);
> ...
>

Hi Simon,

Currently the code looks like this:

void tl_release(drbd_dev *mdev,unsigned int barrier_nr,
		       unsigned int set_size)
{
	struct drbd_barrier *b;

	spin_lock_irq(&mdev->tl_lock);

	b = mdev->oldest_barrier;
	mdev->oldest_barrier = b->next;

	list_del(&b->requests);
	/* There could be requests on the list waiting for completion
	   of the write to the local disk, to avoid corruptions of
	   slab's data structures we have to remove the lists head */

	spin_unlock_irq(&mdev->tl_lock);

	D_ASSERT(b->br_number == barrier_nr);
	D_ASSERT(b->n_req == set_size);

#ifdef DBG_ASSERTS
	if(b->br_number != barrier_nr) {
		DUMPI(b->br_number);
		DUMPI(barrier_nr);
	}
	if(b->n_req != set_size) {
		DUMPI(b->n_req);
		DUMPI(set_size);
	}
#endif

	kfree(b);
}


In case they are different you should also see the nubers. 
BTW, the spinlock only protects the linked lists. Looking at
the content of the barrier object is ok.

PS: Recently I was quite active in this parts of the code, with
    the current SVN head, these ASSERTS should not trigger.

BTW: The meaning is, we sent a number of write requests between
     two barriers. When the barrier ACK of the peer comes in
     we verify that the peer wrote the same number of writes
     between those two barriers.

-Phil
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-07-26  8:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-25 18:56 [Drbd-dev] Transaction log related assert messages running DRBD 8 trunk Graham, Simon
2006-07-26  8:11 ` Philipp Reisner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox