Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Lars Ellenberg <Lars.Ellenberg@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] DRBD-8: recent regression causing corruption and crashes
Date: Fri, 11 Aug 2006 20:45:59 +0200	[thread overview]
Message-ID: <20060811184559.GG7373@soda.linbit> (raw)
In-Reply-To: <342BAC0A5467384983B586A6B0B3767103624F3C@EXNA.corp.stratus.com>

/ 2006-08-11 12:01:23 -0400
\ Graham, Simon:
> Quick update:
> 

How exactly do you "test"?
Kernel and hardware?
(sorry, if you posted that earlier, just point me to it)

I triggered a full sync (drbdadm invalidate),
and while that was running, access the Primary(SyncSource)
(cp -av /somethinghuge/ /mnt/drbd-mount-point/)

> > 1. I get errors during initial synchronization of a volume like this
> > that cause the resync to be aborted:
> > 
> > drbd15: tl_verify: failed to find req e51a4da0, sector 0 in list

I don't see those here.

> DRBD, Cmd: WriteAck, BlkId: SYNCER Sector: 0, AckLen: 8000

I don't see these either.

> > 2. I get panics with the following signature:- these look like they
> are
> > happening when a local write
> >     on the primary (which this node is) completes.
> 
> The panic signature seems to change - for example, I just got one like
> this in the receiver thread:
> 
> drbd15: ASSERT( drbd_req_get_sector(i) == sector ) in
> /sandbox/sgraham/sn/trunk/platform/drbd/8.0/drbd/drbd_main.c:313
> drbd15: tl_verify: found req e63d0240 but it has wrong sector (8 versus
> 0)

nor these.

> drbd15: in tl_clear_barrier:374: ap_pending_cnt = -1 < 0 !

this is bad...

What I do see here is: "ap_pending > 0" still too often, when I
disconnect during resync + write activity, effectively blocking the
Primary's io subsystem.  seemingly we still got bugs in tl_clear :(
need to look into that further.

> Code:  Bad EIP value.
>  <0>Fatal exception: panic in 5 seconds

outch.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :

  reply	other threads:[~2006-08-11 18:45 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-11 16:01 [Drbd-dev] DRBD-8: recent regression causing corruption and crashes Graham, Simon
2006-08-11 18:45 ` Lars Ellenberg [this message]
  -- strict thread matches above, loose matches on Subject: below --
2006-08-11  2:31 Graham, Simon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060811184559.GG7373@soda.linbit \
    --to=lars.ellenberg@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox