Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] Troubleshooting digest failures?
Date: Wed, 6 Aug 2008 19:09:30 +0200	[thread overview]
Message-ID: <20080806170930.GL32725@soda.linbit> (raw)
In-Reply-To: <20080806160409.GJ32725@soda.linbit>

On Wed, Aug 06, 2008 at 06:04:09PM +0200, Lars Ellenberg wrote:
> On Wed, Aug 06, 2008 at 08:56:25AM -0600, Gregor Mosheh wrote:
> > Hey guys.
> 
> Hello again.
> 
> Sorry for the joke, but I cannot help it.
> You know the story about "The hare and the hedgehog"?
> 
> > I've gotten no response from the user list,
> 
> now, that is not entirely true ;)
> 
> > so maybe it's time  
> > for a different tack debugging DRBD's innards...
> >
> > I've been having a problem which I describe here. The last posting is  
> > probably the most relevant.
> > http://www.gossamer-threads.com/lists/drbd/users/15119
> >
> > How would I go about debugging this?  Is there extra logging or
> > debugging  which I can enable? Have any of you seen this before?
> 
> Anyways,
> appart from what I wrote in your thread, and the 
> "What causes nodes to become out-of-sync?" thread,
>  http://www.gossamer-threads.com/lists/drbd/users/15081
> there is not much else I can say.
> 
> You said you have an other cluster, not yet in production, where it did
> not occur so far, and you suggest it may be just the missing load that
> makes it "appear" healthy.
> 
> How about using it as test setup, and generate load on it,
> until you can provoke the symptom there, too?
> 
> To reverse that, if you cannot provoke the symptom there,
> I'd still point to hardware issues on the affected cluster.

also, please have a look at this thread, where I try to explain
why modifying in-flight data buffers would lead to these symptoms.
http://www.gossamer-threads.com/lists/drbd/users/15189

also, when online-verify reports the out-of-sync sectors,
please to the
 # dd iflag=direct if=/dev/whatever bs=512 \
	skip=sector-offset count=size \
	of=nodename.dump
 # diff -U0 <(xxd node0.dump) <(xxd node1.dump)
trick (explained in the "what causes nodes to become out of sync"
thread) to get a diff of the hexdumps, so we can tell whether there is
  single bit flips,
  multiple word data changes
  complete unrelated stuff
in the corresponding sectors on the different nodes.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :

      reply	other threads:[~2008-08-06 17:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-06 14:56 [Drbd-dev] Troubleshooting digest failures? Gregor Mosheh
2008-08-06 16:04 ` Lars Ellenberg
2008-08-06 17:09   ` Lars Ellenberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080806170930.GL32725@soda.linbit \
    --to=lars.ellenberg@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox