From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] Troubleshooting digest failures?
Date: Wed, 6 Aug 2008 19:09:30 +0200 [thread overview]
Message-ID: <20080806170930.GL32725@soda.linbit> (raw)
In-Reply-To: <20080806160409.GJ32725@soda.linbit>
On Wed, Aug 06, 2008 at 06:04:09PM +0200, Lars Ellenberg wrote:
> On Wed, Aug 06, 2008 at 08:56:25AM -0600, Gregor Mosheh wrote:
> > Hey guys.
>
> Hello again.
>
> Sorry for the joke, but I cannot help it.
> You know the story about "The hare and the hedgehog"?
>
> > I've gotten no response from the user list,
>
> now, that is not entirely true ;)
>
> > so maybe it's time
> > for a different tack debugging DRBD's innards...
> >
> > I've been having a problem which I describe here. The last posting is
> > probably the most relevant.
> > http://www.gossamer-threads.com/lists/drbd/users/15119
> >
> > How would I go about debugging this? Is there extra logging or
> > debugging which I can enable? Have any of you seen this before?
>
> Anyways,
> appart from what I wrote in your thread, and the
> "What causes nodes to become out-of-sync?" thread,
> http://www.gossamer-threads.com/lists/drbd/users/15081
> there is not much else I can say.
>
> You said you have an other cluster, not yet in production, where it did
> not occur so far, and you suggest it may be just the missing load that
> makes it "appear" healthy.
>
> How about using it as test setup, and generate load on it,
> until you can provoke the symptom there, too?
>
> To reverse that, if you cannot provoke the symptom there,
> I'd still point to hardware issues on the affected cluster.
also, please have a look at this thread, where I try to explain
why modifying in-flight data buffers would lead to these symptoms.
http://www.gossamer-threads.com/lists/drbd/users/15189
also, when online-verify reports the out-of-sync sectors,
please to the
# dd iflag=direct if=/dev/whatever bs=512 \
skip=sector-offset count=size \
of=nodename.dump
# diff -U0 <(xxd node0.dump) <(xxd node1.dump)
trick (explained in the "what causes nodes to become out of sync"
thread) to get a diff of the hexdumps, so we can tell whether there is
single bit flips,
multiple word data changes
complete unrelated stuff
in the corresponding sectors on the different nodes.
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
prev parent reply other threads:[~2008-08-06 17:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-06 14:56 [Drbd-dev] Troubleshooting digest failures? Gregor Mosheh
2008-08-06 16:04 ` Lars Ellenberg
2008-08-06 17:09 ` Lars Ellenberg [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080806170930.GL32725@soda.linbit \
--to=lars.ellenberg@linbit.com \
--cc=drbd-dev@lists.linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox