Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Cc: drbd-user@linbit.com
Subject: [Drbd-dev] Re: [DRBD-user] drbd_panic() in drbd_receiver.c
Date: Tue, 4 Jul 2006 12:07:57 +0200	[thread overview]
Message-ID: <200607041207.57710.philipp.reisner@linbit.com> (raw)
In-Reply-To: <342BAC0A5467384983B586A6B0B37671031FB31E@EXNA.corp.stratus.com>

Am Montag, 3. Juli 2006 19:03 schrieb Graham, Simon:
> I too have been looking into this -- I agree with Damian and think it's
> very important that DRBD never panic in cases like this if it is to be
> used in an HA system -- I think the final approach has to be one of
> fixing up underlying disk errors where possible and returning an error
> to the caller where it is not possible to fix up.
>
> In this specific case (NegDReply), it seems that it would be OK to
> simply remove the panic() and complete the original request with an EIO
> error or somesuch - this does mean adding a call to
> drbd_bio_endio(bio,0) in addition to removing the panic() though.
>
> Even if this is acceptable, there are a bunch of other places where
> panic is currently done that, I think, also need to be changed,
> including:
>
> 1. In drbd_set_state if the node is now Primary and does not have access
> to good data; I think this can simply be removed
>    since drbd_fail_request_early already returns a failure to the caller
> in this case.
>
> 2. Failure to write bitmap to disk; not sure what the right answer is
> here - any suggestions? (perhaps force the disk to be
>    inconsistent in some manner that will require a complete resync?)
>
> 3. Failure to write meta data to disk; ditto above only harder -- if you
> cant write to the meta-data area, you cant store data
>    that indicates the contents are bad...
>
> 4. Received NegRSDReply -- during resync, SyncTarget gets error from
> SyncSource; In this specific case, it seems to me that
>    a possible solution is to leave the block in question set in the
> bitmap, ensure that the state is never set consistent
>    on the current SyncTarget and ensure that no matter what happens, the
> current SyncSource remains the best source of data.
>    A potential issue with this is that the SyncTarget will continue to
> attempt to synchronize the block in question - since
>    it's still set in the bitmap it will eventually be found again when
> the syncer wraps round - maybe that's OK though (so
>    long as there is some sort of delay between attempts)?
>
> I am planning on implementing these, assuming there isn't any huge
> disagreement on the approach and assuming it isn't already in
> progress...
>
> Perhaps we should take this discussion to drvd-dev?
> Simon
>
> PS: Once the panics are gone, there is a second phase required which is
> to fix up underlying errors where possible -- for example, if the volume
> is consistent on both sides and a read on the primary fails, not only
> should the read be retried to the secondary but also the returned data
> should be rewritten on the primary -- for a class of errors, this will
> actually fix the problem as the disk will remap a bad block when the
> write is done; is anyone working on this?

Excellent ideas. In case you really start to work on this, please
base your work on the drbd-8.0 code, preferably the trunk.

PS: Moving this thread over to drbd-dev, is a good idea.

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

       reply	other threads:[~2006-07-04 10:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <342BAC0A5467384983B586A6B0B37671031FB31E@EXNA.corp.stratus.com>
2006-07-04 10:07 ` Philipp Reisner [this message]
2006-07-04 15:01 [Drbd-dev] Re: [DRBD-user] drbd_panic() in drbd_receiver.c Graham, Simon
2006-07-04 15:23 ` Lars Ellenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200607041207.57710.philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=drbd-user@linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox