All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Reisner <philipp.reisner@linbit.com>
To: drbd-dev@lists.linbit.com
Cc: drbd-user@linbit.com
Subject: [Drbd-dev] Re: [DRBD-user] drbd_panic() in drbd_receiver.c
Date: Tue, 4 Jul 2006 12:07:57 +0200	[thread overview]
Message-ID: <200607041207.57710.philipp.reisner@linbit.com> (raw)
In-Reply-To: <342BAC0A5467384983B586A6B0B37671031FB31E@EXNA.corp.stratus.com>

Am Montag, 3. Juli 2006 19:03 schrieb Graham, Simon:
> I too have been looking into this -- I agree with Damian and think it's
> very important that DRBD never panic in cases like this if it is to be
> used in an HA system -- I think the final approach has to be one of
> fixing up underlying disk errors where possible and returning an error
> to the caller where it is not possible to fix up.
>
> In this specific case (NegDReply), it seems that it would be OK to
> simply remove the panic() and complete the original request with an EIO
> error or somesuch - this does mean adding a call to
> drbd_bio_endio(bio,0) in addition to removing the panic() though.
>
> Even if this is acceptable, there are a bunch of other places where
> panic is currently done that, I think, also need to be changed,
> including:
>
> 1. In drbd_set_state if the node is now Primary and does not have access
> to good data; I think this can simply be removed
>    since drbd_fail_request_early already returns a failure to the caller
> in this case.
>
> 2. Failure to write bitmap to disk; not sure what the right answer is
> here - any suggestions? (perhaps force the disk to be
>    inconsistent in some manner that will require a complete resync?)
>
> 3. Failure to write meta data to disk; ditto above only harder -- if you
> cant write to the meta-data area, you cant store data
>    that indicates the contents are bad...
>
> 4. Received NegRSDReply -- during resync, SyncTarget gets error from
> SyncSource; In this specific case, it seems to me that
>    a possible solution is to leave the block in question set in the
> bitmap, ensure that the state is never set consistent
>    on the current SyncTarget and ensure that no matter what happens, the
> current SyncSource remains the best source of data.
>    A potential issue with this is that the SyncTarget will continue to
> attempt to synchronize the block in question - since
>    it's still set in the bitmap it will eventually be found again when
> the syncer wraps round - maybe that's OK though (so
>    long as there is some sort of delay between attempts)?
>
> I am planning on implementing these, assuming there isn't any huge
> disagreement on the approach and assuming it isn't already in
> progress...
>
> Perhaps we should take this discussion to drvd-dev?
> Simon
>
> PS: Once the panics are gone, there is a second phase required which is
> to fix up underlying errors where possible -- for example, if the volume
> is consistent on both sides and a read on the primary fails, not only
> should the read be retried to the secondary but also the returned data
> should be rewritten on the primary -- for a class of errors, this will
> actually fix the problem as the disk will remap a bad block when the
> write is done; is anyone working on this?

Excellent ideas. In case you really start to work on this, please
base your work on the drbd-8.0 code, preferably the trunk.

PS: Moving this thread over to drbd-dev, is a good idea.

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

       reply	other threads:[~2006-07-04 10:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <342BAC0A5467384983B586A6B0B37671031FB31E@EXNA.corp.stratus.com>
2006-07-04 10:07 ` Philipp Reisner [this message]
2006-07-04 15:01 [Drbd-dev] Re: [DRBD-user] drbd_panic() in drbd_receiver.c Graham, Simon
2006-07-04 15:23 ` Lars Ellenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200607041207.57710.philipp.reisner@linbit.com \
    --to=philipp.reisner@linbit.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=drbd-user@linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.