linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@fb.com>
Cc: linux-raid@vger.kernel.org, Kernel-team@fb.com,
	songliubraving@fb.com, hch@infradead.org,
	dan.j.williams@intel.com
Subject: Re: [PATCH 5/9] raid5: log recovery
Date: Wed, 12 Aug 2015 13:51:18 +1000	[thread overview]
Message-ID: <20150812135118.542939c4@noble> (raw)
In-Reply-To: <20150805213909.GA3197719@devbig257.prn2.facebook.com>

On Wed, 5 Aug 2015 14:39:09 -0700 Shaohua Li <shli@fb.com> wrote:

> On Wed, Aug 05, 2015 at 02:05:25PM +1000, NeilBrown wrote:
> > On Wed, 29 Jul 2015 17:38:45 -0700 Shaohua Li <shli@fb.com> wrote:
> > 
> > > This is the log recovery support. The process is quite straightforward.
> > > We scan the log and read all valid meta/data/parity into memory. If a
> > > stripe's data/parity checksum is correct, the stripe will be recoveried.
> > > Otherwise, it's discarded and we don't scan the log further. The reclaim
> > > process guarantees stripe which starts to be flushed raid disks has
> > > completed data/parity and has correct checksum. To recovery a stripe, we
> > > just copy its data/parity to corresponding raid disks.
> > > 
> > > The trick thing is superblock update after recovery. we can't let
> > > superblock point to last valid meta block. The log might look like:
> > > | meta 1| meta 2| meta 3|
> > > meta 1 is valid, meta 2 is invalid. meta 3 could be valid. If superblock
> > > points to meta 1, we write a new valid meta 2n.  If crash happens again,
> > > new recovery will start from meta 1. Since meta 2n is valid, recovery
> > > will think meta 3 is valid, which is wrong.  The solution is we create a
> > > new meta in meta2 with its seq == meta 1's seq + 2 and let superblock
> > > points to meta2.  recovery will not think meta 3 is a valid meta,
> > > because its seq is wrong
> > 
> > I like the idea of using a slightly larger 'seq' to avoid collisions -
> > except that I would probably feel safer with a much larger seq. May add
> > 1024 or something (at least 10).
> 
> ok 
> > > 
> > > TODO:
> > > -recovery should run the stripe cache state machine in case of disk
> > > breakage.
> > 
> > Why?
> > 
> > when you write to the log, you write all of the blocks that need
> > updating, whether they are destined for a failed device or not.
> > 
> > When you recover, you then have all the blocks that you might want to
> > write.  So write all the ones for which you have working devices, and
> > ignore the rest.
> > 
> > Did I miss something?
> > 
> > Not that I object, but if it works....
> 
> I mean the case of disk is broken. For example, log has a stripe with
> data for disk 1, 2, 4. In recovery, disk 2 is broken. Just write 1, 4
> isn't good. If we run the state machine, we can read disk 3 and have an
> eventually consistent stripe.

But the log will have date for disk 1, 2, 4, and P and Q.
So if disk 2 is broken, we just write 1, 4, P, and Q and the data is
safe.

NeilBrown


  reply	other threads:[~2015-08-12  3:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-30  0:38 [PATCH 0/9]raid5: fix write hole Shaohua Li
2015-07-30  0:38 ` [PATCH 1/9] MD: add a new disk role to present cache device Shaohua Li
2015-08-04 14:28   ` Christoph Hellwig
2015-08-04 18:17     ` Song Liu
2015-08-05  0:25       ` NeilBrown
2015-08-05  1:05   ` NeilBrown
2015-07-30  0:38 ` [PATCH 2/9] md: override md superblock recovery_offset for " Shaohua Li
2015-08-04 14:30   ` Christoph Hellwig
2015-08-05  1:08   ` NeilBrown
2015-07-30  0:38 ` [PATCH 3/9] raid5: add basic stripe log Shaohua Li
2015-08-05  3:07   ` NeilBrown
2015-08-05 21:19     ` Shaohua Li
2015-08-12  3:20       ` NeilBrown
2015-07-30  0:38 ` [PATCH 4/9] raid5: log reclaim support Shaohua Li
2015-08-05  3:43   ` NeilBrown
2015-08-05 21:34     ` Shaohua Li
2015-08-12  3:50       ` NeilBrown
2015-08-05  3:52   ` NeilBrown
2015-07-30  0:38 ` [PATCH 5/9] raid5: log recovery Shaohua Li
2015-08-05  4:05   ` NeilBrown
2015-08-05 21:39     ` Shaohua Li
2015-08-12  3:51       ` NeilBrown [this message]
2015-07-30  0:38 ` [PATCH 6/9] raid5: disable batch with log enabled Shaohua Li
2015-07-30  0:38 ` [PATCH 7/9] raid5: don't allow resize/reshape with cache(log) support Shaohua Li
2015-08-05  4:13   ` NeilBrown
2015-08-05 21:42     ` Shaohua Li
2015-08-12  3:57       ` NeilBrown
2015-07-30  0:38 ` [PATCH 8/9] raid5: enable log for raid array with cache disk Shaohua Li
2015-07-30  0:38 ` [PATCH 9/9] raid5: skip resync if cache(log) is enabled Shaohua Li
2015-08-05  4:16   ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150812135118.542939c4@noble \
    --to=neilb@suse.com \
    --cc=Kernel-team@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).