From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@fb.com>
Cc: linux-raid@vger.kernel.org, Kernel-team@fb.com,
songliubraving@fb.com, hch@infradead.org,
dan.j.williams@intel.com
Subject: Re: [PATCH 5/9] raid5: log recovery
Date: Wed, 12 Aug 2015 13:51:18 +1000 [thread overview]
Message-ID: <20150812135118.542939c4@noble> (raw)
In-Reply-To: <20150805213909.GA3197719@devbig257.prn2.facebook.com>
On Wed, 5 Aug 2015 14:39:09 -0700 Shaohua Li <shli@fb.com> wrote:
> On Wed, Aug 05, 2015 at 02:05:25PM +1000, NeilBrown wrote:
> > On Wed, 29 Jul 2015 17:38:45 -0700 Shaohua Li <shli@fb.com> wrote:
> >
> > > This is the log recovery support. The process is quite straightforward.
> > > We scan the log and read all valid meta/data/parity into memory. If a
> > > stripe's data/parity checksum is correct, the stripe will be recoveried.
> > > Otherwise, it's discarded and we don't scan the log further. The reclaim
> > > process guarantees stripe which starts to be flushed raid disks has
> > > completed data/parity and has correct checksum. To recovery a stripe, we
> > > just copy its data/parity to corresponding raid disks.
> > >
> > > The trick thing is superblock update after recovery. we can't let
> > > superblock point to last valid meta block. The log might look like:
> > > | meta 1| meta 2| meta 3|
> > > meta 1 is valid, meta 2 is invalid. meta 3 could be valid. If superblock
> > > points to meta 1, we write a new valid meta 2n. If crash happens again,
> > > new recovery will start from meta 1. Since meta 2n is valid, recovery
> > > will think meta 3 is valid, which is wrong. The solution is we create a
> > > new meta in meta2 with its seq == meta 1's seq + 2 and let superblock
> > > points to meta2. recovery will not think meta 3 is a valid meta,
> > > because its seq is wrong
> >
> > I like the idea of using a slightly larger 'seq' to avoid collisions -
> > except that I would probably feel safer with a much larger seq. May add
> > 1024 or something (at least 10).
>
> ok
> > >
> > > TODO:
> > > -recovery should run the stripe cache state machine in case of disk
> > > breakage.
> >
> > Why?
> >
> > when you write to the log, you write all of the blocks that need
> > updating, whether they are destined for a failed device or not.
> >
> > When you recover, you then have all the blocks that you might want to
> > write. So write all the ones for which you have working devices, and
> > ignore the rest.
> >
> > Did I miss something?
> >
> > Not that I object, but if it works....
>
> I mean the case of disk is broken. For example, log has a stripe with
> data for disk 1, 2, 4. In recovery, disk 2 is broken. Just write 1, 4
> isn't good. If we run the state machine, we can read disk 3 and have an
> eventually consistent stripe.
But the log will have date for disk 1, 2, 4, and P and Q.
So if disk 2 is broken, we just write 1, 4, P, and Q and the data is
safe.
NeilBrown
next prev parent reply other threads:[~2015-08-12 3:51 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-30 0:38 [PATCH 0/9]raid5: fix write hole Shaohua Li
2015-07-30 0:38 ` [PATCH 1/9] MD: add a new disk role to present cache device Shaohua Li
2015-08-04 14:28 ` Christoph Hellwig
2015-08-04 18:17 ` Song Liu
2015-08-05 0:25 ` NeilBrown
2015-08-05 1:05 ` NeilBrown
2015-07-30 0:38 ` [PATCH 2/9] md: override md superblock recovery_offset for " Shaohua Li
2015-08-04 14:30 ` Christoph Hellwig
2015-08-05 1:08 ` NeilBrown
2015-07-30 0:38 ` [PATCH 3/9] raid5: add basic stripe log Shaohua Li
2015-08-05 3:07 ` NeilBrown
2015-08-05 21:19 ` Shaohua Li
2015-08-12 3:20 ` NeilBrown
2015-07-30 0:38 ` [PATCH 4/9] raid5: log reclaim support Shaohua Li
2015-08-05 3:43 ` NeilBrown
2015-08-05 21:34 ` Shaohua Li
2015-08-12 3:50 ` NeilBrown
2015-08-05 3:52 ` NeilBrown
2015-07-30 0:38 ` [PATCH 5/9] raid5: log recovery Shaohua Li
2015-08-05 4:05 ` NeilBrown
2015-08-05 21:39 ` Shaohua Li
2015-08-12 3:51 ` NeilBrown [this message]
2015-07-30 0:38 ` [PATCH 6/9] raid5: disable batch with log enabled Shaohua Li
2015-07-30 0:38 ` [PATCH 7/9] raid5: don't allow resize/reshape with cache(log) support Shaohua Li
2015-08-05 4:13 ` NeilBrown
2015-08-05 21:42 ` Shaohua Li
2015-08-12 3:57 ` NeilBrown
2015-07-30 0:38 ` [PATCH 8/9] raid5: enable log for raid array with cache disk Shaohua Li
2015-07-30 0:38 ` [PATCH 9/9] raid5: skip resync if cache(log) is enabled Shaohua Li
2015-08-05 4:16 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150812135118.542939c4@noble \
--to=neilb@suse.com \
--cc=Kernel-team@fb.com \
--cc=dan.j.williams@intel.com \
--cc=hch@infradead.org \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fb.com \
--cc=songliubraving@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.