From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@fb.com>
Cc: linux-raid@vger.kernel.org, songliubraving@fb.com,
hch@infradead.org, dan.j.williams@intel.com, Kernel-team@fb.com
Subject: Re: [PATCH V4 00/13] MD: a caching layer for raid5/6
Date: Thu, 16 Jul 2015 09:16:53 +1000 [thread overview]
Message-ID: <20150716091653.7b970b32@noble> (raw)
In-Reply-To: <20150715194927.GA3502691@devbig257.prn2.facebook.com>
On Wed, 15 Jul 2015 12:49:37 -0700 Shaohua Li <shli@fb.com> wrote:
> On Wed, Jul 15, 2015 at 02:06:41PM +1000, NeilBrown wrote:
> > On Tue, 14 Jul 2015 20:16:17 -0700 Shaohua Li <shli@fb.com> wrote:
> >
> >
> > > I don't
> > > understand why you object adding a superblock for cache. The advantage
> > > is it's self contained. And there is nothing about
> > > complexity/maintaince, as we can store the most necessary fields into
> > > the superblock.
> >
> > Because there is precisely 1 number that needs to be stored in the
> > superblock, and there seems no point having a superblock just to store
> > one number.
> > It isn't much extra complexity, but any extra thing is still an extra
> > thing.
> > Having the data section of the log device containing just a log is
> > elegant. Elegant is good.
> > If we decided that keeping two copies for superblocks was a good idea
> > (which I think it is, I just haven't created a "v1.3" layout yet), then
> > re-using the main superblock for the head-of-log pointer would instantly
> > give us two copies of that as well.
>
> I think I need 2 fields to find log head/tail in recovery. Currently
> cache superblock records checkpoint disk position (log tail) and
> checkpoint sequence number, which can be used to find log head. Just
> recording log tail doesn't work well (it might work, for example,
> zeroing sectors before log head, so we can identify log head. But it's
> really ugly and not efficient). I only found recovery_offset can be
> overloaded. Do you have idea other fileds can be overloaded in MD
> superblock?
If each metadata block contains
- a magic number
- a checksum of the block
- a sequence number
- a pointer to the "next" metadata block (which is equivalent to
the size of all described data)
- a pointer to the tail (oldest active metadata block).
Then given the address of any block in the log you can easily find the
head: walk the "next" pointers forward until you find a block
that has the wrong magic or checksum or sequence or previous pointer.
The last block that was consistent is the head.
You can then find the tail directly, and walk forward processing the
log.
Efficiency is not really an issue. On a clean shutdown (which should
be the norm), the md superblock will contain a pointer to the head, and
the "next" block after that can quickly be determined to be invalid.
On an unclean shutdown it is expected that we need to do a bit more
work, and skipping forward along the chain to find the head of the log
is the least of our worries.
NeilBrown
next prev parent reply other threads:[~2015-07-15 23:16 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-23 21:37 [PATCH V4 00/13] MD: a caching layer for raid5/6 Shaohua Li
2015-06-23 21:37 ` [PATCH V4 01/13] MD: add a new disk role to present cache device Shaohua Li
2015-06-23 21:37 ` [PATCH V4 02/13] raid5: directly use mddev->queue Shaohua Li
2015-06-23 21:37 ` [PATCH V4 03/13] raid5: cache log handling Shaohua Li
2015-06-23 21:37 ` [PATCH V4 04/13] raid5: cache part of raid5 cache Shaohua Li
2015-06-23 21:37 ` [PATCH V4 05/13] raid5: cache reclaim support Shaohua Li
2015-06-23 21:37 ` [PATCH V4 06/13] raid5: cache IO error handling Shaohua Li
2015-06-23 21:37 ` [PATCH V4 07/13] raid5: cache device quiesce support Shaohua Li
2015-06-23 21:37 ` [PATCH V4 08/13] raid5: cache recovery support Shaohua Li
2015-06-23 21:37 ` [PATCH V4 09/13] raid5: add some sysfs entries Shaohua Li
2015-06-23 21:38 ` [PATCH V4 10/13] raid5: don't allow resize/reshape with cache support Shaohua Li
2015-06-23 21:38 ` [PATCH V4 11/13] raid5: guarantee cache release stripes in correct way Shaohua Li
2015-06-23 21:38 ` [PATCH V4 12/13] raid5: enable cache for raid array with cache disk Shaohua Li
2015-06-23 21:38 ` [PATCH V4 13/13] raid5: skip resync if caching is enabled Shaohua Li
2015-07-02 3:25 ` [PATCH V4 00/13] MD: a caching layer for raid5/6 Yuanhan Liu
2015-07-02 17:11 ` Shaohua Li
2015-07-03 2:18 ` Yuanhan Liu
2015-07-08 1:56 ` NeilBrown
2015-07-08 5:44 ` Shaohua Li
2015-07-09 23:21 ` NeilBrown
2015-07-10 4:08 ` Shaohua Li
2015-07-10 4:36 ` NeilBrown
2015-07-10 4:52 ` Shaohua Li
2015-07-10 5:10 ` NeilBrown
2015-07-10 5:18 ` Shaohua Li
2015-07-10 6:42 ` NeilBrown
2015-07-10 17:48 ` Shaohua Li
2015-07-13 22:22 ` NeilBrown
2015-07-13 22:35 ` Shaohua Li
2015-07-15 0:45 ` Shaohua Li
2015-07-15 2:12 ` NeilBrown
2015-07-15 3:16 ` Shaohua Li
2015-07-15 4:06 ` NeilBrown
2015-07-15 19:49 ` Shaohua Li
2015-07-15 23:16 ` NeilBrown [this message]
2015-07-16 0:07 ` Shaohua Li
2015-07-16 1:22 ` NeilBrown
2015-07-16 4:13 ` Shaohua Li
2015-07-16 6:07 ` NeilBrown
2015-07-16 15:07 ` John Stoffel
2015-07-20 0:03 ` NeilBrown
2015-07-20 14:11 ` John Stoffel
2015-07-16 17:40 ` Shaohua Li
2015-07-17 3:47 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150716091653.7b970b32@noble \
--to=neilb@suse.com \
--cc=Kernel-team@fb.com \
--cc=dan.j.williams@intel.com \
--cc=hch@infradead.org \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fb.com \
--cc=songliubraving@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).