Re: [PATCH V4 00/13] MD: a caching layer for raid5/6

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@fb.com>
Cc: linux-raid@vger.kernel.org, songliubraving@fb.com,
	hch@infradead.org, dan.j.williams@intel.com, Kernel-team@fb.com
Subject: Re: [PATCH V4 00/13] MD: a caching layer for raid5/6
Date: Wed, 15 Jul 2015 14:06:41 +1000	[thread overview]
Message-ID: <20150715140641.3c61cf78@noble> (raw)
In-Reply-To: <20150715031615.GA1292235@devbig257.prn2.facebook.com>

On Tue, 14 Jul 2015 20:16:17 -0700 Shaohua Li <shli@fb.com> wrote:

> On Wed, Jul 15, 2015 at 12:12:34PM +1000, NeilBrown wrote:
> > On Tue, 14 Jul 2015 17:45:04 -0700 Shaohua Li <shli@fb.com> wrote:
> > 
> > > On Fri, Jul 10, 2015 at 02:36:56PM +1000, NeilBrown wrote:
> > 
> > > > Yes it does.  Having a single sort of metadata block is an important
> > > > part of the goal.  How the code actually chooses to use these is a
> > > > separate issue that can change harmlessly.
> > > 
> > > Taking a close look to reuse MD superblock for caching. It turns out to
> > > be quite hacky. Suppose I use md_update_sb to update superblock when we
> > > checkpoint the log. So I update corresponding fields of mddev
> > > (resync_offset, recovery_offset). In md_update_sb, I must add a bunch of
> > > 'if (caching_disk) xxx' as raid disks shouldn't store the
> > > resync_offset/recovery_offset. Or I can add a new cache_update_sb, but I
> > > thought I must add the same hack code if we don't duplicate a lot of
> > > code.
> > 
> > in md_update_sb, in the loop:
> > 
> > 
> > 	/* First make sure individual recovery_offsets are correct */
> > 	rdev_for_each(rdev, mddev) {
> > 		if (rdev->raid_disk >= 0 &&
> > 		    mddev->delta_disks >= 0 &&
> > 		    !test_bit(In_sync, &rdev->flags) &&
> > 		    mddev->curr_resync_completed > rdev->recovery_offset)
> > 				rdev->recovery_offset = mddev->curr_resync_completed;
> > 
> > 	}
> > 
> > add something like:
> >                else if (rdev->is_cache)
> >                         rdev->recovery_offset =
> >                         mddev->cache->latest_checkpoint
> > 
> > 
> > In super_1_sync, where the code:
> > 
> > 	if (rdev->raid_disk >= 0 &&
> > 	    !test_bit(In_sync, &rdev->flags)) {
> > 		sb->feature_map |=
> > 			cpu_to_le32(MD_FEATURE_RECOVERY_OFFSET);
> > 		sb->recovery_offset =
> > 			cpu_to_le64(rdev->recovery_offset);
> > 		if (rdev->saved_raid_disk >= 0 && mddev->bitmap)
> > 			sb->feature_map |=
> > 				cpu_to_le32(MD_FEATURE_RECOVERY_BITMAP);
> > 	}
> > 
> > is, add something like
> > 	else if (rdev->is_a_cache_disk) {
> >               sb->feature_map |= MD_FEATURE_IMA_CACHE;
> >               sb->recovery_offset = cpu_to_le64(rdev->recovery_Offset);
> >         }
> > 
> > or just make the original code a little more general - I'm not sure
> > exactly how you flag the cache device.
> > 
> > You don't need to do this every time you checkpoint the log.  The
> > pointer just needs to point to somewhere in the log so that the
> > start/end can be found (each metadata block points to the next one).
> > You could leave it until the log wraps completely, though that probably
> > isn't ideal.
> > 
> > So when you checkpoint the log, if the ->recovery_offset of the cache
> > device is more than (say) 25% behind the new checkpoint location, just
> > set MD_CHANGE_PENDING and wake the md thread.
> > 
> > I don't see that as particularly hackish.
> 
> The policy above about when superblock should be written is fine with
> me, but I'd like to focus on where/how superblock should be written
> here. I'd say exporting a structure of cache (the
> cache->latest_checkpoint) to generic MD layer is very hackish.

OK, get the cache code to write the desired value into the
recovery_offset field, so the md code only has to look at that.
The core md code does still need to know there is a cache, and which is
the cache device - it cannot be completely unaware...

> md_update_sb writes all raid disks, that's bad since we just want to
> update cache disk.

How bad?  How often?  Would you really be able to notice?

And having a per-device "update superblock" flag is not completely out
of the question.  RAID10 could benefit from the clean/dirty state being
localized to the device which was actually dirty.

>                    Overloading some fields of MD superblock and using
> them with some 'if (cache)' stuff is not natural way too.

If we could foresee everything, we could assign everything its own
field.  But unfortunately I didn't.  That is why we have feature bits.
Different feature bits mean different fields have different meanings.


>                                                           I don't
> understand why you object adding a superblock for cache. The advantage
> is it's self contained. And there is nothing about
> complexity/maintaince, as we can store the most necessary fields into
> the superblock.

Because there is precisely 1 number that needs to be stored in the
superblock, and there seems no point having a superblock just to store
one number.
It isn't much extra complexity, but any extra thing is still an extra
thing.
Having the data section of the log device containing just a log is
elegant.  Elegant is good.
If we decided that keeping two copies for superblocks was a good idea
(which I think it is, I just haven't created a "v1.3" layout yet), then
re-using the main superblock for the head-of-log pointer would instantly
give us two copies of that as well.

NeilBrown

next prev parent reply	other threads:[~2015-07-15  4:06 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-23 21:37 [PATCH V4 00/13] MD: a caching layer for raid5/6 Shaohua Li
2015-06-23 21:37 ` [PATCH V4 01/13] MD: add a new disk role to present cache device Shaohua Li
2015-06-23 21:37 ` [PATCH V4 02/13] raid5: directly use mddev->queue Shaohua Li
2015-06-23 21:37 ` [PATCH V4 03/13] raid5: cache log handling Shaohua Li
2015-06-23 21:37 ` [PATCH V4 04/13] raid5: cache part of raid5 cache Shaohua Li
2015-06-23 21:37 ` [PATCH V4 05/13] raid5: cache reclaim support Shaohua Li
2015-06-23 21:37 ` [PATCH V4 06/13] raid5: cache IO error handling Shaohua Li
2015-06-23 21:37 ` [PATCH V4 07/13] raid5: cache device quiesce support Shaohua Li
2015-06-23 21:37 ` [PATCH V4 08/13] raid5: cache recovery support Shaohua Li
2015-06-23 21:37 ` [PATCH V4 09/13] raid5: add some sysfs entries Shaohua Li
2015-06-23 21:38 ` [PATCH V4 10/13] raid5: don't allow resize/reshape with cache support Shaohua Li
2015-06-23 21:38 ` [PATCH V4 11/13] raid5: guarantee cache release stripes in correct way Shaohua Li
2015-06-23 21:38 ` [PATCH V4 12/13] raid5: enable cache for raid array with cache disk Shaohua Li
2015-06-23 21:38 ` [PATCH V4 13/13] raid5: skip resync if caching is enabled Shaohua Li
2015-07-02  3:25 ` [PATCH V4 00/13] MD: a caching layer for raid5/6 Yuanhan Liu
2015-07-02 17:11   ` Shaohua Li
2015-07-03  2:18     ` Yuanhan Liu
2015-07-08  1:56 ` NeilBrown
2015-07-08  5:44   ` Shaohua Li
2015-07-09 23:21     ` NeilBrown
2015-07-10  4:08       ` Shaohua Li
2015-07-10  4:36         ` NeilBrown
2015-07-10  4:52           ` Shaohua Li
2015-07-10  5:10             ` NeilBrown
2015-07-10  5:18               ` Shaohua Li
2015-07-10  6:42                 ` NeilBrown
2015-07-10 17:48                   ` Shaohua Li
2015-07-13 22:22                     ` NeilBrown
2015-07-13 22:35                       ` Shaohua Li
2015-07-15  0:45           ` Shaohua Li
2015-07-15  2:12             ` NeilBrown
2015-07-15  3:16               ` Shaohua Li
2015-07-15  4:06                 ` NeilBrown [this message]
2015-07-15 19:49                   ` Shaohua Li
2015-07-15 23:16                     ` NeilBrown
2015-07-16  0:07                       ` Shaohua Li
2015-07-16  1:22                         ` NeilBrown
2015-07-16  4:13                           ` Shaohua Li
2015-07-16  6:07                             ` NeilBrown
2015-07-16 15:07                               ` John Stoffel
2015-07-20  0:03                                 ` NeilBrown
2015-07-20 14:11                                   ` John Stoffel
2015-07-16 17:40                               ` Shaohua Li
2015-07-17  3:47                                 ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150715140641.3c61cf78@noble \
    --to=neilb@suse.com \
    --cc=Kernel-team@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.