linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Brassow <jbrassow@redhat.com>
To: Phillip Susi <psusi@cfl.rr.com>
Cc: linux-raid@vger.kernel.org, Kergon Alasdair <agk@redhat.com>,
	Mike Snitzer <msnitzer@redhat.com>
Subject: Re: [dm-devel] [PATCH v3 0/8] dm-raid (raid456) target
Date: Thu, 6 Jan 2011 14:59:12 -0600	[thread overview]
Message-ID: <3AD00905-D14B-4D3F-AB6C-06E0DC8009CD@redhat.com> (raw)
In-Reply-To: <4D25E617.70509@cfl.rr.com>


On Jan 6, 2011, at 9:56 AM, Phillip Susi wrote:

> On 1/6/2011 5:46 AM, NeilBrown wrote:
>> 3:	<#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN>
>
> Let me get this straight.  You specify a separate device to hold the
> metadata and write intent bitmap for each data device?  So for a 3  
> disk
> raid 5, lvm will need to create two logical volumes on each of the 3
> physical volumes, one of which will only be a single physical extent,
> and will hold the raid metadata and write intent bitmap?
>
> Why not just store the metadata on the main device like mdadm does  
> today?

There is no single big reason to do things as I've propose, just a lot  
of little reasons...

1) Device-mapper already has a few cases where metadata is kept on  
separate devices from the data (snapshots and mirror log) and no cases  
where they are kept together.  This new raid module is similar to the  
mirroring case, where bitmaps are kept separately.

2) It seems a bit funny to specify a length (second param of the  
device-mapper CTR) and then expect the devices to be larger than their  
share of that amount to accommodate metadata.  You might say it is  
funny to have to specify a separate device to hold the metadata, but I  
would again give the mirror log as an example.

3) Where multiple physical devices form a single leg/component of the  
array, the argument for having a metadata device specifically tied to  
its data device as an indivisible unit is weakened.

4) Having the metadata on a separate logical device increases the  
flexibility of its placement.  You could have it at the beginning, in  
the middle, or at the end.  (The middle might actually be preferred  
for performance reasons.)  There are no offset calculations to perform  
in the kernel that depend on metadata placement.

5) Resizing an array might require the resizing of the metadata area.   
Because the devices are separate, there is no need to move around data  
or metadata to accommodate this.  If they were mixed in the same  
device and the metadata was at the beginning, that's a problem if the  
metadata no longer fits in its area.  Likewise, if the metadata were  
at the end of a mixed device, you would have to move it when growing.   
These problems are eliminated.

6) The metadata areas are not necessary in every case.  Some raid  
controllers handle the metadata on their own (dm-raid works with  
these).  You might say it is merely another flag on the CTR line to  
indicate whether to use metadata or not.  Perhaps, but having them  
separate means you can easily convert between the two types.

7) Clustering?  Perhaps one of the weaker arguments, but having the  
metadata separate allows it to easily grow to accommodate a bitmap /  
device / node, for example.  This is really the same argument as  
easily being able to reform/resize the metadata area.

8) Bitmaps/superblocks that are updated often could be placed on  
separate devices, like SSDs, while the data is on spinning media.  I'm  
not necessarily advocating this, but if someone wants to do it, I  
think they should be able to.

9) Flexibility for the future.  Imagine a mirror and you'd like to  
split off a leg - the data portion alone becomes the linear device.   
The metadata device could be discarded, or it could be recombined with  
the data device and reinserted into the array - having just the deltas  
be played back from the original mirror that has remained actively in- 
use.

Each of these reasons is not all that compelling in isolation; but  
together, I think they make a pretty good case.  There is additional  
flexibility here; and this is to be sacrificed for what?  A simpler  
CTR line?  I don't know of anyone who enters these by hand without  
instead using LVM, dm-raid, multipath, etc.  MD does it this way?   
Well, this is device-mapper and it has its own idiosyncrasies and  
precedents.

Also, I understand what you mean by your final question, but for those  
who are new to this I'd like to point out that we /are/ storing the  
metadata on the main physical device, but not the same logical  
device.  [Again, this will be the rule, but is flexible.]

brassow

  reply	other threads:[~2011-01-06 20:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-21  2:37 [PATCH v3 0/8] dm-raid (raid456) target Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 1/8] md/bitmap: revert dm-dirty-log preparation Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 2/8] md/bitmap: use DIV_ROUND_UP in bitmap_init_from_disk Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 3/8] md/raid5: use sysfs_notify_dirent_safe to avoid NULL pointer Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 4/8] dm raid: skeleton raid456 target support Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 5/8] dm: introduce target callbacks and congestion callback Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 6/8] dm: per-target unplug callback support Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 7/8] dm raid: add iterate_devices and io_hints functions Mike Snitzer
2010-12-21  2:37 ` [PATCH v3 8/8] dm raid: add suspend and resume functions Mike Snitzer
2010-12-21  3:14 ` [PATCH v3 0/8] dm-raid (raid456) target Neil Brown
2011-01-05 22:36   ` Mike Snitzer
2011-01-06 10:46     ` NeilBrown
2011-01-06 14:43       ` Jonathan Brassow
2011-01-10  0:37         ` NeilBrown
2011-01-10 20:14           ` Jonathan Brassow
2011-01-06 15:56       ` [dm-devel] " Phillip Susi
2011-01-06 20:59         ` Jonathan Brassow [this message]
2011-01-06 21:01       ` Jonathan Brassow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3AD00905-D14B-4D3F-AB6C-06E0DC8009CD@redhat.com \
    --to=jbrassow@redhat.com \
    --cc=agk@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=msnitzer@redhat.com \
    --cc=psusi@cfl.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).