From: Liu Bo <bo.li.liu@oracle.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 06/19] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result.
Date: Thu, 23 Feb 2017 16:37:57 -0800 [thread overview]
Message-ID: <20170224003757.GC24959@lim.localdomain> (raw)
In-Reply-To: <20161226062939.5841-7-quwenruo@cn.fujitsu.com>
On Mon, Dec 26, 2016 at 02:29:26PM +0800, Qu Wenruo wrote:
> Introduce a new function, __btrfs_map_block_v2().
>
> Unlike old btrfs_map_block(), which needs different parameter to handle
> different RAID profile, this new function uses unified btrfs_map_block
> structure to handle all RAID profile in a more meaningful method:
>
> Return physical address along with logical address for each stripe.
>
> For RAID1/Single/DUP (none-stripped):
> result would be like:
> Map block: Logical 128M, Len 10M, Type RAID1, Stripe len 0, Nr_stripes 2
> Stripe 0: Logical 128M, Physical X, Len: 10M Dev dev1
> Stripe 1: Logical 128M, Physical Y, Len: 10M Dev dev2
>
> Result will be as long as possible, since it's not stripped at all.
>
> For RAID0/10 (stripped without parity):
> Result will be aligned to full stripe size:
> Map block: Logical 64K, Len 128K, Type RAID10, Stripe len 64K, Nr_stripes 4
> Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
> Stripe 1: Logical 64K, Physical Y, Len 64K Dev dev2
> Stripe 2: Logical 128K, Physical Z, Len 64K Dev dev3
> Stripe 3: Logical 128K, Physical W, Len 64K Dev dev4
>
> For RAID5/6 (stripped with parity and dev-rotation)
> Result will be aligned to full stripe size:
> Map block: Logical 64K, Len 128K, Type RAID6, Stripe len 64K, Nr_stripes 4
> Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
> Stripe 1: Logical 128K, Physical Y, Len 64K Dev dev2
> Stripe 2: Logical RAID5_P, Physical Z, Len 64K Dev dev3
> Stripe 3: Logical RAID6_Q, Physical W, Len 64K Dev dev4
>
> The new unified layout should be very flex and can even handle things
> like N-way RAID1 (which old mirror_num basic one can't handle well).
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
> volumes.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> volumes.h | 49 +++++++++++++++++
> 2 files changed, 230 insertions(+)
>
> diff --git a/volumes.c b/volumes.c
> index f17bdeed..11d1f0e8 100644
> --- a/volumes.c
> +++ b/volumes.c
> @@ -1593,6 +1593,187 @@ out:
> return 0;
> }
>
> +static inline struct btrfs_map_block *alloc_map_block(int num_stripes)
> +{
> + struct btrfs_map_block *ret;
> + int size;
> +
> + size = sizeof(struct btrfs_map_stripe) * num_stripes +
> + sizeof(struct btrfs_map_block);
> + ret = malloc(size);
> + if (!ret)
> + return NULL;
> + memset(ret, 0, size);
> + return ret;
> +}
> +
> +static int fill_full_map_block(struct map_lookup *map, u64 start, u64 length,
> + struct btrfs_map_block *map_block)
> +{
> + u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
> + u64 bg_start = map->ce.start;
> + u64 bg_end = bg_start + map->ce.size;
> + u64 bg_offset = start - bg_start; /* offset inside the block group */
> + u64 fstripe_logical = 0; /* Full stripe start logical bytenr */
> + u64 fstripe_size = 0; /* Full stripe logical size */
> + u64 fstripe_phy_off = 0; /* Full stripe offset in each dev */
> + u32 stripe_len = map->stripe_len;
> + int sub_stripes = map->sub_stripes;
> + int data_stripes = nr_data_stripes(map);
> + int dev_rotation;
> + int i;
> +
> + map_block->num_stripes = map->num_stripes;
> + map_block->type = profile;
> +
> + /*
> + * Common full stripe data for stripe based profiles
> + */
> + if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10 |
> + BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
> + fstripe_size = stripe_len * data_stripes;
> + if (sub_stripes)
> + fstripe_size /= sub_stripes;
> + fstripe_logical = bg_offset / fstripe_size * fstripe_size +
> + bg_start;
> + fstripe_phy_off = bg_offset / fstripe_size * stripe_len;
> + }
> +
> + switch (profile) {
> + case BTRFS_BLOCK_GROUP_DUP:
> + case BTRFS_BLOCK_GROUP_RAID1:
> + case 0: /* SINGLE */
> + /*
> + * None-stripe mode,(Single, DUP and RAID1)
> + * Just use offset to fill map_block
> + */
> + map_block->stripe_len = 0;
> + map_block->start = start;
> + map_block->length = min(bg_end, start + length) - start;
> + for (i = 0; i < map->num_stripes; i++) {
> + struct btrfs_map_stripe *stripe;
> +
> + stripe = &map_block->stripes[i];
> +
> + stripe->dev = map->stripes[i].dev;
> + stripe->logical = start;
> + stripe->physical = map->stripes[i].physical + bg_offset;
> + stripe->length = map_block->length;
> + }
> + break;
> + case BTRFS_BLOCK_GROUP_RAID10:
> + case BTRFS_BLOCK_GROUP_RAID0:
> + /*
> + * Stripe modes without parity(0 and 10)
> + * Return the whole full stripe
> + */
> +
> + map_block->start = fstripe_logical;
> + map_block->length = fstripe_size;
> + map_block->stripe_len = map->stripe_len;
> + for (i = 0; i < map->num_stripes; i++) {
> + struct btrfs_map_stripe *stripe;
> + u64 cur_offset;
> +
> + /* Handle RAID10 sub stripes */
> + if (sub_stripes)
> + cur_offset = i / sub_stripes * stripe_len;
> + else
> + cur_offset = stripe_len * i;
> + stripe = &map_block->stripes[i];
> +
> + stripe->dev = map->stripes[i].dev;
> + stripe->logical = fstripe_logical + cur_offset;
> + stripe->length = stripe_len;
> + stripe->physical = map->stripes[i].physical +
> + fstripe_phy_off;
Looks like @fstripe_phy_off refers to the start offset of the stripe on devices,
but we may ask for an offset inside the stripe.
Thanks,
-liubo
> + }
> + break;
> + case BTRFS_BLOCK_GROUP_RAID5:
> + case BTRFS_BLOCK_GROUP_RAID6:
> + /*
> + * Stripe modes with parity and device rotation(5 and 6)
> + *
> + * Return the whole full stripe
> + */
> +
> + dev_rotation = (bg_offset / fstripe_size) % map->num_stripes;
> +
> + map_block->start = fstripe_logical;
> + map_block->length = fstripe_size;
> + map_block->stripe_len = map->stripe_len;
> + for (i = 0; i < map->num_stripes; i++) {
> + struct btrfs_map_stripe *stripe;
> + int dest_index;
> + u64 cur_offset = stripe_len * i;
> +
> + stripe = &map_block->stripes[i];
> +
> + dest_index = (i + dev_rotation) % map->num_stripes;
> + stripe->dev = map->stripes[dest_index].dev;
> + stripe->length = stripe_len;
> + stripe->physical = map->stripes[dest_index].physical +
> + fstripe_phy_off;
> + if (i < data_stripes) {
> + /* data stripe */
> + stripe->logical = fstripe_logical +
> + cur_offset;
> + } else if (i == data_stripes) {
> + /* P */
> + stripe->logical = BTRFS_RAID5_P_STRIPE;
> + } else {
> + /* Q */
> + stripe->logical = BTRFS_RAID6_Q_STRIPE;
> + }
> + }
> + break;
> + default:
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
> + u64 length, struct btrfs_map_block **map_ret)
> +{
> + struct cache_extent *ce;
> + struct map_lookup *map;
> + struct btrfs_map_block *map_block;
> + int ret;
> +
> + /* Eearly parameter check */
> + if (!length || !map_ret) {
> + error("wrong parameter for %s", __func__);
> + return -EINVAL;
> + }
> +
> + ce = search_cache_extent(&fs_info->mapping_tree.cache_tree, logical);
> + if (!ce)
> + return -ENOENT;
> + if (ce->start > logical)
> + return -ENOENT;
> +
> + map = container_of(ce, struct map_lookup, ce);
> + /*
> + * Allocate a full map_block anyway
> + *
> + * For write, we need the full map_block anyway.
> + * For read, it will be striped to the needed stripe before returning.
> + */
> + map_block = alloc_map_block(map->num_stripes);
> + if (!map_block)
> + return -ENOMEM;
> + ret = fill_full_map_block(map, logical, length, map_block);
> + if (ret < 0) {
> + free(map_block);
> + return ret;
> + }
> + /* TODO: Remove unrelated map_stripes for READ operation */
> +
> + *map_ret = map_block;
> + return 0;
> +}
> +
> struct btrfs_device *btrfs_find_device(struct btrfs_root *root, u64 devid,
> u8 *uuid, u8 *fsid)
> {
> diff --git a/volumes.h b/volumes.h
> index ee7d56ab..0a575557 100644
> --- a/volumes.h
> +++ b/volumes.h
> @@ -108,6 +108,51 @@ struct map_lookup {
> struct btrfs_bio_stripe stripes[];
> };
>
> +struct btrfs_map_stripe {
> + struct btrfs_device *dev;
> +
> + /*
> + * Logical address of the stripe start.
> + * Caller should check if this logical is the desired map start.
> + * It's possible that the logical is smaller or larger than desired
> + * map range.
> + *
> + * For P/Q stipre, it will be BTRFS_RAID5_P_STRIPE
> + * and BTRFS_RAID6_Q_STRIPE.
> + */
> + u64 logical;
> +
> + u64 physical;
> +
> + /* The length of the stripe */
> + u64 length;
> +};
> +
> +struct btrfs_map_block {
> + /*
> + * The logical start of the whole map block.
> + * For RAID5/6 it will be the bytenr of the full stripe start,
> + * so it's possible that @start is smaller than desired map range
> + * start.
> + */
> + u64 start;
> +
> + /*
> + * The logical length of the map block.
> + * For RAID5/6 it will be total data stripe size
> + */
> + u64 length;
> +
> + /* Block group type */
> + u64 type;
> +
> + /* Stripe length, for non-stripped mode, it will be 0 */
> + u32 stripe_len;
> +
> + int num_stripes;
> + struct btrfs_map_stripe stripes[];
> +};
> +
> #define btrfs_multi_bio_size(n) (sizeof(struct btrfs_multi_bio) + \
> (sizeof(struct btrfs_bio_stripe) * (n)))
> #define btrfs_map_lookup_size(n) (sizeof(struct map_lookup) + \
> @@ -187,6 +232,10 @@ int btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw,
> u64 logical, u64 *length,
> struct btrfs_multi_bio **multi_ret, int mirror_num,
> u64 **raid_map_ret);
> +
> +/* TODO: Use this map_block_v2 to replace __btrfs_map_block() */
> +int __btrfs_map_block_v2(struct btrfs_fs_info *fs_info, int rw, u64 logical,
> + u64 length, struct btrfs_map_block **map_ret);
> int btrfs_next_bg(struct btrfs_mapping_tree *map_tree, u64 *logical,
> u64 *size, u64 type);
> static inline int btrfs_next_bg_metadata(struct btrfs_mapping_tree *map_tree,
> --
> 2.11.0
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-02-24 0:38 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-26 6:29 [PATCH v2 00/19] Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 01/19] btrfs-progs: raid56: Introduce raid56 header for later recovery usage Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 02/19] btrfs-progs: raid56: Introduce tables for RAID6 recovery Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 03/19] btrfs-progs: raid56: Allow raid6 to recover 2 data stripes Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 04/19] btrfs-progs: raid56: Allow raid6 to recover data and p Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 05/19] btrfs-progs: Introduce wrapper to recover raid56 data Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 06/19] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result Qu Wenruo
2017-02-24 0:37 ` Liu Bo [this message]
2017-02-24 0:45 ` Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 07/19] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 08/19] btrfs-progs: csum: Introduce function to read out one data csum Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 09/19] btrfs-progs: scrub: Introduce structures to support fsck scrub for RAID56 Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 10/19] btrfs-progs: scrub: Introduce function to scrub mirror based tree block Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 11/19] btrfs-progs: scrub: Introduce function to scrub mirror based data blocks Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 12/19] btrfs-progs: scrub: Introduce function to scrub one extent Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 13/19] btrfs-progs: scrub: Introduce function to scrub one data stripe Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 14/19] btrfs-progs: scrub: Introduce function to verify parities Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 15/19] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 16/19] btrfs-progs: scrub: Introduce function to recover data parity Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 17/19] btrfs-progs: scrub: Introduce a function to scrub one full stripe Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 18/19] btrfs-progs: scrub: Introduce function to check a whole block group Qu Wenruo
2016-12-26 6:29 ` [PATCH v2 19/19] btrfs-progs: fsck: Introduce offline scrub function Qu Wenruo
2016-12-26 8:42 ` [PATCH v2 00/19] Btrfs offline scrub Qu Wenruo
2016-12-29 18:15 ` [PATCH v2 00/19] Goffredo Baroncelli
2016-12-30 0:40 ` Qu Wenruo
2016-12-30 18:39 ` Goffredo Baroncelli
2017-01-03 0:25 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170224003757.GC24959@lim.localdomain \
--to=bo.li.liu@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).