linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Su Yue <suy.fnst@cn.fujitsu.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 3/3] btrfs-progs: check: exclude extents of tree blocks and extent items
Date: Tue, 28 Nov 2017 16:40:13 +0800	[thread overview]
Message-ID: <a23d78eb-6eff-f80c-38e0-c70e26b2f729@cn.fujitsu.com> (raw)
In-Reply-To: <d7bf2d14-a2e1-9d27-ac09-80933717d57c@gmx.com>



On 11/28/2017 04:25 PM, Qu Wenruo wrote:
> 
> 
> On 2017年11月28日 15:47, Su Yue wrote:
>>
>>
>> On 11/28/2017 12:05 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2017年11月28日 10:38, Su Yue wrote:
>>>> Thanks for review.
>>>>
>>>> On 11/27/2017 06:37 PM, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2017年11月27日 11:13, Su Yue wrote:
>>>>>> Commit d17d6663c99c ("btrfs-progs: lowmem check: Fix regression which
>>>>>> screws up extent allocator") removes pin_metadata_blocks() from
>>>>>> lowmem repair.
>>>>>> So we have to find another way to exclude extents which should be
>>>>>> occupied by tree blocks.
>>>>>
>>>>> First thing first, any tree block which is referred by will not be
>>>>> freed.
>>>>>
>>>>> Unless some special case, like extent tree initialization which clears
>>>>> the whole extent tree so we can't check if one tree block should be
>>>>> freed using extent tree, there is no need to explicitly pin existing
>>>>> tree blocks.
>>>>>
>>>>> Their creation/freeing is already handled well.
>>>>>
>>>> If I understand the logic of extents correctly:
>>>>
>>>> The extents in free space cache are handled in the way like
>>>> cache_block_group() in extent-tree.c.
>>>> It traverses all extents items then marks all holes free in free space
>>>> cache.
>>>>
>>>> Yes, in a normal situation, extents are already handled well.
>>>> But original and lowmem check try to repair corrupted extent tree.
>>>>
>>>> Suppose a situation:
>>>> 1) Let an extent item of one tree block is corrupted or missed.
>>>> 2) The correct extent in free space cache will be marked as free.
>>>> 3) Next CoW may allocate the "used" extents from free space
>>>>      and insert an extent item.
>>>> 4) Lowmem repair increases refs of the extent item and
>>>>      causes a wrong extent item.
>>>> OR
>>>> 3) Lowmem repair inserts the extent item firstly.
>>>> 4) CoW may allocate the "used" extents from free space.
>>>>      BUG_ON failure of inserting the extent item.
>>>
>>> OK, this explains things, but still not optimal.
>>>
>>> Such behavior should not happen if nothing is exposed.
>>> Pinning down all tree blocks is never a light operation and should be
>>> avoided if possible.
>>>
>>
>> Agreed.
>>
>>> It would be nice to do it when starting a new transaction to modify
>>> the fs.
>>>
>>
>> Now, there is only one transaction while checking chunks and extents
>> because of previous wrong call of pin_metadata_blocks().
>> It's meaningless to put it at start of new transaction now.
> 
> Because you start that transaction under all condition.
> 
> Normally, transaction should only be started if you're sure you will
> modify the filesystem.
> 
> Start them unconditionally is not a preferred behavior.
> 
>>
>> I will split the transaction into many minors. Then lowmem repair will
>> speed up if no error in chunks and extents. But it is just a little
>> optimization.
> 
> Compared to iterating all tree blocks, it's a big optimization.
> 
> Just think of filesystem with several TB metadata.
> 
>>
>>>>
>>>>> Please explain the reason why you need to pin extents first.
>>>>>
>>>> What I do in the patch is like origin mode's.
>>>> Pining extents first ensures every extents(corrupted and uncorrupted)
>>>> will not be rellocated.
>>>
>>> Only extent tree reinit will pin down all metadata block in original
>>> mode while normal repair won't.
>>>
>>> So you're still doing something which is not done in original mode,
>>> which either needs to be explained in detail or fix original mode first.
>>>
>>
>> You are right. I did miss details of how original mode frees extent
>> records.
>>
>> After come thinking, pining extents is still necessary in lowmem
>> repair.
>>
>> Original mode has extent cache to record all extents and free normal
>> extents while traversing tree blocks.
>> After traversal, only corrupted extent records exist in the cache.
>> At this moment, it pins them and start transactions to repair.
>>
>> Lowmem mode does not store anything like extent records. It begins
>> transactions(in further patch) in traversal. We can not guarantee
>> extents allocated from free space is occupied by one tree block which
>> has not been traversed.
>>
>> Although pining extents is heavy and painfully work, it seems
>> inevitable. Otherwise, I have no better idea than pining all
>> extents.
> 
> Firstly, only missing backref in extent tree will cause problem.
> 
> Even we have corrupted backref or extra backref which doesn't have any
> tree block/data referring it, it won't cause a big problem.
> 
> So for lowmem mode, we can do it in a different way, just like seed
> device, we allocate new meta chunks, and only use new meta chunks for
> our tree operations.
> 
> This will avoid any extra tree iteration, and easier to implement AFAIK.
> (Just mark all existing block groups full, to force chunk allocation)
> 

Seems feasible! I will try to implement it.

Thanks,
Su

> Thanks,
> Qu
> 
>>
>> Thanks,
>> Su
>>
>>>>
>>>> Thanks,
>>>> Su
>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>>>
>>>>>> Modify pin_down_tree_blocks() only for code reuse.
>>>>>> So behavior of pin_metadata_blocks() which works with option
>>>>>> 'init-extent-tree' is not influenced.
>>>>>>
>>>>>> Introduce exclude_blocks_and_extent_items() to mark extents of all
>>>>>> tree
>>>>>> blocks dirty in fs_info->excluded_extents. It also calls
>>>>>> exclude_extent_items() to exclude extent_items in extent tree.
>>>>>>
>>>>>> Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
>>>>>> ---
>>>>>>     cmds-check.c | 122
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>>>>>>     1 file changed, 112 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/cmds-check.c b/cmds-check.c
>>>>>> index c7b570bef9c3..f39285c5a3c2 100644
>>>>>> --- a/cmds-check.c
>>>>>> +++ b/cmds-check.c
>>>>>> @@ -13351,6 +13351,7 @@ out:
>>>>>>         return err;
>>>>>>     }
>>>>>>     +static int exclude_blocks_and_extent_items(struct btrfs_fs_info
>>>>>> *fs_info);
>>>>>>     /*
>>>>>>      * Low memory usage version check_chunks_and_extents.
>>>>>>      */
>>>>>> @@ -13363,12 +13364,22 @@ static int
>>>>>> check_chunks_and_extents_v2(struct btrfs_fs_info *fs_info)
>>>>>>         struct btrfs_root *root1;
>>>>>>         struct btrfs_root *root;
>>>>>>         struct btrfs_root *cur_root;
>>>>>> +    struct extent_io_tree excluded_extents;
>>>>>>         int err =;
>>>>>>         int ret;
>>>>>>           root =s_info->fs_root;
>>>>>>           if (repair) {
>>>>>> +        extent_io_tree_init(&excluded_extents);
>>>>>> +        fs_info->excluded_extents =excluded_extents;
>>>>>> +        ret =xclude_blocks_and_extent_items(fs_info);
>>>>>> +        if (ret) {
>>>>>> +            error("failed to exclude tree blocks and extent items");
>>>>>> +            return ret;
>>>>>> +        }
>>>>>> +        reset_cached_block_groups(fs_info);
>>>>>> +
>>>>>>             trans =trfs_start_transaction(fs_info->extent_root, 1);
>>>>>>             if (IS_ERR(trans)) {
>>>>>>                 error("failed to start transaction before check");
>>>>>> @@ -13437,6 +13448,8 @@ out:
>>>>>>                 err |=et;
>>>>>>             else
>>>>>>                 err &=BG_ACCOUNTING_ERROR;
>>>>>> +        extent_io_tree_cleanup(&excluded_extents);
>>>>>> +        fs_info->excluded_extents =ULL;
>>>>>>         }
>>>>>>           if (trans)
>>>>>> @@ -13534,40 +13547,106 @@ init:
>>>>>>         return 0;
>>>>>>     }
>>>>>>     -static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info,
>>>>>> -                struct extent_buffer *eb, int tree_root)
>>>>>> +static int exclude_extent_items(struct btrfs_fs_info *fs_info,
>>>>>> +                struct extent_io_tree *tree)
>>>>>> +{
>>>>>> +    struct btrfs_root *root =s_info->extent_root;
>>>>>> +    struct btrfs_key key;
>>>>>> +    struct btrfs_path path;
>>>>>> +    struct extent_buffer *eb;
>>>>>> +    int slot;
>>>>>> +    int ret;
>>>>>> +    u64 start;
>>>>>> +    u64 end;
>>>>>> +
>>>>>> +    ASSERT(tree);
>>>>>> +    btrfs_init_path(&path);
>>>>>> +    key.objectid =;
>>>>>> +    key.type =;
>>>>>> +    key.offset =;
>>>>>> +
>>>>>> +    ret =trfs_search_slot(NULL, root, &key, &path, 0, 0);
>>>>>> +    if (ret < 0)
>>>>>> +        goto out;
>>>>>> +
>>>>>> +    while (1) {
>>>>>> +        eb =ath.nodes[0];
>>>>>> +        slot =ath.slots[0];
>>>>>> +        btrfs_item_key_to_cpu(eb, &key, slot);
>>>>>> +        if (key.type !=TRFS_EXTENT_ITEM_KEY &&
>>>>>> +            key.type !=TRFS_METADATA_ITEM_KEY)
>>>>>> +            goto next;
>>>>>> +        start =ey.objectid;
>>>>>> +        if (key.type =BTRFS_EXTENT_ITEM_KEY)
>>>>>> +            end =tart + key.offset;
>>>>>> +        else
>>>>>> +            end =tart + fs_info->nodesize;
>>>>>> +
>>>>>> +        set_extent_dirty(tree, start, end - 1);
>>>>>> +next:
>>>>>> +        ret =trfs_next_item(root, &path);
>>>>>> +        if (ret > 0) {
>>>>>> +            ret =;
>>>>>> +            goto out;
>>>>>> +        }
>>>>>> +        if (ret < 0)
>>>>>> +            goto out;
>>>>>> +    }
>>>>>> +out:
>>>>>> +    if (ret)
>>>>>> +        error("failed to exclude extent items");
>>>>>> +    btrfs_release_path(&path);
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int traverse_tree_blocks(struct btrfs_fs_info *fs_info,
>>>>>> +                struct extent_buffer *eb, int tree_root,
>>>>>> +                int pin)
>>>>>>     {
>>>>>>         struct extent_buffer *tmp;
>>>>>>         struct btrfs_root_item *ri;
>>>>>>         struct btrfs_key key;
>>>>>> +    struct extent_io_tree *tree;
>>>>>>         u64 bytenr;
>>>>>>         int level =trfs_header_level(eb);
>>>>>>         int nritems;
>>>>>>         int ret;
>>>>>>         int i;
>>>>>> +    u64 end =b->start + eb->len;
>>>>>>     +    if (pin)
>>>>>> +        tree =fs_info->pinned_extents;
>>>>>> +    else
>>>>>> +        tree =s_info->excluded_extents;
>>>>>>         /*
>>>>>> -     * If we have pinned this block before, don't pin it again.
>>>>>> +     * If we have pinned/excluded this block before, don't do it
>>>>>> again.
>>>>>>          * This can not only avoid forever loop with broken filesystem
>>>>>>          * but also give us some speedups.
>>>>>>          */
>>>>>> -    if (test_range_bit(&fs_info->pinned_extents, eb->start,
>>>>>> -               eb->start + eb->len - 1, EXTENT_DIRTY, 0))
>>>>>> +    if (test_range_bit(tree, eb->start, end - 1, EXTENT_DIRTY, 0))
>>>>>>             return 0;
>>>>>>     -    btrfs_pin_extent(fs_info, eb->start, eb->len);
>>>>>> +    if (pin)
>>>>>> +        btrfs_pin_extent(fs_info, eb->start, eb->len);
>>>>>> +    else
>>>>>> +        set_extent_dirty(tree, eb->start, end - 1);
>>>>>>           nritems =trfs_header_nritems(eb);
>>>>>>         for (i =; i < nritems; i++) {
>>>>>>             if (level =0) {
>>>>>> +            bool is_extent_root;
>>>>>>                 btrfs_item_key_to_cpu(eb, &key, i);
>>>>>>                 if (key.type !=TRFS_ROOT_ITEM_KEY)
>>>>>>                     continue;
>>>>>>                 /* Skip the extent root and reloc roots */
>>>>>> -            if (key.objectid =BTRFS_EXTENT_TREE_OBJECTID ||
>>>>>> -                key.objectid =BTRFS_TREE_RELOC_OBJECTID ||
>>>>>> +            if (key.objectid =BTRFS_TREE_RELOC_OBJECTID ||
>>>>>>                     key.objectid =BTRFS_DATA_RELOC_TREE_OBJECTID)
>>>>>>                     continue;
>>>>>> +            is_extent_root >> +                key.objectid ==
>>>>>> BTRFS_EXTENT_TREE_OBJECTID;
>>>>>> +            /* If pin, skip the extent root */
>>>>>> +            if (pin && is_extent_root)
>>>>>> +                continue;
>>>>>>                 ri =trfs_item_ptr(eb, i, struct btrfs_root_item);
>>>>>>                 bytenr =trfs_disk_root_bytenr(eb, ri);
>>>>>>     @@ -13582,7 +13661,7 @@ static int pin_down_tree_blocks(struct
>>>>>> btrfs_fs_info *fs_info,
>>>>>>                     fprintf(stderr, "Error reading root block\n");
>>>>>>                     return -EIO;
>>>>>>                 }
>>>>>> -            ret =in_down_tree_blocks(fs_info, tmp, 0);
>>>>>> +            ret =raverse_tree_blocks(fs_info, tmp, 0, pin);
>>>>>>                 free_extent_buffer(tmp);
>>>>>>                 if (ret)
>>>>>>                     return ret;
>>>>>> @@ -13601,7 +13680,8 @@ static int pin_down_tree_blocks(struct
>>>>>> btrfs_fs_info *fs_info,
>>>>>>                     fprintf(stderr, "Error reading tree block\n");
>>>>>>                     return -EIO;
>>>>>>                 }
>>>>>> -            ret =in_down_tree_blocks(fs_info, tmp, tree_root);
>>>>>> +            ret =raverse_tree_blocks(fs_info, tmp, tree_root,
>>>>>> +                           pin);
>>>>>>                 free_extent_buffer(tmp);
>>>>>>                 if (ret)
>>>>>>                     return ret;
>>>>>> @@ -13611,6 +13691,12 @@ static int pin_down_tree_blocks(struct
>>>>>> btrfs_fs_info *fs_info,
>>>>>>         return 0;
>>>>>>     }
>>>>>>     +static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info,
>>>>>> +                struct extent_buffer *eb, int tree_root)
>>>>>> +{
>>>>>> +    return traverse_tree_blocks(fs_info, eb, tree_root, 1);
>>>>>> +}
>>>>>> +
>>>>>>     static int pin_metadata_blocks(struct btrfs_fs_info *fs_info)
>>>>>>     {
>>>>>>         int ret;
>>>>>> @@ -13622,6 +13708,22 @@ static int pin_metadata_blocks(struct
>>>>>> btrfs_fs_info *fs_info)
>>>>>>         return pin_down_tree_blocks(fs_info,
>>>>>> fs_info->tree_root->node, 1);
>>>>>>     }
>>>>>>     +static int exclude_tree_blocks(struct btrfs_fs_info *fs_info,
>>>>>> +                struct extent_buffer *eb, int tree_root)
>>>>>> +{
>>>>>> +    return traverse_tree_blocks(fs_info, fs_info->tree_root->node,
>>>>>> 1, 0);
>>>>>> +}
>>>>>> +
>>>>>> +static int exclude_blocks_and_extent_items(struct btrfs_fs_info
>>>>>> *fs_info)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    ret =xclude_extent_items(fs_info, fs_info->excluded_extents);
>>>>>> +    if (ret)
>>>>>> +        return ret;
>>>>>> +    return exclude_tree_blocks(fs_info, fs_info->tree_root->node, 1);
>>>>>> +}
>>>>>> +
>>>>>>     static int reset_block_groups(struct btrfs_fs_info *fs_info)
>>>>>>     {
>>>>>>         struct btrfs_block_group_cache *cache;
>>>>>>
>>>>>
>>>>
>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



      reply	other threads:[~2017-11-28  8:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-27  3:13 [PATCH 0/3] btrfs-progs: check: fix up bugs of lowmem mode Su Yue
2017-11-27  3:13 ` [PATCH 1/3] btrfs-progs: check: release path in repair_extent_data_item() Su Yue
2017-11-27 10:21   ` Qu Wenruo
2017-11-27  3:13 ` [PATCH 2/3] btrfs-progs: check: record returned errors after walk_down_tree_v2() Su Yue
2017-11-27 10:32   ` Qu Wenruo
2017-11-27  3:13 ` [PATCH 3/3] btrfs-progs: check: exclude extents of tree blocks and extent items Su Yue
2017-11-27 10:37   ` Qu Wenruo
2017-11-28  2:38     ` Su Yue
2017-11-28  4:05       ` Qu Wenruo
2017-11-28  7:47         ` Su Yue
2017-11-28  8:25           ` Qu Wenruo
2017-11-28  8:40             ` Su Yue [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a23d78eb-6eff-f80c-38e0-c70e26b2f729@cn.fujitsu.com \
    --to=suy.fnst@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).