From: Anand Jain <anand.jain@oracle.com>
To: bo.li.liu@oracle.com, Nikolay Borisov <nborisov@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] Btrfs: avoid losing data raid profile when deleting a device
Date: Mon, 16 Oct 2017 12:22:44 +0800 [thread overview]
Message-ID: <9d30d375-d0e9-fa29-84df-d07a303259e9@oracle.com> (raw)
In-Reply-To: <20171013205112.GA17177@lim.localdomain>
On 10/14/2017 04:51 AM, Liu Bo wrote:
> On Wed, Oct 11, 2017 at 10:38:50AM +0300, Nikolay Borisov wrote:
>>
>>
>> On 10.10.2017 20:53, Liu Bo wrote:
>>> We've avoided data losing raid profile when doing balance, but it
>>> turns out that deleting a device could also result in the same
>>> problem
>>>
>>> This fixes the problem by creating an empty data chunk before
>>> relocating the data chunk.
>>
>> Why is this needed - copy the metadata of the to-be-relocated chunk into
>> the newly created empty chunk? I don't entirely understand that code but
>> doesn't this seem a bit like a hack in order to stash some information?
>> Perhaps you could elaborate the logic a bit more in the changelog?
>>
>>>
>>> Metadata/System chunk are supposed to have non-zero bytes all the time
>>> so their raid profile is persistent.
>>
>> I think this changelog is a bit scarce on detail as to the culprit of
>> the problem. Could you perhaps put a sentence or two what the underlying
>> logic which deletes the raid profile if a chunk is empty ?
>>
>
> Fair enough.
>
> The problem is as same as what commit 2c9fe8355258 ("btrfs: Fix
> lost-data-profile caused by balance bg") had fixed.
>
> Similar to doing balance, deleting a device can also move all chunks
> on this disk to other available disks, after 'move' successfully,
> it'll remove those chunks.
>
> If our last data chunk is empty and part of it happens to be on this
> disk, then there is no data chunk in this btrfs after deleting the
> device successfully, any following write will try to create a new data
> chunk which ends up with a single data chunk because the only
> available data raid profile is 'single'.
So you are referring to a raid1 group profile which contains 3 or more
devices otherwise single group file is what it will fit ? Is there
reproducer ?
Thanks, Anand
> thanks,
> -liubo
>
>>>
>>> Reported-by: James Alandt <James.Alandt@wdc.com>
>>> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
>>> ---
>>>
>>> v2: - return the correct error.
>>> - move helper ahead of __btrfs_balance().
>>>
>>> fs/btrfs/volumes.c | 84 ++++++++++++++++++++++++++++++++++++++++++------------
>>> 1 file changed, 65 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 4a72c45..a74396d 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -3018,6 +3018,48 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info)
>>> return ret;
>>> }
>>>
>>> +/*
>>> + * return 1 : allocate a data chunk successfully,
>>> + * return <0: errors during allocating a data chunk,
>>> + * return 0 : no need to allocate a data chunk.
>>> + */
>>> +static int btrfs_may_alloc_data_chunk(struct btrfs_fs_info *fs_info,
>>> + u64 chunk_offset)
>>> +{
>>> + struct btrfs_block_group_cache *cache;
>>> + u64 bytes_used;
>>> + u64 chunk_type;
>>> +
>>> + cache = btrfs_lookup_block_group(fs_info, chunk_offset);
>>> + ASSERT(cache);
>>> + chunk_type = cache->flags;
>>> + btrfs_put_block_group(cache);
>>> +
>>> + if (chunk_type & BTRFS_BLOCK_GROUP_DATA) {
>>> + spin_lock(&fs_info->data_sinfo->lock);
>>> + bytes_used = fs_info->data_sinfo->bytes_used;
>>> + spin_unlock(&fs_info->data_sinfo->lock);
>>> +
>>> + if (!bytes_used) {
>>> + struct btrfs_trans_handle *trans;
>>> + int ret;
>>> +
>>> + trans = btrfs_join_transaction(fs_info->tree_root);
>>> + if (IS_ERR(trans))
>>> + return PTR_ERR(trans);
>>> +
>>> + ret = btrfs_force_chunk_alloc(trans, fs_info,
>>> + BTRFS_BLOCK_GROUP_DATA);
>>> + btrfs_end_transaction(trans);
>>> + if (ret < 0)
>>> + return ret;
>>> +
>>> + return 1;
>>> + }
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> static int insert_balance_item(struct btrfs_fs_info *fs_info,
>>> struct btrfs_balance_control *bctl)
>>> {
>>> @@ -3476,7 +3518,6 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>> u32 count_meta = 0;
>>> u32 count_sys = 0;
>>> int chunk_reserved = 0;
>>> - u64 bytes_used = 0;
>>>
>>> /* step one make some room on all the devices */
>>> devices = &fs_info->fs_devices->devices;
>>> @@ -3635,28 +3676,21 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
>>> goto loop;
>>> }
>>>
>>> - ASSERT(fs_info->data_sinfo);
>>> - spin_lock(&fs_info->data_sinfo->lock);
>>> - bytes_used = fs_info->data_sinfo->bytes_used;
>>> - spin_unlock(&fs_info->data_sinfo->lock);
>>> -
>>> - if ((chunk_type & BTRFS_BLOCK_GROUP_DATA) &&
>>> - !chunk_reserved && !bytes_used) {
>>> - trans = btrfs_start_transaction(chunk_root, 0);
>>> - if (IS_ERR(trans)) {
>>> - mutex_unlock(&fs_info->delete_unused_bgs_mutex);
>>> - ret = PTR_ERR(trans);
>>> - goto error;
>>> - }
>>> -
>>> - ret = btrfs_force_chunk_alloc(trans, fs_info,
>>> - BTRFS_BLOCK_GROUP_DATA);
>>> - btrfs_end_transaction(trans);
>>> + if (!chunk_reserved) {
>>> + /*
>>> + * We may be relocating the only data chunk we have,
>>> + * which could potentially end up with losing data's
>>> + * raid profile, so lets allocate an empty one in
>>> + * advance.
>>> + */
>>> + ret = btrfs_may_alloc_data_chunk(fs_info,
>>> + found_key.offset);
>>> if (ret < 0) {
>>> mutex_unlock(&fs_info->delete_unused_bgs_mutex);
>>> goto error;
>>> + } else if (ret == 1) {
>>> + chunk_reserved = 1;
>>> }
>>> - chunk_reserved = 1;
>>> }
>>>
>>> ret = btrfs_relocate_chunk(fs_info, found_key.offset);
>>> @@ -4419,6 +4453,18 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
>>> chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent);
>>> btrfs_release_path(path);
>>>
>>> + /*
>>> + * We may be relocating the only data chunk we have,
>>> + * which could potentially end up with losing data's
>>> + * raid profile, so lets allocate an empty one in
>>> + * advance.
>>> + */
>>> + ret = btrfs_may_alloc_data_chunk(fs_info, chunk_offset);
>>> + if (ret < 0) {
>>> + mutex_unlock(&fs_info->delete_unused_bgs_mutex);
>>> + goto done;
>>> + }
>>> +
>>> ret = btrfs_relocate_chunk(fs_info, chunk_offset);
>>> mutex_unlock(&fs_info->delete_unused_bgs_mutex);
>>> if (ret && ret != -ENOSPC)
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2017-10-16 4:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-09 18:01 [PATCH] Btrfs: avoid losing data raid profile when deleting a device Liu Bo
2017-10-10 6:57 ` Nikolay Borisov
2017-10-10 17:39 ` Liu Bo
2017-10-10 17:53 ` [PATCH v2] " Liu Bo
2017-10-11 7:38 ` Nikolay Borisov
2017-10-13 20:51 ` Liu Bo
2017-10-16 4:22 ` Anand Jain [this message]
2017-10-16 17:26 ` Liu Bo
2017-10-16 8:53 ` Nikolay Borisov
2017-10-30 18:43 ` Liu Bo
2017-11-15 23:28 ` [PATCH v3] " Liu Bo
2018-01-05 18:14 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9d30d375-d0e9-fa29-84df-d07a303259e9@oracle.com \
--to=anand.jain@oracle.com \
--cc=bo.li.liu@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).