From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: Remove the duplicated and sometimes too cautious btrfs_can_relocate()
Date: Thu, 18 Jul 2019 14:18:58 +0300 [thread overview]
Message-ID: <68da954b-98d0-68e2-51ff-75aade830d2c@suse.com> (raw)
In-Reply-To: <20190718054857.8970-1-wqu@suse.com>
On 18.07.19 г. 8:48 ч., Qu Wenruo wrote:
> [BUG]
> The following script can easily cause unexpected ENOSPC:
> umount $dev &> /dev/null
> umount $mnt &> /dev/null
>
> mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>
> mount $dev $mnt -o enospc_debug
>
> for i in $(seq -w 0 511); do
> xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
> done
> sync
>
> btrfs balance start --full $mnt || return 1
> sync
>
> # This will report -ENOSPC
> btrfs balance start --full $mnt || return 1
> umount $mnt
>
> Also, btrfs/156 can also fail due to ENOSPC.
>
> [CAUSE]
> The ENOSPC is reported by btrfs_can_relocate().
>
> In btrfs_can_relocate(), it does the following check:
> - If the block group is empty
> If empty, definitely we can relocate this block group.
> - If we are not the only block group and we have enough space
> Then we can relocate this block group.
>
> Above two checks are completely OK, although I could argue they doesn't
> make much sense, but the following check is vague and even sometimes
> too cautious to cause ENOSPC:
> - If we can allocate a new block group as large as current one.
> If we failed previous two checks, we must pass this to relocate this
> block group.
btrfs_can_relocate chunk requires min_free to be allocatable.
min_free is defined as the used space in the block group being
relocated, which I think is correct. Also I find the logic which
adjusts min_free and dev_min to also be correct. Finally the function
checks whether the device's freespace is fragmented by trying to find a
device chunk with the appropriate size. The question is - can we really
have a device that has enough free space, yet is fragmented such that
find_free_dev_extent fails which results in failing the allocation? I
think the answer is no since we allocate in chunk granularity. What am I missing?
OTOH, in btrfs_inc_block_group_ro we only allocate a chunk if:
a) we are changing raid profiles
b) if inc_block_group_ro fails for our block group.
And for b) I'm a bit puzzled as to what the code is supposed to mean. We have:
num_bytes = cache->key.offset - cache->reserved - cache->pinned -
cache->bytes_super - btrfs_block_group_used(&cache->item);
sinfo_used = btrfs_space_info_used(sinfo, true);
if (sinfo_used + num_bytes + min_allocable_bytes <=
sinfo->total_bytes) {
//set ro
}
This means if the free space in the block group + the used space in the
space info is smaller than the total space in
the space info - make this block group RO. What's the rationale behind that?
next prev parent reply other threads:[~2019-07-18 11:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-18 5:48 [PATCH] btrfs: Remove the duplicated and sometimes too cautious btrfs_can_relocate() Qu Wenruo
2019-07-18 11:16 ` Filipe Manana
2019-07-18 11:23 ` Nikolay Borisov
2019-07-18 11:35 ` Qu Wenruo
2019-07-18 11:18 ` Nikolay Borisov [this message]
2019-07-18 12:48 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=68da954b-98d0-68e2-51ff-75aade830d2c@suse.com \
--to=nborisov@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox