From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC 1/3] btrfs: Introduce per-profile available space facility
Date: Mon, 30 Dec 2019 11:14:14 -0500 [thread overview]
Message-ID: <ab05c368-d1ff-1002-0d83-5a8d33973233@toxicpanda.com> (raw)
In-Reply-To: <20191225133938.115733-2-wqu@suse.com>
On 12/25/19 8:39 AM, Qu Wenruo wrote:
> [PROBLEM]
> There are some locations in btrfs requiring accurate estimation on how
> many new bytes can be allocated on unallocated space.
>
> We have two types of estimation:
> - Factor based calculation
> Just use all unallocated space, divide by the profile factor
> One obvious user is can_overcommit().
>
> - Chunk allocator like calculation
> This will emulate the chunk allocator behavior, to get a proper
> estimation.
> The only user is btrfs_calc_avail_data_space(), utilized by
> btrfs_statfs().
> The problem is, that function is not generic purposed enough, can't
> handle things like RAID5/6.
>
> Current factor based calculation can't handle the following case:
> devid 1 unallocated: 1T
> devid 2 unallocated: 10T
> metadata type: RAID1
>
> If using factor, we can use (1T + 10T) / 2 = 5.5T free space for
> metadata.
> But in fact we can only get 1T free space, as we're limited by the
> smallest device for RAID1.
>
> [SOLUTION]
> This patch will introduce the skeleton of per-profile available space
> calculation, which can more-or-less get to the point of chunk allocator.
>
> The difference between it and chunk allocator is mostly on rounding and
> [0, 1M) reserved space handling, which shouldn't cause practical impact.
>
> The newly introduced per-profile available space calculation will
> calculate available space for each type, using chunk-allocator like
> calculation.
>
> With that facility, for above device layout we get the full available
> space array:
> RAID10: 0 (not enough devices)
> RAID1: 1T
> RAID1C3: 0 (not enough devices)
> RAID1C4: 0 (not enough devices)
> DUP: 5.5T
> RAID0: 2T
> SINGLE: 11T
> RAID5: 1T
> RAID6: 0 (not enough devices)
>
> Or for a more complex example:
> devid 1 unallocated: 1T
> devid 2 unallocated: 1T
> devid 3 unallocated: 10T
>
> We will get an array of:
> RAID10: 0 (not enough devices)
> RAID1: 2T
> RAID1C3: 1T
> RAID1C4: 0 (not enough devices)
> DUP: 6T
> RAID0: 3T
> SINGLE: 12T
> RAID5: 2T
> RAID6: 0 (not enough devices)
>
> And for the each profile , we go chunk allocator level calculation:
> The code code looks like:
>
> clear_virtual_used_space_of_all_rw_devices();
> do {
> /*
> * The same as chunk allocator, despite used space,
> * we also take virtual used space into consideration.
> */
> sort_device_with_virtual_free_space();
>
> /*
> * Unlike chunk allocator, we don't need to bother hole/stripe
> * size, so we use the smallest device to make sure we can
> * allocated as many stripes as regular chunk allocator
> */
> stripe_size = device_with_smallest_free->avail_space;
>
> /*
> * Allocate a virtual chunk, allocated virtual chunk will
> * increase virtual used space, allow next iteration to
> * properly emulate chunk allocator behavior.
> */
> ret = alloc_virtual_chunk(stripe_size, &allocated_size);
> if (ret == 0)
> avail += allocated_size;
> } while (ret == 0)
>
> As we always select the device with least free space (just like chunk
> allocator), for above 1T + 10T device, we will allocate a 1T virtual chunk
> in the first iteration, then run out of device in next iteration.
>
> Thus only get 1T free space for RAID1 type, just like what chunk
> allocator would do.
>
> This patch is just the skeleton, we only do the per-profile chunk
> calculation at mount time.
>
> Later commits will update per-profile available space at other proper
> timings.
>
> Suggested-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
next prev parent reply other threads:[~2019-12-30 16:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-25 13:39 [PATCH RFC 0/3] Introduce per-profile available space array to avoid over-confident can_overcommit() Qu Wenruo
2019-12-25 13:39 ` [PATCH RFC 1/3] btrfs: Introduce per-profile available space facility Qu Wenruo
2019-12-30 16:14 ` Josef Bacik [this message]
2019-12-25 13:39 ` [PATCH RFC 2/3] btrfs: Update per-profile available space when device size/used space get updated Qu Wenruo
2019-12-30 16:17 ` Josef Bacik
2019-12-31 0:25 ` Qu Wenruo
2019-12-25 13:39 ` [PATCH RFC 3/3] btrfs: space-info: Use per-profile available space in can_overcommit() Qu Wenruo
2019-12-30 16:17 ` Josef Bacik
2019-12-27 18:32 ` [PATCH RFC 0/3] Introduce per-profile available space array to avoid over-confident can_overcommit() Josef Bacik
2019-12-28 1:09 ` Qu Wenruo
2019-12-30 14:29 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab05c368-d1ff-1002-0d83-5a8d33973233@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox