From: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
To: <kreijack@inwind.it>, <1i5t5.duncan@cox.net>, <rwhite@pobox.com>,
<lists@colorremedies.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2 1/3] Btrfs: get more accurate output in df command.
Date: Sat, 13 Dec 2014 17:57:38 +0800 [thread overview]
Message-ID: <548C0D92.2030608@cn.fujitsu.com> (raw)
In-Reply-To: <548B2D34.9060509@inwind.it>
On 12/13/2014 02:00 AM, Goffredo Baroncelli wrote:
> On 12/11/2014 09:31 AM, Dongsheng Yang wrote:
>> When function btrfs_statfs() calculate the tatol size of fs, it is calculating
>> the total size of disks and then dividing it by a factor. But in some usecase,
>> the result is not good to user.
> I am checking it; to me it seems a good improvement. However
> I noticed that df now doesn't seem to report anymore the space consumed
> by the meta-data chunk; eg:
>
> # I have two disks of 5GB each
> $ sudo ~/mkfs.btrfs -f -m raid1 -d raid1 /dev/vgtest/disk /dev/vgtest/disk1
>
> $ df -h /mnt/btrfs1/
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/vgtest-disk 5.0G 1.1G 4.0G 21% /mnt/btrfs1
>
> $ sudo btrfs fi show
> Label: none uuid: 884414c6-9374-40af-a5be-3949cdf6ad0b
> Total devices 2 FS bytes used 640.00KB
> devid 2 size 5.00GB used 2.01GB path /dev/dm-1
> devid 1 size 5.00GB used 2.03GB path /dev/dm-0
>
> $ sudo ./btrfs fi df /mnt/btrfs1/
> Data, RAID1: total=1.00GiB, used=512.00KiB
> Data, single: total=8.00MiB, used=0.00B
> System, RAID1: total=8.00MiB, used=16.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=1.00GiB, used=112.00KiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=16.00MiB, used=0.00B
>
> In this case the filesystem is empty (it was a new filesystem !). However a
> 1G metadata chunk was already allocated. This is the reasons why the free
> space is only 4Gb.
Actually, in the original btrfs_statfs(), the space in metadata chunk
was also not be considered
as available. But you will get 5G in this case with original
btrfs_statfs() because there is a bug
in it. As I use another implementation to calculate the @available in df
command, I did not
mention this problem. I can describe it as below.
In original btrfs_statfs(), we only consider the free space in data
chunk as available.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
list_for_each_entry_rcu(found, head, list) {
if (found->flags & BTRFS_BLOCK_GROUP_DATA) {
int i;
total_free_data += found->disk_total -
found->disk_used;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In the later, we will add the total_free_data to @available in output of df.
buf->f_bavail = total_free_data;
BUT:
This is incorrect! It should be:
buf->f_bavail = div_u64(total_free_data, factor);
That said this bug will add one more (data_chunk_disk_total -
data_chunk_disk_used = 1G)
to @available by mistake. Unfortunately, the free space in
metadata_chunk is also 1G.
Then we will get 5G in this case you provided here.
Conclusion:
Even in the original btrfs_statfs(), the free space in metadata is
also considered as not available.
But one bug in it make it to be misunderstood. My new btrfs_statfs()
still consider the free space
in metadata as not available, furthermore, I add it into @used.
>
> On my system the ratio metadata/data is 234MB/8.82GB = ~3%, so ignoring the
> metadata chunk from the free space may not be a big problem.
>
>
>
>
>
>> Example:
>> # mkfs.btrfs -f /dev/vdf1 /dev/vdf2 -d raid1
>> # mount /dev/vdf1 /mnt
>> # dd if=/dev/zero of=/mnt/zero bs=1M count=1000
>> # df -h /mnt
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/vdf1 3.0G 1018M 1.3G 45% /mnt
>> # btrfs fi show /dev/vdf1
>> Label: none uuid: f85d93dc-81f4-445d-91e5-6a5cd9563294
>> Total devices 2 FS bytes used 1001.53MiB
>> devid 1 size 2.00GiB used 1.85GiB path /dev/vdf1
>> devid 2 size 4.00GiB used 1.83GiB path /dev/vdf2
>> a. df -h should report Size as 2GiB rather than as 3GiB.
>> Because this is 2 device raid1, the limiting factor is devid 1 @2GiB.
>>
>> b. df -h should report Avail as 0.97GiB or less, rather than as 1.3GiB.
>> 1.85 (the capacity of the allocated chunk)
>> -1.018 (the file stored)
>> +(2-1.85=0.15) (the residual capacity of the disks
>> considering a raid1 fs)
>> ---------------
>> = 0.97
>>
>> This patch drops the factor at all and calculate the size observable to
>> user without considering which raid level the data is in and what's the
>> size exactly in disk.
>> After this patch applied:
>> # mkfs.btrfs -f /dev/vdf1 /dev/vdf2 -d raid1
>> # mount /dev/vdf1 /mnt
>> # dd if=/dev/zero of=/mnt/zero bs=1M count=1000
>> # df -h /mnt
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/vdf1 2.0G 1.3G 713M 66% /mnt
>> # df /mnt
>> Filesystem 1K-blocks Used Available Use% Mounted on
>> /dev/vdf1 2097152 1359424 729536 66% /mnt
>> # btrfs fi show /dev/vdf1
>> Label: none uuid: e98c1321-645f-4457-b20d-4f41dc1cf2f4
>> Total devices 2 FS bytes used 1001.55MiB
>> devid 1 size 2.00GiB used 1.85GiB path /dev/vdf1
>> devid 2 size 4.00GiB used 1.83GiB path /dev/vdf2
>> a). The @Size is 2G as we expected.
>> b). @Available is 700M = 1.85G - 1.3G + (2G - 1.85G).
>> c). @Used is changed to 1.3G rather than 1018M as above. Because
>> this patch do not treat the free space in metadata chunk
>> and system chunk as available to user. It's true, user can
>> not use these space to store data, then it should not be
>> thought as available. At the same time, it will make the
>> @Used + @Available == @Size as possible to user.
>>
>> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
>> ---
>> Changelog:
>> v1 -> v2:
>> a. account the total_bytes in medadata chunk and
>> system chunk as used to user.
>> b. set data_stripes to the correct value in RAID0.
>>
>> fs/btrfs/extent-tree.c | 13 ++----------
>> fs/btrfs/super.c | 56 ++++++++++++++++++++++----------------------------
>> 2 files changed, 26 insertions(+), 43 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index a84e00d..9954d60 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -8571,7 +8571,6 @@ static u64 __btrfs_get_ro_block_group_free_space(struct list_head *groups_list)
>> {
>> struct btrfs_block_group_cache *block_group;
>> u64 free_bytes = 0;
>> - int factor;
>>
>> list_for_each_entry(block_group, groups_list, list) {
>> spin_lock(&block_group->lock);
>> @@ -8581,16 +8580,8 @@ static u64 __btrfs_get_ro_block_group_free_space(struct list_head *groups_list)
>> continue;
>> }
>>
>> - if (block_group->flags & (BTRFS_BLOCK_GROUP_RAID1 |
>> - BTRFS_BLOCK_GROUP_RAID10 |
>> - BTRFS_BLOCK_GROUP_DUP))
>> - factor = 2;
>> - else
>> - factor = 1;
>> -
>> - free_bytes += (block_group->key.offset -
>> - btrfs_block_group_used(&block_group->item)) *
>> - factor;
>> + free_bytes += block_group->key.offset -
>> + btrfs_block_group_used(&block_group->item);
>>
>> spin_unlock(&block_group->lock);
>> }
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 54bd91e..40f41a2 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1641,6 +1641,8 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes)
>> u64 used_space;
>> u64 min_stripe_size;
>> int min_stripes = 1, num_stripes = 1;
>> + /* How many stripes used to store data, without considering mirrors. */
>> + int data_stripes = 1;
>> int i = 0, nr_devices;
>> int ret;
>>
>> @@ -1657,12 +1659,15 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes)
>> if (type & BTRFS_BLOCK_GROUP_RAID0) {
>> min_stripes = 2;
>> num_stripes = nr_devices;
>> + data_stripes = num_stripes;
>> } else if (type & BTRFS_BLOCK_GROUP_RAID1) {
>> min_stripes = 2;
>> num_stripes = 2;
>> + data_stripes = 1;
>> } else if (type & BTRFS_BLOCK_GROUP_RAID10) {
>> min_stripes = 4;
>> num_stripes = 4;
>> + data_stripes = 2;
>> }
>>
>> if (type & BTRFS_BLOCK_GROUP_DUP)
>> @@ -1733,14 +1738,17 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes)
>> i = nr_devices - 1;
>> avail_space = 0;
>> while (nr_devices >= min_stripes) {
>> - if (num_stripes > nr_devices)
>> + if (num_stripes > nr_devices) {
>> num_stripes = nr_devices;
>> + if (type & BTRFS_BLOCK_GROUP_RAID0)
>> + data_stripes = num_stripes;
>> + }
>>
>> if (devices_info[i].max_avail >= min_stripe_size) {
>> int j;
>> u64 alloc_size;
>>
>> - avail_space += devices_info[i].max_avail * num_stripes;
>> + avail_space += devices_info[i].max_avail * data_stripes;
>> alloc_size = devices_info[i].max_avail;
>> for (j = i + 1 - num_stripes; j <= i; j++)
>> devices_info[j].max_avail -= alloc_size;
>> @@ -1772,15 +1780,13 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes)
>> static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>> {
>> struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
>> - struct btrfs_super_block *disk_super = fs_info->super_copy;
>> struct list_head *head = &fs_info->space_info;
>> struct btrfs_space_info *found;
>> u64 total_used = 0;
>> + u64 total_alloc = 0;
>> u64 total_free_data = 0;
>> int bits = dentry->d_sb->s_blocksize_bits;
>> __be32 *fsid = (__be32 *)fs_info->fsid;
>> - unsigned factor = 1;
>> - struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
>> int ret;
>>
>> /*
>> @@ -1792,38 +1798,20 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>> rcu_read_lock();
>> list_for_each_entry_rcu(found, head, list) {
>> if (found->flags & BTRFS_BLOCK_GROUP_DATA) {
>> - int i;
>> -
>> - total_free_data += found->disk_total - found->disk_used;
>> + total_free_data += found->total_bytes - found->bytes_used;
>> total_free_data -=
>> btrfs_account_ro_block_groups_free_space(found);
>> -
>> - for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
>> - if (!list_empty(&found->block_groups[i])) {
>> - switch (i) {
>> - case BTRFS_RAID_DUP:
>> - case BTRFS_RAID_RAID1:
>> - case BTRFS_RAID_RAID10:
>> - factor = 2;
>> - }
>> - }
>> - }
>> + total_used += found->bytes_used;
>> + } else {
>> + /* For metadata and system, we treat the total_bytes as
>> + * not available to file data. So show it as Used in df.
>> + **/
>> + total_used += found->total_bytes;
>> }
>> -
>> - total_used += found->disk_used;
>> + total_alloc += found->total_bytes;
>> }
>> -
>> rcu_read_unlock();
>>
>> - buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super), factor);
>> - buf->f_blocks >>= bits;
>> - buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >> bits);
>> -
>> - /* Account global block reserve as used, it's in logical size already */
>> - spin_lock(&block_rsv->lock);
>> - buf->f_bfree -= block_rsv->size >> bits;
>> - spin_unlock(&block_rsv->lock);
>> -
>> buf->f_bavail = total_free_data;
>> ret = btrfs_calc_avail_data_space(fs_info->tree_root, &total_free_data);
>> if (ret) {
>> @@ -1831,8 +1819,12 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>> return ret;
>> }
>> - buf->f_bavail += div_u64(total_free_data, factor);
>> + buf->f_bavail += total_free_data;
>> buf->f_bavail = buf->f_bavail >> bits;
>> + buf->f_blocks = total_alloc + total_free_data;
>> + buf->f_blocks >>= bits;
>> + buf->f_bfree = buf->f_blocks - (total_used >> bits);
>> +
>> mutex_unlock(&fs_info->chunk_mutex);
>> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>
>>
>
next prev parent reply other threads:[~2014-12-13 10:00 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-11 8:31 [PATCH v2 1/3] Btrfs: get more accurate output in df command Dongsheng Yang
2014-12-11 8:31 ` [PATCH v2 2/3] Btrfs: raid56: simplify the parameter of nr_parity_stripes() Dongsheng Yang
2014-12-16 6:21 ` Satoru Takeuchi
2014-12-11 8:31 ` [PATCH v2 3/3] Btrfs: adapt df command to RAID5/6 Dongsheng Yang
2014-12-12 18:00 ` [PATCH v2 1/3] Btrfs: get more accurate output in df command Goffredo Baroncelli
2014-12-13 0:50 ` Duncan
2014-12-13 10:21 ` Dongsheng Yang
2014-12-13 9:57 ` Dongsheng Yang [this message]
2014-12-12 19:25 ` Goffredo Baroncelli
2014-12-14 11:29 ` Dongsheng Yang
[not found] ` <CABmMA7tw9BDsBXGHLO4vjcO4gaYmZPb_BQV8w22griqFvCJpPA@mail.gmail.com>
2014-12-14 14:32 ` Grzegorz Kowal
2014-12-15 1:21 ` Dongsheng Yang
2014-12-15 6:06 ` Robert White
2014-12-15 7:49 ` Robert White
2014-12-15 8:26 ` Dongsheng Yang
2014-12-15 9:36 ` Robert White
2014-12-16 3:30 ` Standards Problems [Was: [PATCH v2 1/3] Btrfs: get more accurate output in df command.] Robert White
2014-12-16 3:52 ` Robert White
2014-12-16 11:30 ` Dongsheng Yang
2014-12-16 13:24 ` Dongsheng Yang
2014-12-16 19:52 ` Robert White
2014-12-17 11:38 ` Dongsheng Yang
2014-12-18 4:07 ` Robert White
2014-12-18 8:02 ` Duncan
2014-12-23 12:31 ` Dongsheng Yang
2014-12-27 1:10 ` Robert White
2015-01-05 9:59 ` Dongsheng Yang
2014-12-31 0:15 ` Zygo Blaxell
2015-01-05 9:56 ` Dongsheng Yang
2015-01-05 10:07 ` [PATCH v2 1/3] Btrfs: get more accurate output in df command Dongsheng Yang
2015-01-05 10:07 ` [PATCH v2 2/3] Btrfs: raid56: simplify the parameter of nr_parity_stripes() Dongsheng Yang
2015-01-05 10:07 ` [PATCH v2 3/3] Btrfs: adapt df command to RAID5/6 Dongsheng Yang
2014-12-19 3:32 ` [PATCH v2 1/3] Btrfs: get more accurate output in df command Zygo Blaxell
[not found] ` <548F1EA7.9050505@inwind.it>
2014-12-16 13:47 ` Dongsheng Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=548C0D92.2030608@cn.fujitsu.com \
--to=yangds.fnst@cn.fujitsu.com \
--cc=1i5t5.duncan@cox.net \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=rwhite@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).