public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	linux-btrfs@vger.kernel.org, kreijack@inwind.it
Subject: Re: [PATCH] btrfs-progs: match the max chunk size to the kernel
Date: Wed, 25 Mar 2020 16:23:31 +0800	[thread overview]
Message-ID: <d8f32f65-70fc-cd18-bf20-e3569b396217@oracle.com> (raw)
In-Reply-To: <6b56d946-922f-c69f-a478-0f7f7244b98f@oracle.com>



On 3/12/20 7:12 PM, Anand Jain wrote:
> On 3/11/20 9:01 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/3/11 下午7:33, Anand Jain wrote:
>>> For chunk type Single, %metadata_profile and %data_profile in
>>> create_raid_groups() is NULL, so we continue to use the initially
>>> created 8MB chunks for both metadata and data.
>>>
>>> 8MB is too small. Kernel default chunk size for type Single is 256MB.
>>> Further the mkfs.btrfs created chunk will stay unless relocated or
>>> cleanup during balance. Consider an ENOSPC case due to 8MB metadata
>>> full.
>>>
>>> I don't see any reason that mkfs.btrfs should create 8MB chunks for
>>> chunk type Single instead it could match it with the kernel allocation
>>> size of 256MB for the chunk type Single.
>>>
>>> For other chunk-types the create_one_raid_group() is called and creates
>>> the required bigger chunks and there is no change with this patch. Also
>>> for fs sizes (-b option) smaller than 256MB there is no issue as the
>>> chunks sizes are 10% of the requested fs-size until the maximum of
>>> 256MB.
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>> The fio in generic/299 is failing to create the files which shall be
>>> deleted in the later part of the test case and the failure happens
>>> only with the MKFS_OPTIONS="-n64K -msingle" only and not with other 
>>> types
>>> of chunks. This is bit inconsistent. And it appears that in case of
>>> Single chunk type it fails to create (or touch) the necessary file
>>> as the metadata is full, so increasing the metadata chunk size to the
>>> sizes as kernel would create will add consistency.
>>
>> Have you tried all existing btrfs-progs test cases?
>> IIRC there are some minimal device related corner cases preventing us
>> from using larger default chunk size.
>>
> 

The test case which checks for the min-size is
   tests/mkfs-tests/010-minimal-size

Which depends on the mkfs to report the min size. Which in turn depends
on the function btrfs_min_dev_size() which does the static calculation
for the minimum size. IMO this can be updated to match with the kernel
allocations as its not a defined API.

But we are creating chunks in the userspace just to inform the kernel
about the user preferred chunk profile types (and Single profile gets
stuck at the initial temporary 8MB chunk).

Anyways I am recalling this patch as Goffredo is working [1] on
the way to communicate the user preferred profiles to the kernel,
which means there is no need to allocate the chunks in the userspace
so indirectly shall fix the single-chunk-size-inconsistency issue
between kernel and progs.

[1]
https://www.spinics.net/lists/linux-btrfs/msg99915.html

Thanks, Anand

>   I forgot btrfs-progs tests. Will run them.
> 
>> Despite that, for generic/299 error, I believe it should be more
>> appropriate to address the problem in ticket space system
> 
>   Agreed.
> 
>> other than
>> initial metadata chunk size.
> 
>> As btrfs can do metadata overcommit as long as we have enough
>> unallocated space, thus the initial chunk size should make minimal 
>> impact.
> 
>   IMO problem is if all the unallocated space has been occupied by
>   the data chunks leading to enospc for the metadata then we have an
>   imbalance which we didn't design in the kernel.
> 
>   To further debug enospc issues, the approach should be an ability to
>   tune chunk sizes on demand. Generally inconsistency makes debugging
>   more difficult. IMO it ok to fix the inconsistency in the chunk sizes.
> 
> Thanks, Anand
> 
>> But don't get me wrong, I'm pretty fine with the unified minimal chunk 
>> size.
>> Just don't believe it's the proper fix for your problem, and want to be
>> extra safe before we hit some strange problems.
>>
>> Thanks,
>> Qu
>>
>>>
>>>   volumes.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/volumes.c b/volumes.c
>>> index b46bf5988a95..d56f2fc897e3 100644
>>> --- a/volumes.c
>>> +++ b/volumes.c
>>> @@ -1004,7 +1004,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle 
>>> *trans,
>>>       struct list_head *cur;
>>>       struct map_lookup *map;
>>>       int min_stripe_size = SZ_1M;
>>> -    u64 calc_size = SZ_8M;
>>> +    u64 calc_size = SZ_256M;
>>>       u64 min_free;
>>>       u64 max_chunk_size = 4 * calc_size;
>>>       u64 avail = 0;
>>>
>>
> 

      reply	other threads:[~2020-03-25  8:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-11 11:33 [PATCH] btrfs-progs: match the max chunk size to the kernel Anand Jain
2020-03-11 13:01 ` Qu Wenruo
2020-03-12 11:12   ` Anand Jain
2020-03-25  8:23     ` Anand Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8f32f65-70fc-cd18-bf20-e3569b396217@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox