linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Mark Fasheh <mfasheh@suse.de>
Cc: <linux-btrfs@vger.kernel.org>, Chris Mason <clm@fb.com>,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH RFC 00/14] Accurate qgroup reserve framework
Date: Fri, 11 Sep 2015 08:43:23 +0800	[thread overview]
Message-ID: <55F223AB.1080400@cn.fujitsu.com> (raw)
In-Reply-To: <20150910210104.GS1145@wotan.suse.de>



Mark Fasheh wrote on 2015/09/10 14:01 -0700:
> Hi Qu,
>
> On Tue, Sep 08, 2015 at 04:56:52PM +0800, Qu Wenruo wrote:
>> [[BUG]]
>> One of the most common case to trigger the bug is the following method:
>> 1) Enable quota
>> 2) Limit excl of qgroup 5 to 16M
>> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>>
>> EQUOT will be triggered at about the 8th write.
>
> Does this happen on all kernels with qgroups or is this related to your
> recent rewrite?
All kernels.

My recent rewrite only affects the accounting part (the excl/rfer 
numbers), reserve part is somewhat independent from accounting part.

But I have to admit that, in fact my rewrite introduced some 
incompatibility with old reserve codes.

One of the most obvious one is the hot fix introduced in late 4.2-rc.
And still some hidden one. For example, old reserved space will be freed 
at end_trans() time.

But with new accounting rewrite, we shouldn't do that until 
commit_trans(). As reserved space will be converted into rfer/exel only 
at commit_trans().
If freed too early like old codes, we may have the possibility to exceed 
the limit.

Thankfully, all these will be addressed in the big patchset.
>
>
>> [[CAUSE]]
>> The problem is caused by the fact that qgroup will reserve space even
>> the data space is already reserved.
>>
>> In above reproducer, each time we buffered write [0,2M) qgroup will
>> reserve 2M space, but in fact, at the 1st time, we have already reserved
>> 2M and from then on, we don't need to reserved any data space as we are
>> only writing [0,2M).
>>
>> Also, the reserved space will only be freed *ONCE* when its backref is
>> run at commit_transaction() time.
>>
>> That's causing the reserved space leaking.
>>
>> [[FIX]]
>> The fix is not a simple one, as currently btrfs_qgroup_reserve() follow
>
> Indeed, this is quite a large patch series and I see no testing details from
> you. Can you please at the least provide a single reproducer in the form of
> something that can be added to xfstests?
Like Filipe mentioned, it's already submitted to fstests.

And sorry for not mentioning it in the comment message.

BTW, there will be more test cases coming for qgroup soon, with a lot of 
error exposed in the development of the patchset.
>
>
>> the very bad btrfs space allocating principle:
>>    Allocate as much as you needed, even it's not fully used.
>>
>> So for accurate qgroup reserve, we introduce a completely new framework
>> for data and metadata.
>> 1) Per-inode data reserve map
>>     Now, each inode will have a data reserve map, recording which range
>>     of data is already reserved.
>>     If we are writing a range which is already reserved, we won't need to
>>     reserve space again.
>>
>>     Also, for the fact that qgroup is only accounted at commit_trans(),
>>     for data commit into disc and its metadata is also inserted into
>>     current tree, we should free the data reserved range, but still keep
>>     the reserved space until commit_trans().
>>
>>     So delayed_ref_head will have new members to record how much space is
>>     reserved and free them at commit_trans() time.
>>
>> 2) Per-root metadata reserve counter
>>     For metadata(tree block), it's impossible to know how much space it
>>     will use exactly in advance.
>>     And due to the new qgroup accounting framework, the old
>>     free-at-end-trans may lead to exceeding limit.
>>
>>     So we record how much metadata space is reserved for each root, and
>>     free them at commit_trans() time.
>>     This method is not perfect, but thanks to the compared small size of
>>     metadata, it should be quite good.
>>
>> More detailed info can be found in each commit message and source
>> commend.
>>
>> Qu Wenruo (19):
>>    btrfs: qgroup: New function declaration for new reserve implement
>>    btrfs: qgroup: Implement data_rsv_map init/free functions
>>    btrfs: qgroup: Introduce new function to search most left reserve
>>      range
>>    btrfs: qgroup: Introduce function to insert non-overlap reserve range
>>    btrfs: qgroup: Introduce function to reserve data range per inode
>>    btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
>>    btrfs: qgroup: Introduce function to release reserved range
>>    btrfs: qgroup: Introduce function to release/free reserved data range
>>    btrfs: delayed_ref: Add new function to record reserved space into
>>      delayed ref
>>    btrfs: delayed_ref: release and free qgroup reserved at proper timing
>>    btrfs: qgroup: Introduce new functions to reserve/free metadata
>>    btrfs: qgroup: Use new metadata reservation.
>>    btrfs: extent-tree: Add new verions of btrfs_check_data_free_space
>>    btrfs: Switch to new check_data_free_space
>>    btrfs: fallocate: Add support to accurate qgroup reserve
>>    btrfs: extent-tree: Add new version of btrfs_delalloc_reserve_space
>>    btrfs: extent-tree: Use new __btrfs_delalloc_reserve_space function
>>    btrfs: qgroup: Cleanup old inaccurate facilities
>>    btrfs: qgroup: Add handler for NOCOW and inline
>
> I took a quick look through a few of these, none of them have any trace_*
> functions, yet you're adding several new entrypoints to the qgroup code.
> Those are incredibly useful for debugging on live systems and in fact I've
> got a patch which reintroduces the ones you removed in your last patch
> series ;)
Sounds great.

I was planning to add them later after the patchset merged, but since 
now it's not possible to merge into 4.3, I'll add tracepoints in the 
4.3~4.4 time interval.

BTW, I'm not quite a fan of using trace point to debug, as it's not so 
convenient compared to pr_info method.
And of course, takes more codes than pr_info.
(Yep, I'm quite a lazy bone)

Any good practice to make full use of tracepoint for debugging?

Thanks,
Qu
>
> This time around can you please provde tracepoints for at least your new
> high level entrypoint functions into the qgroup code?
>
> Thanks,
> 	--Mark
>
> --
> Mark Fasheh
>

  parent reply	other threads:[~2015-09-11  0:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-08  8:56 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
2015-09-08  8:56 ` [PATCH 01/19] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
2015-09-09  0:01   ` Tsutomu Itoh
2015-09-08  8:56 ` [PATCH 02/19] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
2015-09-08  8:56 ` [PATCH 03/19] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
2015-09-08  9:01 ` [PATCH 04/19] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
2015-09-09  0:32   ` Tsutomu Itoh
2015-09-08  9:01 ` [PATCH 05/19] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
2015-09-08  9:01 ` [PATCH 06/19] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
2015-09-08  9:02 ` [PATCH 07/19] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
2015-09-08  9:08 ` [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
2015-09-10 23:34   ` Chris Mason
2015-09-11  0:50     ` Qu Wenruo
2015-09-08  9:08 ` [PATCH 08/19] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
2015-09-09  1:05   ` Tsutomu Itoh
2015-09-08  9:08 ` [PATCH 09/19] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
2015-09-08  9:08 ` [PATCH 10/19] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
2015-09-09  1:21   ` Tsutomu Itoh
2015-09-09  1:40     ` Qu Wenruo
2015-09-08  9:08 ` [PATCH 11/19] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
2015-09-08  9:22 ` [PATCH 12/19] btrfs: qgroup: Use new metadata reservation Qu Wenruo
2015-09-08  9:22 ` [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space Qu Wenruo
2015-09-09  1:35   ` Tsutomu Itoh
2015-09-08  9:22 ` [PATCH 14/19] btrfs: Switch to new check_data_free_space Qu Wenruo
2015-09-08  9:22 ` [PATCH 15/19] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
2015-09-09  1:53   ` Tsutomu Itoh
2015-09-08  9:25 ` [PATCH 16/19] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve_space Qu Wenruo
2015-09-08  9:25 ` [PATCH 17/19] btrfs: extent-tree: Use new __btrfs_delalloc_reserve_space function Qu Wenruo
2015-09-08  9:25 ` [PATCH 18/19] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
2015-09-09  2:07   ` Tsutomu Itoh
2015-09-08  9:25 ` [PATCH 19/19] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
2015-09-10 21:01 ` [PATCH RFC 00/14] Accurate qgroup reserve framework Mark Fasheh
2015-09-10 21:33   ` Filipe David Manana
2015-09-10 23:50     ` Mark Fasheh
2015-09-11  0:43   ` Qu Wenruo [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-09-08  8:37 Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F223AB.1080400@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=clm@fb.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mfasheh@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).