From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <fdmanana@gmail.com>, Josef Bacik <jbacik@fb.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2 00/23] Rework btrfs qgroup reserved space framework
Date: Fri, 9 Oct 2015 16:19:47 +0800 [thread overview]
Message-ID: <561778A3.10102@cn.fujitsu.com> (raw)
In-Reply-To: <CAL3q7H5HVA9osJ-rLmCas1SVDJbBWS=LvOoJHk8vHtf46HCvTQ@mail.gmail.com>
Filipe Manana wrote on 2015/10/09 07:41 +0100:
> On Fri, Oct 9, 2015 at 6:45 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> Josef Bacik wrote on 2015/10/08 21:36 -0700:
>>>
>>> On 10/08/2015 07:11 PM, Qu Wenruo wrote:
>>>>
>>>> In previous rework of qgroup, we succeeded in fixing qgroup accounting
>>>> part, making the rfer/excl numbers accurate.
>>>>
>>>> But that's just part of qgroup work, another part of qgroup still has
>>>> quite a lot problem, that's qgroup reserve space part which will lead to
>>>> EQUOT even we are far from the limit.
>>>>
>>>> [[BUG]]
>>>> The easiest way to trigger the bug is,
>>>> 1) Enable quota
>>>> 2) Limit excl of qgroup 5 to 16M
>>>> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>>>>
>>>> EQUOT will be triggered at about the 8th write.
>>>> But after remount, we can still write until about 15M.
>>>>
>>>> [[CAUSE]]
>>>> The problem is caused by the fact that qgroup will reserve space even
>>>> the data space is already reserved.
>>>>
>>>> In above reproducer, each time we buffered write [0,2M) qgroup will
>>>> reserve 2M space, but in fact, at the 1st time, we have already reserved
>>>> 2M and from then on, we don't need to reserved any data space as we are
>>>> only writing [0,2M).
>>>>
>>>> Also, the reserved space will only be freed *ONCE* when its backref is
>>>> run at commit_transaction() time.
>>>>
>>>> That's causing the reserved space leaking.
>>>>
>>>> [[FIX]]
>>>> The fix is not a simple one, as currently btrfs_qgroup_reserve() will
>>>> allocate whatever caller asked for.
>>>>
>>>> So for accurate qgroup reserve, we introduce a completely new framework
>>>> for data and metadata.
>>>> 1) Per-inode data reserve map
>>>> Now, each inode will have a data reserve map, recording which range
>>>> of data is already reserved.
>>>> If we are writing a range which is already reserved, we won't need to
>>>> reserve space again.
>>>>
>>>> Also, for the fact that qgroup is only accounted at commit_trans(),
>>>> for data commit into disc and its metadata is also inserted into
>>>> current tree, we should free the data reserved range, but still keep
>>>> the reserved space until commit_trans().
>>>>
>>>> So delayed_ref_head will have new members to record how much space is
>>>> reserved and free them at commit_trans() time.
>>>
>>>
>>> This is already handled by setting DELALLOC in the io_tree, we do
>>> similar sort of stuff for the normal enospc accounting, why not use
>>> that? Thanks,
>>>
>>> Josef
>>
>>
>> Thanks for pointing this out.
>>
>> I was also searching for a existing facility, but didn't find one as I'm not
>> familiar with io_tree.
>>
>> After a quick glance, it seems quite fit the need, but not completely sure.
>>
>> I'll keep investigating on it and try to use it.
>>
>> BTW, from what I understand, __btrfs_buffered_write() should cause the range
>> to be DEALLOC, but I didn't find any call to set_extent_delalloc(),
>> it that done in other place?
>
> __btrfs_buffered_write() -> btrfs_dirty_pages() -> btrfs_set_extent_delalloc()
>
Thanks,
I also find the call sequence by dump_stack.
And to Josef, after some reading, the timing of clearing DELALLOC is not
perfect for qgroup case.
For buffered/mapped write case, the difference is accept, as DELLAOC is
marked at buffered write or page mkwrite.
Only clear DEALLOC is a little early at cow_file_range() other than
finish_ordered_io() in my patchset.
The difference is acceptable for that case.
But if using DELALLOC flag, we can't handle fallocate() as it doesn't
use DELALLOC at all.
Current | Patchset
btrfs_fallocate() |btrfs_fallocate()
*NO* DELALLOC flag set/claer |-> btrfs_qgroup_reserve()
| -> reserve qgroup space
| for each needed range.
|-> btrfs_prealloc_file_range()
| -> free qgroup space
So at least extra extent flag is needed for accurate qgroup reserve.
But still thanks a lot, as I can now reuse io_tree to do such operation
other than hand coding over 1K lines of new code.
Thanks,
Qu
>>
>> Thanks,
>> Qu
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
prev parent reply other threads:[~2015-10-09 8:19 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-09 2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
2015-10-09 2:11 ` [PATCH v2 01/23] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
2015-10-09 2:11 ` [PATCH v2 02/23] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
2015-10-09 2:15 ` [PATCH v2 03/23] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
2015-10-09 2:15 ` [PATCH v2 04/23] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
2015-10-09 2:15 ` [PATCH v2 05/23] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 06/23] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 07/23] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 08/23] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
2015-10-09 2:18 ` [PATCH v2 09/23] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 10/23] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 11/23] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 12/23] btrfs: qgroup: Use new metadata reservation Qu Wenruo
2015-10-09 2:22 ` [PATCH v2 13/23] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
2015-10-09 2:25 ` [PATCH v2 14/23] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
2015-10-09 2:25 ` [PATCH v2 15/23] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
2015-10-09 2:25 ` [PATCH v2 16/23] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
2015-10-09 2:30 ` [PATCH v2 18/23] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
2015-10-09 2:30 ` [PATCH v2 19/23] btrfs: Add handler for invalidate page Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 20/23] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 21/23] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 22/23] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
2015-10-09 2:34 ` [PATCH v2 23/23] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
2015-10-09 4:08 ` [PATCH v2 17/23] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
2015-10-09 4:36 ` [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Josef Bacik
2015-10-09 5:45 ` Qu Wenruo
2015-10-09 6:41 ` Filipe Manana
2015-10-09 8:19 ` Qu Wenruo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561778A3.10102@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=fdmanana@gmail.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.