linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <fdmanana@gmail.com>, Josef Bacik <jbacik@fb.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2 00/23] Rework btrfs qgroup reserved space framework
Date: Fri, 9 Oct 2015 16:19:47 +0800	[thread overview]
Message-ID: <561778A3.10102@cn.fujitsu.com> (raw)
In-Reply-To: <CAL3q7H5HVA9osJ-rLmCas1SVDJbBWS=LvOoJHk8vHtf46HCvTQ@mail.gmail.com>



Filipe Manana wrote on 2015/10/09 07:41 +0100:
> On Fri, Oct 9, 2015 at 6:45 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> Josef Bacik wrote on 2015/10/08 21:36 -0700:
>>>
>>> On 10/08/2015 07:11 PM, Qu Wenruo wrote:
>>>>
>>>> In previous rework of qgroup, we succeeded in fixing qgroup accounting
>>>> part, making the rfer/excl numbers accurate.
>>>>
>>>> But that's just part of qgroup work, another part of qgroup still has
>>>> quite a lot problem, that's qgroup reserve space part which will lead to
>>>> EQUOT even we are far from the limit.
>>>>
>>>> [[BUG]]
>>>> The easiest way to trigger the bug is,
>>>> 1) Enable quota
>>>> 2) Limit excl of qgroup 5 to 16M
>>>> 3) Write [0,2M) of a file inside subvol 5 10 times without sync
>>>>
>>>> EQUOT will be triggered at about the 8th write.
>>>> But after remount, we can still write until about 15M.
>>>>
>>>> [[CAUSE]]
>>>> The problem is caused by the fact that qgroup will reserve space even
>>>> the data space is already reserved.
>>>>
>>>> In above reproducer, each time we buffered write [0,2M) qgroup will
>>>> reserve 2M space, but in fact, at the 1st time, we have already reserved
>>>> 2M and from then on, we don't need to reserved any data space as we are
>>>> only writing [0,2M).
>>>>
>>>> Also, the reserved space will only be freed *ONCE* when its backref is
>>>> run at commit_transaction() time.
>>>>
>>>> That's causing the reserved space leaking.
>>>>
>>>> [[FIX]]
>>>> The fix is not a simple one, as currently btrfs_qgroup_reserve() will
>>>> allocate whatever caller asked for.
>>>>
>>>> So for accurate qgroup reserve, we introduce a completely new framework
>>>> for data and metadata.
>>>> 1) Per-inode data reserve map
>>>>      Now, each inode will have a data reserve map, recording which range
>>>>      of data is already reserved.
>>>>      If we are writing a range which is already reserved, we won't need to
>>>>      reserve space again.
>>>>
>>>>      Also, for the fact that qgroup is only accounted at commit_trans(),
>>>>      for data commit into disc and its metadata is also inserted into
>>>>      current tree, we should free the data reserved range, but still keep
>>>>      the reserved space until commit_trans().
>>>>
>>>>      So delayed_ref_head will have new members to record how much space is
>>>>      reserved and free them at commit_trans() time.
>>>
>>>
>>> This is already handled by setting DELALLOC in the io_tree, we do
>>> similar sort of stuff for the normal enospc accounting, why not use
>>> that?  Thanks,
>>>
>>> Josef
>>
>>
>> Thanks for pointing this out.
>>
>> I was also searching for a existing facility, but didn't find one as I'm not
>> familiar with io_tree.
>>
>> After a quick glance, it seems quite fit the need, but not completely sure.
>>
>> I'll keep investigating on it and try to use it.
>>
>> BTW, from what I understand, __btrfs_buffered_write() should cause the range
>> to be DEALLOC, but I didn't find any call to set_extent_delalloc(),
>> it that done in other place?
>
> __btrfs_buffered_write() -> btrfs_dirty_pages() -> btrfs_set_extent_delalloc()
>
Thanks,

I also find the call sequence by dump_stack.

And to Josef, after some reading, the timing of clearing DELALLOC is not 
perfect for qgroup case.

For buffered/mapped write case, the difference is accept, as DELLAOC is 
marked at buffered write or page mkwrite.
Only clear DEALLOC is a little early at cow_file_range() other than 
finish_ordered_io() in my patchset.
The difference is acceptable for that case.

But if using DELALLOC flag, we can't handle fallocate() as it doesn't 
use DELALLOC at all.

Current				|	Patchset
btrfs_fallocate()		|btrfs_fallocate()
*NO* DELALLOC flag set/claer	|-> btrfs_qgroup_reserve()
				|   -> reserve qgroup space
				|      for each needed range.
				|-> btrfs_prealloc_file_range()
				|   -> free qgroup space

So at least extra extent flag is needed for accurate qgroup reserve.

But still thanks a lot, as I can now reuse io_tree to do such operation 
other than hand coding over 1K lines of new code.

Thanks,
Qu
>>
>> Thanks,
>> Qu
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>

      reply	other threads:[~2015-10-09  8:19 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-09  2:11 [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Qu Wenruo
2015-10-09  2:11 ` [PATCH v2 01/23] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
2015-10-09  2:11 ` [PATCH v2 02/23] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
2015-10-09  2:15 ` [PATCH v2 03/23] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
2015-10-09  2:15 ` [PATCH v2 04/23] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
2015-10-09  2:15 ` [PATCH v2 05/23] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
2015-10-09  2:18 ` [PATCH v2 06/23] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
2015-10-09  2:18 ` [PATCH v2 07/23] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
2015-10-09  2:18 ` [PATCH v2 08/23] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
2015-10-09  2:18 ` [PATCH v2 09/23] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
2015-10-09  2:22 ` [PATCH v2 10/23] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
2015-10-09  2:22 ` [PATCH v2 11/23] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
2015-10-09  2:22 ` [PATCH v2 12/23] btrfs: qgroup: Use new metadata reservation Qu Wenruo
2015-10-09  2:22 ` [PATCH v2 13/23] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
2015-10-09  2:25 ` [PATCH v2 14/23] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
2015-10-09  2:25 ` [PATCH v2 15/23] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
2015-10-09  2:25 ` [PATCH v2 16/23] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
2015-10-09  2:30 ` [PATCH v2 18/23] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
2015-10-09  2:30 ` [PATCH v2 19/23] btrfs: Add handler for invalidate page Qu Wenruo
2015-10-09  2:34 ` [PATCH v2 20/23] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
2015-10-09  2:34 ` [PATCH v2 21/23] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
2015-10-09  2:34 ` [PATCH v2 22/23] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
2015-10-09  2:34 ` [PATCH v2 23/23] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
2015-10-09  4:08 ` [PATCH v2 17/23] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
2015-10-09  4:36 ` [PATCH v2 00/23] Rework btrfs qgroup reserved space framework Josef Bacik
2015-10-09  5:45   ` Qu Wenruo
2015-10-09  6:41     ` Filipe Manana
2015-10-09  8:19       ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561778A3.10102@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=fdmanana@gmail.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).