From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Jeff Mahoney <jeffm@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: qgroups, properly handle no reservations
Date: Thu, 22 Feb 2018 10:05:36 +0800 [thread overview]
Message-ID: <ac3ffb4b-c60a-fd1e-be50-db99376a6e1e@gmx.com> (raw)
In-Reply-To: <f0c73e34-dc33-d2eb-56d6-2b936f1f967b@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 6957 bytes --]
On 2018年02月22日 09:50, Jeff Mahoney wrote:
> On 2/21/18 8:36 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年02月22日 04:19, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>>
>>> There are several places where we call btrfs_qgroup_reserve_meta and
>>> assume that a return value of 0 means that the reservation was successful.
>>>
>>> Later, we use the original bytes value passed to that call to free
>>> bytes during error handling or to pass the number of bytes reserved to
>>> the caller.
>>>
>>> This patch returns -ENODATA when we don't perform a reservation so that
>>> callers can make the distinction. This also lets call sites not
>>> necessarily care whether qgroups are enabled.
>>
>> IMHO if we don't need to reserve, returning 0 seems good enough.
>> Caller doesn't really need to care if it has reserved some bytes.
>>
>> Or is there any special case where we need to distinguish such case?
>
> Anywhere where the reservation takes place prior to the transaction
> starting, which is pretty much everywhere. We wait until transaction
> commit to flip the bit to turn on quotas, which means that if a
> transaction commits that enables quotas lands in between the reservation
> being take and any error handling that involves freeing the reservation,
> we'll end up with an underflow.
So the same case as btrfs_qgroup_reserve_data().
In that case we could use ret > 0 to indicates the real bytes we
reserved, instead of -ENODATA which normally means error.
>
> This is the first patch of a series I'm working on, but it can stand
> alone. The rest is the patch set I mentioned when we talked a few
> months ago where the lifetimes of reservations are incorrect. We can't
> just drop all the reservations at the end of the transaction because 1)
> the lifetime of some reservations can cross transactions and 2) because
> especially in the start_transaction case, we do the reservation prior to
> waiting to join the transaction. So if the transaction we're waiting on
> commits, our reservation goes away with it but we continue on as if we
> still have it.
Right, the same problem I also addressed in patchset "[PATCH v2 00/10]
Use split qgroup rsv type".
In 6th patch, "[PATCH v2 06/10] btrfs: qgroup: Use
root->qgroup_meta_rsv_* to record qgroup meta reserved space" qgroup
meta reserve will only be increased if we succeeded in reserving
metadata, so later free won't underflow that number.
Thanks,
Qu
>
> -Jeff
>
>> Thanks,
>> Qu
>>
>>>
>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>> ---
>>> fs/btrfs/extent-tree.c | 33 ++++++++++++++++-----------------
>>> fs/btrfs/qgroup.c | 4 ++--
>>> fs/btrfs/transaction.c | 5 ++++-
>>> 3 files changed, 22 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index c1618ab9fecf..2d5e963fae88 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5988,20 +5988,18 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
>>> u64 *qgroup_reserved,
>>> bool use_global_rsv)
>>> {
>>> - u64 num_bytes;
>>> int ret;
>>> struct btrfs_fs_info *fs_info = root->fs_info;
>>> struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>> + /* One for parent inode, two for dir entries */
>>> + u64 num_bytes = 3 * fs_info->nodesize;
>>>
>>> - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>>> - /* One for parent inode, two for dir entries */
>>> - num_bytes = 3 * fs_info->nodesize;
>>> - ret = btrfs_qgroup_reserve_meta(root, num_bytes, true);
>>> - if (ret)
>>> - return ret;
>>> - } else {
>>> + ret = btrfs_qgroup_reserve_meta(root, num_bytes, true);
>>> + if (ret == -ENODATA) {
>>> num_bytes = 0;
>>> - }
>>> + ret = 0;
>>> + } else if (ret)
>>> + return ret;
>>>
>>> *qgroup_reserved = num_bytes;
>>>
>>> @@ -6057,6 +6055,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
>>> enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_ALL;
>>> int ret = 0;
>>> bool delalloc_lock = true;
>>> + u64 qgroup_reserved;
>>>
>>> /* If we are a free space inode we need to not flush since we will be in
>>> * the middle of a transaction commit. We also don't need the delalloc
>>> @@ -6090,17 +6089,17 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
>>> btrfs_calculate_inode_block_rsv_size(fs_info, inode);
>>> spin_unlock(&inode->lock);
>>>
>>> - if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>>> - ret = btrfs_qgroup_reserve_meta(root,
>>> - nr_extents * fs_info->nodesize, true);
>>> - if (ret)
>>> - goto out_fail;
>>> - }
>>> + qgroup_reserved = nr_extents * fs_info->nodesize;
>>> + ret = btrfs_qgroup_reserve_meta(root, qgroup_reserved, true);
>>> + if (ret == -ENODATA) {
>>> + ret = 0;
>>> + qgroup_reserved = 0;
>>> + } if (ret)
>>> + goto out_fail;
>>>
>>> ret = btrfs_inode_rsv_refill(inode, flush);
>>> if (unlikely(ret)) {
>>> - btrfs_qgroup_free_meta(root,
>>> - nr_extents * fs_info->nodesize);
>>> + btrfs_qgroup_free_meta(root, qgroup_reserved);
>>> goto out_fail;
>>> }
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index aa259d6986e1..5d9e011243c6 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -3025,7 +3025,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
>>>
>>> if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) ||
>>> !is_fstree(root->objectid) || num_bytes == 0)
>>> - return 0;
>>> + return -ENODATA;
>>>
>>> BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
>>> trace_qgroup_meta_reserve(root, (s64)num_bytes);
>>> @@ -3057,7 +3057,7 @@ void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
>>> struct btrfs_fs_info *fs_info = root->fs_info;
>>>
>>> if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) ||
>>> - !is_fstree(root->objectid))
>>> + !is_fstree(root->objectid) || num_bytes == 0)
>>> return;
>>>
>>> BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 04f07144b45c..ab67b73bd7fa 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -510,7 +510,10 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
>>> qgroup_reserved = num_items * fs_info->nodesize;
>>> ret = btrfs_qgroup_reserve_meta(root, qgroup_reserved,
>>> enforce_qgroups);
>>> - if (ret)
>>> + if (ret == -ENODATA) {
>>> + ret = 0;
>>> + qgroup_reserved = 0;
>>> + } else if (ret)
>>> return ERR_PTR(ret);
>>>
>>> num_bytes = btrfs_calc_trans_metadata_size(fs_info, num_items);
>>>
>>
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]
next prev parent reply other threads:[~2018-02-22 2:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-21 20:19 [PATCH] btrfs: qgroups, properly handle no reservations jeffm
2018-02-22 1:36 ` Qu Wenruo
2018-02-22 1:50 ` Jeff Mahoney
2018-02-22 2:05 ` Qu Wenruo [this message]
2018-03-07 16:02 ` David Sterba
2018-03-08 1:02 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac3ffb4b-c60a-fd1e-be50-db99376a6e1e@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=jeffm@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).