From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <fdmanana@gmail.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
Mark Fasheh <mfasheh@suse.de>
Subject: Re: [PATCH v2] btrfs: qgroup: Fix qgroup accounting when creating snapshot
Date: Thu, 14 Apr 2016 09:11:51 +0800 [thread overview]
Message-ID: <570EEE57.4000201@cn.fujitsu.com> (raw)
In-Reply-To: <CAL3q7H5k_NWHFwvVQSdDLr0Gsg+ENgG3QSd_24Vi0XU3NZ-OFg@mail.gmail.com>
Filipe Manana wrote on 2016/04/13 17:23 +0100:
> On Tue, Apr 12, 2016 at 8:35 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Current btrfs qgroup design implies a requirement that after calling
>> btrfs_qgroup_account_extents() there must be a commit root switch.
>>
>> Normally this is OK, as btrfs_qgroup_accounting_extents() is only called
>> inside btrfs_commit_transaction() just be commit_cowonly_roots().
>>
>> However there is a exception at create_pending_snapshot(), which will
>> call btrfs_qgroup_account_extents() but no any commit root switch.
>>
>> In case of creating a snapshot whose parent root is itself (create a
>> snapshot of fs tree), it will corrupt qgroup by the following trace:
>> (skipped unrelated data)
>> ======
>> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1
>> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0
>> qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384
>> btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0
>> ======
>>
>> The problem here is in first qgroup_account_extent(), the
>> nr_new_roots of the extent is 1, which means its reference got
>> increased, and qgroup increased its rfer and excl.
>>
>> But at second qgroup_account_extent(), its reference got decreased, but
>> between these two qgroup_account_extent(), there is no switch roots.
>> This leads to the same nr_old_roots, and this extent just got ignored by
>> qgroup, which means this extent is wrongly accounted.
>>
>> Fix it by call commit_cowonly_roots() after qgroup_account_extent() in
>> create_pending_snapshot(), with needed preparation.
>>
>> Reported-by: Mark Fasheh <mfasheh@suse.de>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>> changelog:
>> v2:
>> Fix a soft lockup caused by missing switch_commit_root() call.
>> Fix a warning caused by dirty-but-not-committed root.
>>
>> Note:
>> This may be the dirtiest hack I have ever done.
>
> I don't like it either. But, more importantly, I don't think this is
> correct. See below.
>
>> As there are already several different judgment to check if a fs root
>> should be updated. From root->last_trans to root->commit_root ==
>> root->node.
>>
>> With this patch, we must switch the root of at least related fs tree
>> and extent tree to allow qgroup to call
>> btrfs_qgroup_account_extents().
>> But this will break some transid judgement, as transid is already
>> updated to current transid.
>> (maybe we need a special sub-transid for qgroup use only?)
>>
>> As long as current qgroup use commit_root to determine old_roots,
>> there is no better idea though.
>> ---
>> fs/btrfs/transaction.c | 96 +++++++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 71 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 43885e5..0f299a56 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -311,12 +311,13 @@ loop:
>> * when the transaction commits
>> */
>> static int record_root_in_trans(struct btrfs_trans_handle *trans,
>> - struct btrfs_root *root)
>> + struct btrfs_root *root,
>> + int force)
>> {
>> - if (test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
>> - root->last_trans < trans->transid) {
>> + if ((test_bit(BTRFS_ROOT_REF_COWS, &root->state) &&
>> + root->last_trans < trans->transid) || force) {
>> WARN_ON(root == root->fs_info->extent_root);
>> - WARN_ON(root->commit_root != root->node);
>> + WARN_ON(root->commit_root != root->node && !force);
>>
>> /*
>> * see below for IN_TRANS_SETUP usage rules
>> @@ -331,7 +332,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans,
>> smp_wmb();
>>
>> spin_lock(&root->fs_info->fs_roots_radix_lock);
>> - if (root->last_trans == trans->transid) {
>> + if (root->last_trans == trans->transid && !force) {
>> spin_unlock(&root->fs_info->fs_roots_radix_lock);
>> return 0;
>> }
>> @@ -402,7 +403,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans,
>> return 0;
>>
>> mutex_lock(&root->fs_info->reloc_mutex);
>> - record_root_in_trans(trans, root);
>> + record_root_in_trans(trans, root, 0);
>> mutex_unlock(&root->fs_info->reloc_mutex);
>>
>> return 0;
>> @@ -1383,7 +1384,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>> dentry = pending->dentry;
>> parent_inode = pending->dir;
>> parent_root = BTRFS_I(parent_inode)->root;
>> - record_root_in_trans(trans, parent_root);
>> + record_root_in_trans(trans, parent_root, 0);
>>
>> cur_time = current_fs_time(parent_inode->i_sb);
>>
>> @@ -1420,7 +1421,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>> goto fail;
>> }
>>
>> - record_root_in_trans(trans, root);
>> + record_root_in_trans(trans, root, 0);
>> btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
>> memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
>> btrfs_check_and_init_root_item(new_root_item);
>> @@ -1516,6 +1517,62 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>> goto fail;
>> }
>>
>> + /*
>> + * Account qgroups before insert the dir item
>> + * As such dir item insert will modify parent_root, which could be
>> + * src root. If we don't do it now, wrong accounting may be inherited
>> + * to snapshot qgroup.
>> + *
>> + * For reason locking tree_log_mutex, see btrfs_commit_transaction()
>> + * comment
>> + */
>> + mutex_lock(&root->fs_info->tree_log_mutex);
>> +
>> + ret = commit_fs_roots(trans, root);
>> + if (ret) {
>> + mutex_unlock(&root->fs_info->tree_log_mutex);
>> + goto fail;
>> + }
>> +
>> + ret = btrfs_qgroup_prepare_account_extents(trans, root->fs_info);
>> + if (ret < 0) {
>> + mutex_unlock(&root->fs_info->tree_log_mutex);
>> + goto fail;
>> + }
>> + ret = btrfs_qgroup_account_extents(trans, root->fs_info);
>> + if (ret < 0) {
>> + mutex_unlock(&root->fs_info->tree_log_mutex);
>> + goto fail;
>> + }
>> + /*
>> + * Now qgroup are all updated, we can inherit it to new qgroups
>> + */
>> + ret = btrfs_qgroup_inherit(trans, fs_info,
>> + root->root_key.objectid,
>> + objectid, pending->inherit);
>> + if (ret < 0) {
>> + mutex_unlock(&root->fs_info->tree_log_mutex);
>> + goto fail;
>> + }
>> + /*
>> + * qgroup_account_extents() must be followed by a
>> + * switch_commit_roots(), or next qgroup_account_extents() will
>> + * be corrupted
>> + */
>> + ret = commit_cowonly_roots(trans, root);
>> + if (ret) {
>> + mutex_unlock(&root->fs_info->tree_log_mutex);
>> + goto fail;
>> + }
>> + /*
>> + * Just like in btrfs_commit_transaction(), we need to
>> + * switch_commit_roots().
>> + * However this time we don't need to do a full one,
>> + * excluding tree root and chunk root should be OK.
>> + */
>> + switch_commit_roots(trans->transaction, root->fs_info);
>
> This will undo commit 2b9dbef272b63c561aab0a5be34fd428f7b710f5 (Btrfs:
> keep dropped roots in cache until transaction commit). Won't it?
>
> So create_pending_snapshot() / create_pending_snapshots() are called
> at transaction commit before btrfs_qgroup_account_extents() is called.
> And with create_pending_snapshot() now calling switch_commit_roots()
> it means the roots for deleted snapshots and subvolumes are now gone
> before btrfs_qgroup_account_extents() gets called, so it can't do the
> correct calculations anymore for the cases described in that commit.
> That is, btrfs_qgroup_account_extents() must always be called before
> switch_commit_roots().
> Or did I miss something?
You're right, in create_pending_snapshot(), we should not delete the
dropped roots.
Although that's not hard to fix, we can add a new parameter to
switch_commit_roots() and not to delete dropped roots at
create_pending_snapshot().
Only the final switch_commit_roots() will really remove dropped roots.
(Just making the already dirty hack more ugly)
I hope to find a better method to handle such case, but still no idea yet.
My idea was to introduce sub-transid, but that's something existed but
then removed, and won't really make things significantly better.
If any advice can be provided it would be quite nice.
Thanks,
Qu
>
> We should have had a test case for that commit mentioned above, but
> that's another story...
>
>> + mutex_unlock(&root->fs_info->tree_log_mutex);
>> +
>> ret = btrfs_insert_dir_item(trans, parent_root,
>> dentry->d_name.name, dentry->d_name.len,
>> parent_inode, &key,
>> @@ -1527,6 +1584,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>> goto fail;
>> }
>>
>> + /*
>> + * Force parent root to be updated, as we recorded it before its
>> + * last_trans == cur_transid
>> + */
>> + record_root_in_trans(trans, parent_root, 1);
>> +
>> btrfs_i_size_write(parent_inode, parent_inode->i_size +
>> dentry->d_name.len * 2);
>> parent_inode->i_mtime = parent_inode->i_ctime =
>> @@ -1559,23 +1622,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>> goto fail;
>> }
>>
>> - /*
>> - * account qgroup counters before qgroup_inherit()
>> - */
>> - ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
>> - if (ret)
>> - goto fail;
>> - ret = btrfs_qgroup_account_extents(trans, fs_info);
>> - if (ret)
>> - goto fail;
>> - ret = btrfs_qgroup_inherit(trans, fs_info,
>> - root->root_key.objectid,
>> - objectid, pending->inherit);
>> - if (ret) {
>> - btrfs_abort_transaction(trans, root, ret);
>> - goto fail;
>> - }
>> -
>> fail:
>> pending->error = ret;
>> dir_item_existed:
>> --
>> 2.8.0
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
next prev parent reply other threads:[~2016-04-14 1:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-12 7:35 [PATCH v2] btrfs: qgroup: Fix qgroup accounting when creating snapshot Qu Wenruo
2016-04-13 16:23 ` Filipe Manana
2016-04-14 1:11 ` Qu Wenruo [this message]
2016-04-14 1:30 ` Qu Wenruo
2016-04-14 9:21 ` Filipe Manana
2016-04-14 12:04 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=570EEE57.4000201@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=mfasheh@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox