From: Miao Xie <miaox@cn.fujitsu.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
Date: Thu, 01 Nov 2012 16:16:43 +0800 [thread overview]
Message-ID: <50922FEB.8030900@cn.fujitsu.com> (raw)
In-Reply-To: <20121101080420.GA2554@liubo.cn.oracle.com>
On thu, 1 Nov 2012 16:04:27 +0800, Liu Bo wrote:
> (sorry, forgot to cc linux-btrfs.)
> On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote:
>> On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote:
>>> On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote:
>>>> Consider the following case:
>>>> Task1 Task2
>>>> start_transaction
>>>> commit_transaction
>>>> check pending snapshots list and the
>>>> list is empty.
>>>> add pending snapshot into list
>>>> skip the delalloc flush
>>>> end_transaction
>>>> ...
>>>>
>>>> And then the problem that the snapshot is different with the source subvolume
>>>> happen.
>>>>
>>>
>>> This is weird, create_snapshot() will first add pending snapshot into
>>> list and then commit the transaction itself, regardless of if the
>>> snapshot is different with others or not.
>>
>> But the transaction may be committed by the other task, and the snapshot
>> creation task just wait until it ends.
>>
>
> It's possible that a commit tranaction becomes a end transaction when it
> finds itself is already in commit.
>
> So if snapshot creation starts the transaction, it will increment the
> transaction's num_writers, why does not the other task wait for its
> end_transacion?
>
> I doubt if this can really happen anyway...
>
> Can you elaborate the situation more?
Task1 Task2
start_transaction
start_transaction
commit_transaction
set in_commit to 1
check pending snapshots list and the list is empty.
add pending snapshot into list
skip the delalloc flush
commit_transaction
find in_commit is 1
end_transaction (num_writer--)
wait_for_commit
num_writer is 1
continue committing the transaction
...
Thanks
Miao
>
> thanks,
> liubo
>
>>>
>>> How do you find this?
>>
>> Just by review the code. I think it can be triggered
>>
>> Thanks
>> Miao
>>
>>>
>>> thanks,
>>> liubo
>>>
>>>> This patch fixes the above problem by flush all pending stuffs when all the
>>>> other tasks end the transaction.
>>>>
>>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>>> ---
>>>> fs/btrfs/transaction.c | 74 ++++++++++++++++++++++++++++++-----------------
>>>> 1 files changed, 47 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>>> index 6d0d5a0..d9a9a70 100644
>>>> --- a/fs/btrfs/transaction.c
>>>> +++ b/fs/btrfs/transaction.c
>>>> @@ -1401,6 +1401,48 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans,
>>>> kmem_cache_free(btrfs_trans_handle_cachep, trans);
>>>> }
>>>>
>>>> +static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans,
>>>> + struct btrfs_root *root)
>>>> +{
>>>> + int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT);
>>>> + int snap_pending = 0;
>>>> + int ret;
>>>> +
>>>> + if (!flush_on_commit) {
>>>> + spin_lock(&root->fs_info->trans_lock);
>>>> + if (!list_empty(&trans->transaction->pending_snapshots))
>>>> + snap_pending = 1;
>>>> + spin_unlock(&root->fs_info->trans_lock);
>>>> + }
>>>> +
>>>> + if (flush_on_commit || snap_pending) {
>>>> + btrfs_start_delalloc_inodes(root, 1);
>>>> + btrfs_wait_ordered_extents(root, 1);
>>>> + }
>>>> +
>>>> + ret = btrfs_run_delayed_items(trans, root);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + /*
>>>> + * running the delayed items may have added new refs. account
>>>> + * them now so that they hinder processing of more delayed refs
>>>> + * as little as possible.
>>>> + */
>>>> + btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info);
>>>> +
>>>> + /*
>>>> + * rename don't use btrfs_join_transaction, so, once we
>>>> + * set the transaction to blocked above, we aren't going
>>>> + * to get any new ordered operations. We can safely run
>>>> + * it here and no for sure that nothing new will be added
>>>> + * to the list
>>>> + */
>>>> + btrfs_run_ordered_operations(root, 1);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> /*
>>>> * btrfs_transaction state sequence:
>>>> * in_commit = 0, blocked = 0 (initial)
>>>> @@ -1418,7 +1460,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>>> int ret = -EIO;
>>>> int should_grow = 0;
>>>> unsigned long now = get_seconds();
>>>> - int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT);
>>>>
>>>> btrfs_run_ordered_operations(root, 0);
>>>>
>>>> @@ -1491,39 +1532,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>>> should_grow = 1;
>>>>
>>>> do {
>>>> - int snap_pending = 0;
>>>> -
>>>> joined = cur_trans->num_joined;
>>>> - if (!list_empty(&trans->transaction->pending_snapshots))
>>>> - snap_pending = 1;
>>>>
>>>> WARN_ON(cur_trans != trans->transaction);
>>>>
>>>> - if (flush_on_commit || snap_pending) {
>>>> - btrfs_start_delalloc_inodes(root, 1);
>>>> - btrfs_wait_ordered_extents(root, 1);
>>>> - }
>>>> -
>>>> - ret = btrfs_run_delayed_items(trans, root);
>>>> + ret = btrfs_flush_all_pending_stuffs(trans, root);
>>>> if (ret)
>>>> goto cleanup_transaction;
>>>>
>>>> - /*
>>>> - * running the delayed items may have added new refs. account
>>>> - * them now so that they hinder processing of more delayed refs
>>>> - * as little as possible.
>>>> - */
>>>> - btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info);
>>>> -
>>>> - /*
>>>> - * rename don't use btrfs_join_transaction, so, once we
>>>> - * set the transaction to blocked above, we aren't going
>>>> - * to get any new ordered operations. We can safely run
>>>> - * it here and no for sure that nothing new will be added
>>>> - * to the list
>>>> - */
>>>> - btrfs_run_ordered_operations(root, 1);
>>>> -
>>>> prepare_to_wait(&cur_trans->writer_wait, &wait,
>>>> TASK_UNINTERRUPTIBLE);
>>>>
>>>> @@ -1536,6 +1552,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>>> } while (atomic_read(&cur_trans->num_writers) > 1 ||
>>>> (should_grow && cur_trans->num_joined != joined));
>>>>
>>>> + ret = btrfs_flush_all_pending_stuffs(trans, root);
>>>> + if (ret)
>>>> + goto cleanup_transaction;
>>>> +
>>>> /*
>>>> * Ok now we need to make sure to block out any other joins while we
>>>> * commit the transaction. We could have started a join before setting
>>>> --
>>>> 1.7.6.5
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2012-11-01 8:16 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-01 7:33 [PATCH 2/5] Btrfs: fix missing flush when committing a transaction Miao Xie
[not found] ` <20121101074443.GC1591@liubo.cn.oracle.com>
[not found] ` <50922A0D.80103@cn.fujitsu.com>
2012-11-01 8:04 ` Liu Bo
2012-11-01 8:16 ` Miao Xie [this message]
2012-11-01 9:00 ` Liu Bo
2012-11-01 10:18 ` Miao Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50922FEB.8030900@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.